[jira] Commented: (LUCENE-2026) Refactoring of IndexWriter
[ https://issues.apache.org/jira/browse/LUCENE-2026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12789618#action_12789618 ] Jake Mannix commented on LUCENE-2026: - bq. Now we want more speed, and are ready to sacrifice something if needed. bq. You decide to sacrifice new record (in)visibility. No choice, but to hack into IW to allow readers see its hot, fresh innards. Chiming in here that of course, you don't *need* (ie there is a choice) to hack into the IW to do this. Zoie is a completely user-land solution which modifies no IW/IR internals and yet achieves millisecond index-to-query-visibility turnaround while keeping speedy indexing and query performance. It just keeps the RAMDir outside encapsulated in an object (an IndexingSystem) which has IndexReaders built off of both the RAMDir and the FSDir, and hides the implementation details (in fact the IW itself) from the user. The API for this kind of thing doesn't *have* to be tightly coupled, and I would agree with you that it shouldn't be. > Refactoring of IndexWriter > -- > > Key: LUCENE-2026 > URL: https://issues.apache.org/jira/browse/LUCENE-2026 > Project: Lucene - Java > Issue Type: Improvement > Components: Index >Reporter: Michael Busch >Assignee: Michael Busch >Priority: Minor > Fix For: 3.1 > > > I've been thinking for a while about refactoring the IndexWriter into > two main components. > One could be called a SegmentWriter and as the > name says its job would be to write one particular index segment. The > default one just as today will provide methods to add documents and > flushes when its buffer is full. > Other SegmentWriter implementations would do things like e.g. appending or > copying external segments [what addIndexes*() currently does]. > The second component's job would it be to manage writing the segments > file and merging/deleting segments. It would know about > DeletionPolicy, MergePolicy and MergeScheduler. Ideally it would > provide hooks that allow users to manage external data structures and > keep them in sync with Lucene's data during segment merges. > API wise there are things we have to figure out, such as where the > updateDocument() method would fit in, because its deletion part > affects all segments, whereas the new document is only being added to > the new segment. > Of course these should be lower level APIs for things like parallel > indexing and related use cases. That's why we should still provide > easy to use APIs like today for people who don't need to care about > per-segment ops during indexing. So the current IndexWriter could > probably keeps most of its APIs and delegate to the new classes. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2143) Understand why NRT performance is affected by flush frequency
[ https://issues.apache.org/jira/browse/LUCENE-2143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12788871#action_12788871 ] Jake Mannix commented on LUCENE-2143: - So why also, are you only picking one indexing rate? Should you not also be varying this from low, all the way up to saturating the system with maximum indexing rate, to properly stress the system? > Understand why NRT performance is affected by flush frequency > - > > Key: LUCENE-2143 > URL: https://issues.apache.org/jira/browse/LUCENE-2143 > Project: Lucene - Java > Issue Type: Bug > Components: Index >Reporter: Michael McCandless >Assignee: Michael McCandless > Fix For: 3.1 > > > In LUCENE-2061 (perf tests for NRT), I test NRT performance by first > getting a baseline QPS with only searching, using enough threads to > saturate. > Then, I pick an indexing rate (I used 100 docs/sec), and index docs at > that rate, and I also reopen a NRT reader at different frequencies > (10/sec, 1/sec, every 5 seconds, etc.), and then again test QPS > (saturated). > I think this is a good approach for testing NRT -- apps can see, as a > function of "freshness" and at a fixed indexing rate, what the cost is > to QPS. You'd expect as index rate goes up, and freshness goes up, > QPS will go down. > But I found something very strange: the low frequency reopen rates > often caused a highish hit to QPS. When I forced IW to flush every > 100 docs (= once per second), the performance was generally much > better. > I actually would've expected the reverse -- flushing in batch ought to > use fewer resoruces. > One theory is something odd about my test env (based on OpenSolaris), > so I'd like to retest on a more mainstream OS. > I'm opening this issue to get to the bottom of it... -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1377) Add HTMLStripReader and WordDelimiterFilter from SOLR
[ https://issues.apache.org/jira/browse/LUCENE-1377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12787647#action_12787647 ] Jake Mannix commented on LUCENE-1377: - bq. A solr developer adds something to solr w/o putting it in lucene - only solr users benefit. bq. A lucene developer adds something to lucene w/o adding support for that in solr - only lucene users benefit. I agree with the first, but not the second: solr users benefit on the next release cycle of solr which includes the lucene changes. Your statements here only apply if there wasn't a dependency of solr on lucene. > Add HTMLStripReader and WordDelimiterFilter from SOLR > - > > Key: LUCENE-1377 > URL: https://issues.apache.org/jira/browse/LUCENE-1377 > Project: Lucene - Java > Issue Type: Improvement > Components: Analysis >Affects Versions: 2.3.2 >Reporter: Jason Rutherglen >Priority: Minor > Original Estimate: 24h > Remaining Estimate: 24h > > SOLR has two classes HTMLStripReader and WordDelimiterFilter which are very > useful for a wide variety of use cases. It would be good to place them into > core Lucene. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1377) Add HTMLStripReader and WordDelimiterFilter from SOLR
[ https://issues.apache.org/jira/browse/LUCENE-1377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12787605#action_12787605 ] Jake Mannix commented on LUCENE-1377: - bq. But we've been unable to make that happen in the past, so betting on it going forward seems like a mistake. Sure, but keeping code which could be used by any Lucene project sequestered in Solr because you want to be able to modify it on a different time-schedule seems like bad open-source policy. I guess what I'd change my statement to is that yes, it would be a downside to the Solr community (anytime you depend on another project, you also depend on its release schedule). But I'd still hold that this fact alone should not keep code *out* of Lucene-core. Anything which isn't Solr-specific should belong deeper in, to allow use by more people, otherwise only Solr users get the benefit. > Add HTMLStripReader and WordDelimiterFilter from SOLR > - > > Key: LUCENE-1377 > URL: https://issues.apache.org/jira/browse/LUCENE-1377 > Project: Lucene - Java > Issue Type: Improvement > Components: Analysis >Affects Versions: 2.3.2 >Reporter: Jason Rutherglen >Priority: Minor > Original Estimate: 24h > Remaining Estimate: 24h > > SOLR has two classes HTMLStripReader and WordDelimiterFilter which are very > useful for a wide variety of use cases. It would be good to place them into > core Lucene. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1377) Add HTMLStripReader and WordDelimiterFilter from SOLR
[ https://issues.apache.org/jira/browse/LUCENE-1377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12787595#action_12787595 ] Jake Mannix commented on LUCENE-1377: - bq. There are some downsides for Solr: if these are in Lucene, we (solr) lose the ability to easily add new features or fix bugs... mainly because Solr doesn't use Lucene trunk anymore. Who is this "we" of which you speak? If someone using Solr needs something changed in Lucene, a patch submitted to Lucene JIRA should be no more hard to get committed than a patch submitted to the Solr JIRA, no? The release cycle may be slower, but yes, as Earwin says, maybe this just gives another reason for more rapid minor version iteration in Lucene. > Add HTMLStripReader and WordDelimiterFilter from SOLR > - > > Key: LUCENE-1377 > URL: https://issues.apache.org/jira/browse/LUCENE-1377 > Project: Lucene - Java > Issue Type: Improvement > Components: Analysis >Affects Versions: 2.3.2 >Reporter: Jason Rutherglen >Priority: Minor > Original Estimate: 24h > Remaining Estimate: 24h > > SOLR has two classes HTMLStripReader and WordDelimiterFilter which are very > useful for a wide variety of use cases. It would be good to place them into > core Lucene. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1526) For near real-time search, use paged copy-on-write BitVector impl
[ https://issues.apache.org/jira/browse/LUCENE-1526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12777068#action_12777068 ] Jake Mannix commented on LUCENE-1526: - bq. OK. It's clear Zoie's design is optimized for insanely fast reopen. That, and maxing out QPS and indexing rate while keeping query latency degredation to a minimum. From trying to turn off the extra deleted check, the latency overhead on a 5M doc index is a difference of queries taking 12-13ms with the extra check turned on, and 10ms without it, and you only really start to notice on the extreme edges (the queries hitting all 5million docs by way of an actual query (not MatchAllDocs)), when your performance goes from maybe 100ms to 140-150ms. bq. EG what I'd love to see is, as a function of reopen rate, the "curve" of QPS vs docs per sec. Ie, if you reopen 1X per second, that consumes some of your machine's resources. What's left can be spent indexing or searching or both, so, it's a curve/line. So we should set up fixed rate indexing, and then redline the QPS to see what's possible, and do this for multiple indexing rates, and for multiple reopen rates. Yes, that curve would be a very useful benchmark. Now that I think of it, it wouldn't be too hard to just sneak some reader caching into the ZoieSystem with a tunable parameter for how long you hang onto it, so that we could see how much that can help. One of the nice things that we can do in Zoie by using this kind of index-latency backoff, is that because we have an in-memory two-way mapping of zoie-specific UID to docId, if we actually have time (in the background, since we're caching these readers now) to zip through and update the real delete BitVectors on the segments, and lose the extra check at query time, only using that if you have the index-latency time set below some threshold (determined by how long it takes the system to do this resolution - mapping docId to UID is an array lookup, the reverse is a little slower). bq. Right, Zoie is making determined tradeoffs. I would expect that most apps are fine with controlled reopen frequency, ie, they would choose to not lose indexing and searching performance if it means they can "only" reopen, eg, 2X per second. In theory Zoie is making tradeoffs - in practice, at least against what is on trunk, Zoie's just going way faster in both indexing and querying in the redline perf test. I agree that in principle, once LUCENE-1313 and other improvements and bugs have been worked out of NRT, that query performance should be faster, and if zoie's default BalancedMergePolicy (nee ZoieMergePolicy) is in use for NRT, the indexing performance should be faster too - it's just not quite there yet at this point. bq. I agree - having such well defined API semantics ("once updateDoc returns, searches can see it") is wonderful. But I think they can be cleanly built on top of Lucene NRT as it is today, with a pre-determined (reopen) latency. Of course! These api semantics are already built up on top of plain-old Lucene - even without NRT, so I can't imagine how NRT would *remove* this ability! :) bq. I think the "large merge just finished" case is the most costly for such apps (which the "merged segment warmer" on IW should take care of)? (Because otherwise the segments are tiny, assuming everything is cutover to "per segment"). Definitely. One thing that Zoie benefited from, from an API standpoint, which might be nice in Lucene, now that 1.5 is in place, is that the IndexReaderWarmer could replace the raw SegmentReader with a warmed user-specified subclass of SegmentReader: {code} public abstract class IndexReaderWarmer { public abstract T warm(IndexReader rawReader); } {code} Which could replace the reader in the readerPool with the possibly-user-overridden subclass of SegmentReader (now that SegmentReader is as public as IndexReader itself is) which has now been warmed. For users who like to decorate their readers to keep additional state, instead of use them as keys into WeakHashMaps kept separate, this could be extremely useful (I know that the people I talked to at Apple's iTunes store do this, as well as in bobo, and zoie, to name a few examples off the top of my head). bq. I think Lucene could handle this well, if we made an IndexReader impl that directly searches DocumentWriter's RAM buffer. But that's somewhat challenging Jason mentioned this approach in his talk at ApacheCon, but I'm not at all convinced it's necessary - if a single box can handle indexing a couple hundred smallish documents a second (into a RAMDirectory), and could be sped up by using multiple IndexWriters (writing into multiple RAMDirecotries in parallel, if you were willing to give up some CPU cores to indexing), and you can search against them without having to do any deduplification / bloomfilter check against th
[jira] Commented: (LUCENE-1526) For near real-time search, use paged copy-on-write BitVector impl
[ https://issues.apache.org/jira/browse/LUCENE-1526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12776568#action_12776568 ] Jake Mannix commented on LUCENE-1526: - bq. These sound serious - if you can provide any details, that'd help. I'll do some stress testing too. Thanks for testing and reporting Out of these, the specific issue of incorrectness of applied deletes is easiest to see - we saw it by indexing up to a million docs, then keep adding docs but only after doing a delete on the UID where UID instead of increasing, is looped around mod 1million. Calling numDocs (not maxDoc) on the reader with Zoie always returns 1M after looping around, but with NRT, it starts slowly growing above 1M. The CPU and memory is undoubtedly due to the constant reopening of these readers, and yes, could be aleiviated by not doing this - we're just comparing to the zoie case, where we *do* reopen (the RAMDir) on every request (and copy the delSet) if there have been modifications since the last update. bq. Lucene NRT makes a clone of the BitVector for every reader that has new deletions. Once this is done, searching is "normal" - it's as if the reader were a disk reader. There's no extra checking of deleted docs (unlike Zoie), no OR'ing of 2 BitVectors, etc. Ok, so if this is copy-on-write, it's done every time there is a new delete for that segment? If the disk index is optimized that means it would happen on every update, a clone of the full numDocs sized BitVector? I'm still a little unsure of how this happens. * somebody calls getReader() - they've got all the SegmentReaders for the disk segments, and each of them have BitVectors for deletions. * IW.update() gets called - the BitVector for the segment which now has a deletion is cloned, and set on a new pooled SegmentReader as its deletedSet * maybe IW.update() gets called a bunch more - do these modify the pooled but as-yet-unused SegmentReader? New readers in the pool? What? * another call to getReader() comes in, and gets an IndexReader wrapping the pooled SegmentReaders. bq. Yes, this makes Lucene's reopen more costly. But, then there's no double checking for deletions. That's the tradeoff, and this is why the 64 msec is added to Zoie's search time. Zoie's searches are slower. So we re-ran some of our tests last night, commenting out our deleted check to measure it's cost in the most extreme case possible: a dead easy query (in that it's only one term), but one which yes, hits the entire index (doing a MatchAllDocs query is actually special-cased in our code, and is perfectly fast, so not a good worst case to check), and as the index grows up above a million documents, zoie could shave somewhere from 22-28% of its time off by not doing the extra check. We haven't re-run the test to see what happens as the index grows to 5M or 10M yet, but I can probably run that later today. bq. The fact that Zoie on the pure indexing case (ie no deletions) was 10X faster than Lucene is very weird - that means something else is up, besides how deletions are carried in RAM. It's entirely possible it's the fact that Lucene doesn't flush the tiny segments to a RAMDir (which LUCENE-1313 addresses). Yeah, if you call getReader() a bunch of times per second, each one does a flush(true,true,true), right? Without having LUCENE-1313, this kills the indexing performance if querying is going on. If no getReader() is being called at all, Zoie is about 10% slower than pure Lucene IndexWriter.add() (that's the cost of doing it in two steps - index into *two* RAMDirs [so they are hot-swappable] and then writing segments to disk with addIndexesNoOptimize() periodically). I don't think there's any difference in the MergePolicy - I think they're both using the BalancedMergePolicy (since that's the one which is optimized for the realtime case). bq. Actually I thought of a simple way to run the "search only" (not reopen) test - I'll just augment TopScoreDocCollector to optionally check the IntSetAccelerator, and measure the cost in practice, for different numbers of docs added to the IntSet. Due to the bloomfilter living on top of the hashSet, at least at the scales we're dealing with, we didn't see any change in cost due to the number of deletions (zoie by default keeps no more than 10k modifications in memory before flushing to disk, so the biggest the delSet is going to be is that, and we don't see the more-than-constant scaling yet at that size). bq. But your test is missing a dimension: frequency of reopen. If you reopen once per second, how do Zoie/Lucene compare? Twice per second? Once every 5 seconds? Etc. Yep, this is true. It's a little more invasive to put this into Zoie, because the reopen time is so fast that there's no pooling, so it would need to be kinda hacked in, or tacked on to the outside. Not rocket science, but not
[jira] Commented: (LUCENE-1526) For near real-time search, use paged copy-on-write BitVector impl
[ https://issues.apache.org/jira/browse/LUCENE-1526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12776329#action_12776329 ] Jake Mannix commented on LUCENE-1526: - bq. Whoa, pretty insane volume. Aiming for maxing out indexing speed and query throughput at the same time is what we're testing here, and this is a reasonable extreme limit to aim for when stress-testing real-time search. bq. A handful by pooling the BitVector fixed size bytes arrays (see LUCENE-1574). Pooling, you say? But what if updates come in too fast to reuse your pool? If you're indexing at the speeds I'm describing, won't you run out of BitVectors in the pool? bq. I really need a solution that will absolutely not affect query performance from what is today "You" really need this? Why is the core case for real-time search a scenario where taking a hit of a huge reduction in throughput worth a possible gain in query latency? If the cost was 20% query latency drop in exchange for 7x throughput cost when doing heavy indexing, is that worth it? What about 10% latency cost vs 2x throughput loss? These questions aren't easily answered by saying real-time search with Lucene needs to _absolutely not affect query performance from what it is today_. These kinds of absolute statements should be backed up by comparisons with real performance and load testing. There are many axes of performance to optimize for: * query latency * query throughput * indexing throughput * index freshness (how fast before documents are visible) Saying that one of these is absolutely of more importance than the others without real metrics showing which ones are affected in which ways by different implementation choices is doing a disservice to the community, and is not by any means "conservative". > For near real-time search, use paged copy-on-write BitVector impl > - > > Key: LUCENE-1526 > URL: https://issues.apache.org/jira/browse/LUCENE-1526 > Project: Lucene - Java > Issue Type: Improvement > Components: Index >Affects Versions: 2.4 >Reporter: Jason Rutherglen >Priority: Minor > Attachments: LUCENE-1526.patch > > Original Estimate: 168h > Remaining Estimate: 168h > > SegmentReader currently uses a BitVector to represent deleted docs. > When performing rapid clone (see LUCENE-1314) and delete operations, > performing a copy on write of the BitVector can become costly because > the entire underlying byte array must be created and copied. A way to > make this clone delete process faster is to implement tombstones, a > term coined by Marvin Humphrey. Tombstones represent new deletions > plus the incremental deletions from previously reopened readers in > the current reader. > The proposed implementation of tombstones is to accumulate deletions > into an int array represented as a DocIdSet. With LUCENE-1476, > SegmentTermDocs iterates over deleted docs using a DocIdSet rather > than accessing the BitVector by calling get. This allows a BitVector > and a set of tombstones to by ANDed together as the current reader's > delete docs. > A tombstone merge policy needs to be defined to determine when to > merge tombstone DocIdSets into a new deleted docs BitVector as too > many tombstones would eventually be detrimental to performance. A > probable implementation will merge tombstones based on the number of > tombstones and the total number of documents in the tombstones. The > merge policy may be set in the clone/reopen methods or on the > IndexReader. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1526) For near real-time search, use paged copy-on-write BitVector impl
[ https://issues.apache.org/jira/browse/LUCENE-1526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12776294#action_12776294 ] Jake Mannix commented on LUCENE-1526: - bq. Zoie must do the IntSet check plus the BitVector check (done by Lucene), right? Yes, so how does Lucene NRT deal with new deletes? The disk-backed IndexReader still does its internal check for deletions, right? I haven't played with the latest patches on LUCENE-1313, so I'm not sure what has changed, but if you're leaving the disk index alone (to preserve point-in-time status of the index without writing to disk all the time), you've got your in-memory BitVector of newly uncommitted deletes, and then the SegmentReaders from the disk have their own internal deletedDocs BitVector. Are these two OR'ed with each other somewhere? What is done in NRT to minimize the time of checking both of these without modifying the read-only SegmentReader? In the current 2.9.0 code, the segment is reloaded completely on getReader() if there are new add/deletes, right? bq. Ie comparing IntSet lookup vs BitVector lookup isn't the comparison you want to do. You should compare the IntSet lookup (Zoie's added cost) to 0. If you've got that technique for resolving new deletes against the disk-based ones while maintaining point-in-time nature and can completely amortize the reopen cost so that it doesn't affect performance, then yeah, that would be the comparison. I'm not sure I understand how the NRT implementation is doing this currently - I tried to step through the debugger while running the TestIndexWriterReader test, but I'm still not quite sure what is going on during the reopen. bq. So, for a query that hits 5M docs, Zoie will take 64 msec longer than Lucene, due to the extra check. What I'd like to know is what pctg. slowdown that works out to be, eg for a simple TermQuery that hits those 5M results - that's Zoie's worst case search slowdown. Yes, this is a good check to see, for while it is still a micro-benchmark, really, since it would be done in isolation, while no other production tasks are going on, like rapid indexing and the consequent flushes to disk and reader reopening is going on, but it would be useful to see. What would be even better, however, would be to have a running system whereby there is continual updating of the index, and many concurrent requests are coming in which hit all 5M documents, and measure the mean latency for zoie in this case, in both comparison to NRT, and in comparison to lucene when you *don't* reopen the index (ie. you do things the pre-lucene2.9 way, where the CPU is still being consumed by indexing, but the reader is out of date until the next time it's scheduled by the application to reopen). This would measure the effective latency and throughtput costs of zoie and NRT vs non-NRT lucene. I'm not really sure it's terribly helpful to see "what is zoie's latency when you're not indexing at all" - why on earth would you use either NRT or zoie if you're not doing lots of indexing? > For near real-time search, use paged copy-on-write BitVector impl > - > > Key: LUCENE-1526 > URL: https://issues.apache.org/jira/browse/LUCENE-1526 > Project: Lucene - Java > Issue Type: Improvement > Components: Index >Affects Versions: 2.4 >Reporter: Jason Rutherglen >Priority: Minor > Attachments: LUCENE-1526.patch > > Original Estimate: 168h > Remaining Estimate: 168h > > SegmentReader currently uses a BitVector to represent deleted docs. > When performing rapid clone (see LUCENE-1314) and delete operations, > performing a copy on write of the BitVector can become costly because > the entire underlying byte array must be created and copied. A way to > make this clone delete process faster is to implement tombstones, a > term coined by Marvin Humphrey. Tombstones represent new deletions > plus the incremental deletions from previously reopened readers in > the current reader. > The proposed implementation of tombstones is to accumulate deletions > into an int array represented as a DocIdSet. With LUCENE-1476, > SegmentTermDocs iterates over deleted docs using a DocIdSet rather > than accessing the BitVector by calling get. This allows a BitVector > and a set of tombstones to by ANDed together as the current reader's > delete docs. > A tombstone merge policy needs to be defined to determine when to > merge tombstone DocIdSets into a new deleted docs BitVector as too > many tombstones would eventually be detrimental to performance. A > probable implementation will merge tombstones based on the number of > tombstones and the total number of documents in the tombstones. The > merge policy may be set in the clone/reopen methods or on the > IndexReade
[jira] Commented: (LUCENE-1526) For near real-time search, use paged copy-on-write BitVector impl
[ https://issues.apache.org/jira/browse/LUCENE-1526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12776277#action_12776277 ] Jake Mannix commented on LUCENE-1526: - But Jason, if you're indexing 300 documents a second (possibly all of which are delete+re-add), and querying at a thousand queries a second, how many of these BitVectors are you going to end up making? > For near real-time search, use paged copy-on-write BitVector impl > - > > Key: LUCENE-1526 > URL: https://issues.apache.org/jira/browse/LUCENE-1526 > Project: Lucene - Java > Issue Type: Improvement > Components: Index >Affects Versions: 2.4 >Reporter: Jason Rutherglen >Priority: Minor > Attachments: LUCENE-1526.patch > > Original Estimate: 168h > Remaining Estimate: 168h > > SegmentReader currently uses a BitVector to represent deleted docs. > When performing rapid clone (see LUCENE-1314) and delete operations, > performing a copy on write of the BitVector can become costly because > the entire underlying byte array must be created and copied. A way to > make this clone delete process faster is to implement tombstones, a > term coined by Marvin Humphrey. Tombstones represent new deletions > plus the incremental deletions from previously reopened readers in > the current reader. > The proposed implementation of tombstones is to accumulate deletions > into an int array represented as a DocIdSet. With LUCENE-1476, > SegmentTermDocs iterates over deleted docs using a DocIdSet rather > than accessing the BitVector by calling get. This allows a BitVector > and a set of tombstones to by ANDed together as the current reader's > delete docs. > A tombstone merge policy needs to be defined to determine when to > merge tombstone DocIdSets into a new deleted docs BitVector as too > many tombstones would eventually be detrimental to performance. A > probable implementation will merge tombstones based on the number of > tombstones and the total number of documents in the tombstones. The > merge policy may be set in the clone/reopen methods or on the > IndexReader. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1526) For near real-time search, use paged copy-on-write BitVector impl
[ https://issues.apache.org/jira/browse/LUCENE-1526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12775151#action_12775151 ] Jake Mannix commented on LUCENE-1526: - bq. But how many msec does this clone add in practice? Note that it's only done if there is a new deletion against that segment. I do agree it's silly wasteful, but searching should then be faster than using AccelerateIntSet or MultiBitSet. It's a tradeoff of the turnaround time for search perf. I actually don't know for sure if this is the majority of the time, as I haven't actually run both the AcceleratedIntSet or 2.9 NRT through a profiler, but if you're indexing at high speed (which is what is done in our load/perf tests), you're going to be cloning these things hundreds of times per second (look at the indexing throughput we're forcing the system to go through), and even if it's fast, that's costly. bq. I'd love to see how the worst-case queries (matching millions of hits) perform with each of these three options. It's pretty easy to change the index and query files in our test to do that, that's a good idea. You can feel free to check out our load testing framework too - it will let you monkey with various parameters, monitor the whole thing via JMX, and so forth, both for the full zoie-based stuff, and where the zoie api is wrapped purely around Lucene 2.9 NRT. The instructions for how to set it up are on the zoie wiki. bq. When a doc needs to be updated, you index it immediately into the RAMDir, and reopen the RAMDir's IndexReader. You add it's UID to the AcceleratedIntSet, and all searches "and NOT"'d against that set. You don't tell Lucene to delete the old doc, yet. Yep, basically. The IntSetAccellerator (of UIDs) is set on the (long lived) IndexReader for the disk index - this is why it's done as a ThreadLocal - everybody is sharing that IndexReader, but different threads have different point-in-time views of how much of it has been deleted. bq. These are great results! If I'm reading them right, it looks like generally you get faster query throughput, and roughly equal indexing throughput, on upgrading from 2.4 to 2.9? That's about right. Of course, the comparison between zoie with either 2.4 or 2.9 against lucene 2.9 NRT is an important one to look at: zoie is pushing about 7-9x better throughput for both queries and indexing than NRT. I'm sure the performance numbers would change if we allowed not realtimeness, yes, that's one of the many dimensions to consider in this (along with percentage of indexing events which are deletes, how many of those are from really old segments vs. newer ones, how big the queries are, etc...). bq. One optimization you could make with Zoie is, if a real-time deletion (from the AcceleratedIntSet) is in fact hit, it could mark the corresponding docID, to make subsequent searches a bit faster (and save the bg CPU when flushing the deletes to Lucene). That sound interesting - how would that work? We don't really touch the disk indexReader, other than to set this modSet on it in the ThreadLocal, where would this mark live? > For near real-time search, use paged copy-on-write BitVector impl > - > > Key: LUCENE-1526 > URL: https://issues.apache.org/jira/browse/LUCENE-1526 > Project: Lucene - Java > Issue Type: Improvement > Components: Index >Affects Versions: 2.4 >Reporter: Jason Rutherglen >Priority: Minor > Attachments: LUCENE-1526.patch > > Original Estimate: 168h > Remaining Estimate: 168h > > SegmentReader currently uses a BitVector to represent deleted docs. > When performing rapid clone (see LUCENE-1314) and delete operations, > performing a copy on write of the BitVector can become costly because > the entire underlying byte array must be created and copied. A way to > make this clone delete process faster is to implement tombstones, a > term coined by Marvin Humphrey. Tombstones represent new deletions > plus the incremental deletions from previously reopened readers in > the current reader. > The proposed implementation of tombstones is to accumulate deletions > into an int array represented as a DocIdSet. With LUCENE-1476, > SegmentTermDocs iterates over deleted docs using a DocIdSet rather > than accessing the BitVector by calling get. This allows a BitVector > and a set of tombstones to by ANDed together as the current reader's > delete docs. > A tombstone merge policy needs to be defined to determine when to > merge tombstone DocIdSets into a new deleted docs BitVector as too > many tombstones would eventually be detrimental to performance. A > probable implementation will merge tombstones based on the number of > tombstones and the total number of documents in the tombstones. The > merge policy may be
[jira] Commented: (LUCENE-1526) For near real-time search, use paged copy-on-write BitVector impl
[ https://issues.apache.org/jira/browse/LUCENE-1526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12775082#action_12775082 ] Jake Mannix commented on LUCENE-1526: - I'll try to get those numbers for you, they should be in our logs, and if not, it's easy enough to put them.My guess it that if zoie is doing 7x the QPS, the latency is significantly *less* than the NRT latency, not more, but I could be wrong. Note that without concurrent indexing, the query speed will be the same, as all docs will be flushed to disk and the isDeleted check reduces to exactly the raw lucene case. > For near real-time search, use paged copy-on-write BitVector impl > - > > Key: LUCENE-1526 > URL: https://issues.apache.org/jira/browse/LUCENE-1526 > Project: Lucene - Java > Issue Type: Improvement > Components: Index >Affects Versions: 2.4 >Reporter: Jason Rutherglen >Priority: Minor > Attachments: LUCENE-1526.patch > > Original Estimate: 168h > Remaining Estimate: 168h > > SegmentReader currently uses a BitVector to represent deleted docs. > When performing rapid clone (see LUCENE-1314) and delete operations, > performing a copy on write of the BitVector can become costly because > the entire underlying byte array must be created and copied. A way to > make this clone delete process faster is to implement tombstones, a > term coined by Marvin Humphrey. Tombstones represent new deletions > plus the incremental deletions from previously reopened readers in > the current reader. > The proposed implementation of tombstones is to accumulate deletions > into an int array represented as a DocIdSet. With LUCENE-1476, > SegmentTermDocs iterates over deleted docs using a DocIdSet rather > than accessing the BitVector by calling get. This allows a BitVector > and a set of tombstones to by ANDed together as the current reader's > delete docs. > A tombstone merge policy needs to be defined to determine when to > merge tombstone DocIdSets into a new deleted docs BitVector as too > many tombstones would eventually be detrimental to performance. A > probable implementation will merge tombstones based on the number of > tombstones and the total number of documents in the tombstones. The > merge policy may be set in the clone/reopen methods or on the > IndexReader. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1526) For near real-time search, use paged copy-on-write BitVector impl
[ https://issues.apache.org/jira/browse/LUCENE-1526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12774693#action_12774693 ] Jake Mannix commented on LUCENE-1526: - bq. But, I agree it's wasteful of space when deletes are so sparse... though it is fast. It's fast for random access, but it's really slow if you need to make a lot of these (either during heavy indexing if copy-on-write, or during heavy query load if copy-on-reopen). bq. So are you using this, only, as your deleted docs? Ie you don't store the deletions with Lucene? I'm getting confused if this is only for the NRT case, or, in general. These are only to augment the deleted docs *of the disk reader* - the disk reader isn't reopened at all except infrequently - once a batch (a big enough RAMDirectory is filled, or enough time goes by, depending on configuration) is ready to be flushed to disk, diskReader.addIndexes is called and when the diskReader is reopened, the deletes live in the normal diskReader's delete set. Before this time is ready, when there is a batch in ram that hasn't been flushed, the IntSetAccelerator is applied to the not-reopened diskReader. It's a copy-on-read ThreadLocal. So I'm not sure if that described it correctly: only the deletes which should have been applied to the diskReader are treated separately - those are basically batched: for T amount of time or D amount of docs (configurable) whichever comes first, they are applied to the diskReader, which knows about Lucene's regular deletions and now these new ones as well. Once the memory is flushed to disk, the in-memory delSet is emptied, and applied to the diskReader using regular apis before reopening. bq. OK, I think I'm catching up here... so you only open a new reader at the batch boundary right? Ie, a batch update (all its adds & deletes) is atomic from the readers standpoint? Yes - disk reader, you mean, right? This is only reopened at batch boundary. bq. OK so a batch is quickly reopened, using bloom filter + int set for fast "contains" check for the deletions that occurred during that batch (and, custom TermDocs that does the "and not deleted"). This gets you your fast turnaround and decent search performance. The reopening isn't that quick, but it's in the background, or are you talking about the RAMDirectory? Yeah, that is reopened per query (if necessary - if there are no changes, of course no reopen), but it is kept very small (10k docs or less, for example). It's actually pretty fantastic performance - check out the zoie perf pages: http://code.google.com/p/zoie/wiki/Performance_Comparisons_for_ZoieLucene24ZoieLucene29LuceneNRT > For near real-time search, use paged copy-on-write BitVector impl > - > > Key: LUCENE-1526 > URL: https://issues.apache.org/jira/browse/LUCENE-1526 > Project: Lucene - Java > Issue Type: Improvement > Components: Index >Affects Versions: 2.4 >Reporter: Jason Rutherglen >Priority: Minor > Attachments: LUCENE-1526.patch > > Original Estimate: 168h > Remaining Estimate: 168h > > SegmentReader currently uses a BitVector to represent deleted docs. > When performing rapid clone (see LUCENE-1314) and delete operations, > performing a copy on write of the BitVector can become costly because > the entire underlying byte array must be created and copied. A way to > make this clone delete process faster is to implement tombstones, a > term coined by Marvin Humphrey. Tombstones represent new deletions > plus the incremental deletions from previously reopened readers in > the current reader. > The proposed implementation of tombstones is to accumulate deletions > into an int array represented as a DocIdSet. With LUCENE-1476, > SegmentTermDocs iterates over deleted docs using a DocIdSet rather > than accessing the BitVector by calling get. This allows a BitVector > and a set of tombstones to by ANDed together as the current reader's > delete docs. > A tombstone merge policy needs to be defined to determine when to > merge tombstone DocIdSets into a new deleted docs BitVector as too > many tombstones would eventually be detrimental to performance. A > probable implementation will merge tombstones based on the number of > tombstones and the total number of documents in the tombstones. The > merge policy may be set in the clone/reopen methods or on the > IndexReader. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1526) For near real-time search, use paged copy-on-write BitVector impl
[ https://issues.apache.org/jira/browse/LUCENE-1526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12774628#action_12774628 ] Jake Mannix commented on LUCENE-1526: - bq. Another approach we might take here is to track new deletions using a sorted tree. New deletions insert in O(log(N)) time, and then we can efficiently expose a DocIdSetIterator from that. We did this in Zoie for a while, and it turned out to be a bottleneck - not as much of a bottleneck as continually cloning a bitvector (that was even worse), but still not good. We currently use a bloomfilter on top of an openintset, which performs pretty fantastically: constant-time adds and even-faster constant-time contains() checks, with small size (necessary for the new Reader per query scenario since this requires lots of deep-cloning of this structure). Just a note from our experience over in zoie-land. It also helped to not produce a docIdset iterator using these bits, but instead override TermDocs to be returned on the disk reader, and keep track of it directly there. > For near real-time search, use paged copy-on-write BitVector impl > - > > Key: LUCENE-1526 > URL: https://issues.apache.org/jira/browse/LUCENE-1526 > Project: Lucene - Java > Issue Type: Improvement > Components: Index >Affects Versions: 2.4 >Reporter: Jason Rutherglen >Priority: Minor > Attachments: LUCENE-1526.patch > > Original Estimate: 168h > Remaining Estimate: 168h > > SegmentReader currently uses a BitVector to represent deleted docs. > When performing rapid clone (see LUCENE-1314) and delete operations, > performing a copy on write of the BitVector can become costly because > the entire underlying byte array must be created and copied. A way to > make this clone delete process faster is to implement tombstones, a > term coined by Marvin Humphrey. Tombstones represent new deletions > plus the incremental deletions from previously reopened readers in > the current reader. > The proposed implementation of tombstones is to accumulate deletions > into an int array represented as a DocIdSet. With LUCENE-1476, > SegmentTermDocs iterates over deleted docs using a DocIdSet rather > than accessing the BitVector by calling get. This allows a BitVector > and a set of tombstones to by ANDed together as the current reader's > delete docs. > A tombstone merge policy needs to be defined to determine when to > merge tombstone DocIdSets into a new deleted docs BitVector as too > many tombstones would eventually be detrimental to performance. A > probable implementation will merge tombstones based on the number of > tombstones and the total number of documents in the tombstones. The > merge policy may be set in the clone/reopen methods or on the > IndexReader. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1997) Explore performance of multi-PQ vs single-PQ sorting API
[ https://issues.apache.org/jira/browse/LUCENE-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12773114#action_12773114 ] Jake Mannix commented on LUCENE-1997: - bq. Since each approach has distinct advantages, why not offer both ("simple" and "expert") comparator extensions APIs? +1 from me on this one, as long as the simpler one is around. I'll bet we'll find that we regret keeping the "expert" one by 3.2 or so though, but I'll take any compromise which gets the simpler API in there. bq. Don't forget that this is multiplied by however many queries are currently in flight. Sure, so if you're running with 100 queries per second on a single shard (pretty fast!), with 100 segments, and you want to do sorting by value on the top 1000 values (how far down the long tail of extreme cases are we at now? Do librarians hit their search servers with 100 QPS and have indices poorly built with hundreds of segments and can't take downtime to *ever* optimize?), we're now talking about 40MB. *Forty megabytes*. On a beefy machine which is supposed to be handling 100QPS across an index big enough to need 100 segments. How much heap would such a machine already be allocating? 4GB? 6? More? We're talking about less than 1% of the heap is being used by the multiPQ approach in comparison to singlePQ. > Explore performance of multi-PQ vs single-PQ sorting API > > > Key: LUCENE-1997 > URL: https://issues.apache.org/jira/browse/LUCENE-1997 > Project: Lucene - Java > Issue Type: Improvement > Components: Search >Affects Versions: 2.9 >Reporter: Michael McCandless >Assignee: Michael McCandless > Attachments: LUCENE-1997.patch, LUCENE-1997.patch, LUCENE-1997.patch, > LUCENE-1997.patch, LUCENE-1997.patch, LUCENE-1997.patch, LUCENE-1997.patch, > LUCENE-1997.patch, LUCENE-1997.patch > > > Spinoff from recent "lucene 2.9 sorting algorithm" thread on java-dev, > where a simpler (non-segment-based) comparator API is proposed that > gathers results into multiple PQs (one per segment) and then merges > them in the end. > I started from John's multi-PQ code and worked it into > contrib/benchmark so that we could run perf tests. Then I generified > the Python script I use for running search benchmarks (in > contrib/benchmark/sortBench.py). > The script first creates indexes with 1M docs (based on > SortableSingleDocSource, and based on wikipedia, if available). Then > it runs various combinations: > * Index with 20 balanced segments vs index with the "normal" log > segment size > * Queries with different numbers of hits (only for wikipedia index) > * Different top N > * Different sorts (by title, for wikipedia, and by random string, > random int, and country for the random index) > For each test, 7 search rounds are run and the best QPS is kept. The > script runs singlePQ then multiPQ, and records the resulting best QPS > for each and produces table (in Jira format) as output. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1997) Explore performance of multi-PQ vs single-PQ sorting API
[ https://issues.apache.org/jira/browse/LUCENE-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12772741#action_12772741 ] Jake Mannix commented on LUCENE-1997: - The current concern is to do with the memory? I'm more concerned with the weird "java ghosts" that are flying around, sometimes swaying results by 20-40%... the memory could only be an issue on a setup with hundreds of segments and sorting the top 1000 values (do we really try to optimize for this performance case?). In the normal case (no more than tens of segments, and the top 10 or 100 hits), we're talking about what, 100-1000 PQ entries? > Explore performance of multi-PQ vs single-PQ sorting API > > > Key: LUCENE-1997 > URL: https://issues.apache.org/jira/browse/LUCENE-1997 > Project: Lucene - Java > Issue Type: Improvement > Components: Search >Affects Versions: 2.9 >Reporter: Michael McCandless >Assignee: Michael McCandless > Attachments: LUCENE-1997.patch, LUCENE-1997.patch, LUCENE-1997.patch, > LUCENE-1997.patch, LUCENE-1997.patch, LUCENE-1997.patch, LUCENE-1997.patch, > LUCENE-1997.patch, LUCENE-1997.patch > > > Spinoff from recent "lucene 2.9 sorting algorithm" thread on java-dev, > where a simpler (non-segment-based) comparator API is proposed that > gathers results into multiple PQs (one per segment) and then merges > them in the end. > I started from John's multi-PQ code and worked it into > contrib/benchmark so that we could run perf tests. Then I generified > the Python script I use for running search benchmarks (in > contrib/benchmark/sortBench.py). > The script first creates indexes with 1M docs (based on > SortableSingleDocSource, and based on wikipedia, if available). Then > it runs various combinations: > * Index with 20 balanced segments vs index with the "normal" log > segment size > * Queries with different numbers of hits (only for wikipedia index) > * Different top N > * Different sorts (by title, for wikipedia, and by random string, > random int, and country for the random index) > For each test, 7 search rounds are run and the best QPS is kept. The > script runs singlePQ then multiPQ, and records the resulting best QPS > for each and produces table (in Jira format) as output. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1997) Explore performance of multi-PQ vs single-PQ sorting API
[ https://issues.apache.org/jira/browse/LUCENE-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12771041#action_12771041 ] Jake Mannix commented on LUCENE-1997: - Can you tell me more about this? What kind of comparator can't pre-create a fixed ordinal list for all the possible values? I'm sure I've seen this too, but I can't bring one to mind right now. > Explore performance of multi-PQ vs single-PQ sorting API > > > Key: LUCENE-1997 > URL: https://issues.apache.org/jira/browse/LUCENE-1997 > Project: Lucene - Java > Issue Type: Improvement > Components: Search >Affects Versions: 2.9 >Reporter: Michael McCandless >Assignee: Michael McCandless > Attachments: LUCENE-1997.patch, LUCENE-1997.patch, LUCENE-1997.patch, > LUCENE-1997.patch, LUCENE-1997.patch, LUCENE-1997.patch, LUCENE-1997.patch, > LUCENE-1997.patch > > > Spinoff from recent "lucene 2.9 sorting algorithm" thread on java-dev, > where a simpler (non-segment-based) comparator API is proposed that > gathers results into multiple PQs (one per segment) and then merges > them in the end. > I started from John's multi-PQ code and worked it into > contrib/benchmark so that we could run perf tests. Then I generified > the Python script I use for running search benchmarks (in > contrib/benchmark/sortBench.py). > The script first creates indexes with 1M docs (based on > SortableSingleDocSource, and based on wikipedia, if available). Then > it runs various combinations: > * Index with 20 balanced segments vs index with the "normal" log > segment size > * Queries with different numbers of hits (only for wikipedia index) > * Different top N > * Different sorts (by title, for wikipedia, and by random string, > random int, and country for the random index) > For each test, 7 search rounds are run and the best QPS is kept. The > script runs singlePQ then multiPQ, and records the resulting best QPS > for each and produces table (in Jira format) as output. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1997) Explore performance of multi-PQ vs single-PQ sorting API
[ https://issues.apache.org/jira/browse/LUCENE-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12771017#action_12771017 ] Jake Mannix commented on LUCENE-1997: - But part of the point is that you don't have to get the values - you can have a fast in-memory structure which just encodes their sort-order, right? This is the whole point of using the ordinal - you pre-sort all of the possible values to get the ordinals, and now arbitrarily complex comparator reduces to int compare at sort time(*). In the custom comparators we use, for example, this allows for even sorting by multivalued fields in a custom way, via the simple compare(int doca, int docb) way. (*) This reminds me: Mike, you switched the compare for ord values from being "return ordA - ordB" to being "return ordA > ordB ? 1 : (ordA == ordB ? 0 : -1)", on the basis of int overflow at some point, right? This is only true if we're really sorting by integers, which could overflow - if they're ordinals, then these are both non-negative numbers, and their difference will always be greater than -MAX_INT, so the branching can be avoided in this innermost comparison in this case. > Explore performance of multi-PQ vs single-PQ sorting API > > > Key: LUCENE-1997 > URL: https://issues.apache.org/jira/browse/LUCENE-1997 > Project: Lucene - Java > Issue Type: Improvement > Components: Search >Affects Versions: 2.9 >Reporter: Michael McCandless >Assignee: Michael McCandless > Attachments: LUCENE-1997.patch, LUCENE-1997.patch, LUCENE-1997.patch, > LUCENE-1997.patch, LUCENE-1997.patch, LUCENE-1997.patch, LUCENE-1997.patch, > LUCENE-1997.patch > > > Spinoff from recent "lucene 2.9 sorting algorithm" thread on java-dev, > where a simpler (non-segment-based) comparator API is proposed that > gathers results into multiple PQs (one per segment) and then merges > them in the end. > I started from John's multi-PQ code and worked it into > contrib/benchmark so that we could run perf tests. Then I generified > the Python script I use for running search benchmarks (in > contrib/benchmark/sortBench.py). > The script first creates indexes with 1M docs (based on > SortableSingleDocSource, and based on wikipedia, if available). Then > it runs various combinations: > * Index with 20 balanced segments vs index with the "normal" log > segment size > * Queries with different numbers of hits (only for wikipedia index) > * Different top N > * Different sorts (by title, for wikipedia, and by random string, > random int, and country for the random index) > For each test, 7 search rounds are run and the best QPS is kept. The > script runs singlePQ then multiPQ, and records the resulting best QPS > for each and produces table (in Jira format) as output. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1997) Explore performance of multi-PQ vs single-PQ sorting API
[ https://issues.apache.org/jira/browse/LUCENE-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12771005#action_12771005 ] Jake Mannix commented on LUCENE-1997: - That should speed things up, but that's way subleading in complexity. This is an additive term O(numSegments * numDesiredResults) total operations when done "slowly" (as opposed to the best merge, which is O(numDesiredResults * log(numSegments)) ), in comparison to the primary subleading piece for multiPQ, which is O(numSegments * numDesiredResults * log(numDesiredResults) * log(numHitsPerSegment) ), so that's taking a piece of the CPU time which is smaller by a factor of 20-100 already than the total PQ insert time, and reducing it by a further factor of maybe 5-10. If it's easy to code up, sure, why not. But it's not really "inner loop" necessary optimizations anymore, I'd argue. > Explore performance of multi-PQ vs single-PQ sorting API > > > Key: LUCENE-1997 > URL: https://issues.apache.org/jira/browse/LUCENE-1997 > Project: Lucene - Java > Issue Type: Improvement > Components: Search >Affects Versions: 2.9 >Reporter: Michael McCandless >Assignee: Michael McCandless > Attachments: LUCENE-1997.patch, LUCENE-1997.patch, LUCENE-1997.patch, > LUCENE-1997.patch, LUCENE-1997.patch, LUCENE-1997.patch, LUCENE-1997.patch, > LUCENE-1997.patch > > > Spinoff from recent "lucene 2.9 sorting algorithm" thread on java-dev, > where a simpler (non-segment-based) comparator API is proposed that > gathers results into multiple PQs (one per segment) and then merges > them in the end. > I started from John's multi-PQ code and worked it into > contrib/benchmark so that we could run perf tests. Then I generified > the Python script I use for running search benchmarks (in > contrib/benchmark/sortBench.py). > The script first creates indexes with 1M docs (based on > SortableSingleDocSource, and based on wikipedia, if available). Then > it runs various combinations: > * Index with 20 balanced segments vs index with the "normal" log > segment size > * Queries with different numbers of hits (only for wikipedia index) > * Different top N > * Different sorts (by title, for wikipedia, and by random string, > random int, and country for the random index) > For each test, 7 search rounds are run and the best QPS is kept. The > script runs singlePQ then multiPQ, and records the resulting best QPS > for each and produces table (in Jira format) as output. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1997) Explore performance of multi-PQ vs single-PQ sorting API
[ https://issues.apache.org/jira/browse/LUCENE-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12770945#action_12770945 ] Jake Mannix commented on LUCENE-1997: - bq. search segment 2 into queue B (with possible short circuiting by the smallest value in queueA) Well, we're not doing the short circuit trick on multiPQ right now, are we? It would certainly speed things up, but requires the API have the convert() method available, which was the big savings on the API side to multiPQ. If it was available, I think multiPQ (either with N or 2 queues) would perform *strictly* better than singlePQ, but I didn't suggest this because it seems to negate the cleanliness of the API. One thing John mentioned offhand is that perhaps the convert() method could be optional? If you don't implement it, you don't get to short-circuit using knowledge of previous segments, but if you do, you get maximum performance in the cases where multiPQ performs worse (mid-range hitCount, high numResultsToReturn, and in the numeric sorting case). I think maybe combining this idea with 2 queues could be the best of all worlds, with best overall speed, only twice the memory of singlePQ, and the simplest API with the addition of one new *optional* method? > Explore performance of multi-PQ vs single-PQ sorting API > > > Key: LUCENE-1997 > URL: https://issues.apache.org/jira/browse/LUCENE-1997 > Project: Lucene - Java > Issue Type: Improvement > Components: Search >Affects Versions: 2.9 >Reporter: Michael McCandless >Assignee: Michael McCandless > Attachments: LUCENE-1997.patch, LUCENE-1997.patch, LUCENE-1997.patch, > LUCENE-1997.patch, LUCENE-1997.patch, LUCENE-1997.patch, LUCENE-1997.patch, > LUCENE-1997.patch > > > Spinoff from recent "lucene 2.9 sorting algorithm" thread on java-dev, > where a simpler (non-segment-based) comparator API is proposed that > gathers results into multiple PQs (one per segment) and then merges > them in the end. > I started from John's multi-PQ code and worked it into > contrib/benchmark so that we could run perf tests. Then I generified > the Python script I use for running search benchmarks (in > contrib/benchmark/sortBench.py). > The script first creates indexes with 1M docs (based on > SortableSingleDocSource, and based on wikipedia, if available). Then > it runs various combinations: > * Index with 20 balanced segments vs index with the "normal" log > segment size > * Queries with different numbers of hits (only for wikipedia index) > * Different top N > * Different sorts (by title, for wikipedia, and by random string, > random int, and country for the random index) > For each test, 7 search rounds are run and the best QPS is kept. The > script runs singlePQ then multiPQ, and records the resulting best QPS > for each and produces table (in Jira format) as output. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1997) Explore performance of multi-PQ vs single-PQ sorting API
[ https://issues.apache.org/jira/browse/LUCENE-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12770709#action_12770709 ] Jake Mannix commented on LUCENE-1997: - Not terribly useful, in that it's MacOS X laptop - 2.4 GHz Intel Core 2 Duo, 4GB 667 MHz DDR2 SDRAM, Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_19-b02-304) Java HotSpot(TM) Client VM (build 1.5.0_19-137, mixed mode, sharing) but it's run with the latest patch. > Explore performance of multi-PQ vs single-PQ sorting API > > > Key: LUCENE-1997 > URL: https://issues.apache.org/jira/browse/LUCENE-1997 > Project: Lucene - Java > Issue Type: Improvement > Components: Search >Affects Versions: 2.9 >Reporter: Michael McCandless >Assignee: Michael McCandless > Attachments: LUCENE-1997.patch, LUCENE-1997.patch, LUCENE-1997.patch, > LUCENE-1997.patch, LUCENE-1997.patch, LUCENE-1997.patch, LUCENE-1997.patch > > > Spinoff from recent "lucene 2.9 sorting algorithm" thread on java-dev, > where a simpler (non-segment-based) comparator API is proposed that > gathers results into multiple PQs (one per segment) and then merges > them in the end. > I started from John's multi-PQ code and worked it into > contrib/benchmark so that we could run perf tests. Then I generified > the Python script I use for running search benchmarks (in > contrib/benchmark/sortBench.py). > The script first creates indexes with 1M docs (based on > SortableSingleDocSource, and based on wikipedia, if available). Then > it runs various combinations: > * Index with 20 balanced segments vs index with the "normal" log > segment size > * Queries with different numbers of hits (only for wikipedia index) > * Different top N > * Different sorts (by title, for wikipedia, and by random string, > random int, and country for the random index) > For each test, 7 search rounds are run and the best QPS is kept. The > script runs singlePQ then multiPQ, and records the resulting best QPS > for each and produces table (in Jira format) as output. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1997) Explore performance of multi-PQ vs single-PQ sorting API
[ https://issues.apache.org/jira/browse/LUCENE-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12770708#action_12770708 ] Jake Mannix commented on LUCENE-1997: - ||Source||Seg size||Query||Tot hits||Sort||Top N||QPS old||QPS new||Pct change|| |random|balanced||500|rand int|10|22.02|22.99|{color:green}4.4%{color}| |random|balanced||500|rand int|25|21.45|22.91|{color:green}6.8%{color}| |random|balanced||500|rand int|50|21.40|22.58|{color:green}5.5%{color}| |random|balanced||500|rand int|100|21.16|21.93|{color:green}3.6%{color}| |random|balanced||500|rand int|500|21.27|17.45|{color:red}-18.0%{color}| |random|balanced||500|rand int|1000|20.62|14.33|{color:red}-30.5%{color}| |random|balanced||500|rand string|10|18.33|21.91|{color:green}19.5%{color}| |random|balanced||500|rand string|25|18.37|22.20|{color:green}20.8%{color}| |random|balanced||500|rand string|50|18.51|22.13|{color:green}19.6%{color}| |random|balanced||500|rand string|100|20.30|21.91|{color:green}7.9%{color}| |random|balanced||500|rand string|500|18.34|18.69|{color:green}1.9%{color}| |random|balanced||500|rand string|1000|17.83|15.04|{color:red}-15.6%{color}| |random|balanced||500|country|10|18.49|22.16|{color:green}19.8%{color}| |random|balanced||500|country|25|18.62|22.28|{color:green}19.7%{color}| |random|balanced||500|country|50|18.30|21.86|{color:green}19.5%{color}| |random|balanced||500|country|100|20.46|22.08|{color:green}7.9%{color}| |random|balanced||500|country|500|18.02|18.38|{color:green}2.0%{color}| |random|balanced||500|country|1000|18.28|14.62|{color:red}-20.0%{color}| |random|log||500|rand int|10|22.04|23.18|{color:green}5.2%{color}| |random|log||500|rand int|25|21.48|23.02|{color:green}7.2%{color}| |random|log||500|rand int|50|21.37|22.96|{color:green}7.4%{color}| |random|log||500|rand int|100|21.47|22.43|{color:green}4.5%{color}| |random|log||500|rand int|500|21.09|19.60|{color:red}-7.1%{color}| |random|log||500|rand int|1000|20.37|17.58|{color:red}-13.7%{color}| |random|log||500|rand string|10|18.54|22.49|{color:green}21.3%{color}| |random|log||500|rand string|25|18.50|22.32|{color:green}20.6%{color}| |random|log||500|rand string|50|18.53|22.24|{color:green}20.0%{color}| |random|log||500|rand string|100|20.46|22.43|{color:green}9.6%{color}| |random|log||500|rand string|500|18.59|20.83|{color:green}12.0%{color}| |random|log||500|rand string|1000|18.27|18.01|{color:red}-1.4%{color}| |random|log||500|country|10|18.55|22.50|{color:green}21.3%{color}| |random|log||500|country|25|18.48|22.35|{color:green}20.9%{color}| |random|log||500|country|50|18.48|22.24|{color:green}20.3%{color}| |random|log||500|country|100|20.73|22.34|{color:green}7.8%{color}| |random|log||500|country|500|18.77|20.12|{color:green}7.2%{color}| |random|log||500|country|1000|17.92|18.44|{color:green}2.9%{color}| > Explore performance of multi-PQ vs single-PQ sorting API > > > Key: LUCENE-1997 > URL: https://issues.apache.org/jira/browse/LUCENE-1997 > Project: Lucene - Java > Issue Type: Improvement > Components: Search >Affects Versions: 2.9 >Reporter: Michael McCandless >Assignee: Michael McCandless > Attachments: LUCENE-1997.patch, LUCENE-1997.patch, LUCENE-1997.patch, > LUCENE-1997.patch, LUCENE-1997.patch, LUCENE-1997.patch, LUCENE-1997.patch > > > Spinoff from recent "lucene 2.9 sorting algorithm" thread on java-dev, > where a simpler (non-segment-based) comparator API is proposed that > gathers results into multiple PQs (one per segment) and then merges > them in the end. > I started from John's multi-PQ code and worked it into > contrib/benchmark so that we could run perf tests. Then I generified > the Python script I use for running search benchmarks (in > contrib/benchmark/sortBench.py). > The script first creates indexes with 1M docs (based on > SortableSingleDocSource, and based on wikipedia, if available). Then > it runs various combinations: > * Index with 20 balanced segments vs index with the "normal" log > segment size > * Queries with different numbers of hits (only for wikipedia index) > * Different top N > * Different sorts (by title, for wikipedia, and by random string, > random int, and country for the random index) > For each test, 7 search rounds are run and the best QPS is kept. The > script runs singlePQ then multiPQ, and records the resulting best QPS > for each and produces table (in Jira format) as output. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-
[jira] Commented: (LUCENE-1997) Explore performance of multi-PQ vs single-PQ sorting API
[ https://issues.apache.org/jira/browse/LUCENE-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12770604#action_12770604 ] Jake Mannix commented on LUCENE-1997: - What happened with these ones, Mike? > Explore performance of multi-PQ vs single-PQ sorting API > > > Key: LUCENE-1997 > URL: https://issues.apache.org/jira/browse/LUCENE-1997 > Project: Lucene - Java > Issue Type: Improvement > Components: Search >Affects Versions: 2.9 >Reporter: Michael McCandless >Assignee: Michael McCandless > Attachments: LUCENE-1997.patch, LUCENE-1997.patch, LUCENE-1997.patch, > LUCENE-1997.patch, LUCENE-1997.patch, LUCENE-1997.patch > > > Spinoff from recent "lucene 2.9 sorting algorithm" thread on java-dev, > where a simpler (non-segment-based) comparator API is proposed that > gathers results into multiple PQs (one per segment) and then merges > them in the end. > I started from John's multi-PQ code and worked it into > contrib/benchmark so that we could run perf tests. Then I generified > the Python script I use for running search benchmarks (in > contrib/benchmark/sortBench.py). > The script first creates indexes with 1M docs (based on > SortableSingleDocSource, and based on wikipedia, if available). Then > it runs various combinations: > * Index with 20 balanced segments vs index with the "normal" log > segment size > * Queries with different numbers of hits (only for wikipedia index) > * Different top N > * Different sorts (by title, for wikipedia, and by random string, > random int, and country for the random index) > For each test, 7 search rounds are run and the best QPS is kept. The > script runs singlePQ then multiPQ, and records the resulting best QPS > for each and produces table (in Jira format) as output. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1997) Explore performance of multi-PQ vs single-PQ sorting API
[ https://issues.apache.org/jira/browse/LUCENE-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12770587#action_12770587 ] Jake Mannix commented on LUCENE-1997: - Excellent, good to see that my big-O analysis is holding up on the 5M doc set: as the sub-leading terms drop off and become negligible, any improvement of singlePQ over multiPQ starts to go away entirely. Still seems to be some statistical fluctuations here and there though (why would 1000 hits ever have *better* perf for multiPQ vs singlePQ compared to 500 hits?), but I guess that's entropy for you... > Explore performance of multi-PQ vs single-PQ sorting API > > > Key: LUCENE-1997 > URL: https://issues.apache.org/jira/browse/LUCENE-1997 > Project: Lucene - Java > Issue Type: Improvement > Components: Search >Affects Versions: 2.9 >Reporter: Michael McCandless >Assignee: Michael McCandless > Attachments: LUCENE-1997.patch, LUCENE-1997.patch, LUCENE-1997.patch, > LUCENE-1997.patch, LUCENE-1997.patch, LUCENE-1997.patch > > > Spinoff from recent "lucene 2.9 sorting algorithm" thread on java-dev, > where a simpler (non-segment-based) comparator API is proposed that > gathers results into multiple PQs (one per segment) and then merges > them in the end. > I started from John's multi-PQ code and worked it into > contrib/benchmark so that we could run perf tests. Then I generified > the Python script I use for running search benchmarks (in > contrib/benchmark/sortBench.py). > The script first creates indexes with 1M docs (based on > SortableSingleDocSource, and based on wikipedia, if available). Then > it runs various combinations: > * Index with 20 balanced segments vs index with the "normal" log > segment size > * Queries with different numbers of hits (only for wikipedia index) > * Different top N > * Different sorts (by title, for wikipedia, and by random string, > random int, and country for the random index) > For each test, 7 search rounds are run and the best QPS is kept. The > script runs singlePQ then multiPQ, and records the resulting best QPS > for each and produces table (in Jira format) as output. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1997) Explore performance of multi-PQ vs single-PQ sorting API
[ https://issues.apache.org/jira/browse/LUCENE-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12769860#action_12769860 ] Jake Mannix commented on LUCENE-1997: - Mark, you say with the previous numbers, you'd say "-1", but if you look at the most common use case (top 10), the simpler API is faster in almost all cases, and in some cases it's 10-20% faster. Top 500, top 1000 are not only just "not as common", they're probably at the 1% level, or less. As far as shifting back, API-wise, that really shouldn't be a factor: 2.9 *just* came out, and what, we stick with a slightly *slower* API (for the most common use case across all Lucene users), which happens to be *more complex*, and more importantly: just very nonstandard - Comparable is very familiar to everyone, even if you have to have two forms, one for primitives, one for Objects - an api which *doesn't* have the whole slew of compare(), compareBottom(), copy(), setBottom(), value() and setNextReader() has a tremendous advantage over one which does. It's "advanced" to implement a custom sort, but it will be *easier* if it's not complex, and then it doesn't *need* to be "advanced" (shouldn't we be striving to make there be less APIs which are listed as "advanced", and instead more features which can *do* complex things but are still listed as things "normal users" can do). I think it's *great* precedent to set with users to say, "oops! we found that this new (just now as of this version) api was unnecessarily clumsy, we're shifting back to a simpler one which is just like the one you used to have". Sticking with a worse api because it performs better in only extreme scenarios because "we already moved on to this new api, shouldn't go back now, don't want to admit we ever made a mistake!" is what is "ugly". The main thing to remember is that the entire thinking around making this different from the old was *only* because it seemed that using a simpler api would perform much worse than this one, and it does not appear that this is the case. If that original reasoning turns out to have been incorrect, then the answer is simple: go with the simpler API *now* before users *do* get used to using the new one. If it turns out I'm wrong, and lots of users sort based on field values for the top 1000 entries often, or that the most recent runs turn out to be flukes and are not typical performance, only then would I'd change my opinion. > Explore performance of multi-PQ vs single-PQ sorting API > > > Key: LUCENE-1997 > URL: https://issues.apache.org/jira/browse/LUCENE-1997 > Project: Lucene - Java > Issue Type: Improvement > Components: Search >Affects Versions: 2.9 >Reporter: Michael McCandless >Assignee: Michael McCandless > Attachments: LUCENE-1997.patch, LUCENE-1997.patch, LUCENE-1997.patch, > LUCENE-1997.patch > > > Spinoff from recent "lucene 2.9 sorting algorithm" thread on java-dev, > where a simpler (non-segment-based) comparator API is proposed that > gathers results into multiple PQs (one per segment) and then merges > them in the end. > I started from John's multi-PQ code and worked it into > contrib/benchmark so that we could run perf tests. Then I generified > the Python script I use for running search benchmarks (in > contrib/benchmark/sortBench.py). > The script first creates indexes with 1M docs (based on > SortableSingleDocSource, and based on wikipedia, if available). Then > it runs various combinations: > * Index with 20 balanced segments vs index with the "normal" log > segment size > * Queries with different numbers of hits (only for wikipedia index) > * Different top N > * Different sorts (by title, for wikipedia, and by random string, > random int, and country for the random index) > For each test, 7 search rounds are run and the best QPS is kept. The > script runs singlePQ then multiPQ, and records the resulting best QPS > for each and produces table (in Jira format) as output. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1997) Explore performance of multi-PQ vs single-PQ sorting API
[ https://issues.apache.org/jira/browse/LUCENE-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12769328#action_12769328 ] Jake Mannix commented on LUCENE-1997: - Mike, thanks for all the hard work on this - it's clearly far more work than anyone has spent yet on just doing the upgrade to the newer api, and that's appreciated. Am I wrong in thinking that these results are pretty ambiguous? How often to people take the top 500 or top 1000 sorted hits? If you don't focus on that case (that of looking for pages *50 through 100* of normal 10-per-page search results), there's a bunch of green, a bunch of red, both techniques are +/- 10-20% of each other? Is that what everyone else sees of Mike's newest numbers here, or am I misreading them? > Explore performance of multi-PQ vs single-PQ sorting API > > > Key: LUCENE-1997 > URL: https://issues.apache.org/jira/browse/LUCENE-1997 > Project: Lucene - Java > Issue Type: Improvement > Components: Search >Affects Versions: 2.9 >Reporter: Michael McCandless >Assignee: Michael McCandless > Attachments: LUCENE-1997.patch, LUCENE-1997.patch, LUCENE-1997.patch, > LUCENE-1997.patch > > > Spinoff from recent "lucene 2.9 sorting algorithm" thread on java-dev, > where a simpler (non-segment-based) comparator API is proposed that > gathers results into multiple PQs (one per segment) and then merges > them in the end. > I started from John's multi-PQ code and worked it into > contrib/benchmark so that we could run perf tests. Then I generified > the Python script I use for running search benchmarks (in > contrib/benchmark/sortBench.py). > The script first creates indexes with 1M docs (based on > SortableSingleDocSource, and based on wikipedia, if available). Then > it runs various combinations: > * Index with 20 balanced segments vs index with the "normal" log > segment size > * Queries with different numbers of hits (only for wikipedia index) > * Different top N > * Different sorts (by title, for wikipedia, and by random string, > random int, and country for the random index) > For each test, 7 search rounds are run and the best QPS is kept. The > script runs singlePQ then multiPQ, and records the resulting best QPS > for each and produces table (in Jira format) as output. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1997) Explore performance of multi-PQ vs single-PQ sorting API
[ https://issues.apache.org/jira/browse/LUCENE-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12769058#action_12769058 ] Jake Mannix commented on LUCENE-1997: - I would say that of course weighting more highly linux and solaris should be done over results on macs, because while I love my mac, I've yet to see a production cluster running on MacBook Pros... :) > Explore performance of multi-PQ vs single-PQ sorting API > > > Key: LUCENE-1997 > URL: https://issues.apache.org/jira/browse/LUCENE-1997 > Project: Lucene - Java > Issue Type: Improvement > Components: Search >Affects Versions: 2.9 >Reporter: Michael McCandless >Assignee: Michael McCandless > Attachments: LUCENE-1997.patch, LUCENE-1997.patch > > > Spinoff from recent "lucene 2.9 sorting algorithm" thread on java-dev, > where a simpler (non-segment-based) comparator API is proposed that > gathers results into multiple PQs (one per segment) and then merges > them in the end. > I started from John's multi-PQ code and worked it into > contrib/benchmark so that we could run perf tests. Then I generified > the Python script I use for running search benchmarks (in > contrib/benchmark/sortBench.py). > The script first creates indexes with 1M docs (based on > SortableSingleDocSource, and based on wikipedia, if available). Then > it runs various combinations: > * Index with 20 balanced segments vs index with the "normal" log > segment size > * Queries with different numbers of hits (only for wikipedia index) > * Different top N > * Different sorts (by title, for wikipedia, and by random string, > random int, and country for the random index) > For each test, 7 search rounds are run and the best QPS is kept. The > script runs singlePQ then multiPQ, and records the resulting best QPS > for each and produces table (in Jira format) as output. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1997) Explore performance of multi-PQ vs single-PQ sorting API
[ https://issues.apache.org/jira/browse/LUCENE-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12769056#action_12769056 ] Jake Mannix commented on LUCENE-1997: - Java6 is standard in production servers, since when? What justified lucene staying java1.4 for so long if this is the case? In my own experience, my last job only moved to java1.5 a year ago, and at my current company, we're still on 1.5, and I've seen that be pretty common, and I'm in the Valley, where things update pretty quickly. > Explore performance of multi-PQ vs single-PQ sorting API > > > Key: LUCENE-1997 > URL: https://issues.apache.org/jira/browse/LUCENE-1997 > Project: Lucene - Java > Issue Type: Improvement > Components: Search >Affects Versions: 2.9 >Reporter: Michael McCandless >Assignee: Michael McCandless > Attachments: LUCENE-1997.patch, LUCENE-1997.patch > > > Spinoff from recent "lucene 2.9 sorting algorithm" thread on java-dev, > where a simpler (non-segment-based) comparator API is proposed that > gathers results into multiple PQs (one per segment) and then merges > them in the end. > I started from John's multi-PQ code and worked it into > contrib/benchmark so that we could run perf tests. Then I generified > the Python script I use for running search benchmarks (in > contrib/benchmark/sortBench.py). > The script first creates indexes with 1M docs (based on > SortableSingleDocSource, and based on wikipedia, if available). Then > it runs various combinations: > * Index with 20 balanced segments vs index with the "normal" log > segment size > * Queries with different numbers of hits (only for wikipedia index) > * Different top N > * Different sorts (by title, for wikipedia, and by random string, > random int, and country for the random index) > For each test, 7 search rounds are run and the best QPS is kept. The > script runs singlePQ then multiPQ, and records the resulting best QPS > for each and produces table (in Jira format) as output. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1997) Explore performance of multi-PQ vs single-PQ sorting API
[ https://issues.apache.org/jira/browse/LUCENE-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12769042#action_12769042 ] Jake Mannix commented on LUCENE-1997: - Hah! Thanks for posting that, Mark! Much easier to read. :) Hey John, can you comment with your hardware specs on this, so it can be recorded for posterity? ;) > Explore performance of multi-PQ vs single-PQ sorting API > > > Key: LUCENE-1997 > URL: https://issues.apache.org/jira/browse/LUCENE-1997 > Project: Lucene - Java > Issue Type: Improvement > Components: Search >Affects Versions: 2.9 >Reporter: Michael McCandless >Assignee: Michael McCandless > Attachments: LUCENE-1997.patch, LUCENE-1997.patch > > > Spinoff from recent "lucene 2.9 sorting algorithm" thread on java-dev, > where a simpler (non-segment-based) comparator API is proposed that > gathers results into multiple PQs (one per segment) and then merges > them in the end. > I started from John's multi-PQ code and worked it into > contrib/benchmark so that we could run perf tests. Then I generified > the Python script I use for running search benchmarks (in > contrib/benchmark/sortBench.py). > The script first creates indexes with 1M docs (based on > SortableSingleDocSource, and based on wikipedia, if available). Then > it runs various combinations: > * Index with 20 balanced segments vs index with the "normal" log > segment size > * Queries with different numbers of hits (only for wikipedia index) > * Different top N > * Different sorts (by title, for wikipedia, and by random string, > random int, and country for the random index) > For each test, 7 search rounds are run and the best QPS is kept. The > script runs singlePQ then multiPQ, and records the resulting best QPS > for each and produces table (in Jira format) as output. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org