[jira] Commented: (LUCENE-806) Synchronization bottleneck in FieldSortedHitQueue with many concurrent readers
[ https://issues.apache.org/jira/browse/LUCENE-806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12486252 ] Paul Cowan commented on LUCENE-806: --- Otis, you're probably right -- it may not be wise to tackle two birds with one stone. My concern is that if I do this a quick and dirty way, it may involve exposing an API to enable/disable this behaviour which a subsequent refactor would then remove, and I'd obviously rather keep the API stable. I'm about to attach 3 patches, with varying levels of effect on the code. I'd be interested to see what people think is the best approach given the possible refactor. Hoss, I've had a look at your patch, and rather like it. That's kind of tackling a slightly different problem; that's cleaning up the FieldCache (which is a great idea) whereas cleaning up FSHQ is only incidentally related to FieldCache. It uses it (and if it was broken up, each comparator source would be using your much cleaner API) but I think the two coexist quite happily. I'd like to see both, in other words! Synchronization bottleneck in FieldSortedHitQueue with many concurrent readers -- Key: LUCENE-806 URL: https://issues.apache.org/jira/browse/LUCENE-806 Project: Lucene - Java Issue Type: Improvement Components: Search Affects Versions: 2.0.0 Reporter: Paul Cowan Priority: Minor Attachments: lucene-806-proposed-direction.patch, lucene-806.patch The below is from a post by (my colleague) Paul Smith to the java-users list: --- Hi ho peoples. We have an application that is internationalized, and stores data from many languages (each project has it's own index, mostly aligned with a single language, maybe 2). Anyway, I've noticed during some thread dumps diagnosing some performance issues, that there appears to be a _potential_ synchronization bottleneck using Locale-based sorting of Strings. I don't think this problem is the root cause of our performance problem, but I thought I'd mention it here. Here's the stack dump of a thread waiting: http-1001-Processor245 daemon prio=1 tid=0x31434da0 nid=0x3744 waiting for monitor entry [0x2cd44000..0x2cd45f30] at java.text.RuleBasedCollator.compare(RuleBasedCollator.java) - waiting to lock 0x6b1e8c68 (a java.text.RuleBasedCollator) at org.apache.lucene.search.FieldSortedHitQueue$4.compare(FieldSortedHitQueue.java:320) at org.apache.lucene.search.FieldSortedHitQueue.lessThan(FieldSortedHitQueue.java:114) at org.apache.lucene.util.PriorityQueue.upHeap(PriorityQueue.java:120) at org.apache.lucene.util.PriorityQueue.put(PriorityQueue.java:47) at org.apache.lucene.util.PriorityQueue.insert(PriorityQueue.java:58) at org.apache.lucene.search.FieldSortedHitQueue.insert(FieldSortedHitQueue.java:90) at org.apache.lucene.search.FieldSortedHitQueue.insert(FieldSortedHitQueue.java:97) at org.apache.lucene.search.TopFieldDocCollector.collect(TopFieldDocCollector.java:47) at org.apache.lucene.search.BooleanScorer2.score(BooleanScorer2.java:291) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:132) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:110) at com.aconex.index.search.FastLocaleSortIndexSearcher.search(FastLocaleSortIndexSearcher.java:90) . In our case we had 12 threads waiting like this, while one thread had the lock on the RuleBasedCollator. Turns out RuleBasedCollator's.compare(...) method is synchronized. I wonder if a ThreadLocal based collator would be better here... ? There doesn't appear to be a reason for other threads searching the same index to wait on this sort. Be just as easy to use their own. (Is RuleBasedCollator a heavy object memory wise? Wouldn't have thought so, per thread) Thoughts? --- I've investigated this somewhat, and agree that this is a potential problem with a series of possible workarounds. Further discussion (including proof-of-concept patch) to follow. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-806) Synchronization bottleneck in FieldSortedHitQueue with many concurrent readers
[ https://issues.apache.org/jira/browse/LUCENE-806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Cowan updated LUCENE-806: -- Attachment: LUCENE-806-minimal-usealways.patch Minimal ThreadLocal wrapper, Implementation #1: an always-on version. This is used all the time, which may not be ideal (not sure if there are any major disadvantages, mind you; I think ThreadLocals are very low-impact, Collators are quite lightweight, and there shouldn't be any duplicated object instances floating around) Note that with this version, the original comparatorStringLocale() method can be removed; I've left it in-place for now though. Synchronization bottleneck in FieldSortedHitQueue with many concurrent readers -- Key: LUCENE-806 URL: https://issues.apache.org/jira/browse/LUCENE-806 Project: Lucene - Java Issue Type: Improvement Components: Search Affects Versions: 2.0.0 Reporter: Paul Cowan Priority: Minor Attachments: LUCENE-806-minimal-usealways.patch, lucene-806-proposed-direction.patch, lucene-806.patch The below is from a post by (my colleague) Paul Smith to the java-users list: --- Hi ho peoples. We have an application that is internationalized, and stores data from many languages (each project has it's own index, mostly aligned with a single language, maybe 2). Anyway, I've noticed during some thread dumps diagnosing some performance issues, that there appears to be a _potential_ synchronization bottleneck using Locale-based sorting of Strings. I don't think this problem is the root cause of our performance problem, but I thought I'd mention it here. Here's the stack dump of a thread waiting: http-1001-Processor245 daemon prio=1 tid=0x31434da0 nid=0x3744 waiting for monitor entry [0x2cd44000..0x2cd45f30] at java.text.RuleBasedCollator.compare(RuleBasedCollator.java) - waiting to lock 0x6b1e8c68 (a java.text.RuleBasedCollator) at org.apache.lucene.search.FieldSortedHitQueue$4.compare(FieldSortedHitQueue.java:320) at org.apache.lucene.search.FieldSortedHitQueue.lessThan(FieldSortedHitQueue.java:114) at org.apache.lucene.util.PriorityQueue.upHeap(PriorityQueue.java:120) at org.apache.lucene.util.PriorityQueue.put(PriorityQueue.java:47) at org.apache.lucene.util.PriorityQueue.insert(PriorityQueue.java:58) at org.apache.lucene.search.FieldSortedHitQueue.insert(FieldSortedHitQueue.java:90) at org.apache.lucene.search.FieldSortedHitQueue.insert(FieldSortedHitQueue.java:97) at org.apache.lucene.search.TopFieldDocCollector.collect(TopFieldDocCollector.java:47) at org.apache.lucene.search.BooleanScorer2.score(BooleanScorer2.java:291) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:132) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:110) at com.aconex.index.search.FastLocaleSortIndexSearcher.search(FastLocaleSortIndexSearcher.java:90) . In our case we had 12 threads waiting like this, while one thread had the lock on the RuleBasedCollator. Turns out RuleBasedCollator's.compare(...) method is synchronized. I wonder if a ThreadLocal based collator would be better here... ? There doesn't appear to be a reason for other threads searching the same index to wait on this sort. Be just as easy to use their own. (Is RuleBasedCollator a heavy object memory wise? Wouldn't have thought so, per thread) Thoughts? --- I've investigated this somewhat, and agree that this is a potential problem with a series of possible workarounds. Further discussion (including proof-of-concept patch) to follow. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-806) Synchronization bottleneck in FieldSortedHitQueue with many concurrent readers
[ https://issues.apache.org/jira/browse/LUCENE-806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Cowan updated LUCENE-806: -- Attachment: LUCENE-806-minimal-systemproperty.patch Minimal ThreadLocal wrapper, Implementation #2: based on a system property (org.apache.lucene.usePerThreadLocaleComparators). This is messy, but leaves the current behaviour as default and is not unprecedented in the Lucene codebase. If it's decided the behaviour shouldn't be 'always-on', this may be the best compromise as it's still (in a way) exposing a public API, but as it's a system property it's less visible and it may be less painful if it's yanked later. Synchronization bottleneck in FieldSortedHitQueue with many concurrent readers -- Key: LUCENE-806 URL: https://issues.apache.org/jira/browse/LUCENE-806 Project: Lucene - Java Issue Type: Improvement Components: Search Affects Versions: 2.0.0 Reporter: Paul Cowan Priority: Minor Attachments: LUCENE-806-minimal-systemproperty.patch, LUCENE-806-minimal-usealways.patch, lucene-806-proposed-direction.patch, lucene-806.patch The below is from a post by (my colleague) Paul Smith to the java-users list: --- Hi ho peoples. We have an application that is internationalized, and stores data from many languages (each project has it's own index, mostly aligned with a single language, maybe 2). Anyway, I've noticed during some thread dumps diagnosing some performance issues, that there appears to be a _potential_ synchronization bottleneck using Locale-based sorting of Strings. I don't think this problem is the root cause of our performance problem, but I thought I'd mention it here. Here's the stack dump of a thread waiting: http-1001-Processor245 daemon prio=1 tid=0x31434da0 nid=0x3744 waiting for monitor entry [0x2cd44000..0x2cd45f30] at java.text.RuleBasedCollator.compare(RuleBasedCollator.java) - waiting to lock 0x6b1e8c68 (a java.text.RuleBasedCollator) at org.apache.lucene.search.FieldSortedHitQueue$4.compare(FieldSortedHitQueue.java:320) at org.apache.lucene.search.FieldSortedHitQueue.lessThan(FieldSortedHitQueue.java:114) at org.apache.lucene.util.PriorityQueue.upHeap(PriorityQueue.java:120) at org.apache.lucene.util.PriorityQueue.put(PriorityQueue.java:47) at org.apache.lucene.util.PriorityQueue.insert(PriorityQueue.java:58) at org.apache.lucene.search.FieldSortedHitQueue.insert(FieldSortedHitQueue.java:90) at org.apache.lucene.search.FieldSortedHitQueue.insert(FieldSortedHitQueue.java:97) at org.apache.lucene.search.TopFieldDocCollector.collect(TopFieldDocCollector.java:47) at org.apache.lucene.search.BooleanScorer2.score(BooleanScorer2.java:291) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:132) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:110) at com.aconex.index.search.FastLocaleSortIndexSearcher.search(FastLocaleSortIndexSearcher.java:90) . In our case we had 12 threads waiting like this, while one thread had the lock on the RuleBasedCollator. Turns out RuleBasedCollator's.compare(...) method is synchronized. I wonder if a ThreadLocal based collator would be better here... ? There doesn't appear to be a reason for other threads searching the same index to wait on this sort. Be just as easy to use their own. (Is RuleBasedCollator a heavy object memory wise? Wouldn't have thought so, per thread) Thoughts? --- I've investigated this somewhat, and agree that this is a potential problem with a series of possible workarounds. Further discussion (including proof-of-concept patch) to follow. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-806) Synchronization bottleneck in FieldSortedHitQueue with many concurrent readers
[ https://issues.apache.org/jira/browse/LUCENE-806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Cowan updated LUCENE-806: -- Attachment: LUCENE-806-minimal-publicapi.patch Minimal ThreadLocal wrapper, Implementation #3: public static API. This is the easiest way to do this, but means that if it's later refactored to be unnecessary (or, more accurately, be done in a cleaner way) the API may get yanked after only a relatively short lifespan. Synchronization bottleneck in FieldSortedHitQueue with many concurrent readers -- Key: LUCENE-806 URL: https://issues.apache.org/jira/browse/LUCENE-806 Project: Lucene - Java Issue Type: Improvement Components: Search Affects Versions: 2.0.0 Reporter: Paul Cowan Priority: Minor Attachments: LUCENE-806-minimal-publicapi.patch, LUCENE-806-minimal-systemproperty.patch, LUCENE-806-minimal-usealways.patch, lucene-806-proposed-direction.patch, lucene-806.patch The below is from a post by (my colleague) Paul Smith to the java-users list: --- Hi ho peoples. We have an application that is internationalized, and stores data from many languages (each project has it's own index, mostly aligned with a single language, maybe 2). Anyway, I've noticed during some thread dumps diagnosing some performance issues, that there appears to be a _potential_ synchronization bottleneck using Locale-based sorting of Strings. I don't think this problem is the root cause of our performance problem, but I thought I'd mention it here. Here's the stack dump of a thread waiting: http-1001-Processor245 daemon prio=1 tid=0x31434da0 nid=0x3744 waiting for monitor entry [0x2cd44000..0x2cd45f30] at java.text.RuleBasedCollator.compare(RuleBasedCollator.java) - waiting to lock 0x6b1e8c68 (a java.text.RuleBasedCollator) at org.apache.lucene.search.FieldSortedHitQueue$4.compare(FieldSortedHitQueue.java:320) at org.apache.lucene.search.FieldSortedHitQueue.lessThan(FieldSortedHitQueue.java:114) at org.apache.lucene.util.PriorityQueue.upHeap(PriorityQueue.java:120) at org.apache.lucene.util.PriorityQueue.put(PriorityQueue.java:47) at org.apache.lucene.util.PriorityQueue.insert(PriorityQueue.java:58) at org.apache.lucene.search.FieldSortedHitQueue.insert(FieldSortedHitQueue.java:90) at org.apache.lucene.search.FieldSortedHitQueue.insert(FieldSortedHitQueue.java:97) at org.apache.lucene.search.TopFieldDocCollector.collect(TopFieldDocCollector.java:47) at org.apache.lucene.search.BooleanScorer2.score(BooleanScorer2.java:291) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:132) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:110) at com.aconex.index.search.FastLocaleSortIndexSearcher.search(FastLocaleSortIndexSearcher.java:90) . In our case we had 12 threads waiting like this, while one thread had the lock on the RuleBasedCollator. Turns out RuleBasedCollator's.compare(...) method is synchronized. I wonder if a ThreadLocal based collator would be better here... ? There doesn't appear to be a reason for other threads searching the same index to wait on this sort. Be just as easy to use their own. (Is RuleBasedCollator a heavy object memory wise? Wouldn't have thought so, per thread) Thoughts? --- I've investigated this somewhat, and agree that this is a potential problem with a series of possible workarounds. Further discussion (including proof-of-concept patch) to follow. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-843) improve how IndexWriter uses RAM to buffer added documents
[ https://issues.apache.org/jira/browse/LUCENE-843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12486292 ] Michael McCandless commented on LUCENE-843: --- Some details on how I measure RAM usage: both the baseline (current lucene trunk) and my patch have two general classes of RAM usage. The first class, document processing RAM, is RAM used while processing a single doc. This RAM is re-used for each document (in the trunk, it's GC'd and new RAM is allocated; in my patch, I explicitly re-use these objects) and how large it gets is driven by how big each document is. The second class, indexed documents RAM, is the RAM used up by previously indexed documents. This RAM grows with each added document and how large it gets is driven by the number and size of docs indexed since the last flush. So when I say the writer is allowed to use 32 MB of RAM, I'm only measuring the indexed documents RAM. With trunk I do this by calling ramSizeInBytes(), and with my patch I do the analagous thing by measuring how many RAM buffers are held up storing previously indexed documents. I then define RAM efficiency (docs/MB) as how many docs we can hold in indexed documents RAM per MB RAM, at the point that we flush to disk. I think this is an important metric because it drives how large your initial (level 0) segments are. The larger these segments are then generally the less merging you need to do, for a given # docs in the index. I also measure overall RAM used in the JVM (using MemoryMXBean.getHeapMemoryUsage().getUsed()) just prior to each flush except the last, to also capture the document processing RAM, object overhead, etc. improve how IndexWriter uses RAM to buffer added documents -- Key: LUCENE-843 URL: https://issues.apache.org/jira/browse/LUCENE-843 Project: Lucene - Java Issue Type: Improvement Components: Index Affects Versions: 2.2 Reporter: Michael McCandless Assigned To: Michael McCandless Priority: Minor Attachments: LUCENE-843.patch, LUCENE-843.take2.patch, LUCENE-843.take3.patch, LUCENE-843.take4.patch I'm working on a new class (MultiDocumentWriter) that writes more than one document directly into a single Lucene segment, more efficiently than the current approach. This only affects the creation of an initial segment from added documents. I haven't changed anything after that, eg how segments are merged. The basic ideas are: * Write stored fields and term vectors directly to disk (don't use up RAM for these). * Gather posting lists term infos in RAM, but periodically do in-RAM merges. Once RAM is full, flush buffers to disk (and merge them later when it's time to make a real segment). * Recycle objects/buffers to reduce time/stress in GC. * Other various optimizations. Some of these changes are similar to how KinoSearch builds a segment. But, I haven't made any changes to Lucene's file format nor added requirements for a global fields schema. So far the only externally visible change is a new method setRAMBufferSize in IndexWriter (and setMaxBufferedDocs is deprecated) so that it flushes according to RAM usage and not a fixed number documents added. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-843) improve how IndexWriter uses RAM to buffer added documents
[ https://issues.apache.org/jira/browse/LUCENE-843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12486332 ] Michael McCandless commented on LUCENE-843: --- A couple more details on the testing: I run java -server to get all optimizations in the JVM, and the IO system is a local OS X RAID 0 of 4 SATA drives. Using the above tool I ran an initial set of benchmarks comparing old (= Lucene trunk) vs new (= this patch), varying document size (~550 bytes to ~5,500 bytes to ~55,000 bytes of plain text from Europarl en). For each document size I run 4 combinations of whether term vectors and stored fields are on or off and whether autoCommit is true or false. I measure net docs/sec (= total # docs indexed divided by total time taken), RAM efficiency (= avg # docs flushed with each flush divided by RAM buffer size), and avg HEAP RAM usage before each flush. Here are the results for the 10K tokens (= ~55,000 bytes plain text) per document: 2 DOCS @ ~55,000 bytes plain text RAM = 32 MB NUM THREADS = 1 MERGE FACTOR = 10 No term vectors nor stored fields AUTOCOMMIT = true (commit whenever RAM is full) old 2 docs in 200.3 secs index size = 358M new 2 docs in 126.0 secs index size = 356M Total Docs/sec: old99.8; new 158.7 [ 59.0% faster] Docs/MB @ flush:old24.2; new49.1 [ 102.5% more] Avg RAM used (MB) @ flush: old74.5; new36.2 [ 51.4% less] AUTOCOMMIT = false (commit only once at the end) old 2 docs in 202.7 secs index size = 358M new 2 docs in 120.0 secs index size = 354M Total Docs/sec: old98.7; new 166.7 [ 69.0% faster] Docs/MB @ flush:old24.2; new48.9 [ 101.7% more] Avg RAM used (MB) @ flush: old74.3; new37.0 [ 50.2% less] With term vectors (positions + offsets) and 2 small stored fields AUTOCOMMIT = true (commit whenever RAM is full) old 2 docs in 374.7 secs index size = 1.4G new 2 docs in 236.1 secs index size = 1.4G Total Docs/sec: old53.4; new84.7 [ 58.7% faster] Docs/MB @ flush:old10.2; new49.1 [ 382.8% more] Avg RAM used (MB) @ flush: old 129.3; new36.6 [ 71.7% less] AUTOCOMMIT = false (commit only once at the end) old 2 docs in 385.7 secs index size = 1.4G new 2 docs in 182.8 secs index size = 1.4G Total Docs/sec: old51.9; new 109.4 [ 111.0% faster] Docs/MB @ flush:old10.2; new48.9 [ 380.9% more] Avg RAM used (MB) @ flush: old76.0; new37.3 [ 50.9% less] improve how IndexWriter uses RAM to buffer added documents -- Key: LUCENE-843 URL: https://issues.apache.org/jira/browse/LUCENE-843 Project: Lucene - Java Issue Type: Improvement Components: Index Affects Versions: 2.2 Reporter: Michael McCandless Assigned To: Michael McCandless Priority: Minor Attachments: LUCENE-843.patch, LUCENE-843.take2.patch, LUCENE-843.take3.patch, LUCENE-843.take4.patch I'm working on a new class (MultiDocumentWriter) that writes more than one document directly into a single Lucene segment, more efficiently than the current approach. This only affects the creation of an initial segment from added documents. I haven't changed anything after that, eg how segments are merged. The basic ideas are: * Write stored fields and term vectors directly to disk (don't use up RAM for these). * Gather posting lists term infos in RAM, but periodically do in-RAM merges. Once RAM is full, flush buffers to disk (and merge them later when it's time to make a real segment). * Recycle objects/buffers to reduce time/stress in GC. * Other various optimizations. Some of these changes are similar to how KinoSearch builds a segment. But, I haven't made any changes to Lucene's file format nor added requirements for a global fields schema. So far the only externally visible change is a new method setRAMBufferSize in IndexWriter (and setMaxBufferedDocs is deprecated) so that it flushes according to RAM usage and not a fixed number documents added. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-843) improve how IndexWriter uses RAM to buffer added documents
[ https://issues.apache.org/jira/browse/LUCENE-843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12486334 ] Michael McCandless commented on LUCENE-843: --- Here are the results for normal sized docs (1K tokens = ~5,500 bytes plain text each): 20 DOCS @ ~5,500 bytes plain text RAM = 32 MB NUM THREADS = 1 MERGE FACTOR = 10 No term vectors nor stored fields AUTOCOMMIT = true (commit whenever RAM is full) old 20 docs in 397.6 secs index size = 415M new 20 docs in 167.5 secs index size = 411M Total Docs/sec: old 503.1; new 1194.1 [ 137.3% faster] Docs/MB @ flush:old81.6; new 406.2 [ 397.6% more] Avg RAM used (MB) @ flush: old87.3; new35.2 [ 59.7% less] AUTOCOMMIT = false (commit only once at the end) old 20 docs in 394.6 secs index size = 415M new 20 docs in 168.4 secs index size = 408M Total Docs/sec: old 506.9; new 1187.7 [ 134.3% faster] Docs/MB @ flush:old81.6; new 432.2 [ 429.4% more] Avg RAM used (MB) @ flush: old 126.6; new36.9 [ 70.8% less] With term vectors (positions + offsets) and 2 small stored fields AUTOCOMMIT = true (commit whenever RAM is full) old 20 docs in 754.2 secs index size = 1.7G new 20 docs in 304.9 secs index size = 1.7G Total Docs/sec: old 265.2; new 656.0 [ 147.4% faster] Docs/MB @ flush:old46.7; new 406.2 [ 769.6% more] Avg RAM used (MB) @ flush: old92.9; new35.2 [ 62.1% less] AUTOCOMMIT = false (commit only once at the end) old 20 docs in 743.9 secs index size = 1.7G new 20 docs in 244.3 secs index size = 1.7G Total Docs/sec: old 268.9; new 818.7 [ 204.5% faster] Docs/MB @ flush:old46.7; new 432.2 [ 825.2% more] Avg RAM used (MB) @ flush: old93.0; new36.6 [ 60.6% less] improve how IndexWriter uses RAM to buffer added documents -- Key: LUCENE-843 URL: https://issues.apache.org/jira/browse/LUCENE-843 Project: Lucene - Java Issue Type: Improvement Components: Index Affects Versions: 2.2 Reporter: Michael McCandless Assigned To: Michael McCandless Priority: Minor Attachments: LUCENE-843.patch, LUCENE-843.take2.patch, LUCENE-843.take3.patch, LUCENE-843.take4.patch I'm working on a new class (MultiDocumentWriter) that writes more than one document directly into a single Lucene segment, more efficiently than the current approach. This only affects the creation of an initial segment from added documents. I haven't changed anything after that, eg how segments are merged. The basic ideas are: * Write stored fields and term vectors directly to disk (don't use up RAM for these). * Gather posting lists term infos in RAM, but periodically do in-RAM merges. Once RAM is full, flush buffers to disk (and merge them later when it's time to make a real segment). * Recycle objects/buffers to reduce time/stress in GC. * Other various optimizations. Some of these changes are similar to how KinoSearch builds a segment. But, I haven't made any changes to Lucene's file format nor added requirements for a global fields schema. So far the only externally visible change is a new method setRAMBufferSize in IndexWriter (and setMaxBufferedDocs is deprecated) so that it flushes according to RAM usage and not a fixed number documents added. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-843) improve how IndexWriter uses RAM to buffer added documents
[ https://issues.apache.org/jira/browse/LUCENE-843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12486335 ] Michael McCandless commented on LUCENE-843: --- Last is the results for small docs (100 tokens = ~550 bytes plain text each): 200 DOCS @ ~550 bytes plain text RAM = 32 MB NUM THREADS = 1 MERGE FACTOR = 10 No term vectors nor stored fields AUTOCOMMIT = true (commit whenever RAM is full) old 200 docs in 886.7 secs index size = 438M new 200 docs in 230.5 secs index size = 435M Total Docs/sec: old 2255.6; new 8676.4 [ 284.7% faster] Docs/MB @ flush:old 128.0; new 4194.6 [ 3176.2% more] Avg RAM used (MB) @ flush: old 107.3; new37.7 [ 64.9% less] AUTOCOMMIT = false (commit only once at the end) old 200 docs in 888.7 secs index size = 438M new 200 docs in 239.6 secs index size = 432M Total Docs/sec: old 2250.5; new 8348.7 [ 271.0% faster] Docs/MB @ flush:old 128.0; new 4146.8 [ 3138.9% more] Avg RAM used (MB) @ flush: old 108.1; new38.9 [ 64.0% less] With term vectors (positions + offsets) and 2 small stored fields AUTOCOMMIT = true (commit whenever RAM is full) old 200 docs in 1480.1 secs index size = 2.1G new 200 docs in 462.0 secs index size = 2.1G Total Docs/sec: old 1351.2; new 4329.3 [ 220.4% faster] Docs/MB @ flush:old93.1; new 4194.6 [ 4405.7% more] Avg RAM used (MB) @ flush: old 296.4; new38.3 [ 87.1% less] AUTOCOMMIT = false (commit only once at the end) old 200 docs in 1489.4 secs index size = 2.1G new 200 docs in 347.9 secs index size = 2.1G Total Docs/sec: old 1342.8; new 5749.4 [ 328.2% faster] Docs/MB @ flush:old93.1; new 4146.8 [ 4354.5% more] Avg RAM used (MB) @ flush: old 297.1; new38.6 [ 87.0% less] 20 DOCS @ ~5,500 bytes plain text No term vectors nor stored fields AUTOCOMMIT = true (commit whenever RAM is full) old 20 docs in 397.6 secs index size = 415M new 20 docs in 167.5 secs index size = 411M Total Docs/sec: old 503.1; new 1194.1 [ 137.3% faster] Docs/MB @ flush:old81.6; new 406.2 [ 397.6% more] Avg RAM used (MB) @ flush: old87.3; new35.2 [ 59.7% less] AUTOCOMMIT = false (commit only once at the end) old 20 docs in 394.6 secs index size = 415M new 20 docs in 168.4 secs index size = 408M Total Docs/sec: old 506.9; new 1187.7 [ 134.3% faster] Docs/MB @ flush:old81.6; new 432.2 [ 429.4% more] Avg RAM used (MB) @ flush: old 126.6; new36.9 [ 70.8% less] With term vectors (positions + offsets) and 2 small stored fields AUTOCOMMIT = true (commit whenever RAM is full) old 20 docs in 754.2 secs index size = 1.7G new 20 docs in 304.9 secs index size = 1.7G Total Docs/sec: old 265.2; new 656.0 [ 147.4% faster] Docs/MB @ flush:old46.7; new 406.2 [ 769.6% more] Avg RAM used (MB) @ flush: old92.9; new35.2 [ 62.1% less] AUTOCOMMIT = false (commit only once at the end) old 20 docs in 743.9 secs index size = 1.7G new 20 docs in 244.3 secs index size = 1.7G Total Docs/sec: old 268.9; new 818.7 [ 204.5% faster] Docs/MB @ flush:old46.7; new 432.2 [ 825.2% more] Avg RAM used (MB) @ flush: old93.0; new36.6 [ 60.6% less] improve how IndexWriter uses RAM to buffer added documents -- Key: LUCENE-843 URL: https://issues.apache.org/jira/browse/LUCENE-843 Project: Lucene - Java Issue Type: Improvement Components: Index Affects Versions: 2.2 Reporter: Michael McCandless Assigned To: Michael McCandless Priority: Minor Attachments: LUCENE-843.patch, LUCENE-843.take2.patch, LUCENE-843.take3.patch, LUCENE-843.take4.patch I'm working on a new class (MultiDocumentWriter) that writes more than one document directly into a single Lucene segment, more efficiently than the current approach. This only
[jira] Commented: (LUCENE-843) improve how IndexWriter uses RAM to buffer added documents
[ https://issues.apache.org/jira/browse/LUCENE-843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12486339 ] Michael McCandless commented on LUCENE-843: --- A few notes from these results: * A real Lucene app won't see these gains because frequently the retrieval of docs from the content source, and the tokenization, take substantial amounts of time whereas for this test I've intentionally minimized the cost of those steps but they are very low for this test because I'm 1) pulling one line at a time from a big text file, and 2) using my simplistic SimpleSpaceAnalyzer which just breaks tokens at the space character. * Best speedup is ~4.3X faster, for tiny docs (~550 bytes) with term vectors and stored fields enabled and using autoCommit=false. * Least speedup is still ~1.6X faster, for large docs (~55,000 bytes) with autoCommit=true. * The autoCommit=false cases are a little unfair to the new patch because with the new patch, you get a single-segment (optimized) index in the end, but with existing Lucene trunk, you don't. * With term vectors and/or stored fields, autoCommit=false is quite a bit faster with the patch, because we never pay the price to merge them since they are written once. * With term vectors and/or stored fields, the new patch has substantially better RAM efficiency. * The patch is especially faster and has better RAM efficiency with smaller documents. * The actual HEAP RAM usage is quite a bit more stable with the patch, especially with term vectors stored fields enabled. I think this is because the patch creates far less garbage for GC to periodically reclaim. I think this also means you could push your RAM buffer size even higher to get better performance. improve how IndexWriter uses RAM to buffer added documents -- Key: LUCENE-843 URL: https://issues.apache.org/jira/browse/LUCENE-843 Project: Lucene - Java Issue Type: Improvement Components: Index Affects Versions: 2.2 Reporter: Michael McCandless Assigned To: Michael McCandless Priority: Minor Attachments: LUCENE-843.patch, LUCENE-843.take2.patch, LUCENE-843.take3.patch, LUCENE-843.take4.patch I'm working on a new class (MultiDocumentWriter) that writes more than one document directly into a single Lucene segment, more efficiently than the current approach. This only affects the creation of an initial segment from added documents. I haven't changed anything after that, eg how segments are merged. The basic ideas are: * Write stored fields and term vectors directly to disk (don't use up RAM for these). * Gather posting lists term infos in RAM, but periodically do in-RAM merges. Once RAM is full, flush buffers to disk (and merge them later when it's time to make a real segment). * Recycle objects/buffers to reduce time/stress in GC. * Other various optimizations. Some of these changes are similar to how KinoSearch builds a segment. But, I haven't made any changes to Lucene's file format nor added requirements for a global fields schema. So far the only externally visible change is a new method setRAMBufferSize in IndexWriter (and setMaxBufferedDocs is deprecated) so that it flushes according to RAM usage and not a fixed number documents added. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-843) improve how IndexWriter uses RAM to buffer added documents
[ https://issues.apache.org/jira/browse/LUCENE-843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12486373 ] Marvin Humphrey commented on LUCENE-843: The actual HEAP RAM usage is quite a bit more stable with the patch, especially with term vectors stored fields enabled. I think this is because the patch creates far less garbage for GC to periodically reclaim. I think this also means you could push your RAM buffer size even higher to get better performance. For KinoSearch, the sweet spot seems to be a buffer of around 16 MB when benchmarking with the Reuters corpus on my G4 laptop. Larger than that and things actually slow down, unless the buffer is large enough that it never needs flushing. My hypothesis is that RAM fragmentation is slowing down malloc/free. I'll be interested as to whether you see the same effect. improve how IndexWriter uses RAM to buffer added documents -- Key: LUCENE-843 URL: https://issues.apache.org/jira/browse/LUCENE-843 Project: Lucene - Java Issue Type: Improvement Components: Index Affects Versions: 2.2 Reporter: Michael McCandless Assigned To: Michael McCandless Priority: Minor Attachments: LUCENE-843.patch, LUCENE-843.take2.patch, LUCENE-843.take3.patch, LUCENE-843.take4.patch I'm working on a new class (MultiDocumentWriter) that writes more than one document directly into a single Lucene segment, more efficiently than the current approach. This only affects the creation of an initial segment from added documents. I haven't changed anything after that, eg how segments are merged. The basic ideas are: * Write stored fields and term vectors directly to disk (don't use up RAM for these). * Gather posting lists term infos in RAM, but periodically do in-RAM merges. Once RAM is full, flush buffers to disk (and merge them later when it's time to make a real segment). * Recycle objects/buffers to reduce time/stress in GC. * Other various optimizations. Some of these changes are similar to how KinoSearch builds a segment. But, I haven't made any changes to Lucene's file format nor added requirements for a global fields schema. So far the only externally visible change is a new method setRAMBufferSize in IndexWriter (and setMaxBufferedDocs is deprecated) so that it flushes according to RAM usage and not a fixed number documents added. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [jira] Commented: (LUCENE-843) improve how IndexWriter uses RAM to buffer added documents
On 4/3/07, Michael McCandless (JIRA) [EMAIL PROTECTED] wrote: * With term vectors and/or stored fields, the new patch has substantially better RAM efficiency. Impressive numbers! The new patch improves RAM efficiency quite a bit even with no term vectors nor stored fields, because of the periodic in-RAM merges of posting lists term infos etc. The frequency of the in-RAM merges is controlled by flushedMergeFactor, which measures in doc count, right? How sensitive is performance to the value of flushedMergeFactor? Cheers, Ning - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: improve how IndexWriter uses RAM to buffer added documents
Wow, very nice results Mike! -Yonik On 4/3/07, Michael McCandless (JIRA) [EMAIL PROTECTED] wrote: [ https://issues.apache.org/jira/browse/LUCENE-843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12486335 ] Michael McCandless commented on LUCENE-843: --- Last is the results for small docs (100 tokens = ~550 bytes plain text each): 200 DOCS @ ~550 bytes plain text RAM = 32 MB NUM THREADS = 1 MERGE FACTOR = 10 No term vectors nor stored fields AUTOCOMMIT = true (commit whenever RAM is full) old 200 docs in 886.7 secs index size = 438M new 200 docs in 230.5 secs index size = 435M Total Docs/sec: old 2255.6; new 8676.4 [ 284.7% faster] Docs/MB @ flush:old 128.0; new 4194.6 [ 3176.2% more] Avg RAM used (MB) @ flush: old 107.3; new37.7 [ 64.9% less] AUTOCOMMIT = false (commit only once at the end) old 200 docs in 888.7 secs index size = 438M new 200 docs in 239.6 secs index size = 432M Total Docs/sec: old 2250.5; new 8348.7 [ 271.0% faster] Docs/MB @ flush:old 128.0; new 4146.8 [ 3138.9% more] Avg RAM used (MB) @ flush: old 108.1; new38.9 [ 64.0% less] With term vectors (positions + offsets) and 2 small stored fields AUTOCOMMIT = true (commit whenever RAM is full) old 200 docs in 1480.1 secs index size = 2.1G new 200 docs in 462.0 secs index size = 2.1G Total Docs/sec: old 1351.2; new 4329.3 [ 220.4% faster] Docs/MB @ flush:old93.1; new 4194.6 [ 4405.7% more] Avg RAM used (MB) @ flush: old 296.4; new38.3 [ 87.1% less] AUTOCOMMIT = false (commit only once at the end) old 200 docs in 1489.4 secs index size = 2.1G new 200 docs in 347.9 secs index size = 2.1G Total Docs/sec: old 1342.8; new 5749.4 [ 328.2% faster] Docs/MB @ flush:old93.1; new 4146.8 [ 4354.5% more] Avg RAM used (MB) @ flush: old 297.1; new38.6 [ 87.0% less] 20 DOCS @ ~5,500 bytes plain text No term vectors nor stored fields AUTOCOMMIT = true (commit whenever RAM is full) old 20 docs in 397.6 secs index size = 415M new 20 docs in 167.5 secs index size = 411M Total Docs/sec: old 503.1; new 1194.1 [ 137.3% faster] Docs/MB @ flush:old81.6; new 406.2 [ 397.6% more] Avg RAM used (MB) @ flush: old87.3; new35.2 [ 59.7% less] AUTOCOMMIT = false (commit only once at the end) old 20 docs in 394.6 secs index size = 415M new 20 docs in 168.4 secs index size = 408M Total Docs/sec: old 506.9; new 1187.7 [ 134.3% faster] Docs/MB @ flush:old81.6; new 432.2 [ 429.4% more] Avg RAM used (MB) @ flush: old 126.6; new36.9 [ 70.8% less] With term vectors (positions + offsets) and 2 small stored fields AUTOCOMMIT = true (commit whenever RAM is full) old 20 docs in 754.2 secs index size = 1.7G new 20 docs in 304.9 secs index size = 1.7G Total Docs/sec: old 265.2; new 656.0 [ 147.4% faster] Docs/MB @ flush:old46.7; new 406.2 [ 769.6% more] Avg RAM used (MB) @ flush: old92.9; new35.2 [ 62.1% less] AUTOCOMMIT = false (commit only once at the end) old 20 docs in 743.9 secs index size = 1.7G new 20 docs in 244.3 secs index size = 1.7G Total Docs/sec: old 268.9; new 818.7 [ 204.5% faster] Docs/MB @ flush:old46.7; new 432.2 [ 825.2% more] Avg RAM used (MB) @ flush: old93.0; new36.6 [ 60.6% less] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-843) improve how IndexWriter uses RAM to buffer added documents
[ https://issues.apache.org/jira/browse/LUCENE-843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12486385 ] Michael McCandless commented on LUCENE-843: --- The actual HEAP RAM usage is quite a bit more stable with the patch, especially with term vectors stored fields enabled. I think this is because the patch creates far less garbage for GC to periodically reclaim. I think this also means you could push your RAM buffer size even higher to get better performance. For KinoSearch, the sweet spot seems to be a buffer of around 16 MB when benchmarking with the Reuters corpus on my G4 laptop. Larger than that and things actually slow down, unless the buffer is large enough that it never needs flushing. My hypothesis is that RAM fragmentation is slowing down malloc/free. I'll be interested as to whether you see the same effect. Interesting. OK I will run the benchmark across increasing RAM sizes to see where the sweet spot seems to be! improve how IndexWriter uses RAM to buffer added documents -- Key: LUCENE-843 URL: https://issues.apache.org/jira/browse/LUCENE-843 Project: Lucene - Java Issue Type: Improvement Components: Index Affects Versions: 2.2 Reporter: Michael McCandless Assigned To: Michael McCandless Priority: Minor Attachments: LUCENE-843.patch, LUCENE-843.take2.patch, LUCENE-843.take3.patch, LUCENE-843.take4.patch I'm working on a new class (MultiDocumentWriter) that writes more than one document directly into a single Lucene segment, more efficiently than the current approach. This only affects the creation of an initial segment from added documents. I haven't changed anything after that, eg how segments are merged. The basic ideas are: * Write stored fields and term vectors directly to disk (don't use up RAM for these). * Gather posting lists term infos in RAM, but periodically do in-RAM merges. Once RAM is full, flush buffers to disk (and merge them later when it's time to make a real segment). * Recycle objects/buffers to reduce time/stress in GC. * Other various optimizations. Some of these changes are similar to how KinoSearch builds a segment. But, I haven't made any changes to Lucene's file format nor added requirements for a global fields schema. So far the only externally visible change is a new method setRAMBufferSize in IndexWriter (and setMaxBufferedDocs is deprecated) so that it flushes according to RAM usage and not a fixed number documents added. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [jira] Commented: (LUCENE-843) improve how IndexWriter uses RAM to buffer added documents
Ning Li [EMAIL PROTECTED] wrote: On 4/3/07, Michael McCandless (JIRA) [EMAIL PROTECTED] wrote: * With term vectors and/or stored fields, the new patch has substantially better RAM efficiency. Impressive numbers! The new patch improves RAM efficiency quite a bit even with no term vectors nor stored fields, because of the periodic in-RAM merges of posting lists term infos etc. The frequency of the in-RAM merges is controlled by flushedMergeFactor, which measures in doc count, right? How sensitive is performance to the value of flushedMergeFactor? Right, the in-RAM merges seem to help *alot* because you get great compression of the terms dictionary, and also some compression of the freq postings since the docIDs are delta encoded. Also, you waste less end buffer space (buffers are fixed sizes) when you merge together into a large segment. The in-RAM merges are triggered by number of bytes used vs RAM buffer size. Each doc is indexed to its own RAM segment, then once these level 0 segments use 1/Nth of the RAM buffer size, I merge into level 1. Then once level 1 segments are using 1/Mth of the RAM buffer size, I merge into level 2. I don't do any merges beyond that. Right now N = 14 and M = 7 but I haven't really tuned them yet ... Once RAM is full, all of those segments are merged into a single on-disk segment. Once enough on-disk segments accumulate they are periodically merged (based on flushedMergeFactor) as well. Finally when it's time to commit a real segment I merge all RAM segments and flushed segments into a real Lucene segment. I haven't done much testing to find sweet spot for these merge settings just yet. Still plenty to do! Mike - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: improve how IndexWriter uses RAM to buffer added documents
Yonik Seeley [EMAIL PROTECTED] wrote: Wow, very nice results Mike! Thanks :) I'm just praying I don't have some sneaky bug making the results far better than they really are!! And still plenty to do... Mike - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
publish to maven-repository
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hi there, I will give it another try: Could you please publish lucene 2.* artifacts (including contribs) to the maven2 repository at ibiblio? Currently there is only the lucene-core available up to version 2.0.0: http://repo1.maven.org/maven2/org/apache/lucene/ JARs and POMs go to: scp://people.apache.org/www/www.apache.org/dist/maven-repository If you need assitance I am pleased to help. But I am not an official apache member and do NOT have access to do the deployment myself. Thank you so much... Jörg -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.5 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFGEr3LmPuec2Dcv/8RAh1sAJ9m3qs7upNGJTgie5tNeAFKZenBowCgjufY uB1/RNnI4wB3dviKy0w7XEs= =llLh -END PGP SIGNATURE- - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]