[jira] [Created] (LUCENE-5793) Add equals/hashCode to FieldType
Shay Banon created LUCENE-5793: -- Summary: Add equals/hashCode to FieldType Key: LUCENE-5793 URL: https://issues.apache.org/jira/browse/LUCENE-5793 Project: Lucene - Core Issue Type: Improvement Reporter: Shay Banon would be nice to have equals and hashCode to FieldType, so one can easily check if they are the same, and for example, reuse existing default implementations of it. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5634) Reuse TokenStream instances in Field
[ https://issues.apache.org/jira/browse/LUCENE-5634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13986177#comment-13986177 ] Shay Banon commented on LUCENE-5634: this optimization has proven to help a lot in the context of ES, but we can use a static thread local since we are fully in control of the threading model. With Lucene itself, where it can be used in many different environment, then this can cause some unexpected behavior. For example, this might cause Tomcat to warn on leaking resources when unloading a war. > Reuse TokenStream instances in Field > > > Key: LUCENE-5634 > URL: https://issues.apache.org/jira/browse/LUCENE-5634 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Michael McCandless > Fix For: 4.9, 5.0 > > Attachments: LUCENE-5634.patch > > > If you don't reuse your Doc/Field instances (which is very expert: I > suspect few apps do) then there's a lot of garbage created to index each > StringField because we make a new StringTokenStream or > NumericTokenStream (and their Attributes). > We should be able to re-use these instances via a static > ThreadLocal... -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5516) Forward information that trigger a merge to MergeScheduler
[ https://issues.apache.org/jira/browse/LUCENE-5516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13930457#comment-13930457 ] Shay Banon commented on LUCENE-5516: +1, this looks great!. Exactly the info we would love to have to better control merges. > Forward information that trigger a merge to MergeScheduler > -- > > Key: LUCENE-5516 > URL: https://issues.apache.org/jira/browse/LUCENE-5516 > Project: Lucene - Core > Issue Type: Improvement > Components: core/index >Affects Versions: 4.7 >Reporter: Simon Willnauer >Assignee: Simon Willnauer > Fix For: 4.8, 5.0 > > Attachments: LUCENE-5516.patch, LUCENE-5516.patch > > > Today we pass information about the merge trigger to the merge policy. Yet, > no matter if the MP finds a merge or not we call the MergeScheduler who runs > & blocks even if we didn't find a merge. In some cases we don't even want > this to happen but inside the MergeScheduler we have no choice to opt out > since we don't know what triggered the merge. We should forward the infos we > have to the MergeScheduler as well. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5373) Lucene42DocValuesProducer.ramBytesUsed is over-estimated
[ https://issues.apache.org/jira/browse/LUCENE-5373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13853098#comment-13853098 ] Shay Banon commented on LUCENE-5373: as someone who found this issue, on top of the wrong computation, its also very expensive. This call should be lightweight and hopefully not use sizeOf at all... . At the very least, if possible, the result of it should be cached? Maybe even introduce size caching on a higher level (calling code) if possible. > Lucene42DocValuesProducer.ramBytesUsed is over-estimated > > > Key: LUCENE-5373 > URL: https://issues.apache.org/jira/browse/LUCENE-5373 > Project: Lucene - Core > Issue Type: Bug >Reporter: Adrien Grand >Priority: Minor > > Lucene42DocValuesProducer.ramBytesUsed uses > {{RamUsageEstimator.sizeOf(this)}} to return an estimation of the memory > usage. One of the issues (there might be other ones) is that this class has a > reference to an IndexInput that might link to other data-structures that we > wouldn't want to take into account. For example, index inputs of a > {{RAMDirectory}} all point to the directory itself, so > {{Lucene42DocValuesProducer.ramBytesUsed}} would return the amount of memory > used by the whole directory. -- This message was sent by Atlassian JIRA (v6.1.4#6159) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5086) RamUsageEstimator causes AWT classes to be loaded by calling ManagementFactory#getPlatformMBeanServer
[ https://issues.apache.org/jira/browse/LUCENE-5086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13699892#comment-13699892 ] Shay Banon commented on LUCENE-5086: The Java version on the Mac is the latest one: java version "1.6.0_51" Java(TM) SE Runtime Environment (build 1.6.0_51-b11-457-11M4509) Java HotSpot(TM) 64-Bit Server VM (build 20.51-b01-457, mixed mode) Regarding the catch, I think Throwable is the right exceptions to catch here. Catch all, who cares, you don't want a bug in the JVM that throws an unexpected runtime exception to cause Lucene to break the APP completely because its a static block, and I have been right there a few times. But if you feel differently, go ahead and change it to explicitly catch whats needed. > RamUsageEstimator causes AWT classes to be loaded by calling > ManagementFactory#getPlatformMBeanServer > - > > Key: LUCENE-5086 > URL: https://issues.apache.org/jira/browse/LUCENE-5086 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Shay Banon >Assignee: Dawid Weiss > > Yea, that type of day and that type of title :). > Since the last update of Java 6 on OS X, I started to see an annoying icon > pop up at the doc whenever running elasticsearch. By default, all of our > scripts add headless AWT flag so people will probably not encounter it, but, > it was strange that I saw it when before I didn't. > I started to dig around, and saw that when RamUsageEstimator was being > loaded, it was causing AWT classes to be loaded. Further investigation showed > that actually for some reason, calling > ManagementFactory#getPlatformMBeanServer now with the new Java version causes > AWT classes to be loaded (at least on the mac, haven't tested on other > platforms yet). > There are several ways to try and solve it, for example, by identifying the > bug in the JVM itself, but I think that there should be a fix for it in > Lucene itself, specifically since there is no need to call > #getPlatformMBeanServer to get the hotspot diagnostics one (its a heavy > call...). > Here is a simple call that will allow to get the hotspot mxbean without using > the #getPlatformMBeanServer method, and not causing it to be loaded and > loading all those nasty AWT classes: > {code} > Object getHotSpotMXBean() { > try { > // Java 6 > Class sunMF = Class.forName("sun.management.ManagementFactory"); > return sunMF.getMethod("getDiagnosticMXBean").invoke(null); > } catch (Throwable t) { > // ignore > } > // potentially Java 7 > try { > return ManagementFactory.class.getMethod("getPlatformMXBean", > Class.class).invoke(null, > Class.forName("com.sun.management.HotSpotDiagnosticMXBean")); > } catch (Throwable t) { > // ignore > } > return null; > } > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5086) RamUsageEstimator causes AWT classes to be loaded by calling ManagementFactory#getPlatformMBeanServer
[ https://issues.apache.org/jira/browse/LUCENE-5086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shay Banon updated LUCENE-5086: --- Description: Yea, that type of day and that type of title :). Since the last update of Java 6 on OS X, I started to see an annoying icon pop up at the doc whenever running elasticsearch. By default, all of our scripts add headless AWT flag so people will probably not encounter it, but, it was strange that I saw it when before I didn't. I started to dig around, and saw that when RamUsageEstimator was being loaded, it was causing AWT classes to be loaded. Further investigation showed that actually for some reason, calling ManagementFactory#getPlatformMBeanServer now with the new Java version causes AWT classes to be loaded (at least on the mac, haven't tested on other platforms yet). There are several ways to try and solve it, for example, by identifying the bug in the JVM itself, but I think that there should be a fix for it in Lucene itself, specifically since there is no need to call #getPlatformMBeanServer to get the hotspot diagnostics one (its a heavy call...). Here is a simple call that will allow to get the hotspot mxbean without using the #getPlatformMBeanServer method, and not causing it to be loaded and loading all those nasty AWT classes: {code} Object getHotSpotMXBean() { Object hotSpotBean = null; try { // Java 6 Class sunMF = Class.forName("sun.management.ManagementFactory"); return sunMF.getMethod("getDiagnosticMXBean").invoke(null); } catch (Throwable t) { // ignore } // potentially Java 7 try { return ManagementFactory.class.getMethod("getPlatformMXBean", Class.class).invoke(null, Class.forName("com.sun.management.HotSpotDiagnosticMXBean")); } catch (Throwable t) { // ignore } return null; } {code} was: Yea, that type of day and that type of title :). Since the last update of Java 6 on OS X, I started to see an annoying icon pop up at the doc whenever running elasticsearch. By default, all of our scripts add headless AWT flag so people will probably not encounter it, but, it was strange that I saw it when before I didn't. I started to dig around, and saw that when RamUsageEstimator was being loaded, it was causing AWT classes to be loaded. Further investigation showed that actually for some reason, calling ManagementFactory#getPlatformMBeanServer now with the new Java version causes AWT classes to be loaded (at least on the mac, haven't tested on other platforms yet). There are several ways to try and solve it, for example, by identifying the bug in the JVM itself, but I think that there should be a fix for it in Lucene itself, specifically since there is no need to call #getPlatformMBeanServer to get the hotspot diagnostics one (its a heavy call...). Here is a simple call that will allow to get the hotspot mxbean without using the #getPlatformMBeanServer method, and not causing it to be loaded and loading all those nasty AWT classes: [code] Object getHotSpotMXBean() { Object hotSpotBean = null; try { // Java 6 Class sunMF = Class.forName("sun.management.ManagementFactory"); return sunMF.getMethod("getDiagnosticMXBean").invoke(null); } catch (Throwable t) { // ignore } // potentially Java 7 try { return ManagementFactory.class.getMethod("getPlatformMXBean", Class.class).invoke(null, Class.forName("com.sun.management.HotSpotDiagnosticMXBean")); } catch (Throwable t) { // ignore } return null; } [/code] > RamUsageEstimator causes AWT classes to be loaded by calling > ManagementFactory#getPlatformMBeanServer > - > > Key: LUCENE-5086 > URL: https://issues.apache.org/jira/browse/LUCENE-5086 > Project: Lucene - Core > Issue Type: Bug >Reporter: Shay Banon > > Yea, that type of day and that type of title :). > Since the last update of Java 6 on OS X, I started to see an annoying icon > pop up at the doc whenever running elasticsearch. By default, all of our > scripts add headless AWT flag so people will probably not encounter it, but, > it was strange that I saw it when before I didn't. > I started to dig around, and saw that when RamUsageEstimator was being > loaded, it was causing AWT classes to be loaded. Further investigation showed > that actually for some reason, calling > ManagementFactory#getPlatformMBeanServer now with the new Java version causes > AWT classes to be loaded (at least on the mac, haven't tested on other > platforms yet). > There are several ways to try and solv
[jira] [Updated] (LUCENE-5086) RamUsageEstimator causes AWT classes to be loaded by calling ManagementFactory#getPlatformMBeanServer
[ https://issues.apache.org/jira/browse/LUCENE-5086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shay Banon updated LUCENE-5086: --- Description: Yea, that type of day and that type of title :). Since the last update of Java 6 on OS X, I started to see an annoying icon pop up at the doc whenever running elasticsearch. By default, all of our scripts add headless AWT flag so people will probably not encounter it, but, it was strange that I saw it when before I didn't. I started to dig around, and saw that when RamUsageEstimator was being loaded, it was causing AWT classes to be loaded. Further investigation showed that actually for some reason, calling ManagementFactory#getPlatformMBeanServer now with the new Java version causes AWT classes to be loaded (at least on the mac, haven't tested on other platforms yet). There are several ways to try and solve it, for example, by identifying the bug in the JVM itself, but I think that there should be a fix for it in Lucene itself, specifically since there is no need to call #getPlatformMBeanServer to get the hotspot diagnostics one (its a heavy call...). Here is a simple call that will allow to get the hotspot mxbean without using the #getPlatformMBeanServer method, and not causing it to be loaded and loading all those nasty AWT classes: {code} Object getHotSpotMXBean() { try { // Java 6 Class sunMF = Class.forName("sun.management.ManagementFactory"); return sunMF.getMethod("getDiagnosticMXBean").invoke(null); } catch (Throwable t) { // ignore } // potentially Java 7 try { return ManagementFactory.class.getMethod("getPlatformMXBean", Class.class).invoke(null, Class.forName("com.sun.management.HotSpotDiagnosticMXBean")); } catch (Throwable t) { // ignore } return null; } {code} was: Yea, that type of day and that type of title :). Since the last update of Java 6 on OS X, I started to see an annoying icon pop up at the doc whenever running elasticsearch. By default, all of our scripts add headless AWT flag so people will probably not encounter it, but, it was strange that I saw it when before I didn't. I started to dig around, and saw that when RamUsageEstimator was being loaded, it was causing AWT classes to be loaded. Further investigation showed that actually for some reason, calling ManagementFactory#getPlatformMBeanServer now with the new Java version causes AWT classes to be loaded (at least on the mac, haven't tested on other platforms yet). There are several ways to try and solve it, for example, by identifying the bug in the JVM itself, but I think that there should be a fix for it in Lucene itself, specifically since there is no need to call #getPlatformMBeanServer to get the hotspot diagnostics one (its a heavy call...). Here is a simple call that will allow to get the hotspot mxbean without using the #getPlatformMBeanServer method, and not causing it to be loaded and loading all those nasty AWT classes: {code} Object getHotSpotMXBean() { Object hotSpotBean = null; try { // Java 6 Class sunMF = Class.forName("sun.management.ManagementFactory"); return sunMF.getMethod("getDiagnosticMXBean").invoke(null); } catch (Throwable t) { // ignore } // potentially Java 7 try { return ManagementFactory.class.getMethod("getPlatformMXBean", Class.class).invoke(null, Class.forName("com.sun.management.HotSpotDiagnosticMXBean")); } catch (Throwable t) { // ignore } return null; } {code} > RamUsageEstimator causes AWT classes to be loaded by calling > ManagementFactory#getPlatformMBeanServer > - > > Key: LUCENE-5086 > URL: https://issues.apache.org/jira/browse/LUCENE-5086 > Project: Lucene - Core > Issue Type: Bug >Reporter: Shay Banon > > Yea, that type of day and that type of title :). > Since the last update of Java 6 on OS X, I started to see an annoying icon > pop up at the doc whenever running elasticsearch. By default, all of our > scripts add headless AWT flag so people will probably not encounter it, but, > it was strange that I saw it when before I didn't. > I started to dig around, and saw that when RamUsageEstimator was being > loaded, it was causing AWT classes to be loaded. Further investigation showed > that actually for some reason, calling > ManagementFactory#getPlatformMBeanServer now with the new Java version causes > AWT classes to be loaded (at least on the mac, haven't tested on other > platforms yet). > There are several ways to try and solve it, for example, by identifying th
[jira] [Created] (LUCENE-5086) RamUsageEstimator causes AWT classes to be loaded by calling ManagementFactory#getPlatformMBeanServer
Shay Banon created LUCENE-5086: -- Summary: RamUsageEstimator causes AWT classes to be loaded by calling ManagementFactory#getPlatformMBeanServer Key: LUCENE-5086 URL: https://issues.apache.org/jira/browse/LUCENE-5086 Project: Lucene - Core Issue Type: Bug Reporter: Shay Banon Yea, that type of day and that type of title :). Since the last update of Java 6 on OS X, I started to see an annoying icon pop up at the doc whenever running elasticsearch. By default, all of our scripts add headless AWT flag so people will probably not encounter it, but, it was strange that I saw it when before I didn't. I started to dig around, and saw that when RamUsageEstimator was being loaded, it was causing AWT classes to be loaded. Further investigation showed that actually for some reason, calling ManagementFactory#getPlatformMBeanServer now with the new Java version causes AWT classes to be loaded (at least on the mac, haven't tested on other platforms yet). There are several ways to try and solve it, for example, by identifying the bug in the JVM itself, but I think that there should be a fix for it in Lucene itself, specifically since there is no need to call #getPlatformMBeanServer to get the hotspot diagnostics one (its a heavy call...). Here is a simple call that will allow to get the hotspot mxbean without using the #getPlatformMBeanServer method, and not causing it to be loaded and loading all those nasty AWT classes: [code] Object getHotSpotMXBean() { Object hotSpotBean = null; try { // Java 6 Class sunMF = Class.forName("sun.management.ManagementFactory"); return sunMF.getMethod("getDiagnosticMXBean").invoke(null); } catch (Throwable t) { // ignore } // potentially Java 7 try { return ManagementFactory.class.getMethod("getPlatformMXBean", Class.class).invoke(null, Class.forName("com.sun.management.HotSpotDiagnosticMXBean")); } catch (Throwable t) { // ignore } return null; } [/code] -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4472) Add setting that prevents merging on updateDocument
[ https://issues.apache.org/jira/browse/LUCENE-4472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13473617#comment-13473617 ] Shay Banon commented on LUCENE-4472: Agree with Robert on the additional context flag, that would make things most flexible. A flag on IW makes things simpler from the user perspective though, cause then there is no need to customize the built in merge policies. > Add setting that prevents merging on updateDocument > --- > > Key: LUCENE-4472 > URL: https://issues.apache.org/jira/browse/LUCENE-4472 > Project: Lucene - Core > Issue Type: Improvement > Components: core/index >Affects Versions: 4.0 >Reporter: Simon Willnauer > Fix For: 4.1, 5.0 > > Attachments: LUCENE-4472.patch > > > Currently we always call maybeMerge if a segment was flushed after > updateDocument. Some apps and in particular ElasticSearch uses some hacky > workarounds to disable that ie for merge throttling. It should be easier to > enable this kind of behavior. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-3425) NRT Caching Dir to allow for exact memory usage, better buffer allocation and "global" cross indices control
NRT Caching Dir to allow for exact memory usage, better buffer allocation and "global" cross indices control Key: LUCENE-3425 URL: https://issues.apache.org/jira/browse/LUCENE-3425 Project: Lucene - Java Issue Type: Improvement Components: core/index Reporter: Shay Banon A discussion on IRC raised several improvements that can be made to NRT caching dir. Some of the problems it currently has are: 1. Not explicitly controlling the memory usage, which can result in overusing memory (for example, large new segments being committed because refreshing is too far behind). 2. Heap fragmentation because of constant allocation of (probably promoted to old gen) byte buffers. 3. Not being able to control the memory usage across indices for multi index usage within a single JVM. A suggested solution (which still needs to be ironed out) is to have a BufferAllocator that controls allocation of byte[], and allow to return unused byte[] to it. It will have a cap on the size of memory it allows to be allocated. The NRT caching dir will use the allocator, which can either be provided (for usage across several indices) or created internally. The caching dir will also create a wrapped IndexOutput, that will flush to the main dir if the allocator can no longer provide byte[] (exhausted). When a file is "flushed" from the cache to the main directory, it will return all the currently allocated byte[] to the BufferAllocator to be reused by other "files". -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3416) Allow to pass an instance of RateLimiter to FSDirectory allowing to rate limit merge IO across several directories / instances
[ https://issues.apache.org/jira/browse/LUCENE-3416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shay Banon updated LUCENE-3416: --- Attachment: LUCENE-3416.patch A new patch, remove synchronization. It also adds another field to RateLimiter to record the original mbPerSec value set, so we can easily get it back. > Allow to pass an instance of RateLimiter to FSDirectory allowing to rate > limit merge IO across several directories / instances > -- > > Key: LUCENE-3416 > URL: https://issues.apache.org/jira/browse/LUCENE-3416 > Project: Lucene - Java > Issue Type: Improvement > Components: core/store >Reporter: Shay Banon >Assignee: Simon Willnauer > Attachments: LUCENE-3416.patch, LUCENE-3416.patch > > > This can come in handy when running several Lucene indices in the same VM, > and wishing to rate limit merge across all of them. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3416) Allow to pass an instance of RateLimiter to FSDirectory allowing to rate limit merge IO across several directories / instances
[ https://issues.apache.org/jira/browse/LUCENE-3416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13099287#comment-13099287 ] Shay Banon commented on LUCENE-3416: I must say that I am at a lost in trying to understand why we need this "optimization", but it does not really matter to me as long as the ability to set the rate limiter instance gets in. > Allow to pass an instance of RateLimiter to FSDirectory allowing to rate > limit merge IO across several directories / instances > -- > > Key: LUCENE-3416 > URL: https://issues.apache.org/jira/browse/LUCENE-3416 > Project: Lucene - Java > Issue Type: Improvement > Components: core/store >Reporter: Shay Banon >Assignee: Simon Willnauer > Attachments: LUCENE-3416.patch > > > This can come in handy when running several Lucene indices in the same VM, > and wishing to rate limit merge across all of them. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3416) Allow to pass an instance of RateLimiter to FSDirectory allowing to rate limit merge IO across several directories / instances
[ https://issues.apache.org/jira/browse/LUCENE-3416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13099233#comment-13099233 ] Shay Banon commented on LUCENE-3416: I agree with Mike, I think it should remain synchronized, it does safeguard concurrently calling setMaxMergeWriteMBPerSec from falling over itself (who "wins" the call is not really relevant). Since thats synchronized, the metod I added should be as well. Personally, I really don't think there is a need to make it thread safe without "blocking", since calling the "setters" is not something people do frequently at all, so the optimization is mute, and it will complicate the code. As for making mergeWriteRateLimiter volatile, it can be done. Though, in practice, there really is no need to do it (there is a memory barrier when reading it before). But, I think that should go in a different issue? Just to keep changes clean and isolated? > Allow to pass an instance of RateLimiter to FSDirectory allowing to rate > limit merge IO across several directories / instances > -- > > Key: LUCENE-3416 > URL: https://issues.apache.org/jira/browse/LUCENE-3416 > Project: Lucene - Java > Issue Type: Improvement > Components: core/store >Reporter: Shay Banon >Assignee: Simon Willnauer > Attachments: LUCENE-3416.patch > > > This can come in handy when running several Lucene indices in the same VM, > and wishing to rate limit merge across all of them. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3416) Allow to pass an instance of RateLimiter to FSDirectory allowing to rate limit merge IO across several directories / instances
[ https://issues.apache.org/jira/browse/LUCENE-3416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13099160#comment-13099160 ] Shay Banon commented on LUCENE-3416: > this make no sense to me. If you don't want to set this concurrently how does > a lock protect you from this? I mean you if you have two threads accessing > this you have either A B or B A. but this would happen without a lock too. if > you want to have the changes to take effect immediately you need to either > lock on each read on this var or make it volatile which is almost equivalent > (a mem barrier). No, thats not correct. setMaxMergeWriteMBPerSec (not the method I added, the other one) is a complex method, and I think Mike wanted to protect from two threads setting the value concurrently. As for reading the value, I think Mike logic was that its not that importnat the have "immediate" visibility of the change to require a volatile field (which is understandable). So, since setMaxMergeWriteMBPerSec is synchronized, the method added in this patch has to be as well. > My concern here was related to make this var volatile which would be a > cacheline invalidation each time you read the var. I think we should get rid > of the synchronized. Reading a volatile var in x86 is not a cache invalidation, though it does come with a cost. Its not relevant here based on what I explained before (and second guessing Mike :) ) > Allow to pass an instance of RateLimiter to FSDirectory allowing to rate > limit merge IO across several directories / instances > -- > > Key: LUCENE-3416 > URL: https://issues.apache.org/jira/browse/LUCENE-3416 > Project: Lucene - Java > Issue Type: Improvement > Components: core/store >Reporter: Shay Banon >Assignee: Simon Willnauer > Attachments: LUCENE-3416.patch > > > This can come in handy when running several Lucene indices in the same VM, > and wishing to rate limit merge across all of them. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3416) Allow to pass an instance of RateLimiter to FSDirectory allowing to rate limit merge IO across several directories / instances
[ https://issues.apache.org/jira/browse/LUCENE-3416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13099146#comment-13099146 ] Shay Banon commented on LUCENE-3416: The only reason its synchronized is because the setMaxMergeWriteMBPerSec method is synchronized (I guess to protected from setting the rate limit concurrently). In practice, I don't see users changing it that often, so concerns about cache lines are not really relevant. > Allow to pass an instance of RateLimiter to FSDirectory allowing to rate > limit merge IO across several directories / instances > -- > > Key: LUCENE-3416 > URL: https://issues.apache.org/jira/browse/LUCENE-3416 > Project: Lucene - Java > Issue Type: Improvement > Components: core/store >Reporter: Shay Banon >Assignee: Simon Willnauer > Attachments: LUCENE-3416.patch > > > This can come in handy when running several Lucene indices in the same VM, > and wishing to rate limit merge across all of them. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3416) Allow to pass an instance of RateLimiter to FSDirectory allowing to rate limit merge IO across several directories / instances
[ https://issues.apache.org/jira/browse/LUCENE-3416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13098018#comment-13098018 ] Shay Banon commented on LUCENE-3416: It is possible, but requires more work to do, and depends on overriding the createOutput method (as well as all the other methods in Directory). If rate limiting makes sense on the directory level to be exposed as a "feature", I think that this small change allows for greater control over it. > Allow to pass an instance of RateLimiter to FSDirectory allowing to rate > limit merge IO across several directories / instances > -- > > Key: LUCENE-3416 > URL: https://issues.apache.org/jira/browse/LUCENE-3416 > Project: Lucene - Java > Issue Type: Improvement > Components: core/store >Reporter: Shay Banon > Attachments: LUCENE-3416.patch > > > This can come in handy when running several Lucene indices in the same VM, > and wishing to rate limit merge across all of them. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3416) Allow to pass an instance of RateLimiter to FSDirectory allowing to rate limit merge IO across several directories / instances
[ https://issues.apache.org/jira/browse/LUCENE-3416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shay Banon updated LUCENE-3416: --- Attachment: LUCENE-3416.patch > Allow to pass an instance of RateLimiter to FSDirectory allowing to rate > limit merge IO across several directories / instances > -- > > Key: LUCENE-3416 > URL: https://issues.apache.org/jira/browse/LUCENE-3416 > Project: Lucene - Java > Issue Type: Improvement > Components: core/store >Reporter: Shay Banon > Attachments: LUCENE-3416.patch > > > This can come in handy when running several Lucene indices in the same VM, > and wishing to rate limit merge across all of them. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-3416) Allow to pass an instance of RateLimiter to FSDirectory allowing to rate limit merge IO across several directories / instances
Allow to pass an instance of RateLimiter to FSDirectory allowing to rate limit merge IO across several directories / instances -- Key: LUCENE-3416 URL: https://issues.apache.org/jira/browse/LUCENE-3416 Project: Lucene - Java Issue Type: Improvement Components: core/store Reporter: Shay Banon This can come in handy when running several Lucene indices in the same VM, and wishing to rate limit merge across all of them. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3335) jrebug causes porter stemmer to sigsegv
[ https://issues.apache.org/jira/browse/LUCENE-3335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13076053#comment-13076053 ] Shay Banon commented on LUCENE-3335: @Uwe I actually forgot about this, and did not think it was because of the porter stemmer at the time, especially since I did try and reproduce it and never managed to (I thought it was coincidence it crashed there). From my experience, you get very little help from sun/oracle when using unorthodox flags like agressive opts without proper recreation. Well, you get very little help there even when you do produce recreation... (see this issue that I opened for example: http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7066129) . I am the reason behind Lucene 1.9.1 release with the major bug in buffering introduced in 1.9 way back in the days, do you really think I would not contact if I thought there really was a problem associated with Lucene? > jrebug causes porter stemmer to sigsegv > --- > > Key: LUCENE-3335 > URL: https://issues.apache.org/jira/browse/LUCENE-3335 > Project: Lucene - Java > Issue Type: Bug >Affects Versions: 1.9, 1.9.1, 2.0.0, 2.1, 2.2, 2.3, 2.3.1, 2.3.2, 2.4, > 2.4.1, 2.9, 2.9.1, 2.9.2, 2.9.3, 2.9.4, 3.0, 3.0.1, 3.0.2, 3.0.3, 3.1, 3.2, > 3.3, 3.4, 4.0 > Environment: - JDK 7 Preview Release, GA (may also affect update _1, > targeted fix is JDK 1.7.0_2) > - JDK 1.6.0_20+ with -XX:+OptimizeStringConcat or -XX:+AggressiveOpts >Reporter: Robert Muir >Assignee: Robert Muir > Labels: Java7 > Attachments: LUCENE-3335.patch, LUCENE-3335_slow.patch, > patch-0uwe.patch > > > happens easily on java7: ant test -Dtestcase=TestPorterStemFilter > -Dtests.iter=100 > might happen on 1.6.0_u26 too, a user reported something that looks like the > same bug already: > http://www.lucidimagination.com/search/document/3beaa082c4d2fdd4/porterstemfilter_kills_jvm -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3282) BlockJoinQuery: Allow to add a custom child collector, and customize the parent bitset extraction
[ https://issues.apache.org/jira/browse/LUCENE-3282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13072979#comment-13072979 ] Shay Banon commented on LUCENE-3282: Hi, sorry for the late response, I the comment. Yea, I agree that there will be false positives, but thats the idea of it (sometimes you want to run facets for example on "sub queries"). Btw, I got your point on advance, do you think if a collector exists, then advance should be implemented by iterating over all docs up to the provided doc to advance to. Regarding the wrapper, interesting!. I need to have a look at how to generalize it, but it should be simple, I think, I'll try and work on it. > BlockJoinQuery: Allow to add a custom child collector, and customize the > parent bitset extraction > - > > Key: LUCENE-3282 > URL: https://issues.apache.org/jira/browse/LUCENE-3282 > Project: Lucene - Java > Issue Type: Improvement > Components: core/search >Affects Versions: 3.4, 4.0 >Reporter: Shay Banon > Attachments: LUCENE-3282.patch, LUCENE-3282.patch > > > It would be nice to allow to add a custom child collector to the > BlockJoinQuery to be called on every matching doc (so we can do things with > it, like counts and such). Also, allow to extend BlockJoinQuery to have a > custom code that converts the filter bitset to an OpenBitSet. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3282) BlockJoinQuery: Allow to add a custom child collector, and customize the parent bitset extraction
[ https://issues.apache.org/jira/browse/LUCENE-3282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13066536#comment-13066536 ] Shay Banon commented on LUCENE-3282: The idea of this is to collect matching child docs regardless of what matches parent wise, and yea, we might miss some depending on the type of query that is actually "wrapping" it, but I think its still useful. > BlockJoinQuery: Allow to add a custom child collector, and customize the > parent bitset extraction > - > > Key: LUCENE-3282 > URL: https://issues.apache.org/jira/browse/LUCENE-3282 > Project: Lucene - Java > Issue Type: Improvement > Components: core/search >Affects Versions: 3.4, 4.0 >Reporter: Shay Banon > Attachments: LUCENE-3282.patch, LUCENE-3282.patch > > > It would be nice to allow to add a custom child collector to the > BlockJoinQuery to be called on every matching doc (so we can do things with > it, like counts and such). Also, allow to extend BlockJoinQuery to have a > custom code that converts the filter bitset to an OpenBitSet. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3282) BlockJoinQuery: Allow to add a custom child collector, and customize the parent bitset extraction
[ https://issues.apache.org/jira/browse/LUCENE-3282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shay Banon updated LUCENE-3282: --- Attachment: LUCENE-3282.patch New version, with CollectorProvider. > BlockJoinQuery: Allow to add a custom child collector, and customize the > parent bitset extraction > - > > Key: LUCENE-3282 > URL: https://issues.apache.org/jira/browse/LUCENE-3282 > Project: Lucene - Java > Issue Type: Improvement > Components: core/search >Affects Versions: 3.4, 4.0 >Reporter: Shay Banon > Attachments: LUCENE-3282.patch, LUCENE-3282.patch > > > It would be nice to allow to add a custom child collector to the > BlockJoinQuery to be called on every matching doc (so we can do things with > it, like counts and such). Also, allow to extend BlockJoinQuery to have a > custom code that converts the filter bitset to an OpenBitSet. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3282) BlockJoinQuery: Allow to add a custom child collector, and customize the parent bitset extraction
[ https://issues.apache.org/jira/browse/LUCENE-3282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13063619#comment-13063619 ] Shay Banon commented on LUCENE-3282: Heya, In my app, I have a wrapper around OBS, that has a common interface that allows to access bits by index (similar to Bits in trunk), so I need to extract from it the OBS. Regarding the Collector, I will work on CollectorProvider interface. I liked the NoOpCollector option since then you don't have to check for nulls each time... > BlockJoinQuery: Allow to add a custom child collector, and customize the > parent bitset extraction > - > > Key: LUCENE-3282 > URL: https://issues.apache.org/jira/browse/LUCENE-3282 > Project: Lucene - Java > Issue Type: Improvement > Components: core/search >Affects Versions: 3.4, 4.0 >Reporter: Shay Banon > Attachments: LUCENE-3282.patch > > > It would be nice to allow to add a custom child collector to the > BlockJoinQuery to be called on every matching doc (so we can do things with > it, like counts and such). Also, allow to extend BlockJoinQuery to have a > custom code that converts the filter bitset to an OpenBitSet. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3282) BlockJoinQuery: Allow to add a custom child collector, and customize the parent bitset extraction
[ https://issues.apache.org/jira/browse/LUCENE-3282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shay Banon updated LUCENE-3282: --- Attachment: LUCENE-3282.patch > BlockJoinQuery: Allow to add a custom child collector, and customize the > parent bitset extraction > - > > Key: LUCENE-3282 > URL: https://issues.apache.org/jira/browse/LUCENE-3282 > Project: Lucene - Java > Issue Type: Improvement > Components: core/search >Affects Versions: 3.4, 4.0 >Reporter: Shay Banon > Attachments: LUCENE-3282.patch > > > It would be nice to allow to add a custom child collector to the > BlockJoinQuery to be called on every matching doc (so we can do things with > it, like counts and such). Also, allow to extend BlockJoinQuery to have a > custom code that converts the filter bitset to an OpenBitSet. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-3282) BlockJoinQuery: Allow to add a custom child collector, and customize the parent bitset extraction
BlockJoinQuery: Allow to add a custom child collector, and customize the parent bitset extraction - Key: LUCENE-3282 URL: https://issues.apache.org/jira/browse/LUCENE-3282 Project: Lucene - Java Issue Type: Improvement Components: core/search Affects Versions: 3.4, 4.0 Reporter: Shay Banon It would be nice to allow to add a custom child collector to the BlockJoinQuery to be called on every matching doc (so we can do things with it, like counts and such). Also, allow to extend BlockJoinQuery to have a custom code that converts the filter bitset to an OpenBitSet. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2960) Allow (or bring back) the ability to setRAMBufferSizeMB on an open IndexWriter
[ https://issues.apache.org/jira/browse/LUCENE-2960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13006214#comment-13006214 ] Shay Banon commented on LUCENE-2960: Just a note regarding the IWC and being able to consult it for live changes, it feels strange to me that settings something on the config will affect the IW in real time. Maybe its just me, but it feels nicer to have the "live" setters on IW compared to IWC. I also like the ability to decouple construction time configuration through IWC, and live settings through setters on IW. It is then very clear what can be set on construction time, and what can be set on a live IW. It also allows for compile time / static check for the code what can be changed at what lifecycle phase. > Allow (or bring back) the ability to setRAMBufferSizeMB on an open IndexWriter > -- > > Key: LUCENE-2960 > URL: https://issues.apache.org/jira/browse/LUCENE-2960 > Project: Lucene - Java > Issue Type: Improvement > Components: Index >Reporter: Shay Banon >Priority: Blocker > Fix For: 3.1, 4.0 > > > In 3.1 the ability to setRAMBufferSizeMB is deprecated, and removed in trunk. > It would be great to be able to control that on a live IndexWriter. Other > possible two methods that would be great to bring back are > setTermIndexInterval and setReaderTermsIndexDivisor. Most of the other > setters can actually be set on the MergePolicy itself, so no need for setters > for those (I think). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2960) Allow (or bring back) the ability to setRAMBufferSizeMB on an open IndexWriter
[ https://issues.apache.org/jira/browse/LUCENE-2960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13006049#comment-13006049 ] Shay Banon commented on LUCENE-2960: Heya, If I had to choose between being able to change things in real time to better concurrency thanks to immutability, I would definitely go with better concurrency. I have no problems with closing the writers and reopening them, though, as Mike said, this can come with a big cost. The funny thing is that a lot of the setters that were already there on the IndexWriter are still exposed, basically, through settings on the relevant MergePolicy, so I don't think we are talking about that many setter to begin with (I don't think we should bring those back to the IndexWriter). I think that the notion of IWC is a good one, and should remain, but only to provide construction time parameters to IW. It should not be consulted once the construction phase of IW is done. If explicit real time parameters are to be set, then IW should expose it as a setter. Now, the question is which, if any, setters should be exposed. Going through the list of current setters on IW, my vote is for the setRAMBufferSizeMB one. I am not sure that its that obscure use case. I believe Solr for example has a notion of cores (or something like that), so it can also be adaptive in terms of indexing buffer size dependent on the number of cores running in the VM. Also, one can easily run a system where it does bulk indexing, and then lowers the indexing buffer size for more "streamline" work. Its just a shame to close the writer for that (and having to pause all indexing work while this happens). The term interval and divisor, I agree, are such obscure (funnily, I use the divisor quite a lot), that closing the writer and opening it again make sense. > Allow (or bring back) the ability to setRAMBufferSizeMB on an open IndexWriter > -- > > Key: LUCENE-2960 > URL: https://issues.apache.org/jira/browse/LUCENE-2960 > Project: Lucene - Java > Issue Type: Improvement > Components: Index >Reporter: Shay Banon >Priority: Blocker > Fix For: 3.1, 4.0 > > > In 3.1 the ability to setRAMBufferSizeMB is deprecated, and removed in trunk. > It would be great to be able to control that on a live IndexWriter. Other > possible two methods that would be great to bring back are > setTermIndexInterval and setReaderTermsIndexDivisor. Most of the other > setters can actually be set on the MergePolicy itself, so no need for setters > for those (I think). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Created: (LUCENE-2960) Allow (or bring back) the ability to setRAMBufferSizeMB on an open IndexWriter
Allow (or bring back) the ability to setRAMBufferSizeMB on an open IndexWriter -- Key: LUCENE-2960 URL: https://issues.apache.org/jira/browse/LUCENE-2960 Project: Lucene - Java Issue Type: Improvement Components: Index Reporter: Shay Banon Fix For: 3.2, 4.0 In 3.1 the ability to setRAMBufferSizeMB is deprecated, and removed in trunk. It would be great to be able to control that on a live IndexWriter. Other possible two methods that would be great to bring back are setTermIndexInterval and setReaderTermsIndexDivisor. Most of the other setters can actually be set on the MergePolicy itself, so no need for setters for those (I think). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2474) Allow to plug in a Cache Eviction Listener to IndexReader to eagerly clean custom caches that use the IndexReader (getFieldCacheKey)
[ https://issues.apache.org/jira/browse/LUCENE-2474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shay Banon updated LUCENE-2474: --- Attachment: MapBackedSet.java A MapBackedSet implementation, that can wrap a CHM to have a concurrent set implementation. We can consider using that instead of sync set and copy on read when notifying listeners. > Allow to plug in a Cache Eviction Listener to IndexReader to eagerly clean > custom caches that use the IndexReader (getFieldCacheKey) > > > Key: LUCENE-2474 > URL: https://issues.apache.org/jira/browse/LUCENE-2474 > Project: Lucene - Java > Issue Type: Improvement > Components: Search >Reporter: Shay Banon >Assignee: Michael McCandless > Fix For: 3.1, 4.0 > > Attachments: LUCENE-2474.patch, LUCENE-2474.patch, LUCENE-2474.patch, > LUCENE-2574.patch, MapBackedSet.java > > > Allow to plug in a Cache Eviction Listener to IndexReader to eagerly clean > custom caches that use the IndexReader (getFieldCacheKey). > A spin of: https://issues.apache.org/jira/browse/LUCENE-2468. Basically, its > make a lot of sense to cache things based on IndexReader#getFieldCacheKey, > even Lucene itself uses it, for example, with the CachingWrapperFilter. > FieldCache enjoys being called explicitly to purge its cache when possible > (which is tricky to know from the "outside", especially when using NRT - > reader attack of the clones). > The provided patch allows to plug a CacheEvictionListener which will be > called when the cache should be purged for an IndexReader. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2871) Use FileChannel in FSDirectory
[ https://issues.apache.org/jira/browse/LUCENE-2871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12984263#action_12984263 ] Shay Banon commented on LUCENE-2871: Agreed Earwin, lets first see if it make sense, this is just an experiment and might not make sense for single threaded writes. > Use FileChannel in FSDirectory > -- > > Key: LUCENE-2871 > URL: https://issues.apache.org/jira/browse/LUCENE-2871 > Project: Lucene - Java > Issue Type: New Feature > Components: Store >Reporter: Shay Banon > Attachments: LUCENE-2871.patch, LUCENE-2871.patch > > > Explore using FileChannel in FSDirectory to see if it improves write > operations performance -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2871) Use FileChannel in FSDirectory
[ https://issues.apache.org/jira/browse/LUCENE-2871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12984206#action_12984206 ] Shay Banon commented on LUCENE-2871: bq. Looking at the current patch, the class seems wrong. In my opinion, this should be only in NIOFSDirectory. SimpleFSDir should only use RAF. Its a good question, not sure what to do with it. Here is the problem. The channel output can be used with all 3 FS dirs (simple, nio, and mmap), and actually might make sense to be used even with SimpleFS (i.e. using non nio to read, but use file channel to write). In order to have all of them supported, currently, the simplest way is to put it in the base class so code will be shared. On IRC, a discussion was made to externalize the outputs and inputs, and then one can more easily pick and choose, but I think this will belong on a different patch. > Use FileChannel in FSDirectory > -- > > Key: LUCENE-2871 > URL: https://issues.apache.org/jira/browse/LUCENE-2871 > Project: Lucene - Java > Issue Type: New Feature > Components: Store >Reporter: Shay Banon > Attachments: LUCENE-2871.patch, LUCENE-2871.patch > > > Explore using FileChannel in FSDirectory to see if it improves write > operations performance -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2871) Use FileChannel in FSDirectory
[ https://issues.apache.org/jira/browse/LUCENE-2871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shay Banon updated LUCENE-2871: --- Attachment: LUCENE-2871.patch Fixed Buffer Overflow exception (I hope, can't really recreate it, mike can...). Also, per the IRC discussion, made SimpleFSDirectory default to not use the file channel output, while NIO and MMap default to use it. One can still control if it will be used or not using the setter method. > Use FileChannel in FSDirectory > -- > > Key: LUCENE-2871 > URL: https://issues.apache.org/jira/browse/LUCENE-2871 > Project: Lucene - Java > Issue Type: New Feature > Components: Store >Reporter: Shay Banon > Attachments: LUCENE-2871.patch, LUCENE-2871.patch > > > Explore using FileChannel in FSDirectory to see if it improves write > operations performance -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2871) Use FileChannel in FSDirectory
[ https://issues.apache.org/jira/browse/LUCENE-2871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12984134#action_12984134 ] Shay Banon commented on LUCENE-2871: Strange, did not get it when running the tests, will try and find out why it can happen. > Use FileChannel in FSDirectory > -- > > Key: LUCENE-2871 > URL: https://issues.apache.org/jira/browse/LUCENE-2871 > Project: Lucene - Java > Issue Type: New Feature > Components: Store >Reporter: Shay Banon > Attachments: LUCENE-2871.patch > > > Explore using FileChannel in FSDirectory to see if it improves write > operations performance -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2474) Allow to plug in a Cache Eviction Listener to IndexReader to eagerly clean custom caches that use the IndexReader (getFieldCacheKey)
[ https://issues.apache.org/jira/browse/LUCENE-2474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12982695#action_12982695 ] Shay Banon commented on LUCENE-2474: Yea, I got the reasoning for Set, we can use that, CHM with PRESENT. If you want, I can attach a simple MapBackedSet that makes any Map a Set. Still, I think that using CopyOnWriteArrayList is best here. I don't think that adding and removing listeners is something that will be done often in an app. But I might be mistaken. In this case, traversal over listeners is much better on CopyOnWriteArrayList compared to CHM. > Allow to plug in a Cache Eviction Listener to IndexReader to eagerly clean > custom caches that use the IndexReader (getFieldCacheKey) > > > Key: LUCENE-2474 > URL: https://issues.apache.org/jira/browse/LUCENE-2474 > Project: Lucene - Java > Issue Type: Improvement > Components: Search >Reporter: Shay Banon > Attachments: LUCENE-2474.patch, LUCENE-2474.patch > > > Allow to plug in a Cache Eviction Listener to IndexReader to eagerly clean > custom caches that use the IndexReader (getFieldCacheKey). > A spin of: https://issues.apache.org/jira/browse/LUCENE-2468. Basically, its > make a lot of sense to cache things based on IndexReader#getFieldCacheKey, > even Lucene itself uses it, for example, with the CachingWrapperFilter. > FieldCache enjoys being called explicitly to purge its cache when possible > (which is tricky to know from the "outside", especially when using NRT - > reader attack of the clones). > The provided patch allows to plug a CacheEvictionListener which will be > called when the cache should be purged for an IndexReader. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2474) Allow to plug in a Cache Eviction Listener to IndexReader to eagerly clean custom caches that use the IndexReader (getFieldCacheKey)
[ https://issues.apache.org/jira/browse/LUCENE-2474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12982509#action_12982509 ] Shay Banon commented on LUCENE-2474: bq. OK, here's a patch exposing the readerFinishedListeners as static methods on IndexReader. I think we should use a CopyOneWriteArrayList so calling the listeners will not happen under a global synchronize block. If maintaining set behavior is required, then I can patch with a ConcurrentHashSet implementation or we can simply replace it with a CHM with PRESENT, or any other solution that does not require calling the listeners under a global sync block. > Allow to plug in a Cache Eviction Listener to IndexReader to eagerly clean > custom caches that use the IndexReader (getFieldCacheKey) > > > Key: LUCENE-2474 > URL: https://issues.apache.org/jira/browse/LUCENE-2474 > Project: Lucene - Java > Issue Type: Improvement > Components: Search >Reporter: Shay Banon > Attachments: LUCENE-2474.patch, LUCENE-2474.patch > > > Allow to plug in a Cache Eviction Listener to IndexReader to eagerly clean > custom caches that use the IndexReader (getFieldCacheKey). > A spin of: https://issues.apache.org/jira/browse/LUCENE-2468. Basically, its > make a lot of sense to cache things based on IndexReader#getFieldCacheKey, > even Lucene itself uses it, for example, with the CachingWrapperFilter. > FieldCache enjoys being called explicitly to purge its cache when possible > (which is tricky to know from the "outside", especially when using NRT - > reader attack of the clones). > The provided patch allows to plug a CacheEvictionListener which will be > called when the cache should be purged for an IndexReader. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2871) Use FileChannel in FSDirectory
[ https://issues.apache.org/jira/browse/LUCENE-2871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shay Banon updated LUCENE-2871: --- Attachment: LUCENE-2871.patch Patch supporting using file channel to write. FSDirectory still retain the ability to use RAF for writes. FSDirectory#setUseChannelOutput: Allows to revert back to use RAF by setting to false. FSDirectory#setCacheChannelBuffers: Allow to control if, when using file channel, buffers should be cached. > Use FileChannel in FSDirectory > -- > > Key: LUCENE-2871 > URL: https://issues.apache.org/jira/browse/LUCENE-2871 > Project: Lucene - Java > Issue Type: New Feature > Components: Store >Reporter: Shay Banon > Attachments: LUCENE-2871.patch > > > Explore using FileChannel in FSDirectory to see if it improves write > operations performance -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Created: (LUCENE-2871) Use FileChannel in FSDirectory
Use FileChannel in FSDirectory -- Key: LUCENE-2871 URL: https://issues.apache.org/jira/browse/LUCENE-2871 Project: Lucene - Java Issue Type: New Feature Components: Store Reporter: Shay Banon Explore using FileChannel in FSDirectory to see if it improves write operations performance -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2474) Allow to plug in a Cache Eviction Listener to IndexReader to eagerly clean custom caches that use the IndexReader (getFieldCacheKey)
[ https://issues.apache.org/jira/browse/LUCENE-2474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12978975#action_12978975 ] Shay Banon commented on LUCENE-2474: bq. But: I think we'd want to have composite reader just forward the registration down to the atomic readers? (And, forward on reopen). I am not sure that you would want to do it. Any caching layer or an external component that is properly written would work on the low level segment readers, it will not even compile against compound readers. This will help direct people to write proper code and dealing only with segment readers. > Allow to plug in a Cache Eviction Listener to IndexReader to eagerly clean > custom caches that use the IndexReader (getFieldCacheKey) > > > Key: LUCENE-2474 > URL: https://issues.apache.org/jira/browse/LUCENE-2474 > Project: Lucene - Java > Issue Type: Improvement > Components: Search >Reporter: Shay Banon > Attachments: LUCENE-2474.patch > > > Allow to plug in a Cache Eviction Listener to IndexReader to eagerly clean > custom caches that use the IndexReader (getFieldCacheKey). > A spin of: https://issues.apache.org/jira/browse/LUCENE-2468. Basically, its > make a lot of sense to cache things based on IndexReader#getFieldCacheKey, > even Lucene itself uses it, for example, with the CachingWrapperFilter. > FieldCache enjoys being called explicitly to purge its cache when possible > (which is tricky to know from the "outside", especially when using NRT - > reader attack of the clones). > The provided patch allows to plug a CacheEvictionListener which will be > called when the cache should be purged for an IndexReader. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2474) Allow to plug in a Cache Eviction Listener to IndexReader to eagerly clean custom caches that use the IndexReader (getFieldCacheKey)
[ https://issues.apache.org/jira/browse/LUCENE-2474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12978719#action_12978719 ] Shay Banon commented on LUCENE-2474: > It would be a cache of anything... one element of that cache would be the > FieldCache, there could be one for filters, or one entry per-filter. > edit: Maybe a better way to think about it is like a ServletContext or > something - it's just a way to attach anything arbitrary to a reader. Got you. My personal taste is to try and keep those readers as lightweight as possible, and have the proper constructs in place to allow to externally use them for caching, without having them manage it as well. > Not with this current patch, as there is no mechanism to get a callback when > you do care about deletes. If I want to cache something that depends on > deletions, I want to purge that cache when the actual reader is closed (as > opposed to the reader's core cache key that is shared amongst all readers > that just have different deletions). So if we go a "close event" route, we > really want two different events... one for the close of a reader (i.e. > deleted matter), and one for the close of the segment (deletes don't matter). I think that a cache that is affected by deletes is a problematic cache to begin with, so was thinking that maybe it should be discouraged by not allowing for it. Especially with NRT. My idea was to simply expand the purge capability that the FC gets for free to other external custom components. Also, if we did have a type safe separation between segment readers and compound readers, I would not have added the ability to register a listener on the compound readers, just the segment readers, as this will encourage people to write caches that only work on segment readers (since the registration for the "purge event" will happen within the cache, and it should work only with segment readers). That was why my patch does not take compound readers into account. > Allow to plug in a Cache Eviction Listener to IndexReader to eagerly clean > custom caches that use the IndexReader (getFieldCacheKey) > > > Key: LUCENE-2474 > URL: https://issues.apache.org/jira/browse/LUCENE-2474 > Project: Lucene - Java > Issue Type: Improvement > Components: Search >Reporter: Shay Banon > Attachments: LUCENE-2474.patch > > > Allow to plug in a Cache Eviction Listener to IndexReader to eagerly clean > custom caches that use the IndexReader (getFieldCacheKey). > A spin of: https://issues.apache.org/jira/browse/LUCENE-2468. Basically, its > make a lot of sense to cache things based on IndexReader#getFieldCacheKey, > even Lucene itself uses it, for example, with the CachingWrapperFilter. > FieldCache enjoys being called explicitly to purge its cache when possible > (which is tricky to know from the "outside", especially when using NRT - > reader attack of the clones). > The provided patch allows to plug a CacheEvictionListener which will be > called when the cache should be purged for an IndexReader. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2474) Allow to plug in a Cache Eviction Listener to IndexReader to eagerly clean custom caches that use the IndexReader (getFieldCacheKey)
[ https://issues.apache.org/jira/browse/LUCENE-2474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12978480#action_12978480 ] Shay Banon commented on LUCENE-2474: Right, I was thinking that its a low level API that you can just add it to the low level readers, but I agree, it will be nicer to have it on the high level as well. Regarding the close method name, I guess we can name it similar to the FieldCache one, maybe purge? > We've talked before about putting caches directly on the readers - that still > seems like the most straightforward approach? not sure I understand that. Do you mean getting FieldCache into the readers? And then what about cached filters? And other custom caching constructs that rely on the same mechanism as the CachingWrapperFilter? I think that if one implements such caching, its an advance enough feature where you should know how to handle deletes and other tidbits (if you need to). > We really need one cache that doesn't care about deletions, and one cache > that does. Isn't that up to the cache to decide? That cache can be anything (internally implemented in Lucene or externally) that follows the mechanism of caching based on (segment) readers. As long as there are constructs to get the deleted docs to handle deletes (for example), then the implementation can use it. > Allow to plug in a Cache Eviction Listener to IndexReader to eagerly clean > custom caches that use the IndexReader (getFieldCacheKey) > > > Key: LUCENE-2474 > URL: https://issues.apache.org/jira/browse/LUCENE-2474 > Project: Lucene - Java > Issue Type: Improvement > Components: Search >Reporter: Shay Banon > Attachments: LUCENE-2474.patch > > > Allow to plug in a Cache Eviction Listener to IndexReader to eagerly clean > custom caches that use the IndexReader (getFieldCacheKey). > A spin of: https://issues.apache.org/jira/browse/LUCENE-2468. Basically, its > make a lot of sense to cache things based on IndexReader#getFieldCacheKey, > even Lucene itself uses it, for example, with the CachingWrapperFilter. > FieldCache enjoys being called explicitly to purge its cache when possible > (which is tricky to know from the "outside", especially when using NRT - > reader attack of the clones). > The provided patch allows to plug a CacheEvictionListener which will be > called when the cache should be purged for an IndexReader. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2292) ByteBuffer Directory - allowing to store the index outside the heap
[ https://issues.apache.org/jira/browse/LUCENE-2292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shay Banon updated LUCENE-2292: --- Attachment: LUCENE-2292.patch A fixed path that now passes all tests using the byte buffer directory. Also, includes refactoring into a different package (store.bytebuffer), and includes a custom ByteBufferAllocator interface that can control how buffers are allocated, including plain and caching implementations. > ByteBuffer Directory - allowing to store the index outside the heap > --- > > Key: LUCENE-2292 > URL: https://issues.apache.org/jira/browse/LUCENE-2292 > Project: Lucene - Java > Issue Type: New Feature > Components: Store >Reporter: Shay Banon > Attachments: LUCENE-2292.patch, LUCENE-2292.patch, LUCENE-2292.patch > > > A byte buffer based directory with the benefit of being able to create direct > byte buffer thus storing the index outside the JVM heap. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2779) Use ConcurrentHashMap in RAMDirectory
[ https://issues.apache.org/jira/browse/LUCENE-2779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12966376#action_12966376 ] Shay Banon commented on LUCENE-2779: If the assumption still stands that an IndexInput will not be opened on a "writing" / unclosed IndexOutput, then RAMFile can also be improved when it comes to concurrency. The RAMOutputStream can maintain its own list of buffers (simple array list, no need to sync), and only when it gets closed, initialize the respective RAMFile with the list. This means most of the synchronize aspects of RAMFile can be removed. Also, on RAMFile, lastModified can be made volatile, and remove the sync on its respective methods. > Use ConcurrentHashMap in RAMDirectory > - > > Key: LUCENE-2779 > URL: https://issues.apache.org/jira/browse/LUCENE-2779 > Project: Lucene - Java > Issue Type: Improvement > Components: Store >Reporter: Shai Erera >Assignee: Shai Erera >Priority: Minor > Fix For: 3.1, 4.0 > > Attachments: LUCENE-2779-backwardsfix.patch, LUCENE-2779.patch, > LUCENE-2779.patch, LUCENE-2779.patch, LUCENE-2779.patch, TestCHM.java > > > RAMDirectory synchronizes on its instance in many places to protect access to > map of RAMFiles, in addition to updating the sizeInBytes member. In many > places the sync is done for 'read' purposes, while only in few places we need > 'write' access. This looks like a perfect use case for ConcurrentHashMap > Also, syncing around sizeInBytes is unnecessary IMO, since it's an AtomicLong > ... > I'll post a patch shortly. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2773) Don't create compound file for large segments by default
[ https://issues.apache.org/jira/browse/LUCENE-2773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12935299#action_12935299 ] Shay Banon commented on LUCENE-2773: Mike, are you sure regarding the default maxMergeMB set to 2gb? This ia a big change in default behavior. For systems that do updates (deletes) we are covered because they are taken (partially) into account when computing the segment size. But, lets say you have a 100gb size index, you will end up with 50 segments, no? > Don't create compound file for large segments by default > > > Key: LUCENE-2773 > URL: https://issues.apache.org/jira/browse/LUCENE-2773 > Project: Lucene - Java > Issue Type: Improvement > Components: Index >Reporter: Michael McCandless >Assignee: Michael McCandless > Fix For: 2.9.4, 3.0.3, 3.1, 4.0 > > Attachments: LUCENE-2773.patch > > > Spinoff from LUCENE-2762. > CFS is useful for keeping the open file count down. But, it costs > some added time during indexing to build, and also ties up temporary > disk space, causing eg a large spike on the final merge of an > optimize. > Since MergePolicy dictates which segments should be CFS, we can > change it to only build CFS for "smallish" merges. > I think we should also set a maxMergeMB by default so that very large > merges aren't done. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1536) if a filter can support random access API, we should use it
[ https://issues.apache.org/jira/browse/LUCENE-1536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12928859#action_12928859 ] Shay Banon commented on LUCENE-1536: Hi Mike, Wondering what are your thoughts on fixing filters correctly are? I think that the initial thought of getting filters all the way down to postings enumeration if they support random access is a great one. A random access doc id set can be added (interface), and if a filter returns it (can be checked using instanceof), then the that doc set can be passed all the way to the enumeration (and intersected per doc with the deleted docs). I think that any type of solution should support the great feature of Lucene queries, for example, FilteredQuery should use that, allowing to build complex query expressions without having the mentioned optimization only applied on the top level search. As most filters results do support random access, either because they use OpenBitSet, or because they are built on top of FieldCache functionality, I think this feature will give great speed improvements to the query execution time. > if a filter can support random access API, we should use it > --- > > Key: LUCENE-1536 > URL: https://issues.apache.org/jira/browse/LUCENE-1536 > Project: Lucene - Java > Issue Type: Improvement > Components: Search >Affects Versions: 2.4 >Reporter: Michael McCandless >Assignee: Michael McCandless >Priority: Minor > Fix For: 4.0 > > Attachments: CachedFilterIndexReader.java, LUCENE-1536.patch, > LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch > > > I ran some performance tests, comparing applying a filter via > random-access API instead of current trunk's iterator API. > This was inspired by LUCENE-1476, where we realized deletions should > really be implemented just like a filter, but then in testing found > that switching deletions to iterator was a very sizable performance > hit. > Some notes on the test: > * Index is first 2M docs of Wikipedia. Test machine is Mac OS X > 10.5.6, quad core Intel CPU, 6 GB RAM, java 1.6.0_07-b06-153. > * I test across multiple queries. 1-X means an OR query, eg 1-4 > means 1 OR 2 OR 3 OR 4, whereas +1-4 is an AND query, ie 1 AND 2 > AND 3 AND 4. "u s" means "united states" (phrase search). > * I test with multiple filter densities (0, 1, 2, 5, 10, 25, 75, 90, > 95, 98, 99, 99.9 (filter is non-null but all bits are set), > 100 (filter=null, control)). > * Method high means I use random-access filter API in > IndexSearcher's main loop. Method low means I use random-access > filter API down in SegmentTermDocs (just like deleted docs > today). > * Baseline (QPS) is current trunk, where filter is applied as iterator up > "high" (ie in IndexSearcher's search loop). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Created: (LUCENE-2666) ArrayIndexOutOfBoundsException when iterating over TermDocs
ArrayIndexOutOfBoundsException when iterating over TermDocs --- Key: LUCENE-2666 URL: https://issues.apache.org/jira/browse/LUCENE-2666 Project: Lucene - Java Issue Type: Bug Components: Index Affects Versions: 3.0.2 Reporter: Shay Banon A user got this very strange exception, and I managed to get the index that it happens on. Basically, iterating over the TermDocs causes an AAOIB exception. I easily reproduced it using the FieldCache which does exactly that (the field in question is indexed as numeric). Here is the exception: Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 114 at org.apache.lucene.util.BitVector.get(BitVector.java:104) at org.apache.lucene.index.SegmentTermDocs.next(SegmentTermDocs.java:127) at org.apache.lucene.search.FieldCacheImpl$LongCache.createValue(FieldCacheImpl.java:501) at org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:183) at org.apache.lucene.search.FieldCacheImpl.getLongs(FieldCacheImpl.java:470) at TestMe.main(TestMe.java:56) It happens on the following segment: _26t docCount: 914 delCount: 1 delFileName: _26t_1.del And as you can see, it smells like a corner case (it fails for document number 912, the AIOOB happens from the deleted docs). The code to recreate it is simple: FSDirectory dir = FSDirectory.open(new File("index")); IndexReader reader = IndexReader.open(dir, true); IndexReader[] subReaders = reader.getSequentialSubReaders(); for (IndexReader subReader : subReaders) { Field field = subReader.getClass().getSuperclass().getDeclaredField("si"); field.setAccessible(true); SegmentInfo si = (SegmentInfo) field.get(subReader); System.out.println("--> " + si); if (si.getDocStoreSegment().contains("_26t")) { // this is the probleatic one... System.out.println("problematic one..."); FieldCache.DEFAULT.getLongs(subReader, "__documentdate", FieldCache.NUMERIC_UTILS_LONG_PARSER); } } Here is the result of a check index on that segment: 8 of 10: name=_26t docCount=914 compound=true hasProx=true numFiles=2 size (MB)=1.641 diagnostics = {optimize=false, mergeFactor=10, os.version=2.6.18-194.11.1.el5.centos.plus, os=Linux, mergeDocStores=true, lucene.version=3.0.2 953716 - 2010-06-11 17:13:53, source=merge, os.arch=amd64, java.version=1.6.0, java.vendor=Sun Microsystems Inc.} has deletions [delFileName=_26t_1.del] test: open reader.OK [1 deleted docs] test: fields..OK [32 fields] test: field norms.OK [32 fields] test: terms, freq, prox...ERROR [114] java.lang.ArrayIndexOutOfBoundsException: 114 at org.apache.lucene.util.BitVector.get(BitVector.java:104) at org.apache.lucene.index.SegmentTermDocs.next(SegmentTermDocs.java:127) at org.apache.lucene.index.SegmentTermPositions.next(SegmentTermPositions.java:102) at org.apache.lucene.index.CheckIndex.testTermIndex(CheckIndex.java:616) at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:509) at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:299) at TestMe.main(TestMe.java:47) test: stored fields...ERROR [114] java.lang.ArrayIndexOutOfBoundsException: 114 at org.apache.lucene.util.BitVector.get(BitVector.java:104) at org.apache.lucene.index.ReadOnlySegmentReader.isDeleted(ReadOnlySegmentReader.java:34) at org.apache.lucene.index.CheckIndex.testStoredFields(CheckIndex.java:684) at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:512) at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:299) at TestMe.main(TestMe.java:47) test: term vectorsERROR [114] java.lang.ArrayIndexOutOfBoundsException: 114 at org.apache.lucene.util.BitVector.get(BitVector.java:104) at org.apache.lucene.index.ReadOnlySegmentReader.isDeleted(ReadOnlySegmentReader.java:34) at org.apache.lucene.index.CheckIndex.testTermVectors(CheckIndex.java:721) at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:515) at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:299) at TestMe.main(TestMe.java:47) The creation of the index does not do something fancy (all defaults), though there is usage of the near real time aspect (IndexWriter#getReader) which does complicate deleted docs handling. Seems like the deleted docs got written without matching the number of docs?. Sadly, I don't have something that recreates it from scratch, but I do have the index if someone want to have a look at it (mail me directly and I will provide a download link). I wil
[jira] Commented: (LUCENE-2161) Some concurrency improvements for NRT
[ https://issues.apache.org/jira/browse/LUCENE-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12874454#action_12874454 ] Shay Banon commented on LUCENE-2161: Thanks! > Some concurrency improvements for NRT > - > > Key: LUCENE-2161 > URL: https://issues.apache.org/jira/browse/LUCENE-2161 > Project: Lucene - Java > Issue Type: Improvement > Components: Index >Reporter: Michael McCandless >Assignee: Michael McCandless >Priority: Minor > Fix For: 2.9.3, 3.0.2, 3.1, 4.0 > > Attachments: LUCENE-2161.patch > > > Some concurrency improvements for NRT > I found & fixed some silly thread bottlenecks that affect NRT: > * Multi/DirectoryReader.numDocs is synchronized, I think so only 1 > thread computes numDocs if it's -1. I removed this sync, and made > numDocs volatile, instead. Yes, multiple threads may compute the > numDocs for the first time, but I think that's harmless? > * Fixed BitVector's ctor to set count to 0 on creating a new BV, and > clone to copy the count over; this saves CPU computing the count > unecessarily. > * Also strengthened assertions done in SR, testing the delete docs > count. > I also found an annoying thread bottleneck that happens, due to CMS. > Whenever CMS hits the max running merges (default changed from 3 to 1 > recently), and the merge policy now wants to launch another merge, it > forces the incoming thread to wait until one of the BG threads > finishes. > This is a basic crude throttling mechanism -- you force the mutators > (whoever is causing new segments to appear) to stop, so that merging > can catch up. > Unfortunately, when stressing NRT, that thread is the one that's > opening a new NRT reader. > So, the first serious problem happens when you call .reopen() on your > NRT reader -- this call simply forwards to IW.getReader if the reader > was an NRT reader. But, because DirectoryReader.doReopen is > synchronized, this had the horrible effect of holding the monitor lock > on your main IR. In my test, this blocked all searches (since each > search uses incRef/decRef, still sync'd until LUCENE-2156, at least). > I fixed this by making doReopen only sync'd on this if it's not simply > forwarding to getWriter. So that's a good step forward. > This prevents searches from being blocked while trying to reopen to a > new NRT. > However... it doesn't fix the problem that when an immense merge is > off and running, opening an NRT reader could hit a tremendous delay > because CMS blocks it. The BalancedSegmentMergePolicy should help > here... by avoiding such immense merges. > But, I think we should also pursue an improvement to CMS. EG, if it > has 2 merges running, where one is huge and one is tiny, it ought to > increase thread priority of the tiny one. I think with such a change > we could increase the max thread count again, to prevent this > starvation. I'll open a separate issue -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2161) Some concurrency improvements for NRT
[ https://issues.apache.org/jira/browse/LUCENE-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12873475#action_12873475 ] Shay Banon commented on LUCENE-2161: Mike, is there a reason why this is not backported to 3.0.2? > Some concurrency improvements for NRT > - > > Key: LUCENE-2161 > URL: https://issues.apache.org/jira/browse/LUCENE-2161 > Project: Lucene - Java > Issue Type: Improvement > Components: Index >Reporter: Michael McCandless >Assignee: Michael McCandless >Priority: Minor > Fix For: 2.9.3, 4.0 > > Attachments: LUCENE-2161.patch > > > Some concurrency improvements for NRT > I found & fixed some silly thread bottlenecks that affect NRT: > * Multi/DirectoryReader.numDocs is synchronized, I think so only 1 > thread computes numDocs if it's -1. I removed this sync, and made > numDocs volatile, instead. Yes, multiple threads may compute the > numDocs for the first time, but I think that's harmless? > * Fixed BitVector's ctor to set count to 0 on creating a new BV, and > clone to copy the count over; this saves CPU computing the count > unecessarily. > * Also strengthened assertions done in SR, testing the delete docs > count. > I also found an annoying thread bottleneck that happens, due to CMS. > Whenever CMS hits the max running merges (default changed from 3 to 1 > recently), and the merge policy now wants to launch another merge, it > forces the incoming thread to wait until one of the BG threads > finishes. > This is a basic crude throttling mechanism -- you force the mutators > (whoever is causing new segments to appear) to stop, so that merging > can catch up. > Unfortunately, when stressing NRT, that thread is the one that's > opening a new NRT reader. > So, the first serious problem happens when you call .reopen() on your > NRT reader -- this call simply forwards to IW.getReader if the reader > was an NRT reader. But, because DirectoryReader.doReopen is > synchronized, this had the horrible effect of holding the monitor lock > on your main IR. In my test, this blocked all searches (since each > search uses incRef/decRef, still sync'd until LUCENE-2156, at least). > I fixed this by making doReopen only sync'd on this if it's not simply > forwarding to getWriter. So that's a good step forward. > This prevents searches from being blocked while trying to reopen to a > new NRT. > However... it doesn't fix the problem that when an immense merge is > off and running, opening an NRT reader could hit a tremendous delay > because CMS blocks it. The BalancedSegmentMergePolicy should help > here... by avoiding such immense merges. > But, I think we should also pursue an improvement to CMS. EG, if it > has 2 merges running, where one is huge and one is tiny, it ought to > increase thread priority of the tiny one. I think with such a change > we could increase the max thread count again, to prevent this > starvation. I'll open a separate issue -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2468) reopen on NRT reader should share readers w/ unchanged segments
[ https://issues.apache.org/jira/browse/LUCENE-2468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12869779#action_12869779 ] Shay Banon commented on LUCENE-2468: Hi Mike, First, I opened and attached a patch regarding the Cache eviction listeners to IndexReader: https://issues.apache.org/jira/browse/LUCENE-2474, tell me what you think. Regarding your last comment, I agree. Though, trying to streamline its usage in terms of having all built in components and possible extensions work well with it make sense. Thats what you suggest in with the filtered doc set, which is cool. > reopen on NRT reader should share readers w/ unchanged segments > --- > > Key: LUCENE-2468 > URL: https://issues.apache.org/jira/browse/LUCENE-2468 > Project: Lucene - Java > Issue Type: Bug >Reporter: Yonik Seeley >Assignee: Michael McCandless > Attachments: CacheTest.java, DeletionAwareConstantScoreQuery.java, > LUCENE-2468.patch, LUCENE-2468.patch > > > A repoen on an NRT reader doesn't seem to share readers for those segments > that are unchanged. > http://search.lucidimagination.com/search/document/9f0335d480d2e637/nrt_and_caching_based_on_indexreader -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2474) Allow to plug in a Cache Eviction Listener to IndexReader to eagerly clean custom caches that use the IndexReader (getFieldCacheKey)
[ https://issues.apache.org/jira/browse/LUCENE-2474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shay Banon updated LUCENE-2474: --- Attachment: LUCENE-2474.patch First revision of the patch, tell me what you think... . > Allow to plug in a Cache Eviction Listener to IndexReader to eagerly clean > custom caches that use the IndexReader (getFieldCacheKey) > > > Key: LUCENE-2474 > URL: https://issues.apache.org/jira/browse/LUCENE-2474 > Project: Lucene - Java > Issue Type: Improvement > Components: Search >Reporter: Shay Banon > Attachments: LUCENE-2474.patch > > > Allow to plug in a Cache Eviction Listener to IndexReader to eagerly clean > custom caches that use the IndexReader (getFieldCacheKey). > A spin of: https://issues.apache.org/jira/browse/LUCENE-2468. Basically, its > make a lot of sense to cache things based on IndexReader#getFieldCacheKey, > even Lucene itself uses it, for example, with the CachingWrapperFilter. > FieldCache enjoys being called explicitly to purge its cache when possible > (which is tricky to know from the "outside", especially when using NRT - > reader attack of the clones). > The provided patch allows to plug a CacheEvictionListener which will be > called when the cache should be purged for an IndexReader. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Created: (LUCENE-2474) Allow to plug in a Cache Eviction Listener to IndexReader to eagerly clean custom caches that use the IndexReader (getFieldCacheKey)
Allow to plug in a Cache Eviction Listener to IndexReader to eagerly clean custom caches that use the IndexReader (getFieldCacheKey) Key: LUCENE-2474 URL: https://issues.apache.org/jira/browse/LUCENE-2474 Project: Lucene - Java Issue Type: Improvement Components: Search Reporter: Shay Banon Allow to plug in a Cache Eviction Listener to IndexReader to eagerly clean custom caches that use the IndexReader (getFieldCacheKey). A spin of: https://issues.apache.org/jira/browse/LUCENE-2468. Basically, its make a lot of sense to cache things based on IndexReader#getFieldCacheKey, even Lucene itself uses it, for example, with the CachingWrapperFilter. FieldCache enjoys being called explicitly to purge its cache when possible (which is tricky to know from the "outside", especially when using NRT - reader attack of the clones). The provided patch allows to plug a CacheEvictionListener which will be called when the cache should be purged for an IndexReader. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2468) reopen on NRT reader should share readers w/ unchanged segments
[ https://issues.apache.org/jira/browse/LUCENE-2468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12869369#action_12869369 ] Shay Banon commented on LUCENE-2468: bq. So... why not do this in CachingWrapper/SpanFilter, but, instead of discarding the cache entry when deletions must be enforced, we dynamically apply the deletions? (I think we could use FilteredDocIdSet). Yea, that would work well. You will need to somehow still know when to enable or disable this based on the filter you use (it should basically only be enabled ones that are passed to constant score... . bq. Really... we need a more generic solution here (but, it's a much bigger change), where somehow in creating the scorer per-segment we dynamically determine who/where the deletions are enforced. A Filter need not care about deletions if it's AND'd w/ a query that already enforces the deletions. Agreed. As I see it, caching based on IndexReader is key in Lucene, and with NRT, it should feel the same way as it is without it. NRT should not change the way you build your system. > reopen on NRT reader should share readers w/ unchanged segments > --- > > Key: LUCENE-2468 > URL: https://issues.apache.org/jira/browse/LUCENE-2468 > Project: Lucene - Java > Issue Type: Bug >Reporter: Yonik Seeley >Assignee: Michael McCandless > Attachments: CacheTest.java, DeletionAwareConstantScoreQuery.java, > LUCENE-2468.patch, LUCENE-2468.patch > > > A repoen on an NRT reader doesn't seem to share readers for those segments > that are unchanged. > http://search.lucidimagination.com/search/document/9f0335d480d2e637/nrt_and_caching_based_on_indexreader -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2468) reopen on NRT reader should share readers w/ unchanged segments
[ https://issues.apache.org/jira/browse/LUCENE-2468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12869253#action_12869253 ] Shay Banon commented on LUCENE-2468: bq. With the perf fix we are doing here, the problem (not correctly "seeing" deletes on a reopened reader) is isolated to CachingWrapperFilter/CachingSpanFilter, right? Yes, but, this means that ConstantScoreQuery should basically not be cached when using NRT (even with using IndexReader as key...), because of the excessive readers created. With the one that is deletion aware, you can cache it based on the cache key. bq. I think this would be a good change - it would make eviction immediate instead of just when GC gets around to pruning the WeakHashMap. Can you open a separate issue and maybe work out a patch? Sure, I will do it. > reopen on NRT reader should share readers w/ unchanged segments > --- > > Key: LUCENE-2468 > URL: https://issues.apache.org/jira/browse/LUCENE-2468 > Project: Lucene - Java > Issue Type: Bug >Reporter: Yonik Seeley >Assignee: Michael McCandless > Attachments: CacheTest.java, DeletionAwareConstantScoreQuery.java, > LUCENE-2468.patch, LUCENE-2468.patch > > > A repoen on an NRT reader doesn't seem to share readers for those segments > that are unchanged. > http://search.lucidimagination.com/search/document/9f0335d480d2e637/nrt_and_caching_based_on_indexreader -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2468) reopen on NRT reader should share readers w/ unchanged segments
[ https://issues.apache.org/jira/browse/LUCENE-2468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12868959#action_12868959 ] Shay Banon commented on LUCENE-2468: Another quick question Mike, what do you think about the ability to know when a "cache key" is actually closed so it can be removed from a cache? Similar in concept to the eviction done from the field cache in trunk by readers, but open so other Reader#cacheKey based caches (which is the simplest way to do caching in Lucene) can use. > reopen on NRT reader should share readers w/ unchanged segments > --- > > Key: LUCENE-2468 > URL: https://issues.apache.org/jira/browse/LUCENE-2468 > Project: Lucene - Java > Issue Type: Bug >Reporter: Yonik Seeley >Assignee: Michael McCandless > Attachments: CacheTest.java, DeletionAwareConstantScoreQuery.java, > LUCENE-2468.patch, LUCENE-2468.patch > > > A repoen on an NRT reader doesn't seem to share readers for those segments > that are unchanged. > http://search.lucidimagination.com/search/document/9f0335d480d2e637/nrt_and_caching_based_on_indexreader -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2468) reopen on NRT reader should share readers w/ unchanged segments
[ https://issues.apache.org/jira/browse/LUCENE-2468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shay Banon updated LUCENE-2468: --- Attachment: DeletionAwareConstantScoreQuery.java Here is a go at making ConstantScoreQuery deletion aware. I named it differently, but it can replace ConstantScoreQuery with a flag making it deletion aware. What do you think? > reopen on NRT reader should share readers w/ unchanged segments > --- > > Key: LUCENE-2468 > URL: https://issues.apache.org/jira/browse/LUCENE-2468 > Project: Lucene - Java > Issue Type: Bug >Reporter: Yonik Seeley >Assignee: Michael McCandless > Attachments: CacheTest.java, DeletionAwareConstantScoreQuery.java, > LUCENE-2468.patch, LUCENE-2468.patch > > > A repoen on an NRT reader doesn't seem to share readers for those segments > that are unchanged. > http://search.lucidimagination.com/search/document/9f0335d480d2e637/nrt_and_caching_based_on_indexreader -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2468) reopen on NRT reader should share readers w/ unchanged segments
[ https://issues.apache.org/jira/browse/LUCENE-2468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12868923#action_12868923 ] Shay Banon commented on LUCENE-2468: Ahh, now I see that, sorry I missed it. But, basically, enforcing deletions means that we are back to the original problem... . I think it would be quite confusing for users, to be honest. Out of the filters, the problematic ones are the ones that can be converted to queries. From what I can see, the FilteredQuery is ok, so, maybe the ConstantScore can be changed (if possible) to do that... . > reopen on NRT reader should share readers w/ unchanged segments > --- > > Key: LUCENE-2468 > URL: https://issues.apache.org/jira/browse/LUCENE-2468 > Project: Lucene - Java > Issue Type: Bug >Reporter: Yonik Seeley >Assignee: Michael McCandless > Attachments: CacheTest.java, LUCENE-2468.patch, LUCENE-2468.patch > > > A repoen on an NRT reader doesn't seem to share readers for those segments > that are unchanged. > http://search.lucidimagination.com/search/document/9f0335d480d2e637/nrt_and_caching_based_on_indexreader -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2468) reopen on NRT reader should share readers w/ unchanged segments
[ https://issues.apache.org/jira/browse/LUCENE-2468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12868869#action_12868869 ] Shay Banon commented on LUCENE-2468: Check two comments above :), we discussed it. Basically, it does not work with your change and it using a cached filter. > reopen on NRT reader should share readers w/ unchanged segments > --- > > Key: LUCENE-2468 > URL: https://issues.apache.org/jira/browse/LUCENE-2468 > Project: Lucene - Java > Issue Type: Bug >Reporter: Yonik Seeley >Assignee: Michael McCandless > Attachments: CacheTest.java, LUCENE-2468.patch, LUCENE-2468.patch > > > A repoen on an NRT reader doesn't seem to share readers for those segments > that are unchanged. > http://search.lucidimagination.com/search/document/9f0335d480d2e637/nrt_and_caching_based_on_indexreader -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2468) reopen on NRT reader should share readers w/ unchanged segments
[ https://issues.apache.org/jira/browse/LUCENE-2468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12868816#action_12868816 ] Shay Banon commented on LUCENE-2468: Thanks for the work Michael!. Is this issue going to include the ConstantSoreQuery, or should I open a different issue for this? > reopen on NRT reader should share readers w/ unchanged segments > --- > > Key: LUCENE-2468 > URL: https://issues.apache.org/jira/browse/LUCENE-2468 > Project: Lucene - Java > Issue Type: Bug >Reporter: Yonik Seeley >Assignee: Michael McCandless > Attachments: CacheTest.java, LUCENE-2468.patch, LUCENE-2468.patch > > > A repoen on an NRT reader doesn't seem to share readers for those segments > that are unchanged. > http://search.lucidimagination.com/search/document/9f0335d480d2e637/nrt_and_caching_based_on_indexreader -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2468) reopen on NRT reader should share readers w/ unchanged segments
[ https://issues.apache.org/jira/browse/LUCENE-2468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12868680#action_12868680 ] Shay Banon commented on LUCENE-2468: Agreed, seems like ConstantScoreQuery is the only problematic one... . > reopen on NRT reader should share readers w/ unchanged segments > --- > > Key: LUCENE-2468 > URL: https://issues.apache.org/jira/browse/LUCENE-2468 > Project: Lucene - Java > Issue Type: Bug >Reporter: Yonik Seeley >Assignee: Michael McCandless > Attachments: CacheTest.java, LUCENE-2468.patch > > > A repoen on an NRT reader doesn't seem to share readers for those segments > that are unchanged. > http://search.lucidimagination.com/search/document/9f0335d480d2e637/nrt_and_caching_based_on_indexreader -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2468) reopen on NRT reader should share readers w/ unchanged segments
[ https://issues.apache.org/jira/browse/LUCENE-2468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shay Banon updated LUCENE-2468: --- Attachment: CacheTest.java > reopen on NRT reader should share readers w/ unchanged segments > --- > > Key: LUCENE-2468 > URL: https://issues.apache.org/jira/browse/LUCENE-2468 > Project: Lucene - Java > Issue Type: Bug >Reporter: Yonik Seeley >Assignee: Michael McCandless > Attachments: CacheTest.java, LUCENE-2468.patch > > > A repoen on an NRT reader doesn't seem to share readers for those segments > that are unchanged. > http://search.lucidimagination.com/search/document/9f0335d480d2e637/nrt_and_caching_based_on_indexreader -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2468) reopen on NRT reader should share readers w/ unchanged segments
[ https://issues.apache.org/jira/browse/LUCENE-2468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12868659#action_12868659 ] Shay Banon commented on LUCENE-2468: I think that the solution suggested, to use the FieldCacheKey is not good enough, sadly. I am attaching a simpel test that shows that this does not work for cases when a query is passed to a searcher, without a filter, but that query, is, for example, a ConstantScoreQuery. I have simply taken the CachingWrapperFiler and changed it to use the getFieldCacheKey instead of using the IndexReader. This is problematic, since a filter can be used somewhere in the query tree, and wrapped for caching. I am running against 3.0.1. > reopen on NRT reader should share readers w/ unchanged segments > --- > > Key: LUCENE-2468 > URL: https://issues.apache.org/jira/browse/LUCENE-2468 > Project: Lucene - Java > Issue Type: Bug >Reporter: Yonik Seeley >Assignee: Michael McCandless > Attachments: LUCENE-2468.patch > > > A repoen on an NRT reader doesn't seem to share readers for those segments > that are unchanged. > http://search.lucidimagination.com/search/document/9f0335d480d2e637/nrt_and_caching_based_on_indexreader -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2468) reopen on NRT reader should share readers w/ unchanged segments
[ https://issues.apache.org/jira/browse/LUCENE-2468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12868647#action_12868647 ] Shay Banon commented on LUCENE-2468: bq. Shay, as far as CachingWrapperFilter and CacheEvictionListener, it seems more powerful to just let apps create a new query type themselves? That's the nice part of lucene's openness to user query types - start with the code for CachingWrapperFilter and hook up your own caching logic. Yea, but it would be great to know when an IndexReader has decided to actually close, so caches can be eagerly cleaned. Even if one will write a custom implementation, it would benefit it. > reopen on NRT reader should share readers w/ unchanged segments > --- > > Key: LUCENE-2468 > URL: https://issues.apache.org/jira/browse/LUCENE-2468 > Project: Lucene - Java > Issue Type: Bug >Reporter: Yonik Seeley >Assignee: Michael McCandless > Attachments: LUCENE-2468.patch > > > A repoen on an NRT reader doesn't seem to share readers for those segments > that are unchanged. > http://search.lucidimagination.com/search/document/9f0335d480d2e637/nrt_and_caching_based_on_indexreader -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2468) reopen on NRT reader should share readers w/ unchanged segments
[ https://issues.apache.org/jira/browse/LUCENE-2468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12868617#action_12868617 ] Shay Banon commented on LUCENE-2468: Sounds like a good solution for me. I just noticed in trunk that there is also explicit purge from FieldCache when possible. I think it would be great to enable to do this for other caches that are based on it (like the CachingWrapperFilter, but externally written ones as well). I was thinking of an expert API to allow to add a "CacheEvictionListener" or something similar, which will be called when this happens. What do you think? > reopen on NRT reader should share readers w/ unchanged segments > --- > > Key: LUCENE-2468 > URL: https://issues.apache.org/jira/browse/LUCENE-2468 > Project: Lucene - Java > Issue Type: Bug >Reporter: Yonik Seeley >Assignee: Michael McCandless > Attachments: LUCENE-2468.patch > > > A repoen on an NRT reader doesn't seem to share readers for those segments > that are unchanged. > http://search.lucidimagination.com/search/document/9f0335d480d2e637/nrt_and_caching_based_on_indexreader -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2387) IndexWriter retains references to Readers used in Fields (memory leak)
[ https://issues.apache.org/jira/browse/LUCENE-2387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12864157#action_12864157 ] Shay Banon commented on LUCENE-2387: Thanks! > IndexWriter retains references to Readers used in Fields (memory leak) > -- > > Key: LUCENE-2387 > URL: https://issues.apache.org/jira/browse/LUCENE-2387 > Project: Lucene - Java > Issue Type: Bug >Affects Versions: 3.0.1 >Reporter: Ruben Laguna >Assignee: Michael McCandless > Fix For: 2.9.3, 3.0.2, 3.1, 4.0 > > Attachments: LUCENE-2387-29x.patch, LUCENE-2387.patch > > > As described in [1] IndexWriter retains references to Reader used in Fields > and that can lead to big memory leaks when using tika's ParsingReaders (as > those can take 1MB per ParsingReader). > [2] shows a screenshot of the reference chain to the Reader from the > IndexWriter taken with Eclipse MAT (Memory Analysis Tool) . The chain is the > following: > IndexWriter -> DocumentsWriter -> DocumentsWriterThreadState -> > DocFieldProcessorPerThread -> DocFieldProcessorPerField -> Fieldable -> > Field (fieldsData) > - > [1] http://markmail.org/thread/ndmcgffg2mnwjo47 > [2] http://skitch.com/ecerulm/n7643/eclipse-memory-analyzer -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2387) IndexWriter retains references to Readers used in Fields (memory leak)
[ https://issues.apache.org/jira/browse/LUCENE-2387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12864044#action_12864044 ] Shay Banon commented on LUCENE-2387: Is there a chance that this can also be applied to 3.0.2 / 3.1? It would be really helpful to get this as soon as possible in the next Lucene version. > IndexWriter retains references to Readers used in Fields (memory leak) > -- > > Key: LUCENE-2387 > URL: https://issues.apache.org/jira/browse/LUCENE-2387 > Project: Lucene - Java > Issue Type: Bug >Affects Versions: 3.0.1 >Reporter: Ruben Laguna >Assignee: Michael McCandless > Fix For: 4.0 > > Attachments: LUCENE-2387-29x.patch, LUCENE-2387.patch > > > As described in [1] IndexWriter retains references to Reader used in Fields > and that can lead to big memory leaks when using tika's ParsingReaders (as > those can take 1MB per ParsingReader). > [2] shows a screenshot of the reference chain to the Reader from the > IndexWriter taken with Eclipse MAT (Memory Analysis Tool) . The chain is the > following: > IndexWriter -> DocumentsWriter -> DocumentsWriterThreadState -> > DocFieldProcessorPerThread -> DocFieldProcessorPerField -> Fieldable -> > Field (fieldsData) > - > [1] http://markmail.org/thread/ndmcgffg2mnwjo47 > [2] http://skitch.com/ecerulm/n7643/eclipse-memory-analyzer -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2283) Possible Memory Leak in StoredFieldsWriter
[ https://issues.apache.org/jira/browse/LUCENE-2283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12864042#action_12864042 ] Shay Banon commented on LUCENE-2283: Is there a chance that this can also be applied to 3.0.2 / 3.1? It would be really helpful to get this as soon as possible in the next Lucene version. > Possible Memory Leak in StoredFieldsWriter > -- > > Key: LUCENE-2283 > URL: https://issues.apache.org/jira/browse/LUCENE-2283 > Project: Lucene - Java > Issue Type: Bug >Affects Versions: 2.4.1 >Reporter: Tim Smith >Assignee: Michael McCandless > Fix For: 4.0 > > Attachments: LUCENE-2283.patch, LUCENE-2283.patch, LUCENE-2283.patch > > > StoredFieldsWriter creates a pool of PerDoc instances > this pool will grow but never be reclaimed by any mechanism > furthermore, each PerDoc instance contains a RAMFile. > this RAMFile will also never be truncated (and will only ever grow) (as far > as i can tell) > When feeding documents with large number of stored fields (or one large > dominating stored field) this can result in memory being consumed in the > RAMFile but never reclaimed. Eventually, each pooled PerDoc could grow very > large, even if large documents are rare. > Seems like there should be some attempt to reclaim memory from the PerDoc[] > instance pool (or otherwise limit the size of RAMFiles that are cached) etc -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org