[jira] [Created] (LUCENE-5793) Add equals/hashCode to FieldType
Shay Banon created LUCENE-5793: -- Summary: Add equals/hashCode to FieldType Key: LUCENE-5793 URL: https://issues.apache.org/jira/browse/LUCENE-5793 Project: Lucene - Core Issue Type: Improvement Reporter: Shay Banon would be nice to have equals and hashCode to FieldType, so one can easily check if they are the same, and for example, reuse existing default implementations of it. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5634) Reuse TokenStream instances in Field
[ https://issues.apache.org/jira/browse/LUCENE-5634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13986177#comment-13986177 ] Shay Banon commented on LUCENE-5634: this optimization has proven to help a lot in the context of ES, but we can use a static thread local since we are fully in control of the threading model. With Lucene itself, where it can be used in many different environment, then this can cause some unexpected behavior. For example, this might cause Tomcat to warn on leaking resources when unloading a war. Reuse TokenStream instances in Field Key: LUCENE-5634 URL: https://issues.apache.org/jira/browse/LUCENE-5634 Project: Lucene - Core Issue Type: Improvement Reporter: Michael McCandless Fix For: 4.9, 5.0 Attachments: LUCENE-5634.patch If you don't reuse your Doc/Field instances (which is very expert: I suspect few apps do) then there's a lot of garbage created to index each StringField because we make a new StringTokenStream or NumericTokenStream (and their Attributes). We should be able to re-use these instances via a static ThreadLocal... -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5516) Forward information that trigger a merge to MergeScheduler
[ https://issues.apache.org/jira/browse/LUCENE-5516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13930457#comment-13930457 ] Shay Banon commented on LUCENE-5516: +1, this looks great!. Exactly the info we would love to have to better control merges. Forward information that trigger a merge to MergeScheduler -- Key: LUCENE-5516 URL: https://issues.apache.org/jira/browse/LUCENE-5516 Project: Lucene - Core Issue Type: Improvement Components: core/index Affects Versions: 4.7 Reporter: Simon Willnauer Assignee: Simon Willnauer Fix For: 4.8, 5.0 Attachments: LUCENE-5516.patch, LUCENE-5516.patch Today we pass information about the merge trigger to the merge policy. Yet, no matter if the MP finds a merge or not we call the MergeScheduler who runs blocks even if we didn't find a merge. In some cases we don't even want this to happen but inside the MergeScheduler we have no choice to opt out since we don't know what triggered the merge. We should forward the infos we have to the MergeScheduler as well. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5373) Lucene42DocValuesProducer.ramBytesUsed is over-estimated
[ https://issues.apache.org/jira/browse/LUCENE-5373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13853098#comment-13853098 ] Shay Banon commented on LUCENE-5373: as someone who found this issue, on top of the wrong computation, its also very expensive. This call should be lightweight and hopefully not use sizeOf at all... . At the very least, if possible, the result of it should be cached? Maybe even introduce size caching on a higher level (calling code) if possible. Lucene42DocValuesProducer.ramBytesUsed is over-estimated Key: LUCENE-5373 URL: https://issues.apache.org/jira/browse/LUCENE-5373 Project: Lucene - Core Issue Type: Bug Reporter: Adrien Grand Priority: Minor Lucene42DocValuesProducer.ramBytesUsed uses {{RamUsageEstimator.sizeOf(this)}} to return an estimation of the memory usage. One of the issues (there might be other ones) is that this class has a reference to an IndexInput that might link to other data-structures that we wouldn't want to take into account. For example, index inputs of a {{RAMDirectory}} all point to the directory itself, so {{Lucene42DocValuesProducer.ramBytesUsed}} would return the amount of memory used by the whole directory. -- This message was sent by Atlassian JIRA (v6.1.4#6159) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5086) RamUsageEstimator causes AWT classes to be loaded by calling ManagementFactory#getPlatformMBeanServer
[ https://issues.apache.org/jira/browse/LUCENE-5086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13699892#comment-13699892 ] Shay Banon commented on LUCENE-5086: The Java version on the Mac is the latest one: java version 1.6.0_51 Java(TM) SE Runtime Environment (build 1.6.0_51-b11-457-11M4509) Java HotSpot(TM) 64-Bit Server VM (build 20.51-b01-457, mixed mode) Regarding the catch, I think Throwable is the right exceptions to catch here. Catch all, who cares, you don't want a bug in the JVM that throws an unexpected runtime exception to cause Lucene to break the APP completely because its a static block, and I have been right there a few times. But if you feel differently, go ahead and change it to explicitly catch whats needed. RamUsageEstimator causes AWT classes to be loaded by calling ManagementFactory#getPlatformMBeanServer - Key: LUCENE-5086 URL: https://issues.apache.org/jira/browse/LUCENE-5086 Project: Lucene - Core Issue Type: Improvement Reporter: Shay Banon Assignee: Dawid Weiss Yea, that type of day and that type of title :). Since the last update of Java 6 on OS X, I started to see an annoying icon pop up at the doc whenever running elasticsearch. By default, all of our scripts add headless AWT flag so people will probably not encounter it, but, it was strange that I saw it when before I didn't. I started to dig around, and saw that when RamUsageEstimator was being loaded, it was causing AWT classes to be loaded. Further investigation showed that actually for some reason, calling ManagementFactory#getPlatformMBeanServer now with the new Java version causes AWT classes to be loaded (at least on the mac, haven't tested on other platforms yet). There are several ways to try and solve it, for example, by identifying the bug in the JVM itself, but I think that there should be a fix for it in Lucene itself, specifically since there is no need to call #getPlatformMBeanServer to get the hotspot diagnostics one (its a heavy call...). Here is a simple call that will allow to get the hotspot mxbean without using the #getPlatformMBeanServer method, and not causing it to be loaded and loading all those nasty AWT classes: {code} Object getHotSpotMXBean() { try { // Java 6 Class sunMF = Class.forName(sun.management.ManagementFactory); return sunMF.getMethod(getDiagnosticMXBean).invoke(null); } catch (Throwable t) { // ignore } // potentially Java 7 try { return ManagementFactory.class.getMethod(getPlatformMXBean, Class.class).invoke(null, Class.forName(com.sun.management.HotSpotDiagnosticMXBean)); } catch (Throwable t) { // ignore } return null; } {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-5086) RamUsageEstimator causes AWT classes to be loaded by calling ManagementFactory#getPlatformMBeanServer
Shay Banon created LUCENE-5086: -- Summary: RamUsageEstimator causes AWT classes to be loaded by calling ManagementFactory#getPlatformMBeanServer Key: LUCENE-5086 URL: https://issues.apache.org/jira/browse/LUCENE-5086 Project: Lucene - Core Issue Type: Bug Reporter: Shay Banon Yea, that type of day and that type of title :). Since the last update of Java 6 on OS X, I started to see an annoying icon pop up at the doc whenever running elasticsearch. By default, all of our scripts add headless AWT flag so people will probably not encounter it, but, it was strange that I saw it when before I didn't. I started to dig around, and saw that when RamUsageEstimator was being loaded, it was causing AWT classes to be loaded. Further investigation showed that actually for some reason, calling ManagementFactory#getPlatformMBeanServer now with the new Java version causes AWT classes to be loaded (at least on the mac, haven't tested on other platforms yet). There are several ways to try and solve it, for example, by identifying the bug in the JVM itself, but I think that there should be a fix for it in Lucene itself, specifically since there is no need to call #getPlatformMBeanServer to get the hotspot diagnostics one (its a heavy call...). Here is a simple call that will allow to get the hotspot mxbean without using the #getPlatformMBeanServer method, and not causing it to be loaded and loading all those nasty AWT classes: [code] Object getHotSpotMXBean() { Object hotSpotBean = null; try { // Java 6 Class sunMF = Class.forName(sun.management.ManagementFactory); return sunMF.getMethod(getDiagnosticMXBean).invoke(null); } catch (Throwable t) { // ignore } // potentially Java 7 try { return ManagementFactory.class.getMethod(getPlatformMXBean, Class.class).invoke(null, Class.forName(com.sun.management.HotSpotDiagnosticMXBean)); } catch (Throwable t) { // ignore } return null; } [/code] -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5086) RamUsageEstimator causes AWT classes to be loaded by calling ManagementFactory#getPlatformMBeanServer
[ https://issues.apache.org/jira/browse/LUCENE-5086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shay Banon updated LUCENE-5086: --- Description: Yea, that type of day and that type of title :). Since the last update of Java 6 on OS X, I started to see an annoying icon pop up at the doc whenever running elasticsearch. By default, all of our scripts add headless AWT flag so people will probably not encounter it, but, it was strange that I saw it when before I didn't. I started to dig around, and saw that when RamUsageEstimator was being loaded, it was causing AWT classes to be loaded. Further investigation showed that actually for some reason, calling ManagementFactory#getPlatformMBeanServer now with the new Java version causes AWT classes to be loaded (at least on the mac, haven't tested on other platforms yet). There are several ways to try and solve it, for example, by identifying the bug in the JVM itself, but I think that there should be a fix for it in Lucene itself, specifically since there is no need to call #getPlatformMBeanServer to get the hotspot diagnostics one (its a heavy call...). Here is a simple call that will allow to get the hotspot mxbean without using the #getPlatformMBeanServer method, and not causing it to be loaded and loading all those nasty AWT classes: {code} Object getHotSpotMXBean() { try { // Java 6 Class sunMF = Class.forName(sun.management.ManagementFactory); return sunMF.getMethod(getDiagnosticMXBean).invoke(null); } catch (Throwable t) { // ignore } // potentially Java 7 try { return ManagementFactory.class.getMethod(getPlatformMXBean, Class.class).invoke(null, Class.forName(com.sun.management.HotSpotDiagnosticMXBean)); } catch (Throwable t) { // ignore } return null; } {code} was: Yea, that type of day and that type of title :). Since the last update of Java 6 on OS X, I started to see an annoying icon pop up at the doc whenever running elasticsearch. By default, all of our scripts add headless AWT flag so people will probably not encounter it, but, it was strange that I saw it when before I didn't. I started to dig around, and saw that when RamUsageEstimator was being loaded, it was causing AWT classes to be loaded. Further investigation showed that actually for some reason, calling ManagementFactory#getPlatformMBeanServer now with the new Java version causes AWT classes to be loaded (at least on the mac, haven't tested on other platforms yet). There are several ways to try and solve it, for example, by identifying the bug in the JVM itself, but I think that there should be a fix for it in Lucene itself, specifically since there is no need to call #getPlatformMBeanServer to get the hotspot diagnostics one (its a heavy call...). Here is a simple call that will allow to get the hotspot mxbean without using the #getPlatformMBeanServer method, and not causing it to be loaded and loading all those nasty AWT classes: {code} Object getHotSpotMXBean() { Object hotSpotBean = null; try { // Java 6 Class sunMF = Class.forName(sun.management.ManagementFactory); return sunMF.getMethod(getDiagnosticMXBean).invoke(null); } catch (Throwable t) { // ignore } // potentially Java 7 try { return ManagementFactory.class.getMethod(getPlatformMXBean, Class.class).invoke(null, Class.forName(com.sun.management.HotSpotDiagnosticMXBean)); } catch (Throwable t) { // ignore } return null; } {code} RamUsageEstimator causes AWT classes to be loaded by calling ManagementFactory#getPlatformMBeanServer - Key: LUCENE-5086 URL: https://issues.apache.org/jira/browse/LUCENE-5086 Project: Lucene - Core Issue Type: Bug Reporter: Shay Banon Yea, that type of day and that type of title :). Since the last update of Java 6 on OS X, I started to see an annoying icon pop up at the doc whenever running elasticsearch. By default, all of our scripts add headless AWT flag so people will probably not encounter it, but, it was strange that I saw it when before I didn't. I started to dig around, and saw that when RamUsageEstimator was being loaded, it was causing AWT classes to be loaded. Further investigation showed that actually for some reason, calling ManagementFactory#getPlatformMBeanServer now with the new Java version causes AWT classes to be loaded (at least on the mac, haven't tested on other platforms yet). There are several ways to try and solve it, for example, by identifying the bug in the JVM itself, but I
[jira] [Updated] (LUCENE-5086) RamUsageEstimator causes AWT classes to be loaded by calling ManagementFactory#getPlatformMBeanServer
[ https://issues.apache.org/jira/browse/LUCENE-5086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shay Banon updated LUCENE-5086: --- Description: Yea, that type of day and that type of title :). Since the last update of Java 6 on OS X, I started to see an annoying icon pop up at the doc whenever running elasticsearch. By default, all of our scripts add headless AWT flag so people will probably not encounter it, but, it was strange that I saw it when before I didn't. I started to dig around, and saw that when RamUsageEstimator was being loaded, it was causing AWT classes to be loaded. Further investigation showed that actually for some reason, calling ManagementFactory#getPlatformMBeanServer now with the new Java version causes AWT classes to be loaded (at least on the mac, haven't tested on other platforms yet). There are several ways to try and solve it, for example, by identifying the bug in the JVM itself, but I think that there should be a fix for it in Lucene itself, specifically since there is no need to call #getPlatformMBeanServer to get the hotspot diagnostics one (its a heavy call...). Here is a simple call that will allow to get the hotspot mxbean without using the #getPlatformMBeanServer method, and not causing it to be loaded and loading all those nasty AWT classes: {code} Object getHotSpotMXBean() { Object hotSpotBean = null; try { // Java 6 Class sunMF = Class.forName(sun.management.ManagementFactory); return sunMF.getMethod(getDiagnosticMXBean).invoke(null); } catch (Throwable t) { // ignore } // potentially Java 7 try { return ManagementFactory.class.getMethod(getPlatformMXBean, Class.class).invoke(null, Class.forName(com.sun.management.HotSpotDiagnosticMXBean)); } catch (Throwable t) { // ignore } return null; } {code} was: Yea, that type of day and that type of title :). Since the last update of Java 6 on OS X, I started to see an annoying icon pop up at the doc whenever running elasticsearch. By default, all of our scripts add headless AWT flag so people will probably not encounter it, but, it was strange that I saw it when before I didn't. I started to dig around, and saw that when RamUsageEstimator was being loaded, it was causing AWT classes to be loaded. Further investigation showed that actually for some reason, calling ManagementFactory#getPlatformMBeanServer now with the new Java version causes AWT classes to be loaded (at least on the mac, haven't tested on other platforms yet). There are several ways to try and solve it, for example, by identifying the bug in the JVM itself, but I think that there should be a fix for it in Lucene itself, specifically since there is no need to call #getPlatformMBeanServer to get the hotspot diagnostics one (its a heavy call...). Here is a simple call that will allow to get the hotspot mxbean without using the #getPlatformMBeanServer method, and not causing it to be loaded and loading all those nasty AWT classes: [code] Object getHotSpotMXBean() { Object hotSpotBean = null; try { // Java 6 Class sunMF = Class.forName(sun.management.ManagementFactory); return sunMF.getMethod(getDiagnosticMXBean).invoke(null); } catch (Throwable t) { // ignore } // potentially Java 7 try { return ManagementFactory.class.getMethod(getPlatformMXBean, Class.class).invoke(null, Class.forName(com.sun.management.HotSpotDiagnosticMXBean)); } catch (Throwable t) { // ignore } return null; } [/code] RamUsageEstimator causes AWT classes to be loaded by calling ManagementFactory#getPlatformMBeanServer - Key: LUCENE-5086 URL: https://issues.apache.org/jira/browse/LUCENE-5086 Project: Lucene - Core Issue Type: Bug Reporter: Shay Banon Yea, that type of day and that type of title :). Since the last update of Java 6 on OS X, I started to see an annoying icon pop up at the doc whenever running elasticsearch. By default, all of our scripts add headless AWT flag so people will probably not encounter it, but, it was strange that I saw it when before I didn't. I started to dig around, and saw that when RamUsageEstimator was being loaded, it was causing AWT classes to be loaded. Further investigation showed that actually for some reason, calling ManagementFactory#getPlatformMBeanServer now with the new Java version causes AWT classes to be loaded (at least on the mac, haven't tested on other platforms yet). There are several ways to try and solve it, for example, by identifying the
[jira] [Commented] (LUCENE-4472) Add setting that prevents merging on updateDocument
[ https://issues.apache.org/jira/browse/LUCENE-4472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13473617#comment-13473617 ] Shay Banon commented on LUCENE-4472: Agree with Robert on the additional context flag, that would make things most flexible. A flag on IW makes things simpler from the user perspective though, cause then there is no need to customize the built in merge policies. Add setting that prevents merging on updateDocument --- Key: LUCENE-4472 URL: https://issues.apache.org/jira/browse/LUCENE-4472 Project: Lucene - Core Issue Type: Improvement Components: core/index Affects Versions: 4.0 Reporter: Simon Willnauer Fix For: 4.1, 5.0 Attachments: LUCENE-4472.patch Currently we always call maybeMerge if a segment was flushed after updateDocument. Some apps and in particular ElasticSearch uses some hacky workarounds to disable that ie for merge throttling. It should be easier to enable this kind of behavior. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-3425) NRT Caching Dir to allow for exact memory usage, better buffer allocation and global cross indices control
NRT Caching Dir to allow for exact memory usage, better buffer allocation and global cross indices control Key: LUCENE-3425 URL: https://issues.apache.org/jira/browse/LUCENE-3425 Project: Lucene - Java Issue Type: Improvement Components: core/index Reporter: Shay Banon A discussion on IRC raised several improvements that can be made to NRT caching dir. Some of the problems it currently has are: 1. Not explicitly controlling the memory usage, which can result in overusing memory (for example, large new segments being committed because refreshing is too far behind). 2. Heap fragmentation because of constant allocation of (probably promoted to old gen) byte buffers. 3. Not being able to control the memory usage across indices for multi index usage within a single JVM. A suggested solution (which still needs to be ironed out) is to have a BufferAllocator that controls allocation of byte[], and allow to return unused byte[] to it. It will have a cap on the size of memory it allows to be allocated. The NRT caching dir will use the allocator, which can either be provided (for usage across several indices) or created internally. The caching dir will also create a wrapped IndexOutput, that will flush to the main dir if the allocator can no longer provide byte[] (exhausted). When a file is flushed from the cache to the main directory, it will return all the currently allocated byte[] to the BufferAllocator to be reused by other files. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3416) Allow to pass an instance of RateLimiter to FSDirectory allowing to rate limit merge IO across several directories / instances
[ https://issues.apache.org/jira/browse/LUCENE-3416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13099146#comment-13099146 ] Shay Banon commented on LUCENE-3416: The only reason its synchronized is because the setMaxMergeWriteMBPerSec method is synchronized (I guess to protected from setting the rate limit concurrently). In practice, I don't see users changing it that often, so concerns about cache lines are not really relevant. Allow to pass an instance of RateLimiter to FSDirectory allowing to rate limit merge IO across several directories / instances -- Key: LUCENE-3416 URL: https://issues.apache.org/jira/browse/LUCENE-3416 Project: Lucene - Java Issue Type: Improvement Components: core/store Reporter: Shay Banon Assignee: Simon Willnauer Attachments: LUCENE-3416.patch This can come in handy when running several Lucene indices in the same VM, and wishing to rate limit merge across all of them. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3416) Allow to pass an instance of RateLimiter to FSDirectory allowing to rate limit merge IO across several directories / instances
[ https://issues.apache.org/jira/browse/LUCENE-3416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13099160#comment-13099160 ] Shay Banon commented on LUCENE-3416: this make no sense to me. If you don't want to set this concurrently how does a lock protect you from this? I mean you if you have two threads accessing this you have either A B or B A. but this would happen without a lock too. if you want to have the changes to take effect immediately you need to either lock on each read on this var or make it volatile which is almost equivalent (a mem barrier). No, thats not correct. setMaxMergeWriteMBPerSec (not the method I added, the other one) is a complex method, and I think Mike wanted to protect from two threads setting the value concurrently. As for reading the value, I think Mike logic was that its not that importnat the have immediate visibility of the change to require a volatile field (which is understandable). So, since setMaxMergeWriteMBPerSec is synchronized, the method added in this patch has to be as well. My concern here was related to make this var volatile which would be a cacheline invalidation each time you read the var. I think we should get rid of the synchronized. Reading a volatile var in x86 is not a cache invalidation, though it does come with a cost. Its not relevant here based on what I explained before (and second guessing Mike :) ) Allow to pass an instance of RateLimiter to FSDirectory allowing to rate limit merge IO across several directories / instances -- Key: LUCENE-3416 URL: https://issues.apache.org/jira/browse/LUCENE-3416 Project: Lucene - Java Issue Type: Improvement Components: core/store Reporter: Shay Banon Assignee: Simon Willnauer Attachments: LUCENE-3416.patch This can come in handy when running several Lucene indices in the same VM, and wishing to rate limit merge across all of them. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3416) Allow to pass an instance of RateLimiter to FSDirectory allowing to rate limit merge IO across several directories / instances
[ https://issues.apache.org/jira/browse/LUCENE-3416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13099233#comment-13099233 ] Shay Banon commented on LUCENE-3416: I agree with Mike, I think it should remain synchronized, it does safeguard concurrently calling setMaxMergeWriteMBPerSec from falling over itself (who wins the call is not really relevant). Since thats synchronized, the metod I added should be as well. Personally, I really don't think there is a need to make it thread safe without blocking, since calling the setters is not something people do frequently at all, so the optimization is mute, and it will complicate the code. As for making mergeWriteRateLimiter volatile, it can be done. Though, in practice, there really is no need to do it (there is a memory barrier when reading it before). But, I think that should go in a different issue? Just to keep changes clean and isolated? Allow to pass an instance of RateLimiter to FSDirectory allowing to rate limit merge IO across several directories / instances -- Key: LUCENE-3416 URL: https://issues.apache.org/jira/browse/LUCENE-3416 Project: Lucene - Java Issue Type: Improvement Components: core/store Reporter: Shay Banon Assignee: Simon Willnauer Attachments: LUCENE-3416.patch This can come in handy when running several Lucene indices in the same VM, and wishing to rate limit merge across all of them. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3416) Allow to pass an instance of RateLimiter to FSDirectory allowing to rate limit merge IO across several directories / instances
[ https://issues.apache.org/jira/browse/LUCENE-3416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13099287#comment-13099287 ] Shay Banon commented on LUCENE-3416: I must say that I am at a lost in trying to understand why we need this optimization, but it does not really matter to me as long as the ability to set the rate limiter instance gets in. Allow to pass an instance of RateLimiter to FSDirectory allowing to rate limit merge IO across several directories / instances -- Key: LUCENE-3416 URL: https://issues.apache.org/jira/browse/LUCENE-3416 Project: Lucene - Java Issue Type: Improvement Components: core/store Reporter: Shay Banon Assignee: Simon Willnauer Attachments: LUCENE-3416.patch This can come in handy when running several Lucene indices in the same VM, and wishing to rate limit merge across all of them. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-3416) Allow to pass an instance of RateLimiter to FSDirectory allowing to rate limit merge IO across several directories / instances
Allow to pass an instance of RateLimiter to FSDirectory allowing to rate limit merge IO across several directories / instances -- Key: LUCENE-3416 URL: https://issues.apache.org/jira/browse/LUCENE-3416 Project: Lucene - Java Issue Type: Improvement Components: core/store Reporter: Shay Banon This can come in handy when running several Lucene indices in the same VM, and wishing to rate limit merge across all of them. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3416) Allow to pass an instance of RateLimiter to FSDirectory allowing to rate limit merge IO across several directories / instances
[ https://issues.apache.org/jira/browse/LUCENE-3416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shay Banon updated LUCENE-3416: --- Attachment: LUCENE-3416.patch Allow to pass an instance of RateLimiter to FSDirectory allowing to rate limit merge IO across several directories / instances -- Key: LUCENE-3416 URL: https://issues.apache.org/jira/browse/LUCENE-3416 Project: Lucene - Java Issue Type: Improvement Components: core/store Reporter: Shay Banon Attachments: LUCENE-3416.patch This can come in handy when running several Lucene indices in the same VM, and wishing to rate limit merge across all of them. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3416) Allow to pass an instance of RateLimiter to FSDirectory allowing to rate limit merge IO across several directories / instances
[ https://issues.apache.org/jira/browse/LUCENE-3416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13098018#comment-13098018 ] Shay Banon commented on LUCENE-3416: It is possible, but requires more work to do, and depends on overriding the createOutput method (as well as all the other methods in Directory). If rate limiting makes sense on the directory level to be exposed as a feature, I think that this small change allows for greater control over it. Allow to pass an instance of RateLimiter to FSDirectory allowing to rate limit merge IO across several directories / instances -- Key: LUCENE-3416 URL: https://issues.apache.org/jira/browse/LUCENE-3416 Project: Lucene - Java Issue Type: Improvement Components: core/store Reporter: Shay Banon Attachments: LUCENE-3416.patch This can come in handy when running several Lucene indices in the same VM, and wishing to rate limit merge across all of them. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3335) jrebug causes porter stemmer to sigsegv
[ https://issues.apache.org/jira/browse/LUCENE-3335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13076053#comment-13076053 ] Shay Banon commented on LUCENE-3335: @Uwe I actually forgot about this, and did not think it was because of the porter stemmer at the time, especially since I did try and reproduce it and never managed to (I thought it was coincidence it crashed there). From my experience, you get very little help from sun/oracle when using unorthodox flags like agressive opts without proper recreation. Well, you get very little help there even when you do produce recreation... (see this issue that I opened for example: http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7066129) . I am the reason behind Lucene 1.9.1 release with the major bug in buffering introduced in 1.9 way back in the days, do you really think I would not contact if I thought there really was a problem associated with Lucene? jrebug causes porter stemmer to sigsegv --- Key: LUCENE-3335 URL: https://issues.apache.org/jira/browse/LUCENE-3335 Project: Lucene - Java Issue Type: Bug Affects Versions: 1.9, 1.9.1, 2.0.0, 2.1, 2.2, 2.3, 2.3.1, 2.3.2, 2.4, 2.4.1, 2.9, 2.9.1, 2.9.2, 2.9.3, 2.9.4, 3.0, 3.0.1, 3.0.2, 3.0.3, 3.1, 3.2, 3.3, 3.4, 4.0 Environment: - JDK 7 Preview Release, GA (may also affect update _1, targeted fix is JDK 1.7.0_2) - JDK 1.6.0_20+ with -XX:+OptimizeStringConcat or -XX:+AggressiveOpts Reporter: Robert Muir Assignee: Robert Muir Labels: Java7 Attachments: LUCENE-3335.patch, LUCENE-3335_slow.patch, patch-0uwe.patch happens easily on java7: ant test -Dtestcase=TestPorterStemFilter -Dtests.iter=100 might happen on 1.6.0_u26 too, a user reported something that looks like the same bug already: http://www.lucidimagination.com/search/document/3beaa082c4d2fdd4/porterstemfilter_kills_jvm -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3282) BlockJoinQuery: Allow to add a custom child collector, and customize the parent bitset extraction
[ https://issues.apache.org/jira/browse/LUCENE-3282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13072979#comment-13072979 ] Shay Banon commented on LUCENE-3282: Hi, sorry for the late response, I the comment. Yea, I agree that there will be false positives, but thats the idea of it (sometimes you want to run facets for example on sub queries). Btw, I got your point on advance, do you think if a collector exists, then advance should be implemented by iterating over all docs up to the provided doc to advance to. Regarding the wrapper, interesting!. I need to have a look at how to generalize it, but it should be simple, I think, I'll try and work on it. BlockJoinQuery: Allow to add a custom child collector, and customize the parent bitset extraction - Key: LUCENE-3282 URL: https://issues.apache.org/jira/browse/LUCENE-3282 Project: Lucene - Java Issue Type: Improvement Components: core/search Affects Versions: 3.4, 4.0 Reporter: Shay Banon Attachments: LUCENE-3282.patch, LUCENE-3282.patch It would be nice to allow to add a custom child collector to the BlockJoinQuery to be called on every matching doc (so we can do things with it, like counts and such). Also, allow to extend BlockJoinQuery to have a custom code that converts the filter bitset to an OpenBitSet. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3282) BlockJoinQuery: Allow to add a custom child collector, and customize the parent bitset extraction
[ https://issues.apache.org/jira/browse/LUCENE-3282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13066536#comment-13066536 ] Shay Banon commented on LUCENE-3282: The idea of this is to collect matching child docs regardless of what matches parent wise, and yea, we might miss some depending on the type of query that is actually wrapping it, but I think its still useful. BlockJoinQuery: Allow to add a custom child collector, and customize the parent bitset extraction - Key: LUCENE-3282 URL: https://issues.apache.org/jira/browse/LUCENE-3282 Project: Lucene - Java Issue Type: Improvement Components: core/search Affects Versions: 3.4, 4.0 Reporter: Shay Banon Attachments: LUCENE-3282.patch, LUCENE-3282.patch It would be nice to allow to add a custom child collector to the BlockJoinQuery to be called on every matching doc (so we can do things with it, like counts and such). Also, allow to extend BlockJoinQuery to have a custom code that converts the filter bitset to an OpenBitSet. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3282) BlockJoinQuery: Allow to add a custom child collector, and customize the parent bitset extraction
[ https://issues.apache.org/jira/browse/LUCENE-3282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13063619#comment-13063619 ] Shay Banon commented on LUCENE-3282: Heya, In my app, I have a wrapper around OBS, that has a common interface that allows to access bits by index (similar to Bits in trunk), so I need to extract from it the OBS. Regarding the Collector, I will work on CollectorProvider interface. I liked the NoOpCollector option since then you don't have to check for nulls each time... BlockJoinQuery: Allow to add a custom child collector, and customize the parent bitset extraction - Key: LUCENE-3282 URL: https://issues.apache.org/jira/browse/LUCENE-3282 Project: Lucene - Java Issue Type: Improvement Components: core/search Affects Versions: 3.4, 4.0 Reporter: Shay Banon Attachments: LUCENE-3282.patch It would be nice to allow to add a custom child collector to the BlockJoinQuery to be called on every matching doc (so we can do things with it, like counts and such). Also, allow to extend BlockJoinQuery to have a custom code that converts the filter bitset to an OpenBitSet. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3282) BlockJoinQuery: Allow to add a custom child collector, and customize the parent bitset extraction
[ https://issues.apache.org/jira/browse/LUCENE-3282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shay Banon updated LUCENE-3282: --- Attachment: LUCENE-3282.patch New version, with CollectorProvider. BlockJoinQuery: Allow to add a custom child collector, and customize the parent bitset extraction - Key: LUCENE-3282 URL: https://issues.apache.org/jira/browse/LUCENE-3282 Project: Lucene - Java Issue Type: Improvement Components: core/search Affects Versions: 3.4, 4.0 Reporter: Shay Banon Attachments: LUCENE-3282.patch, LUCENE-3282.patch It would be nice to allow to add a custom child collector to the BlockJoinQuery to be called on every matching doc (so we can do things with it, like counts and such). Also, allow to extend BlockJoinQuery to have a custom code that converts the filter bitset to an OpenBitSet. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-3282) BlockJoinQuery: Allow to add a custom child collector, and customize the parent bitset extraction
BlockJoinQuery: Allow to add a custom child collector, and customize the parent bitset extraction - Key: LUCENE-3282 URL: https://issues.apache.org/jira/browse/LUCENE-3282 Project: Lucene - Java Issue Type: Improvement Components: core/search Affects Versions: 3.4, 4.0 Reporter: Shay Banon It would be nice to allow to add a custom child collector to the BlockJoinQuery to be called on every matching doc (so we can do things with it, like counts and such). Also, allow to extend BlockJoinQuery to have a custom code that converts the filter bitset to an OpenBitSet. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3282) BlockJoinQuery: Allow to add a custom child collector, and customize the parent bitset extraction
[ https://issues.apache.org/jira/browse/LUCENE-3282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shay Banon updated LUCENE-3282: --- Attachment: LUCENE-3282.patch BlockJoinQuery: Allow to add a custom child collector, and customize the parent bitset extraction - Key: LUCENE-3282 URL: https://issues.apache.org/jira/browse/LUCENE-3282 Project: Lucene - Java Issue Type: Improvement Components: core/search Affects Versions: 3.4, 4.0 Reporter: Shay Banon Attachments: LUCENE-3282.patch It would be nice to allow to add a custom child collector to the BlockJoinQuery to be called on every matching doc (so we can do things with it, like counts and such). Also, allow to extend BlockJoinQuery to have a custom code that converts the filter bitset to an OpenBitSet. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2960) Allow (or bring back) the ability to setRAMBufferSizeMB on an open IndexWriter
[ https://issues.apache.org/jira/browse/LUCENE-2960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13006214#comment-13006214 ] Shay Banon commented on LUCENE-2960: Just a note regarding the IWC and being able to consult it for live changes, it feels strange to me that settings something on the config will affect the IW in real time. Maybe its just me, but it feels nicer to have the live setters on IW compared to IWC. I also like the ability to decouple construction time configuration through IWC, and live settings through setters on IW. It is then very clear what can be set on construction time, and what can be set on a live IW. It also allows for compile time / static check for the code what can be changed at what lifecycle phase. Allow (or bring back) the ability to setRAMBufferSizeMB on an open IndexWriter -- Key: LUCENE-2960 URL: https://issues.apache.org/jira/browse/LUCENE-2960 Project: Lucene - Java Issue Type: Improvement Components: Index Reporter: Shay Banon Priority: Blocker Fix For: 3.1, 4.0 In 3.1 the ability to setRAMBufferSizeMB is deprecated, and removed in trunk. It would be great to be able to control that on a live IndexWriter. Other possible two methods that would be great to bring back are setTermIndexInterval and setReaderTermsIndexDivisor. Most of the other setters can actually be set on the MergePolicy itself, so no need for setters for those (I think). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Created: (LUCENE-2960) Allow (or bring back) the ability to setRAMBufferSizeMB on an open IndexWriter
Allow (or bring back) the ability to setRAMBufferSizeMB on an open IndexWriter -- Key: LUCENE-2960 URL: https://issues.apache.org/jira/browse/LUCENE-2960 Project: Lucene - Java Issue Type: Improvement Components: Index Reporter: Shay Banon Fix For: 3.2, 4.0 In 3.1 the ability to setRAMBufferSizeMB is deprecated, and removed in trunk. It would be great to be able to control that on a live IndexWriter. Other possible two methods that would be great to bring back are setTermIndexInterval and setReaderTermsIndexDivisor. Most of the other setters can actually be set on the MergePolicy itself, so no need for setters for those (I think). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2474) Allow to plug in a Cache Eviction Listener to IndexReader to eagerly clean custom caches that use the IndexReader (getFieldCacheKey)
[ https://issues.apache.org/jira/browse/LUCENE-2474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shay Banon updated LUCENE-2474: --- Attachment: MapBackedSet.java A MapBackedSet implementation, that can wrap a CHM to have a concurrent set implementation. We can consider using that instead of sync set and copy on read when notifying listeners. Allow to plug in a Cache Eviction Listener to IndexReader to eagerly clean custom caches that use the IndexReader (getFieldCacheKey) Key: LUCENE-2474 URL: https://issues.apache.org/jira/browse/LUCENE-2474 Project: Lucene - Java Issue Type: Improvement Components: Search Reporter: Shay Banon Assignee: Michael McCandless Fix For: 3.1, 4.0 Attachments: LUCENE-2474.patch, LUCENE-2474.patch, LUCENE-2474.patch, LUCENE-2574.patch, MapBackedSet.java Allow to plug in a Cache Eviction Listener to IndexReader to eagerly clean custom caches that use the IndexReader (getFieldCacheKey). A spin of: https://issues.apache.org/jira/browse/LUCENE-2468. Basically, its make a lot of sense to cache things based on IndexReader#getFieldCacheKey, even Lucene itself uses it, for example, with the CachingWrapperFilter. FieldCache enjoys being called explicitly to purge its cache when possible (which is tricky to know from the outside, especially when using NRT - reader attack of the clones). The provided patch allows to plug a CacheEvictionListener which will be called when the cache should be purged for an IndexReader. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2871) Use FileChannel in FSDirectory
[ https://issues.apache.org/jira/browse/LUCENE-2871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12984134#action_12984134 ] Shay Banon commented on LUCENE-2871: Strange, did not get it when running the tests, will try and find out why it can happen. Use FileChannel in FSDirectory -- Key: LUCENE-2871 URL: https://issues.apache.org/jira/browse/LUCENE-2871 Project: Lucene - Java Issue Type: New Feature Components: Store Reporter: Shay Banon Attachments: LUCENE-2871.patch Explore using FileChannel in FSDirectory to see if it improves write operations performance -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2871) Use FileChannel in FSDirectory
[ https://issues.apache.org/jira/browse/LUCENE-2871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12984206#action_12984206 ] Shay Banon commented on LUCENE-2871: bq. Looking at the current patch, the class seems wrong. In my opinion, this should be only in NIOFSDirectory. SimpleFSDir should only use RAF. Its a good question, not sure what to do with it. Here is the problem. The channel output can be used with all 3 FS dirs (simple, nio, and mmap), and actually might make sense to be used even with SimpleFS (i.e. using non nio to read, but use file channel to write). In order to have all of them supported, currently, the simplest way is to put it in the base class so code will be shared. On IRC, a discussion was made to externalize the outputs and inputs, and then one can more easily pick and choose, but I think this will belong on a different patch. Use FileChannel in FSDirectory -- Key: LUCENE-2871 URL: https://issues.apache.org/jira/browse/LUCENE-2871 Project: Lucene - Java Issue Type: New Feature Components: Store Reporter: Shay Banon Attachments: LUCENE-2871.patch, LUCENE-2871.patch Explore using FileChannel in FSDirectory to see if it improves write operations performance -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2871) Use FileChannel in FSDirectory
[ https://issues.apache.org/jira/browse/LUCENE-2871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12984263#action_12984263 ] Shay Banon commented on LUCENE-2871: Agreed Earwin, lets first see if it make sense, this is just an experiment and might not make sense for single threaded writes. Use FileChannel in FSDirectory -- Key: LUCENE-2871 URL: https://issues.apache.org/jira/browse/LUCENE-2871 Project: Lucene - Java Issue Type: New Feature Components: Store Reporter: Shay Banon Attachments: LUCENE-2871.patch, LUCENE-2871.patch Explore using FileChannel in FSDirectory to see if it improves write operations performance -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2474) Allow to plug in a Cache Eviction Listener to IndexReader to eagerly clean custom caches that use the IndexReader (getFieldCacheKey)
[ https://issues.apache.org/jira/browse/LUCENE-2474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12982695#action_12982695 ] Shay Banon commented on LUCENE-2474: Yea, I got the reasoning for Set, we can use that, CHM with PRESENT. If you want, I can attach a simple MapBackedSet that makes any Map a Set. Still, I think that using CopyOnWriteArrayList is best here. I don't think that adding and removing listeners is something that will be done often in an app. But I might be mistaken. In this case, traversal over listeners is much better on CopyOnWriteArrayList compared to CHM. Allow to plug in a Cache Eviction Listener to IndexReader to eagerly clean custom caches that use the IndexReader (getFieldCacheKey) Key: LUCENE-2474 URL: https://issues.apache.org/jira/browse/LUCENE-2474 Project: Lucene - Java Issue Type: Improvement Components: Search Reporter: Shay Banon Attachments: LUCENE-2474.patch, LUCENE-2474.patch Allow to plug in a Cache Eviction Listener to IndexReader to eagerly clean custom caches that use the IndexReader (getFieldCacheKey). A spin of: https://issues.apache.org/jira/browse/LUCENE-2468. Basically, its make a lot of sense to cache things based on IndexReader#getFieldCacheKey, even Lucene itself uses it, for example, with the CachingWrapperFilter. FieldCache enjoys being called explicitly to purge its cache when possible (which is tricky to know from the outside, especially when using NRT - reader attack of the clones). The provided patch allows to plug a CacheEvictionListener which will be called when the cache should be purged for an IndexReader. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Created: (LUCENE-2871) Use FileChannel in FSDirectory
Use FileChannel in FSDirectory -- Key: LUCENE-2871 URL: https://issues.apache.org/jira/browse/LUCENE-2871 Project: Lucene - Java Issue Type: New Feature Components: Store Reporter: Shay Banon Explore using FileChannel in FSDirectory to see if it improves write operations performance -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2871) Use FileChannel in FSDirectory
[ https://issues.apache.org/jira/browse/LUCENE-2871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shay Banon updated LUCENE-2871: --- Attachment: LUCENE-2871.patch Patch supporting using file channel to write. FSDirectory still retain the ability to use RAF for writes. FSDirectory#setUseChannelOutput: Allows to revert back to use RAF by setting to false. FSDirectory#setCacheChannelBuffers: Allow to control if, when using file channel, buffers should be cached. Use FileChannel in FSDirectory -- Key: LUCENE-2871 URL: https://issues.apache.org/jira/browse/LUCENE-2871 Project: Lucene - Java Issue Type: New Feature Components: Store Reporter: Shay Banon Attachments: LUCENE-2871.patch Explore using FileChannel in FSDirectory to see if it improves write operations performance -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2474) Allow to plug in a Cache Eviction Listener to IndexReader to eagerly clean custom caches that use the IndexReader (getFieldCacheKey)
[ https://issues.apache.org/jira/browse/LUCENE-2474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12982509#action_12982509 ] Shay Banon commented on LUCENE-2474: bq. OK, here's a patch exposing the readerFinishedListeners as static methods on IndexReader. I think we should use a CopyOneWriteArrayList so calling the listeners will not happen under a global synchronize block. If maintaining set behavior is required, then I can patch with a ConcurrentHashSet implementation or we can simply replace it with a CHM with PRESENT, or any other solution that does not require calling the listeners under a global sync block. Allow to plug in a Cache Eviction Listener to IndexReader to eagerly clean custom caches that use the IndexReader (getFieldCacheKey) Key: LUCENE-2474 URL: https://issues.apache.org/jira/browse/LUCENE-2474 Project: Lucene - Java Issue Type: Improvement Components: Search Reporter: Shay Banon Attachments: LUCENE-2474.patch, LUCENE-2474.patch Allow to plug in a Cache Eviction Listener to IndexReader to eagerly clean custom caches that use the IndexReader (getFieldCacheKey). A spin of: https://issues.apache.org/jira/browse/LUCENE-2468. Basically, its make a lot of sense to cache things based on IndexReader#getFieldCacheKey, even Lucene itself uses it, for example, with the CachingWrapperFilter. FieldCache enjoys being called explicitly to purge its cache when possible (which is tricky to know from the outside, especially when using NRT - reader attack of the clones). The provided patch allows to plug a CacheEvictionListener which will be called when the cache should be purged for an IndexReader. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2474) Allow to plug in a Cache Eviction Listener to IndexReader to eagerly clean custom caches that use the IndexReader (getFieldCacheKey)
[ https://issues.apache.org/jira/browse/LUCENE-2474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12978719#action_12978719 ] Shay Banon commented on LUCENE-2474: It would be a cache of anything... one element of that cache would be the FieldCache, there could be one for filters, or one entry per-filter. edit: Maybe a better way to think about it is like a ServletContext or something - it's just a way to attach anything arbitrary to a reader. Got you. My personal taste is to try and keep those readers as lightweight as possible, and have the proper constructs in place to allow to externally use them for caching, without having them manage it as well. Not with this current patch, as there is no mechanism to get a callback when you do care about deletes. If I want to cache something that depends on deletions, I want to purge that cache when the actual reader is closed (as opposed to the reader's core cache key that is shared amongst all readers that just have different deletions). So if we go a close event route, we really want two different events... one for the close of a reader (i.e. deleted matter), and one for the close of the segment (deletes don't matter). I think that a cache that is affected by deletes is a problematic cache to begin with, so was thinking that maybe it should be discouraged by not allowing for it. Especially with NRT. My idea was to simply expand the purge capability that the FC gets for free to other external custom components. Also, if we did have a type safe separation between segment readers and compound readers, I would not have added the ability to register a listener on the compound readers, just the segment readers, as this will encourage people to write caches that only work on segment readers (since the registration for the purge event will happen within the cache, and it should work only with segment readers). That was why my patch does not take compound readers into account. Allow to plug in a Cache Eviction Listener to IndexReader to eagerly clean custom caches that use the IndexReader (getFieldCacheKey) Key: LUCENE-2474 URL: https://issues.apache.org/jira/browse/LUCENE-2474 Project: Lucene - Java Issue Type: Improvement Components: Search Reporter: Shay Banon Attachments: LUCENE-2474.patch Allow to plug in a Cache Eviction Listener to IndexReader to eagerly clean custom caches that use the IndexReader (getFieldCacheKey). A spin of: https://issues.apache.org/jira/browse/LUCENE-2468. Basically, its make a lot of sense to cache things based on IndexReader#getFieldCacheKey, even Lucene itself uses it, for example, with the CachingWrapperFilter. FieldCache enjoys being called explicitly to purge its cache when possible (which is tricky to know from the outside, especially when using NRT - reader attack of the clones). The provided patch allows to plug a CacheEvictionListener which will be called when the cache should be purged for an IndexReader. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2474) Allow to plug in a Cache Eviction Listener to IndexReader to eagerly clean custom caches that use the IndexReader (getFieldCacheKey)
[ https://issues.apache.org/jira/browse/LUCENE-2474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12978975#action_12978975 ] Shay Banon commented on LUCENE-2474: bq. But: I think we'd want to have composite reader just forward the registration down to the atomic readers? (And, forward on reopen). I am not sure that you would want to do it. Any caching layer or an external component that is properly written would work on the low level segment readers, it will not even compile against compound readers. This will help direct people to write proper code and dealing only with segment readers. Allow to plug in a Cache Eviction Listener to IndexReader to eagerly clean custom caches that use the IndexReader (getFieldCacheKey) Key: LUCENE-2474 URL: https://issues.apache.org/jira/browse/LUCENE-2474 Project: Lucene - Java Issue Type: Improvement Components: Search Reporter: Shay Banon Attachments: LUCENE-2474.patch Allow to plug in a Cache Eviction Listener to IndexReader to eagerly clean custom caches that use the IndexReader (getFieldCacheKey). A spin of: https://issues.apache.org/jira/browse/LUCENE-2468. Basically, its make a lot of sense to cache things based on IndexReader#getFieldCacheKey, even Lucene itself uses it, for example, with the CachingWrapperFilter. FieldCache enjoys being called explicitly to purge its cache when possible (which is tricky to know from the outside, especially when using NRT - reader attack of the clones). The provided patch allows to plug a CacheEvictionListener which will be called when the cache should be purged for an IndexReader. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2474) Allow to plug in a Cache Eviction Listener to IndexReader to eagerly clean custom caches that use the IndexReader (getFieldCacheKey)
[ https://issues.apache.org/jira/browse/LUCENE-2474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12978480#action_12978480 ] Shay Banon commented on LUCENE-2474: Right, I was thinking that its a low level API that you can just add it to the low level readers, but I agree, it will be nicer to have it on the high level as well. Regarding the close method name, I guess we can name it similar to the FieldCache one, maybe purge? We've talked before about putting caches directly on the readers - that still seems like the most straightforward approach? not sure I understand that. Do you mean getting FieldCache into the readers? And then what about cached filters? And other custom caching constructs that rely on the same mechanism as the CachingWrapperFilter? I think that if one implements such caching, its an advance enough feature where you should know how to handle deletes and other tidbits (if you need to). We really need one cache that doesn't care about deletions, and one cache that does. Isn't that up to the cache to decide? That cache can be anything (internally implemented in Lucene or externally) that follows the mechanism of caching based on (segment) readers. As long as there are constructs to get the deleted docs to handle deletes (for example), then the implementation can use it. Allow to plug in a Cache Eviction Listener to IndexReader to eagerly clean custom caches that use the IndexReader (getFieldCacheKey) Key: LUCENE-2474 URL: https://issues.apache.org/jira/browse/LUCENE-2474 Project: Lucene - Java Issue Type: Improvement Components: Search Reporter: Shay Banon Attachments: LUCENE-2474.patch Allow to plug in a Cache Eviction Listener to IndexReader to eagerly clean custom caches that use the IndexReader (getFieldCacheKey). A spin of: https://issues.apache.org/jira/browse/LUCENE-2468. Basically, its make a lot of sense to cache things based on IndexReader#getFieldCacheKey, even Lucene itself uses it, for example, with the CachingWrapperFilter. FieldCache enjoys being called explicitly to purge its cache when possible (which is tricky to know from the outside, especially when using NRT - reader attack of the clones). The provided patch allows to plug a CacheEvictionListener which will be called when the cache should be purged for an IndexReader. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2292) ByteBuffer Directory - allowing to store the index outside the heap
[ https://issues.apache.org/jira/browse/LUCENE-2292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shay Banon updated LUCENE-2292: --- Attachment: LUCENE-2292.patch A fixed path that now passes all tests using the byte buffer directory. Also, includes refactoring into a different package (store.bytebuffer), and includes a custom ByteBufferAllocator interface that can control how buffers are allocated, including plain and caching implementations. ByteBuffer Directory - allowing to store the index outside the heap --- Key: LUCENE-2292 URL: https://issues.apache.org/jira/browse/LUCENE-2292 Project: Lucene - Java Issue Type: New Feature Components: Store Reporter: Shay Banon Attachments: LUCENE-2292.patch, LUCENE-2292.patch, LUCENE-2292.patch A byte buffer based directory with the benefit of being able to create direct byte buffer thus storing the index outside the JVM heap. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2779) Use ConcurrentHashMap in RAMDirectory
[ https://issues.apache.org/jira/browse/LUCENE-2779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12966376#action_12966376 ] Shay Banon commented on LUCENE-2779: If the assumption still stands that an IndexInput will not be opened on a writing / unclosed IndexOutput, then RAMFile can also be improved when it comes to concurrency. The RAMOutputStream can maintain its own list of buffers (simple array list, no need to sync), and only when it gets closed, initialize the respective RAMFile with the list. This means most of the synchronize aspects of RAMFile can be removed. Also, on RAMFile, lastModified can be made volatile, and remove the sync on its respective methods. Use ConcurrentHashMap in RAMDirectory - Key: LUCENE-2779 URL: https://issues.apache.org/jira/browse/LUCENE-2779 Project: Lucene - Java Issue Type: Improvement Components: Store Reporter: Shai Erera Assignee: Shai Erera Priority: Minor Fix For: 3.1, 4.0 Attachments: LUCENE-2779-backwardsfix.patch, LUCENE-2779.patch, LUCENE-2779.patch, LUCENE-2779.patch, LUCENE-2779.patch, TestCHM.java RAMDirectory synchronizes on its instance in many places to protect access to map of RAMFiles, in addition to updating the sizeInBytes member. In many places the sync is done for 'read' purposes, while only in few places we need 'write' access. This looks like a perfect use case for ConcurrentHashMap Also, syncing around sizeInBytes is unnecessary IMO, since it's an AtomicLong ... I'll post a patch shortly. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2773) Don't create compound file for large segments by default
[ https://issues.apache.org/jira/browse/LUCENE-2773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12935299#action_12935299 ] Shay Banon commented on LUCENE-2773: Mike, are you sure regarding the default maxMergeMB set to 2gb? This ia a big change in default behavior. For systems that do updates (deletes) we are covered because they are taken (partially) into account when computing the segment size. But, lets say you have a 100gb size index, you will end up with 50 segments, no? Don't create compound file for large segments by default Key: LUCENE-2773 URL: https://issues.apache.org/jira/browse/LUCENE-2773 Project: Lucene - Java Issue Type: Improvement Components: Index Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 2.9.4, 3.0.3, 3.1, 4.0 Attachments: LUCENE-2773.patch Spinoff from LUCENE-2762. CFS is useful for keeping the open file count down. But, it costs some added time during indexing to build, and also ties up temporary disk space, causing eg a large spike on the final merge of an optimize. Since MergePolicy dictates which segments should be CFS, we can change it to only build CFS for smallish merges. I think we should also set a maxMergeMB by default so that very large merges aren't done. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Created: (LUCENE-2666) ArrayIndexOutOfBoundsException when iterating over TermDocs
ArrayIndexOutOfBoundsException when iterating over TermDocs --- Key: LUCENE-2666 URL: https://issues.apache.org/jira/browse/LUCENE-2666 Project: Lucene - Java Issue Type: Bug Components: Index Affects Versions: 3.0.2 Reporter: Shay Banon A user got this very strange exception, and I managed to get the index that it happens on. Basically, iterating over the TermDocs causes an AAOIB exception. I easily reproduced it using the FieldCache which does exactly that (the field in question is indexed as numeric). Here is the exception: Exception in thread main java.lang.ArrayIndexOutOfBoundsException: 114 at org.apache.lucene.util.BitVector.get(BitVector.java:104) at org.apache.lucene.index.SegmentTermDocs.next(SegmentTermDocs.java:127) at org.apache.lucene.search.FieldCacheImpl$LongCache.createValue(FieldCacheImpl.java:501) at org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:183) at org.apache.lucene.search.FieldCacheImpl.getLongs(FieldCacheImpl.java:470) at TestMe.main(TestMe.java:56) It happens on the following segment: _26t docCount: 914 delCount: 1 delFileName: _26t_1.del And as you can see, it smells like a corner case (it fails for document number 912, the AIOOB happens from the deleted docs). The code to recreate it is simple: FSDirectory dir = FSDirectory.open(new File(index)); IndexReader reader = IndexReader.open(dir, true); IndexReader[] subReaders = reader.getSequentialSubReaders(); for (IndexReader subReader : subReaders) { Field field = subReader.getClass().getSuperclass().getDeclaredField(si); field.setAccessible(true); SegmentInfo si = (SegmentInfo) field.get(subReader); System.out.println(-- + si); if (si.getDocStoreSegment().contains(_26t)) { // this is the probleatic one... System.out.println(problematic one...); FieldCache.DEFAULT.getLongs(subReader, __documentdate, FieldCache.NUMERIC_UTILS_LONG_PARSER); } } Here is the result of a check index on that segment: 8 of 10: name=_26t docCount=914 compound=true hasProx=true numFiles=2 size (MB)=1.641 diagnostics = {optimize=false, mergeFactor=10, os.version=2.6.18-194.11.1.el5.centos.plus, os=Linux, mergeDocStores=true, lucene.version=3.0.2 953716 - 2010-06-11 17:13:53, source=merge, os.arch=amd64, java.version=1.6.0, java.vendor=Sun Microsystems Inc.} has deletions [delFileName=_26t_1.del] test: open reader.OK [1 deleted docs] test: fields..OK [32 fields] test: field norms.OK [32 fields] test: terms, freq, prox...ERROR [114] java.lang.ArrayIndexOutOfBoundsException: 114 at org.apache.lucene.util.BitVector.get(BitVector.java:104) at org.apache.lucene.index.SegmentTermDocs.next(SegmentTermDocs.java:127) at org.apache.lucene.index.SegmentTermPositions.next(SegmentTermPositions.java:102) at org.apache.lucene.index.CheckIndex.testTermIndex(CheckIndex.java:616) at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:509) at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:299) at TestMe.main(TestMe.java:47) test: stored fields...ERROR [114] java.lang.ArrayIndexOutOfBoundsException: 114 at org.apache.lucene.util.BitVector.get(BitVector.java:104) at org.apache.lucene.index.ReadOnlySegmentReader.isDeleted(ReadOnlySegmentReader.java:34) at org.apache.lucene.index.CheckIndex.testStoredFields(CheckIndex.java:684) at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:512) at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:299) at TestMe.main(TestMe.java:47) test: term vectorsERROR [114] java.lang.ArrayIndexOutOfBoundsException: 114 at org.apache.lucene.util.BitVector.get(BitVector.java:104) at org.apache.lucene.index.ReadOnlySegmentReader.isDeleted(ReadOnlySegmentReader.java:34) at org.apache.lucene.index.CheckIndex.testTermVectors(CheckIndex.java:721) at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:515) at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:299) at TestMe.main(TestMe.java:47) The creation of the index does not do something fancy (all defaults), though there is usage of the near real time aspect (IndexWriter#getReader) which does complicate deleted docs handling. Seems like the deleted docs got written without matching the number of docs?. Sadly, I don't have something that recreates it from scratch, but I do have the index if someone want to have a look at it (mail me directly and I will provide a download link). I will continue to
[jira] Commented: (LUCENE-2161) Some concurrency improvements for NRT
[ https://issues.apache.org/jira/browse/LUCENE-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12874454#action_12874454 ] Shay Banon commented on LUCENE-2161: Thanks! Some concurrency improvements for NRT - Key: LUCENE-2161 URL: https://issues.apache.org/jira/browse/LUCENE-2161 Project: Lucene - Java Issue Type: Improvement Components: Index Reporter: Michael McCandless Assignee: Michael McCandless Priority: Minor Fix For: 2.9.3, 3.0.2, 3.1, 4.0 Attachments: LUCENE-2161.patch Some concurrency improvements for NRT I found fixed some silly thread bottlenecks that affect NRT: * Multi/DirectoryReader.numDocs is synchronized, I think so only 1 thread computes numDocs if it's -1. I removed this sync, and made numDocs volatile, instead. Yes, multiple threads may compute the numDocs for the first time, but I think that's harmless? * Fixed BitVector's ctor to set count to 0 on creating a new BV, and clone to copy the count over; this saves CPU computing the count unecessarily. * Also strengthened assertions done in SR, testing the delete docs count. I also found an annoying thread bottleneck that happens, due to CMS. Whenever CMS hits the max running merges (default changed from 3 to 1 recently), and the merge policy now wants to launch another merge, it forces the incoming thread to wait until one of the BG threads finishes. This is a basic crude throttling mechanism -- you force the mutators (whoever is causing new segments to appear) to stop, so that merging can catch up. Unfortunately, when stressing NRT, that thread is the one that's opening a new NRT reader. So, the first serious problem happens when you call .reopen() on your NRT reader -- this call simply forwards to IW.getReader if the reader was an NRT reader. But, because DirectoryReader.doReopen is synchronized, this had the horrible effect of holding the monitor lock on your main IR. In my test, this blocked all searches (since each search uses incRef/decRef, still sync'd until LUCENE-2156, at least). I fixed this by making doReopen only sync'd on this if it's not simply forwarding to getWriter. So that's a good step forward. This prevents searches from being blocked while trying to reopen to a new NRT. However... it doesn't fix the problem that when an immense merge is off and running, opening an NRT reader could hit a tremendous delay because CMS blocks it. The BalancedSegmentMergePolicy should help here... by avoiding such immense merges. But, I think we should also pursue an improvement to CMS. EG, if it has 2 merges running, where one is huge and one is tiny, it ought to increase thread priority of the tiny one. I think with such a change we could increase the max thread count again, to prevent this starvation. I'll open a separate issue -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2161) Some concurrency improvements for NRT
[ https://issues.apache.org/jira/browse/LUCENE-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12873475#action_12873475 ] Shay Banon commented on LUCENE-2161: Mike, is there a reason why this is not backported to 3.0.2? Some concurrency improvements for NRT - Key: LUCENE-2161 URL: https://issues.apache.org/jira/browse/LUCENE-2161 Project: Lucene - Java Issue Type: Improvement Components: Index Reporter: Michael McCandless Assignee: Michael McCandless Priority: Minor Fix For: 2.9.3, 4.0 Attachments: LUCENE-2161.patch Some concurrency improvements for NRT I found fixed some silly thread bottlenecks that affect NRT: * Multi/DirectoryReader.numDocs is synchronized, I think so only 1 thread computes numDocs if it's -1. I removed this sync, and made numDocs volatile, instead. Yes, multiple threads may compute the numDocs for the first time, but I think that's harmless? * Fixed BitVector's ctor to set count to 0 on creating a new BV, and clone to copy the count over; this saves CPU computing the count unecessarily. * Also strengthened assertions done in SR, testing the delete docs count. I also found an annoying thread bottleneck that happens, due to CMS. Whenever CMS hits the max running merges (default changed from 3 to 1 recently), and the merge policy now wants to launch another merge, it forces the incoming thread to wait until one of the BG threads finishes. This is a basic crude throttling mechanism -- you force the mutators (whoever is causing new segments to appear) to stop, so that merging can catch up. Unfortunately, when stressing NRT, that thread is the one that's opening a new NRT reader. So, the first serious problem happens when you call .reopen() on your NRT reader -- this call simply forwards to IW.getReader if the reader was an NRT reader. But, because DirectoryReader.doReopen is synchronized, this had the horrible effect of holding the monitor lock on your main IR. In my test, this blocked all searches (since each search uses incRef/decRef, still sync'd until LUCENE-2156, at least). I fixed this by making doReopen only sync'd on this if it's not simply forwarding to getWriter. So that's a good step forward. This prevents searches from being blocked while trying to reopen to a new NRT. However... it doesn't fix the problem that when an immense merge is off and running, opening an NRT reader could hit a tremendous delay because CMS blocks it. The BalancedSegmentMergePolicy should help here... by avoiding such immense merges. But, I think we should also pursue an improvement to CMS. EG, if it has 2 merges running, where one is huge and one is tiny, it ought to increase thread priority of the tiny one. I think with such a change we could increase the max thread count again, to prevent this starvation. I'll open a separate issue -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2468) reopen on NRT reader should share readers w/ unchanged segments
[ https://issues.apache.org/jira/browse/LUCENE-2468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12869779#action_12869779 ] Shay Banon commented on LUCENE-2468: Hi Mike, First, I opened and attached a patch regarding the Cache eviction listeners to IndexReader: https://issues.apache.org/jira/browse/LUCENE-2474, tell me what you think. Regarding your last comment, I agree. Though, trying to streamline its usage in terms of having all built in components and possible extensions work well with it make sense. Thats what you suggest in with the filtered doc set, which is cool. reopen on NRT reader should share readers w/ unchanged segments --- Key: LUCENE-2468 URL: https://issues.apache.org/jira/browse/LUCENE-2468 Project: Lucene - Java Issue Type: Bug Reporter: Yonik Seeley Assignee: Michael McCandless Attachments: CacheTest.java, DeletionAwareConstantScoreQuery.java, LUCENE-2468.patch, LUCENE-2468.patch A repoen on an NRT reader doesn't seem to share readers for those segments that are unchanged. http://search.lucidimagination.com/search/document/9f0335d480d2e637/nrt_and_caching_based_on_indexreader -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2468) reopen on NRT reader should share readers w/ unchanged segments
[ https://issues.apache.org/jira/browse/LUCENE-2468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12869369#action_12869369 ] Shay Banon commented on LUCENE-2468: bq. So... why not do this in CachingWrapper/SpanFilter, but, instead of discarding the cache entry when deletions must be enforced, we dynamically apply the deletions? (I think we could use FilteredDocIdSet). Yea, that would work well. You will need to somehow still know when to enable or disable this based on the filter you use (it should basically only be enabled ones that are passed to constant score... . bq. Really... we need a more generic solution here (but, it's a much bigger change), where somehow in creating the scorer per-segment we dynamically determine who/where the deletions are enforced. A Filter need not care about deletions if it's AND'd w/ a query that already enforces the deletions. Agreed. As I see it, caching based on IndexReader is key in Lucene, and with NRT, it should feel the same way as it is without it. NRT should not change the way you build your system. reopen on NRT reader should share readers w/ unchanged segments --- Key: LUCENE-2468 URL: https://issues.apache.org/jira/browse/LUCENE-2468 Project: Lucene - Java Issue Type: Bug Reporter: Yonik Seeley Assignee: Michael McCandless Attachments: CacheTest.java, DeletionAwareConstantScoreQuery.java, LUCENE-2468.patch, LUCENE-2468.patch A repoen on an NRT reader doesn't seem to share readers for those segments that are unchanged. http://search.lucidimagination.com/search/document/9f0335d480d2e637/nrt_and_caching_based_on_indexreader -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2468) reopen on NRT reader should share readers w/ unchanged segments
[ https://issues.apache.org/jira/browse/LUCENE-2468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12868617#action_12868617 ] Shay Banon commented on LUCENE-2468: Sounds like a good solution for me. I just noticed in trunk that there is also explicit purge from FieldCache when possible. I think it would be great to enable to do this for other caches that are based on it (like the CachingWrapperFilter, but externally written ones as well). I was thinking of an expert API to allow to add a CacheEvictionListener or something similar, which will be called when this happens. What do you think? reopen on NRT reader should share readers w/ unchanged segments --- Key: LUCENE-2468 URL: https://issues.apache.org/jira/browse/LUCENE-2468 Project: Lucene - Java Issue Type: Bug Reporter: Yonik Seeley Assignee: Michael McCandless Attachments: LUCENE-2468.patch A repoen on an NRT reader doesn't seem to share readers for those segments that are unchanged. http://search.lucidimagination.com/search/document/9f0335d480d2e637/nrt_and_caching_based_on_indexreader -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2468) reopen on NRT reader should share readers w/ unchanged segments
[ https://issues.apache.org/jira/browse/LUCENE-2468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12868647#action_12868647 ] Shay Banon commented on LUCENE-2468: bq. Shay, as far as CachingWrapperFilter and CacheEvictionListener, it seems more powerful to just let apps create a new query type themselves? That's the nice part of lucene's openness to user query types - start with the code for CachingWrapperFilter and hook up your own caching logic. Yea, but it would be great to know when an IndexReader has decided to actually close, so caches can be eagerly cleaned. Even if one will write a custom implementation, it would benefit it. reopen on NRT reader should share readers w/ unchanged segments --- Key: LUCENE-2468 URL: https://issues.apache.org/jira/browse/LUCENE-2468 Project: Lucene - Java Issue Type: Bug Reporter: Yonik Seeley Assignee: Michael McCandless Attachments: LUCENE-2468.patch A repoen on an NRT reader doesn't seem to share readers for those segments that are unchanged. http://search.lucidimagination.com/search/document/9f0335d480d2e637/nrt_and_caching_based_on_indexreader -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2468) reopen on NRT reader should share readers w/ unchanged segments
[ https://issues.apache.org/jira/browse/LUCENE-2468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12868659#action_12868659 ] Shay Banon commented on LUCENE-2468: I think that the solution suggested, to use the FieldCacheKey is not good enough, sadly. I am attaching a simpel test that shows that this does not work for cases when a query is passed to a searcher, without a filter, but that query, is, for example, a ConstantScoreQuery. I have simply taken the CachingWrapperFiler and changed it to use the getFieldCacheKey instead of using the IndexReader. This is problematic, since a filter can be used somewhere in the query tree, and wrapped for caching. I am running against 3.0.1. reopen on NRT reader should share readers w/ unchanged segments --- Key: LUCENE-2468 URL: https://issues.apache.org/jira/browse/LUCENE-2468 Project: Lucene - Java Issue Type: Bug Reporter: Yonik Seeley Assignee: Michael McCandless Attachments: LUCENE-2468.patch A repoen on an NRT reader doesn't seem to share readers for those segments that are unchanged. http://search.lucidimagination.com/search/document/9f0335d480d2e637/nrt_and_caching_based_on_indexreader -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2468) reopen on NRT reader should share readers w/ unchanged segments
[ https://issues.apache.org/jira/browse/LUCENE-2468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shay Banon updated LUCENE-2468: --- Attachment: CacheTest.java reopen on NRT reader should share readers w/ unchanged segments --- Key: LUCENE-2468 URL: https://issues.apache.org/jira/browse/LUCENE-2468 Project: Lucene - Java Issue Type: Bug Reporter: Yonik Seeley Assignee: Michael McCandless Attachments: CacheTest.java, LUCENE-2468.patch A repoen on an NRT reader doesn't seem to share readers for those segments that are unchanged. http://search.lucidimagination.com/search/document/9f0335d480d2e637/nrt_and_caching_based_on_indexreader -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2468) reopen on NRT reader should share readers w/ unchanged segments
[ https://issues.apache.org/jira/browse/LUCENE-2468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12868816#action_12868816 ] Shay Banon commented on LUCENE-2468: Thanks for the work Michael!. Is this issue going to include the ConstantSoreQuery, or should I open a different issue for this? reopen on NRT reader should share readers w/ unchanged segments --- Key: LUCENE-2468 URL: https://issues.apache.org/jira/browse/LUCENE-2468 Project: Lucene - Java Issue Type: Bug Reporter: Yonik Seeley Assignee: Michael McCandless Attachments: CacheTest.java, LUCENE-2468.patch, LUCENE-2468.patch A repoen on an NRT reader doesn't seem to share readers for those segments that are unchanged. http://search.lucidimagination.com/search/document/9f0335d480d2e637/nrt_and_caching_based_on_indexreader -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2468) reopen on NRT reader should share readers w/ unchanged segments
[ https://issues.apache.org/jira/browse/LUCENE-2468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12868869#action_12868869 ] Shay Banon commented on LUCENE-2468: Check two comments above :), we discussed it. Basically, it does not work with your change and it using a cached filter. reopen on NRT reader should share readers w/ unchanged segments --- Key: LUCENE-2468 URL: https://issues.apache.org/jira/browse/LUCENE-2468 Project: Lucene - Java Issue Type: Bug Reporter: Yonik Seeley Assignee: Michael McCandless Attachments: CacheTest.java, LUCENE-2468.patch, LUCENE-2468.patch A repoen on an NRT reader doesn't seem to share readers for those segments that are unchanged. http://search.lucidimagination.com/search/document/9f0335d480d2e637/nrt_and_caching_based_on_indexreader -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2468) reopen on NRT reader should share readers w/ unchanged segments
[ https://issues.apache.org/jira/browse/LUCENE-2468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12868923#action_12868923 ] Shay Banon commented on LUCENE-2468: Ahh, now I see that, sorry I missed it. But, basically, enforcing deletions means that we are back to the original problem... . I think it would be quite confusing for users, to be honest. Out of the filters, the problematic ones are the ones that can be converted to queries. From what I can see, the FilteredQuery is ok, so, maybe the ConstantScore can be changed (if possible) to do that... . reopen on NRT reader should share readers w/ unchanged segments --- Key: LUCENE-2468 URL: https://issues.apache.org/jira/browse/LUCENE-2468 Project: Lucene - Java Issue Type: Bug Reporter: Yonik Seeley Assignee: Michael McCandless Attachments: CacheTest.java, LUCENE-2468.patch, LUCENE-2468.patch A repoen on an NRT reader doesn't seem to share readers for those segments that are unchanged. http://search.lucidimagination.com/search/document/9f0335d480d2e637/nrt_and_caching_based_on_indexreader -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2468) reopen on NRT reader should share readers w/ unchanged segments
[ https://issues.apache.org/jira/browse/LUCENE-2468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shay Banon updated LUCENE-2468: --- Attachment: DeletionAwareConstantScoreQuery.java Here is a go at making ConstantScoreQuery deletion aware. I named it differently, but it can replace ConstantScoreQuery with a flag making it deletion aware. What do you think? reopen on NRT reader should share readers w/ unchanged segments --- Key: LUCENE-2468 URL: https://issues.apache.org/jira/browse/LUCENE-2468 Project: Lucene - Java Issue Type: Bug Reporter: Yonik Seeley Assignee: Michael McCandless Attachments: CacheTest.java, DeletionAwareConstantScoreQuery.java, LUCENE-2468.patch, LUCENE-2468.patch A repoen on an NRT reader doesn't seem to share readers for those segments that are unchanged. http://search.lucidimagination.com/search/document/9f0335d480d2e637/nrt_and_caching_based_on_indexreader -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2468) reopen on NRT reader should share readers w/ unchanged segments
[ https://issues.apache.org/jira/browse/LUCENE-2468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12868959#action_12868959 ] Shay Banon commented on LUCENE-2468: Another quick question Mike, what do you think about the ability to know when a cache key is actually closed so it can be removed from a cache? Similar in concept to the eviction done from the field cache in trunk by readers, but open so other Reader#cacheKey based caches (which is the simplest way to do caching in Lucene) can use. reopen on NRT reader should share readers w/ unchanged segments --- Key: LUCENE-2468 URL: https://issues.apache.org/jira/browse/LUCENE-2468 Project: Lucene - Java Issue Type: Bug Reporter: Yonik Seeley Assignee: Michael McCandless Attachments: CacheTest.java, DeletionAwareConstantScoreQuery.java, LUCENE-2468.patch, LUCENE-2468.patch A repoen on an NRT reader doesn't seem to share readers for those segments that are unchanged. http://search.lucidimagination.com/search/document/9f0335d480d2e637/nrt_and_caching_based_on_indexreader -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2283) Possible Memory Leak in StoredFieldsWriter
[ https://issues.apache.org/jira/browse/LUCENE-2283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12864042#action_12864042 ] Shay Banon commented on LUCENE-2283: Is there a chance that this can also be applied to 3.0.2 / 3.1? It would be really helpful to get this as soon as possible in the next Lucene version. Possible Memory Leak in StoredFieldsWriter -- Key: LUCENE-2283 URL: https://issues.apache.org/jira/browse/LUCENE-2283 Project: Lucene - Java Issue Type: Bug Affects Versions: 2.4.1 Reporter: Tim Smith Assignee: Michael McCandless Fix For: 4.0 Attachments: LUCENE-2283.patch, LUCENE-2283.patch, LUCENE-2283.patch StoredFieldsWriter creates a pool of PerDoc instances this pool will grow but never be reclaimed by any mechanism furthermore, each PerDoc instance contains a RAMFile. this RAMFile will also never be truncated (and will only ever grow) (as far as i can tell) When feeding documents with large number of stored fields (or one large dominating stored field) this can result in memory being consumed in the RAMFile but never reclaimed. Eventually, each pooled PerDoc could grow very large, even if large documents are rare. Seems like there should be some attempt to reclaim memory from the PerDoc[] instance pool (or otherwise limit the size of RAMFiles that are cached) etc -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2387) IndexWriter retains references to Readers used in Fields (memory leak)
[ https://issues.apache.org/jira/browse/LUCENE-2387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12864044#action_12864044 ] Shay Banon commented on LUCENE-2387: Is there a chance that this can also be applied to 3.0.2 / 3.1? It would be really helpful to get this as soon as possible in the next Lucene version. IndexWriter retains references to Readers used in Fields (memory leak) -- Key: LUCENE-2387 URL: https://issues.apache.org/jira/browse/LUCENE-2387 Project: Lucene - Java Issue Type: Bug Affects Versions: 3.0.1 Reporter: Ruben Laguna Assignee: Michael McCandless Fix For: 4.0 Attachments: LUCENE-2387-29x.patch, LUCENE-2387.patch As described in [1] IndexWriter retains references to Reader used in Fields and that can lead to big memory leaks when using tika's ParsingReaders (as those can take 1MB per ParsingReader). [2] shows a screenshot of the reference chain to the Reader from the IndexWriter taken with Eclipse MAT (Memory Analysis Tool) . The chain is the following: IndexWriter - DocumentsWriter - DocumentsWriterThreadState - DocFieldProcessorPerThread - DocFieldProcessorPerField - Fieldable - Field (fieldsData) - [1] http://markmail.org/thread/ndmcgffg2mnwjo47 [2] http://skitch.com/ecerulm/n7643/eclipse-memory-analyzer -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Created: (LUCENE-2292) ByteBuffer Directory - allowing to store the index outside the heap
ByteBuffer Directory - allowing to store the index outside the heap --- Key: LUCENE-2292 URL: https://issues.apache.org/jira/browse/LUCENE-2292 Project: Lucene - Java Issue Type: New Feature Components: Store Reporter: Shay Banon A byte buffer based directory with the benefit of being able to create direct byte buffer thus storing the index outside the JVM heap. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2292) ByteBuffer Directory - allowing to store the index outside the heap
[ https://issues.apache.org/jira/browse/LUCENE-2292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shay Banon updated LUCENE-2292: --- Attachment: LUCENE-2292.patch ByteBuffer Directory - allowing to store the index outside the heap --- Key: LUCENE-2292 URL: https://issues.apache.org/jira/browse/LUCENE-2292 Project: Lucene - Java Issue Type: New Feature Components: Store Reporter: Shay Banon Attachments: LUCENE-2292.patch A byte buffer based directory with the benefit of being able to create direct byte buffer thus storing the index outside the JVM heap. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2292) ByteBuffer Directory - allowing to store the index outside the heap
[ https://issues.apache.org/jira/browse/LUCENE-2292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12840379#action_12840379 ] Shay Banon commented on LUCENE-2292: Hi, looks interesting as a replacement for RAMDirectory. This class uses ByteBuffer, which has its overhead over simple byte[], though using the same logic (if you verify it) can be used to improve the concurrency in RAMDirectory (just use byte[[). Your patch uses a sun. internal package. If you want to do something similar to MMapDirectory to release the buffer without waiting for GC, do it in the same way using reflection like in MMapDirectory. From what I know, it was there in all JDKs I worked with (its like sun.misc.Unsafe). Have you seen otherwise? If so, its a simple change (though I am not sure about the access control thingy in MMapDirectory, its a performance killer, and caching of the Method(s) make sense). ByteBuffer Directory - allowing to store the index outside the heap --- Key: LUCENE-2292 URL: https://issues.apache.org/jira/browse/LUCENE-2292 Project: Lucene - Java Issue Type: New Feature Components: Store Reporter: Shay Banon Attachments: LUCENE-2292.patch A byte buffer based directory with the benefit of being able to create direct byte buffer thus storing the index outside the JVM heap. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2292) ByteBuffer Directory - allowing to store the index outside the heap
[ https://issues.apache.org/jira/browse/LUCENE-2292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shay Banon updated LUCENE-2292: --- Attachment: LUCENE-2292.patch Attached new patch, does not use sun.* package. I still cache Method since cleaning a buffer is not only done on close of the directory. ByteBuffer Directory - allowing to store the index outside the heap --- Key: LUCENE-2292 URL: https://issues.apache.org/jira/browse/LUCENE-2292 Project: Lucene - Java Issue Type: New Feature Components: Store Reporter: Shay Banon Attachments: LUCENE-2292.patch, LUCENE-2292.patch A byte buffer based directory with the benefit of being able to create direct byte buffer thus storing the index outside the JVM heap. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2292) ByteBuffer Directory - allowing to store the index outside the heap
[ https://issues.apache.org/jira/browse/LUCENE-2292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12840394#action_12840394 ] Shay Banon commented on LUCENE-2292: By the way, an implementation note. I thought about preallocating a large direct buffer and then splicing it into chunks, but currently I think that the complexity (and overhead in maintaining splicing locations) is not really needed and the current caching should do the trick (with the ability to control both the buffer size and the cache size). ByteBuffer Directory - allowing to store the index outside the heap --- Key: LUCENE-2292 URL: https://issues.apache.org/jira/browse/LUCENE-2292 Project: Lucene - Java Issue Type: New Feature Components: Store Reporter: Shay Banon Attachments: LUCENE-2292.patch, LUCENE-2292.patch A byte buffer based directory with the benefit of being able to create direct byte buffer thus storing the index outside the JVM heap. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Created: (LUCENE-1637) Getting an IndexReader from a committed IndexWriter
Getting an IndexReader from a committed IndexWriter --- Key: LUCENE-1637 URL: https://issues.apache.org/jira/browse/LUCENE-1637 Project: Lucene - Java Issue Type: Improvement Components: Index Affects Versions: 2.9 Reporter: Shay Banon I just had a look at the job done in IndexWriter in order to get an IndexReader with all the current ongoing changes done using the IndexWriter. This feature is very useful, and I was wondering if another feature, which (I think) is simple to implement (compared to the previous one) might make sense. Many times, an application opens an IndexWriter, does whatever changes it does, and then commits the changes. It would be nice to get an IndexReader (read only one is fine) that corresponds to the committed (or even closed) IndexWriter. This will allow for a cache of IndexReader that is already used to be updated with a fresh IndexReader, without the need to reopen one (which should be slower than opening one based on the IndexWriter information). The main difference is the fact that the mentioned IndexReader could still be reopened without the need to throw an AlreadyClosedException. More information can be found here: http://www.nabble.com/Getting-an-IndexReader-from-a-committed-IndexWriter-td23551978.html -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1239) IndexWriter deadlock when using ConcurrentMergeScheduler
[ https://issues.apache.org/jira/browse/LUCENE-1239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12579874#action_12579874 ] Shay Banon commented on LUCENE-1239: Yea, it looks like it is my bad, great catch!. While trying to create a better scheduler (at least in terms of reusing threads instead of creating them), I wondered if there is a chance that the current scheduler can be enhanced to support an extension point for that... . I can give such a refactoring a go if you think it make sense. IndexWriter deadlock when using ConcurrentMergeScheduler Key: LUCENE-1239 URL: https://issues.apache.org/jira/browse/LUCENE-1239 Project: Lucene - Java Issue Type: Bug Components: Index Affects Versions: 2.3.1 Environment: Compass 2.0.0M3 (nightly build #57), Lucene 2.3.1, Spring Framework 2.0.7.0 Reporter: Michael Lossos Assignee: Michael McCandless I'm trying to update our application from Compass 2.0.0M1 with Lucene 2.2 to Compass 2.0.0M3 (latest build) with Lucene 2.3.1. I'm holding all other things constant and only changing the Compass and Lucene jars. I'm recreating the search index for our data and seeing deadlock in Lucene's IndexWriter. It appears to be waiting on a signal from the merge thread. I've tried creating a simple reproduction case for this but to no avail. Doing the exact same steps with Compass 2.0.0M1 and Lucene 2.2 has no problems and recreates our search index. That is to say, it's not our code. In particular, the main thread performing the commit (Lucene document save) from Compass is calling Lucene's IndexWriter.optimize(). We're using Compass's ExecutorMergeScheduler to handle the merging, and it is calling IndexWriter.merge(). The main thread in IndexWriter.optimize() enters the wait() at the bottom of that method and is never notified. I can't tell if this is because optimizeMergesPending() is returning true incorrectly, or if IndexWriter.merge()'s notifyAll() is being called prematurely. Looking at the code, it doesn't seem possible for IndexWriter.optimize() to be waiting and miss a notifyAll(), and Lucene's IndexWriter.merge() was recently fixed to always call notifyAll() even on exceptions -- that is all the relevant IndexWriter code looks properly synchronized. Nevertheless, I'm seeing the deadlock behavior described, and it's reproducible using our app and our test data set. Could someone familiar with IndexWriter's synchronization code take another look at it? I'm sorry that I can't give you a simple reproduction test case. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-511) New BufferedIndexOutput optimization fails to update bufferStart
[ http://issues.apache.org/jira/browse/LUCENE-511?page=all ] Shay Banon updated LUCENE-511: -- Attachment: BufferedIndexOutput.java New BufferedIndexOutput optimization fails to update bufferStart Key: LUCENE-511 URL: http://issues.apache.org/jira/browse/LUCENE-511 Project: Lucene - Java Type: Bug Components: Store Versions: 1.9 Reporter: Shay Banon Priority: Critical Attachments: BufferedIndexOutput.java, RAMOutputTest.java New BufferIndexOutput optimization of writeBytes fails to update bufferStart under some conditions. Test case and fix attached. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]