[jira] Commented: (LUCENE-1726) IndexWriter.readerPool create new segmentReader outside of sync block
[ https://issues.apache.org/jira/browse/LUCENE-1726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12730272#action_12730272 ] Michael McCandless commented on LUCENE-1726: {quote} I haven't really figured out a clean way to move the reader creation out of the reader pool synchronization. It turns out to be somewhat tricky, unless we redesign our synchronization. {quote} Maybe we should simply hold off for now? I don't think this sync is costing much in practice, now. Ie, IndexReader.open is not concurrent when opening its segments; nor would we expect multiple threads to be calling IndexWriter.getReader concurrently. There is a wee bit of concurrency we are preventing, ie for a merge or applyDeletes to get a reader just as an NRT reader is being opened, but realistically 1) that's not very costly, and 2) we can't gain that concurrency back anyway because we synchronize on IW when opening the reader. IndexWriter.readerPool create new segmentReader outside of sync block - Key: LUCENE-1726 URL: https://issues.apache.org/jira/browse/LUCENE-1726 Project: Lucene - Java Issue Type: Improvement Components: Index Affects Versions: 2.4.1 Reporter: Jason Rutherglen Assignee: Michael McCandless Priority: Trivial Fix For: 3.1 Attachments: LUCENE-1726.patch, LUCENE-1726.patch, LUCENE-1726.patch, LUCENE-1726.patch, LUCENE-1726.patch, LUCENE-1726.patch, LUCENE-1726.trunk.test.patch Original Estimate: 48h Remaining Estimate: 48h I think we will want to do something like what field cache does with CreationPlaceholder for IndexWriter.readerPool. Otherwise we have the (I think somewhat problematic) issue of all other readerPool.get* methods waiting for an SR to warm. It would be good to implement this for 2.9. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Resolved: (LUCENE-1740) Lucli: Command to change the Analyzer
[ https://issues.apache.org/jira/browse/LUCENE-1740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-1740. Resolution: Fixed I just committed this. Thanks Bernd! Lucli: Command to change the Analyzer - Key: LUCENE-1740 URL: https://issues.apache.org/jira/browse/LUCENE-1740 Project: Lucene - Java Issue Type: New Feature Components: contrib/* Affects Versions: 2.9 Reporter: Bernd Fondermann Fix For: 2.9 Attachments: analyzer_command.patch Currently, Lucli is hardcoded to use StandardAnalyzer. The provided patch introduces a command analyzer to specify a different Analyzer class. If something fails, StandardAnalyzer is the fall-back. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: Relevance's scores on TopFieldCollector/FieldComparator
It's odd that this was necessary. The ScoreCachingWrapperScorer simply wraps (and caches) the result from calling score(), per hit, so that if score() is called more than once we don't have to re-compute it. I don't understand why you were always seeing 0 score come back from it. Mike On Thu, Jul 9, 2009 at 9:09 AM, Raimon Boschraimon.bo...@gmail.com wrote: It Worked for me changing: public void setScorer(Scorer scorer) { this.scorer = new ScoreCachingWrappingScorer(scorer); } by public void setScorer(Scorer scorer) { this.scorer = scorer; } in my PseudoRandomFieldComparator. Regards, Raimon Bosch. Raimon Bosch wrote: Hi, I've just implemented my PseudoRandomFieldComparator (migrated from PseudoRandomComparatorSource) on Solr. The problem that I see is that I don't have acces to the relevance's scores like in the deprecated class ComparatorSource. I saw that the TopFieldCollector is filling the scorer of my PseudoRandomFieldComparator, but the method scorer.score() is always returning 0. It's like we pass correctly the scorer, but the scorer is making reference to a wrong scores's array. How can I have my relevance's scores on my PseudoRandomFieldComparator? Any ideas? Regards, Raimon Bosch. -- View this message in context: http://www.nabble.com/Relevance%27s-scores-on-TopFieldCollector-FieldComparator-tp24407379p24409794.html Sent from the Lucene - Java Developer mailing list archive at Nabble.com. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Assigned: (LUCENE-1566) Large Lucene index can hit false OOM due to Sun JRE issue
[ https://issues.apache.org/jira/browse/LUCENE-1566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless reassigned LUCENE-1566: -- Assignee: Simon Willnauer (was: Michael McCandless) Assigning this one back to you Simon! Large Lucene index can hit false OOM due to Sun JRE issue - Key: LUCENE-1566 URL: https://issues.apache.org/jira/browse/LUCENE-1566 Project: Lucene - Java Issue Type: Bug Components: Index Affects Versions: 2.4.1 Reporter: Michael McCandless Assignee: Simon Willnauer Priority: Minor Fix For: 2.9 Attachments: LUCENE-1566.patch, LUCENE-1566.patch This is not a Lucene issue, but I want to open this so future google diggers can more easily find it. There's this nasty bug in Sun's JRE: http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6478546 The gist seems to be, if you try to read a large (eg 200 MB) number of bytes during a single RandomAccessFile.read call, you can incorrectly hit OOM. Lucene does this, with norms, since we read in one byte per doc per field with norms, as a contiguous array of length maxDoc(). The workaround was a custom patch to do large file reads as several smaller reads. Background here: http://www.nabble.com/problems-with-large-Lucene-index-td22347854.html -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1678) Deprecate Analyzer.tokenStream
[ https://issues.apache.org/jira/browse/LUCENE-1678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12730303#action_12730303 ] Uwe Schindler commented on LUCENE-1678: --- Your solution is also cool, to fix the last problems with the core token streams in LUCENE-1693: If somebody overrides a deprecated method in one of the core tokenstreams (that are not final), the method is never called, because the indexer uses incrementToken per default. The same can be used to fix this problem in TokenStream, too. I will prepare a patch for this (I am currently preparing a new patch with some tests and the solution for the problems with number of attribute instances may not be equals number of attributes). Deprecate Analyzer.tokenStream -- Key: LUCENE-1678 URL: https://issues.apache.org/jira/browse/LUCENE-1678 Project: Lucene - Java Issue Type: Bug Components: Analysis Reporter: Michael McCandless Assignee: Michael McCandless Priority: Minor Fix For: 2.9 Attachments: LUCENE-1678.patch The addition of reusableTokenStream to the core analyzers unfortunately broke back compat of external subclasses: http://www.nabble.com/Extending-StandardAnalyzer-considered-harmful-td23863822.html On upgrading, such subclasses would silently not be used anymore, since Lucene's indexing invokes reusableTokenStream. I think we should should at least deprecate Analyzer.tokenStream, today, so that users see deprecation warnings if their classes override this method. But going forward when we want to change the API of core classes that are extended, I think we have to introduce entirely new classes, to keep back compatibility. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1693) AttributeSource/TokenStream API improvements
[ https://issues.apache.org/jira/browse/LUCENE-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12730306#action_12730306 ] Uwe Schindler commented on LUCENE-1693: --- Mike implemented a nice idea to solve the problems with tokenstreams overriding deprecated methods in LUCENE-1678. I will try this out here and also fix the problems with # of attribute instances != # of attributes and the iterator problems because of this. AttributeSource/TokenStream API improvements Key: LUCENE-1693 URL: https://issues.apache.org/jira/browse/LUCENE-1693 Project: Lucene - Java Issue Type: Improvement Components: Analysis Reporter: Michael Busch Assignee: Michael Busch Priority: Minor Fix For: 2.9 Attachments: LUCENE-1693.patch, LUCENE-1693.patch, LUCENE-1693.patch, LUCENE-1693.patch, LUCENE-1693.patch, LUCENE-1693.patch, LUCENE-1693.patch, LUCENE-1693.patch, lucene-1693.patch, TestCompatibility.java, TestCompatibility.java, TestCompatibility.java, TestCompatibility.java This patch makes the following improvements to AttributeSource and TokenStream/Filter: - removes the set/getUseNewAPI() methods (including the standard ones). Instead by default incrementToken() throws a subclass of UnsupportedOperationException. The indexer tries to call incrementToken() initially once to see if the exception is thrown; if so, it falls back to the old API. - introduces interfaces for all Attributes. The corresponding implementations have the postfix 'Impl', e.g. TermAttribute and TermAttributeImpl. AttributeSource now has a factory for creating the Attribute instances; the default implementation looks for implementing classes with the postfix 'Impl'. Token now implements all 6 TokenAttribute interfaces. - new method added to AttributeSource: addAttributeImpl(AttributeImpl). Using reflection it walks up in the class hierarchy of the passed in object and finds all interfaces that the class or superclasses implement and that extend the Attribute interface. It then adds the interface-instance mappings to the attribute map for each of the found interfaces. - AttributeImpl now has a default implementation of toString that uses reflection to print out the values of the attributes in a default formatting. This makes it a bit easier to implement AttributeImpl, because toString() was declared abstract before. - Cloning is now done much more efficiently in captureState. The method figures out which unique AttributeImpl instances are contained as values in the attributes map, because those are the ones that need to be cloned. It creates a single linked list that supports deep cloning (in the inner class AttributeSource.State). AttributeSource keeps track of when this state changes, i.e. whenever new attributes are added to the AttributeSource. Only in that case will captureState recompute the state, otherwise it will simply clone the precomputed state and return the clone. restoreState(AttributeSource.State) walks the linked list and uses the copyTo() method of AttributeImpl to copy all values over into the attribute that the source stream (e.g. SinkTokenizer) uses. The cloning performance can be greatly improved if not multiple AttributeImpl instances are used in one TokenStream. A user can e.g. simply add a Token instance to the stream instead of the individual attributes. Or the user could implement a subclass of AttributeImpl that implements exactly the Attribute interfaces needed. I think this should be considered an expert API (addAttributeImpl), as this manual optimization is only needed if cloning performance is crucial. I ran some quick performance tests using Tee/Sink tokenizers (which do cloning) and the performance was roughly 20% faster with the new API. I'll run some more performance tests and post more numbers then. Note also that when we add serialization to the Attributes, e.g. for supporting storing serialized TokenStreams in the index, then the serialization should benefit even significantly more from the new API than cloning. Also, the TokenStream API does not change, except for the removal of the set/getUseNewAPI methods. So the patches in LUCENE-1460 should still work. All core tests pass, however, I need to update all the documentation and also add some unit tests for the new AttributeSource functionality. So this patch is not ready to commit yet, but I wanted to post it already for some feedback. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail:
[jira] Created: (LUCENE-1741) Make MMapDirectory.MAX_BBUF user configureable to support chunking the index files in smaller parts
Make MMapDirectory.MAX_BBUF user configureable to support chunking the index files in smaller parts --- Key: LUCENE-1741 URL: https://issues.apache.org/jira/browse/LUCENE-1741 Project: Lucene - Java Issue Type: Improvement Affects Versions: 2.9 Reporter: Uwe Schindler Assignee: Uwe Schindler Priority: Minor Fix For: 3.1 This is a followup for java-user thred: http://www.lucidimagination.com/search/document/9ba9137bb5d8cb78/oom_with_2_9#9bf3b5b8f3b1fb9b It is easy to implement, just add a setter method for this parameter to MMapDir. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-1741) Make MMapDirectory.MAX_BBUF user configureable to support chunking the index files in smaller parts
[ https://issues.apache.org/jira/browse/LUCENE-1741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-1741: -- Attachment: LUCENE-1741.patch Patch that allows configuration of chunk size. I will commit in the evening (MEZ). Make MMapDirectory.MAX_BBUF user configureable to support chunking the index files in smaller parts --- Key: LUCENE-1741 URL: https://issues.apache.org/jira/browse/LUCENE-1741 Project: Lucene - Java Issue Type: Improvement Affects Versions: 2.9 Reporter: Uwe Schindler Assignee: Uwe Schindler Priority: Minor Fix For: 3.1 Attachments: LUCENE-1741.patch This is a followup for java-user thred: http://www.lucidimagination.com/search/document/9ba9137bb5d8cb78/oom_with_2_9#9bf3b5b8f3b1fb9b It is easy to implement, just add a setter method for this parameter to MMapDir. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-1741) Make MMapDirectory.MAX_BBUF user configureable to support chunking the index files in smaller parts
[ https://issues.apache.org/jira/browse/LUCENE-1741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-1741: -- Fix Version/s: (was: 3.1) 2.9 Make MMapDirectory.MAX_BBUF user configureable to support chunking the index files in smaller parts --- Key: LUCENE-1741 URL: https://issues.apache.org/jira/browse/LUCENE-1741 Project: Lucene - Java Issue Type: Improvement Affects Versions: 2.9 Reporter: Uwe Schindler Assignee: Uwe Schindler Priority: Minor Fix For: 2.9 Attachments: LUCENE-1741.patch This is a followup for java-user thred: http://www.lucidimagination.com/search/document/9ba9137bb5d8cb78/oom_with_2_9#9bf3b5b8f3b1fb9b It is easy to implement, just add a setter method for this parameter to MMapDir. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1741) Make MMapDirectory.MAX_BBUF user configureable to support chunking the index files in smaller parts
[ https://issues.apache.org/jira/browse/LUCENE-1741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12730368#action_12730368 ] Michael McCandless commented on LUCENE-1741: Should we default the chunking size to something smaller (128 MB?) on 32 bit JRE? Make MMapDirectory.MAX_BBUF user configureable to support chunking the index files in smaller parts --- Key: LUCENE-1741 URL: https://issues.apache.org/jira/browse/LUCENE-1741 Project: Lucene - Java Issue Type: Improvement Affects Versions: 2.9 Reporter: Uwe Schindler Assignee: Uwe Schindler Priority: Minor Fix For: 2.9 Attachments: LUCENE-1741.patch This is a followup for java-user thred: http://www.lucidimagination.com/search/document/9ba9137bb5d8cb78/oom_with_2_9#9bf3b5b8f3b1fb9b It is easy to implement, just add a setter method for this parameter to MMapDir. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1741) Make MMapDirectory.MAX_BBUF user configureable to support chunking the index files in smaller parts
[ https://issues.apache.org/jira/browse/LUCENE-1741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12730373#action_12730373 ] Uwe Schindler commented on LUCENE-1741: --- Good idea. Do we have still this 64bit detection property in the utils? If yes, this could be easily done. Make MMapDirectory.MAX_BBUF user configureable to support chunking the index files in smaller parts --- Key: LUCENE-1741 URL: https://issues.apache.org/jira/browse/LUCENE-1741 Project: Lucene - Java Issue Type: Improvement Affects Versions: 2.9 Reporter: Uwe Schindler Assignee: Uwe Schindler Priority: Minor Fix For: 2.9 Attachments: LUCENE-1741.patch This is a followup for java-user thred: http://www.lucidimagination.com/search/document/9ba9137bb5d8cb78/oom_with_2_9#9bf3b5b8f3b1fb9b It is easy to implement, just add a setter method for this parameter to MMapDir. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-1741) Make MMapDirectory.MAX_BBUF user configureable to support chunking the index files in smaller parts
[ https://issues.apache.org/jira/browse/LUCENE-1741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-1741: -- Attachment: LUCENE-1741.patch Attached is a patch using the JRE_IS_64BIT in Constants. I set the default to 256 MiBytes (128 seems to small for large indexes, if the index is e.g. about 1.5 GiBytes, you would get 6 junks. I have no test data which size is good, it is just trying out (and depends e.g. on how often you reboot Windows, as Eks said). Make MMapDirectory.MAX_BBUF user configureable to support chunking the index files in smaller parts --- Key: LUCENE-1741 URL: https://issues.apache.org/jira/browse/LUCENE-1741 Project: Lucene - Java Issue Type: Improvement Affects Versions: 2.9 Reporter: Uwe Schindler Assignee: Uwe Schindler Priority: Minor Fix For: 2.9 Attachments: LUCENE-1741.patch, LUCENE-1741.patch This is a followup for java-user thred: http://www.lucidimagination.com/search/document/9ba9137bb5d8cb78/oom_with_2_9#9bf3b5b8f3b1fb9b It is easy to implement, just add a setter method for this parameter to MMapDir. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: [jira] Updated: (LUCENE-1741) Make MMapDirectory.MAX_BBUF user configureable to support chunking the index files in smaller parts
I have no test data which size is good, it is just trying out Sure, for this you need bad OS and large index, you are not as lucky as I am to have it :) Anyhow, I would argument against default value. An algorithm is quite simple, if you hit OOM on map(), reduce this value until it fits :) no need to touch it if it works... - Original Message From: Uwe Schindler (JIRA) j...@apache.org To: java-dev@lucene.apache.org Sent: Monday, 13 July, 2009 17:21:15 Subject: [jira] Updated: (LUCENE-1741) Make MMapDirectory.MAX_BBUF user configureable to support chunking the index files in smaller parts [ https://issues.apache.org/jira/browse/LUCENE-1741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-1741: -- Attachment: LUCENE-1741.patch Attached is a patch using the JRE_IS_64BIT in Constants. I set the default to 256 MiBytes (128 seems to small for large indexes, if the index is e.g. about 1.5 GiBytes, you would get 6 junks. I have no test data which size is good, it is just trying out (and depends e.g. on how often you reboot Windows, as Eks said). Make MMapDirectory.MAX_BBUF user configureable to support chunking the index files in smaller parts --- Key: LUCENE-1741 URL: https://issues.apache.org/jira/browse/LUCENE-1741 Project: Lucene - Java Issue Type: Improvement Affects Versions: 2.9 Reporter: Uwe Schindler Assignee: Uwe Schindler Priority: Minor Fix For: 2.9 Attachments: LUCENE-1741.patch, LUCENE-1741.patch This is a followup for java-user thred: http://www.lucidimagination.com/search/document/9ba9137bb5d8cb78/oom_with_2_9#9bf3b5b8f3b1fb9b It is easy to implement, just add a setter method for this parameter to MMapDir. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1741) Make MMapDirectory.MAX_BBUF user configureable to support chunking the index files in smaller parts
[ https://issues.apache.org/jira/browse/LUCENE-1741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12730378#action_12730378 ] Michael McCandless commented on LUCENE-1741: Patch looks good! Make MMapDirectory.MAX_BBUF user configureable to support chunking the index files in smaller parts --- Key: LUCENE-1741 URL: https://issues.apache.org/jira/browse/LUCENE-1741 Project: Lucene - Java Issue Type: Improvement Affects Versions: 2.9 Reporter: Uwe Schindler Assignee: Uwe Schindler Priority: Minor Fix For: 2.9 Attachments: LUCENE-1741.patch, LUCENE-1741.patch This is a followup for java-user thred: http://www.lucidimagination.com/search/document/9ba9137bb5d8cb78/oom_with_2_9#9bf3b5b8f3b1fb9b It is easy to implement, just add a setter method for this parameter to MMapDir. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1726) IndexWriter.readerPool create new segmentReader outside of sync block
[ https://issues.apache.org/jira/browse/LUCENE-1726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12730424#action_12730424 ] Jason Rutherglen commented on LUCENE-1726: -- I was thinking the sync on all of readerPool could delay someone trying to call IW.getReader who would wait for a potentially large new segment to be warmed. However because IW.mergeMiddle isn't loading the term index, IW.getReader will pay the cost of loading the term index. So yeah, it doesn't seem necessary. IndexWriter.readerPool create new segmentReader outside of sync block - Key: LUCENE-1726 URL: https://issues.apache.org/jira/browse/LUCENE-1726 Project: Lucene - Java Issue Type: Improvement Components: Index Affects Versions: 2.4.1 Reporter: Jason Rutherglen Assignee: Michael McCandless Priority: Trivial Fix For: 3.1 Attachments: LUCENE-1726.patch, LUCENE-1726.patch, LUCENE-1726.patch, LUCENE-1726.patch, LUCENE-1726.patch, LUCENE-1726.patch, LUCENE-1726.trunk.test.patch Original Estimate: 48h Remaining Estimate: 48h I think we will want to do something like what field cache does with CreationPlaceholder for IndexWriter.readerPool. Otherwise we have the (I think somewhat problematic) issue of all other readerPool.get* methods waiting for an SR to warm. It would be good to implement this for 2.9. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-1712) Set default precisionStep for NumericField and NumericRangeFilter
[ https://issues.apache.org/jira/browse/LUCENE-1712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-1712: -- Attachment: LUCENE-1712.patch Attached is a patch with the default precisionStep of 4. The javadocs of NumericRangeQuery list all possible and senseful values. This patch also contains some cleanup in NumericUtils (rename constants) and a lot of other JavaDocs fixes. It also changes the token types of the TokenStream (no difference between 32/64 bit vals needed) and adds a test for them. Set default precisionStep for NumericField and NumericRangeFilter - Key: LUCENE-1712 URL: https://issues.apache.org/jira/browse/LUCENE-1712 Project: Lucene - Java Issue Type: Improvement Affects Versions: 2.9 Reporter: Michael McCandless Assignee: Uwe Schindler Priority: Minor Fix For: 2.9 Attachments: LUCENE-1712.patch This is a spinoff from LUCENE-1701. A user using Numeric* should not need to understand what's under the hood in order to do their indexing searching. They should be able to simply: {code} doc.add(new NumericField(price, 15.50); {code} And have a decent default precisionStep selected for them. Actually, if we add ctors to NumericField for each of the supported types (so the above code works), we can set the default per-type. I think we should do that? 4 for int and 6 for long was proposed as good defaults. The default need not be perfect, as advanced users can always optimize their precisionStep, and for users experiencing slow RangeQuery performance, NumericRangeQuery with any of the defaults we are discussing will be much faster. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1712) Set default precisionStep for NumericField and NumericRangeFilter
[ https://issues.apache.org/jira/browse/LUCENE-1712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12730538#action_12730538 ] Michael McCandless commented on LUCENE-1712: Patch looks good Uwe! Set default precisionStep for NumericField and NumericRangeFilter - Key: LUCENE-1712 URL: https://issues.apache.org/jira/browse/LUCENE-1712 Project: Lucene - Java Issue Type: Improvement Affects Versions: 2.9 Reporter: Michael McCandless Assignee: Uwe Schindler Priority: Minor Fix For: 2.9 Attachments: LUCENE-1712.patch This is a spinoff from LUCENE-1701. A user using Numeric* should not need to understand what's under the hood in order to do their indexing searching. They should be able to simply: {code} doc.add(new NumericField(price, 15.50); {code} And have a decent default precisionStep selected for them. Actually, if we add ctors to NumericField for each of the supported types (so the above code works), we can set the default per-type. I think we should do that? 4 for int and 6 for long was proposed as good defaults. The default need not be perfect, as advanced users can always optimize their precisionStep, and for users experiencing slow RangeQuery performance, NumericRangeQuery with any of the defaults we are discussing will be much faster. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1741) Make MMapDirectory.MAX_BBUF user configureable to support chunking the index files in smaller parts
[ https://issues.apache.org/jira/browse/LUCENE-1741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12730543#action_12730543 ] Uwe Schindler commented on LUCENE-1741: --- Eks Dev wrote in java-dev: bq. I have no test data which size is good, it is just trying out Sure, for this you need bad OS and large index, you are not as lucky as I am to have it :) Anyhow, I would argument against default value. An algorithm is quite simple, if you hit OOM on map(), reduce this value until it fits :) no need to touch it if it works... Make MMapDirectory.MAX_BBUF user configureable to support chunking the index files in smaller parts --- Key: LUCENE-1741 URL: https://issues.apache.org/jira/browse/LUCENE-1741 Project: Lucene - Java Issue Type: Improvement Affects Versions: 2.9 Reporter: Uwe Schindler Assignee: Uwe Schindler Priority: Minor Fix For: 2.9 Attachments: LUCENE-1741.patch, LUCENE-1741.patch This is a followup for java-user thred: http://www.lucidimagination.com/search/document/9ba9137bb5d8cb78/oom_with_2_9#9bf3b5b8f3b1fb9b It is easy to implement, just add a setter method for this parameter to MMapDir. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1741) Make MMapDirectory.MAX_BBUF user configureable to support chunking the index files in smaller parts
[ https://issues.apache.org/jira/browse/LUCENE-1741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12730547#action_12730547 ] Uwe Schindler commented on LUCENE-1741: --- OK, we have two patches, we can think about using one of them. In my opinion, there is no problem with limiting the chunk size on 32 bit systems. The overhead of choosing the right chunk is neglectible, as it only affects seeking. Normal sequential reads must only check, if the current chunk has enough data and if not, move to the next. The non-chunked stream does this check, too (to throw EOF). With a chunk size of 256 MB, the theoretical maximum number of chunks is 8 (which can be never reached...). Any other comments? Eks: What was you value, that fixed your problem without rebooting. And: How big was your biggest index file? Make MMapDirectory.MAX_BBUF user configureable to support chunking the index files in smaller parts --- Key: LUCENE-1741 URL: https://issues.apache.org/jira/browse/LUCENE-1741 Project: Lucene - Java Issue Type: Improvement Affects Versions: 2.9 Reporter: Uwe Schindler Assignee: Uwe Schindler Priority: Minor Fix For: 2.9 Attachments: LUCENE-1741.patch, LUCENE-1741.patch This is a followup for java-user thred: http://www.lucidimagination.com/search/document/9ba9137bb5d8cb78/oom_with_2_9#9bf3b5b8f3b1fb9b It is easy to implement, just add a setter method for this parameter to MMapDir. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1741) Make MMapDirectory.MAX_BBUF user configureable to support chunking the index files in smaller parts
[ https://issues.apache.org/jira/browse/LUCENE-1741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12730551#action_12730551 ] Paul Smith commented on LUCENE-1741: An algorithm is nice if there are no specific settings specified, but in an environment where large indexes may be opened more frequently than the common use cases, then what is happening is that the Memory layer is getting OOM conditions too much, forcing too much GC activity to attempt the operation. I'd vote for checking if settings have been requested and using them, and if not set rely on a self-tuning algorithm. In a really long running application, the process address space may become more and more fragmented, and the malloc library may not be able to defragment it, so the auto-tuning is nice, but it may not be great for all peoples needs. For example, our specific use case (crazy as this may be) is to have many different indexes open at any one time, closing and opening them frequently (the Realtime Search stuff we are following very closely indeed.. :) ). I'm just thinking that our VM (64bit) may find it difficult to find the contiguous non-heap space for the MMap operation after many days/weeks in operation. Maybe I'm just paranoid. But for operational purposes, it'd be nice to know we could change the setting based on our observations. thanks! Make MMapDirectory.MAX_BBUF user configureable to support chunking the index files in smaller parts --- Key: LUCENE-1741 URL: https://issues.apache.org/jira/browse/LUCENE-1741 Project: Lucene - Java Issue Type: Improvement Affects Versions: 2.9 Reporter: Uwe Schindler Assignee: Uwe Schindler Priority: Minor Fix For: 2.9 Attachments: LUCENE-1741.patch, LUCENE-1741.patch This is a followup for java-user thred: http://www.lucidimagination.com/search/document/9ba9137bb5d8cb78/oom_with_2_9#9bf3b5b8f3b1fb9b It is easy to implement, just add a setter method for this parameter to MMapDir. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1741) Make MMapDirectory.MAX_BBUF user configureable to support chunking the index files in smaller parts
[ https://issues.apache.org/jira/browse/LUCENE-1741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12730560#action_12730560 ] Eks Dev commented on LUCENE-1741: - Uwe, you convinced me, I looked at the code, and indeed, no performance penalty for this. what helped me was 1.1G... (I've tried to find maximum); Max file size is 1.4G ... but 1.1 is just OS coincidence, no magic about it. I guess 512mb makes a good value, if memory is so fragmented that you cannot allocate 0.5G, you are definitely having some other problems around. We are taliking here about VM memory, and even on windows having 512Mb in block is not an issue (or better said, I have never seen problems with this value). @Paul: It is misunderstanding, my algorithm was meant to be manual... no catching OOM and retry (I've burned my fingers already on catching RuntimeException, do only when absolutely desperate :). Uwe made this value user settable anyhow. Thanks Uwe! Make MMapDirectory.MAX_BBUF user configureable to support chunking the index files in smaller parts --- Key: LUCENE-1741 URL: https://issues.apache.org/jira/browse/LUCENE-1741 Project: Lucene - Java Issue Type: Improvement Affects Versions: 2.9 Reporter: Uwe Schindler Assignee: Uwe Schindler Priority: Minor Fix For: 2.9 Attachments: LUCENE-1741.patch, LUCENE-1741.patch This is a followup for java-user thred: http://www.lucidimagination.com/search/document/9ba9137bb5d8cb78/oom_with_2_9#9bf3b5b8f3b1fb9b It is easy to implement, just add a setter method for this parameter to MMapDir. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1741) Make MMapDirectory.MAX_BBUF user configureable to support chunking the index files in smaller parts
[ https://issues.apache.org/jira/browse/LUCENE-1741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12730580#action_12730580 ] Michael McCandless commented on LUCENE-1741: I'd be more comfortable w/ 256 MB (or, smaller); I think fragmentation could easily cause 512MB to give the false OOM. I don't think we'll see real perf costs from buffer switching unless chunk size is very small (eg 1 MB). In any event, Uwe can you add to the javadocs describing this false OOM problem and what to do if you hit it? Make MMapDirectory.MAX_BBUF user configureable to support chunking the index files in smaller parts --- Key: LUCENE-1741 URL: https://issues.apache.org/jira/browse/LUCENE-1741 Project: Lucene - Java Issue Type: Improvement Affects Versions: 2.9 Reporter: Uwe Schindler Assignee: Uwe Schindler Priority: Minor Fix For: 2.9 Attachments: LUCENE-1741.patch, LUCENE-1741.patch This is a followup for java-user thred: http://www.lucidimagination.com/search/document/9ba9137bb5d8cb78/oom_with_2_9#9bf3b5b8f3b1fb9b It is easy to implement, just add a setter method for this parameter to MMapDir. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1741) Make MMapDirectory.MAX_BBUF user configureable to support chunking the index files in smaller parts
[ https://issues.apache.org/jira/browse/LUCENE-1741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12730590#action_12730590 ] Uwe Schindler commented on LUCENE-1741: --- Javadocs state (in FileChannel#map): For most operating systems, mapping a file into memory is more expensive than reading or writing a few tens of kilobytes of data via the usual read and write methods. From the standpoint of performance it is generally only worth mapping relatively large files into memory. So it should be as big as possible. A second problem with too many buffers is, that the MMU/TLB cannot handle too many of them effective. In my opinion, maybe we could enhance MMapDirectory to work together with FileSwitchDirectory or something like that, to only use mmap for large files and all others handled by NIO/Simple. E.g. mapping the segments.gen file into memory is really wasting resources. So MMapDir would only return the MMapIndexInput, if the underlying file is X Bytes (e.g. 8 Megabytes per default) and fall back to SimpleFSIndexInput otherwise. bq. In any event, Uwe can you add to the javadocs describing this false OOM problem and what to do if you hit it? Will do this tomorrow, will go to bed now. Here are also some other numbers about this problem: http://groups.google.com/group/jsr203-interest/browse_thread/thread/66f6a5042f2b0c4a/12228bbd57d1956d Make MMapDirectory.MAX_BBUF user configureable to support chunking the index files in smaller parts --- Key: LUCENE-1741 URL: https://issues.apache.org/jira/browse/LUCENE-1741 Project: Lucene - Java Issue Type: Improvement Affects Versions: 2.9 Reporter: Uwe Schindler Assignee: Uwe Schindler Priority: Minor Fix For: 2.9 Attachments: LUCENE-1741.patch, LUCENE-1741.patch This is a followup for java-user thred: http://www.lucidimagination.com/search/document/9ba9137bb5d8cb78/oom_with_2_9#9bf3b5b8f3b1fb9b It is easy to implement, just add a setter method for this parameter to MMapDir. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Lucene contrib build failing on jdk 1.4
Hi, I checked out Lucene java trunk and ran build-contrib. When I run using Sun JDK 1.6.0_07 it builds successfully, but using Sun JDK 1.4.2_19 I get the error below. Am I doing something wrong? Thanks in advance, Adriano Crestani Campos ... build-artifacts-and-tests: bdb: BUILD FAILED /home/adcampos/lucene/trunk2/build.xml:626: The following error occurred while executing this line: /home/adcampos/lucene/trunk2/build.xml:616: The following error occurred while executing this line: /home/adcampos/lucene/trunk2/contrib/db/build.xml:29: The following error occurred while executing this line: /home/adcampos/lucene/trunk2/contrib/db/bdb/build.xml:14: java.lang.UnsupportedClassVersionError: com/sleepycat/db/internal/Db (Unsupported major.minor version 49.0) Total time: 17 seconds
[jira] Updated: (LUCENE-1567) New flexible query parser
[ https://issues.apache.org/jira/browse/LUCENE-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adriano Crestani updated LUCENE-1567: - Attachment: lucene_1567_adriano_crestani_07_13_2009.patch Hey guys, Here is a patch containing some changes I did on top of last Luis' patch ( lucene_trunk_FlexQueryParser_2009July10_v5.patch): - javadoc reviewed and improved - 2 new classes: QueryParserHelper and LuceneQueryParserHelper, they make it easier to use the new query parser - added the ability to set the prefix length for fuzzy queries, it was still missing in the new query parser - resolved some TODOs - AnalyzerQueryNodeProcessor is now using only the new TokenStream API...is it required to be compatible with the old API even if it is in contrib? - I duplicated the test cases so they run using the query parser API directly, the query parser helpers and the query parser wrappers, this way we test the three ways the user can actually use the query parser. I think that is everything. I will keep reviewing and improving the documentation, I think there might be some broken javadoc links yet. I also would like to rename the package and everythiing else that does reference to lucene2 to lucene. I think it does not make sense to have a package name tied to a version. So, the package org.apache.lucene.queryParser.lucene2 would be renamed to org.apache.lucene.queryParser.lucene. I know it's kind of weird, because there are 2 lucene in the package declararion, but I think it's better than lucene2. Anyway, suggestions about this are welcome :) ... if nobody replies I will feel free to rename it and submit a new patch soon. I will also work on writing a documentation for Lucene wiki that explains how to easily migrate from the old query parser to the new one, but I will only add it to the wiki when the code is committed to the trunk, it doesn't make sense a wiki documentation about something that is not even committed, agreed? Suggestions? Regards, Adriano Crestani Campos New flexible query parser - Key: LUCENE-1567 URL: https://issues.apache.org/jira/browse/LUCENE-1567 Project: Lucene - Java Issue Type: New Feature Components: QueryParser Environment: N/A Reporter: Luis Alves Assignee: Grant Ingersoll Attachments: lucene_1567_adriano_crestani_07_13_2009.patch, lucene_trunk_FlexQueryParser_2009July09_v4.patch, lucene_trunk_FlexQueryParser_2009July10_v5.patch, lucene_trunk_FlexQueryParser_2009March24.patch, lucene_trunk_FlexQueryParser_2009March26_v3.patch, new_query_parser_src.tar, QueryParser_restructure_meetup_june2009_v2.pdf From New flexible query parser thread by Micheal Busch in my team at IBM we have used a different query parser than Lucene's in our products for quite a while. Recently we spent a significant amount of time in refactoring the code and designing a very generic architecture, so that this query parser can be easily used for different products with varying query syntaxes. This work was originally driven by Andreas Neumann (who, however, left our team); most of the code was written by Luis Alves, who has been a bit active in Lucene in the past, and Adriano Campos, who joined our team at IBM half a year ago. Adriano is Apache committer and PMC member on the Tuscany project and getting familiar with Lucene now too. We think this code is much more flexible and extensible than the current Lucene query parser, and would therefore like to contribute it to Lucene. I'd like to give a very brief architecture overview here, Adriano and Luis can then answer more detailed questions as they're much more familiar with the code than I am. The goal was it to separate syntax and semantics of a query. E.g. 'a AND b', '+a +b', 'AND(a,b)' could be different syntaxes for the same query. We distinguish the semantics of the different query components, e.g. whether and how to tokenize/lemmatize/normalize the different terms or which Query objects to create for the terms. We wanted to be able to write a parser with a new syntax, while reusing the underlying semantics, as quickly as possible. In fact, Adriano is currently working on a 100% Lucene-syntax compatible implementation to make it easy for people who are using Lucene's query parser to switch. The query parser has three layers and its core is what we call the QueryNodeTree. It is a tree that initially represents the syntax of the original query, e.g. for 'a AND b': AND / \ A B The three layers are: 1. QueryParser 2. QueryNodeProcessor 3. QueryBuilder 1. The upper layer is the parsing layer which simply transforms the query text string into a QueryNodeTree. Currently our implementations of this layer use javacc. 2. The query node processors do most of the work. It is in
[jira] Commented: (LUCENE-1567) New flexible query parser
[ https://issues.apache.org/jira/browse/LUCENE-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12730627#action_12730627 ] Adriano Crestani commented on LUCENE-1567: -- Ah, I also couldn't run ant build-contrib using Java 1.4, it fails, I even tried a clean trunk and it did not work. Were you able to run it using 1.4 Luis? I already opened a thread on the ML about this: http://markmail.org/thread/3fyldf7t423fhwbm New flexible query parser - Key: LUCENE-1567 URL: https://issues.apache.org/jira/browse/LUCENE-1567 Project: Lucene - Java Issue Type: New Feature Components: QueryParser Environment: N/A Reporter: Luis Alves Assignee: Grant Ingersoll Attachments: lucene_1567_adriano_crestani_07_13_2009.patch, lucene_trunk_FlexQueryParser_2009July09_v4.patch, lucene_trunk_FlexQueryParser_2009July10_v5.patch, lucene_trunk_FlexQueryParser_2009March24.patch, lucene_trunk_FlexQueryParser_2009March26_v3.patch, new_query_parser_src.tar, QueryParser_restructure_meetup_june2009_v2.pdf From New flexible query parser thread by Micheal Busch in my team at IBM we have used a different query parser than Lucene's in our products for quite a while. Recently we spent a significant amount of time in refactoring the code and designing a very generic architecture, so that this query parser can be easily used for different products with varying query syntaxes. This work was originally driven by Andreas Neumann (who, however, left our team); most of the code was written by Luis Alves, who has been a bit active in Lucene in the past, and Adriano Campos, who joined our team at IBM half a year ago. Adriano is Apache committer and PMC member on the Tuscany project and getting familiar with Lucene now too. We think this code is much more flexible and extensible than the current Lucene query parser, and would therefore like to contribute it to Lucene. I'd like to give a very brief architecture overview here, Adriano and Luis can then answer more detailed questions as they're much more familiar with the code than I am. The goal was it to separate syntax and semantics of a query. E.g. 'a AND b', '+a +b', 'AND(a,b)' could be different syntaxes for the same query. We distinguish the semantics of the different query components, e.g. whether and how to tokenize/lemmatize/normalize the different terms or which Query objects to create for the terms. We wanted to be able to write a parser with a new syntax, while reusing the underlying semantics, as quickly as possible. In fact, Adriano is currently working on a 100% Lucene-syntax compatible implementation to make it easy for people who are using Lucene's query parser to switch. The query parser has three layers and its core is what we call the QueryNodeTree. It is a tree that initially represents the syntax of the original query, e.g. for 'a AND b': AND / \ A B The three layers are: 1. QueryParser 2. QueryNodeProcessor 3. QueryBuilder 1. The upper layer is the parsing layer which simply transforms the query text string into a QueryNodeTree. Currently our implementations of this layer use javacc. 2. The query node processors do most of the work. It is in fact a configurable chain of processors. Each processors can walk the tree and modify nodes or even the tree's structure. That makes it possible to e.g. do query optimization before the query is executed or to tokenize terms. 3. The third layer is also a configurable chain of builders, which transform the QueryNodeTree into Lucene Query objects. Furthermore the query parser uses flexible configuration objects, which are based on AttributeSource/Attribute. It also uses message classes that allow to attach resource bundles. This makes it possible to translate messages, which is an important feature of a query parser. This design allows us to develop different query syntaxes very quickly. Adriano wrote the Lucene-compatible syntax in a matter of hours, and the underlying processors and builders in a few days. We now have a 100% compatible Lucene query parser, which means the syntax is identical and all query parser test cases pass on the new one too using a wrapper. Recent posts show that there is demand for query syntax improvements, e.g improved range query syntax or operator precedence. There are already different QP implementations in Lucene+contrib, however I think we did not keep them all up to date and in sync. This is not too surprising, because usually when fixes and changes are made to the main query parser, people don't make the corresponding changes in the contrib parsers. (I'm guilty here too) With this new architecture it will be much easier to maintain different query syntaxes, as the actual code for the first layer is not
Re: Lucene contrib build failing on jdk 1.4
Hey Adriano, Only core is fully 1.4. To build all of contrib you must use at least java 1.5. If you want to build a contrib that is supposed to 1.4 with 1.4, use the individual build file for that contrib. -- - Mark http://www.lucidimagination.com On Mon, Jul 13, 2009 at 7:44 PM, Adriano Crestani adrianocrest...@apache.org wrote: Hi, I checked out Lucene java trunk and ran build-contrib. When I run using Sun JDK 1.6.0_07 it builds successfully, but using Sun JDK 1.4.2_19 I get the error below. Am I doing something wrong? Thanks in advance, Adriano Crestani Campos ... build-artifacts-and-tests: bdb: BUILD FAILED /home/adcampos/lucene/trunk2/build.xml:626: The following error occurred while executing this line: /home/adcampos/lucene/trunk2/build.xml:616: The following error occurred while executing this line: /home/adcampos/lucene/trunk2/contrib/db/build.xml:29: The following error occurred while executing this line: /home/adcampos/lucene/trunk2/contrib/db/bdb/build.xml:14: java.lang.UnsupportedClassVersionError: com/sleepycat/db/internal/Db (Unsupported major.minor version 49.0) Total time: 17 seconds
[jira] Created: (LUCENE-1742) Wrap SegmentInfos in public class
Wrap SegmentInfos in public class -- Key: LUCENE-1742 URL: https://issues.apache.org/jira/browse/LUCENE-1742 Project: Lucene - Java Issue Type: Improvement Reporter: Jason Rutherglen Wrap SegmentInfos in a public class so that subclasses of MergePolicy do not need to be in the org.apache.lucene.index package. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-1742) Wrap SegmentInfos in public class
[ https://issues.apache.org/jira/browse/LUCENE-1742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Rutherglen updated LUCENE-1742: - Component/s: Index Fix Version/s: 3.0 Priority: Trivial (was: Major) Affects Version/s: 2.4.1 Wrap SegmentInfos in public class -- Key: LUCENE-1742 URL: https://issues.apache.org/jira/browse/LUCENE-1742 Project: Lucene - Java Issue Type: Improvement Components: Index Affects Versions: 2.4.1 Reporter: Jason Rutherglen Priority: Trivial Fix For: 3.0 Original Estimate: 48h Remaining Estimate: 48h Wrap SegmentInfos in a public class so that subclasses of MergePolicy do not need to be in the org.apache.lucene.index package. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: Lucene contrib build failing on jdk 1.4
Hi Mark, Thanks for the explanation, it makes sense now : ) Adriano Crestani Campos On Mon, Jul 13, 2009 at 5:41 PM, Mark Miller markrmil...@gmail.com wrote: Hey Adriano, Only core is fully 1.4. To build all of contrib you must use at least java 1.5. If you want to build a contrib that is supposed to 1.4 with 1.4, use the individual build file for that contrib. -- - Mark http://www.lucidimagination.com On Mon, Jul 13, 2009 at 7:44 PM, Adriano Crestani adrianocrest...@apache.org wrote: Hi, I checked out Lucene java trunk and ran build-contrib. When I run using Sun JDK 1.6.0_07 it builds successfully, but using Sun JDK 1.4.2_19 I get the error below. Am I doing something wrong? Thanks in advance, Adriano Crestani Campos ... build-artifacts-and-tests: bdb: BUILD FAILED /home/adcampos/lucene/trunk2/build.xml:626: The following error occurred while executing this line: /home/adcampos/lucene/trunk2/build.xml:616: The following error occurred while executing this line: /home/adcampos/lucene/trunk2/contrib/db/build.xml:29: The following error occurred while executing this line: /home/adcampos/lucene/trunk2/contrib/db/bdb/build.xml:14: java.lang.UnsupportedClassVersionError: com/sleepycat/db/internal/Db (Unsupported major.minor version 49.0) Total time: 17 seconds
[jira] Commented: (LUCENE-1742) Wrap SegmentInfos in public class
[ https://issues.apache.org/jira/browse/LUCENE-1742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12730640#action_12730640 ] Jason Rutherglen commented on LUCENE-1742: -- In order for this class to be compatible with out current default LogMergePolicy, we'll need to expose readers from the IW reader pool. This is because presumably classes may need to access readers such as in LogMergePolicy.findMergesToExpungeDeletes. Wrap SegmentInfos in public class -- Key: LUCENE-1742 URL: https://issues.apache.org/jira/browse/LUCENE-1742 Project: Lucene - Java Issue Type: Improvement Components: Index Affects Versions: 2.4.1 Reporter: Jason Rutherglen Priority: Trivial Fix For: 3.0 Original Estimate: 48h Remaining Estimate: 48h Wrap SegmentInfos in a public class so that subclasses of MergePolicy do not need to be in the org.apache.lucene.index package. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1567) New flexible query parser
[ https://issues.apache.org/jira/browse/LUCENE-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12730641#action_12730641 ] Adriano Crestani commented on LUCENE-1567: -- {quote} Ah, I also couldn't run ant build-contrib using Java 1.4, it fails, I even tried a clean trunk and it did not work. Were you able to run it using 1.4 Luis? I already opened a thread on the ML about this: http://markmail.org/thread/3fyldf7t423fhwbm {quote} Mark Miller just replied to the thread and based on his response there is no need for contrib projects to be able to compile using JDK 1.4. So, Luis, could you rollback your changes you did on the build files? Thanks, Adriano Crestani Campos New flexible query parser - Key: LUCENE-1567 URL: https://issues.apache.org/jira/browse/LUCENE-1567 Project: Lucene - Java Issue Type: New Feature Components: QueryParser Environment: N/A Reporter: Luis Alves Assignee: Grant Ingersoll Attachments: lucene_1567_adriano_crestani_07_13_2009.patch, lucene_trunk_FlexQueryParser_2009July09_v4.patch, lucene_trunk_FlexQueryParser_2009July10_v5.patch, lucene_trunk_FlexQueryParser_2009March24.patch, lucene_trunk_FlexQueryParser_2009March26_v3.patch, new_query_parser_src.tar, QueryParser_restructure_meetup_june2009_v2.pdf From New flexible query parser thread by Micheal Busch in my team at IBM we have used a different query parser than Lucene's in our products for quite a while. Recently we spent a significant amount of time in refactoring the code and designing a very generic architecture, so that this query parser can be easily used for different products with varying query syntaxes. This work was originally driven by Andreas Neumann (who, however, left our team); most of the code was written by Luis Alves, who has been a bit active in Lucene in the past, and Adriano Campos, who joined our team at IBM half a year ago. Adriano is Apache committer and PMC member on the Tuscany project and getting familiar with Lucene now too. We think this code is much more flexible and extensible than the current Lucene query parser, and would therefore like to contribute it to Lucene. I'd like to give a very brief architecture overview here, Adriano and Luis can then answer more detailed questions as they're much more familiar with the code than I am. The goal was it to separate syntax and semantics of a query. E.g. 'a AND b', '+a +b', 'AND(a,b)' could be different syntaxes for the same query. We distinguish the semantics of the different query components, e.g. whether and how to tokenize/lemmatize/normalize the different terms or which Query objects to create for the terms. We wanted to be able to write a parser with a new syntax, while reusing the underlying semantics, as quickly as possible. In fact, Adriano is currently working on a 100% Lucene-syntax compatible implementation to make it easy for people who are using Lucene's query parser to switch. The query parser has three layers and its core is what we call the QueryNodeTree. It is a tree that initially represents the syntax of the original query, e.g. for 'a AND b': AND / \ A B The three layers are: 1. QueryParser 2. QueryNodeProcessor 3. QueryBuilder 1. The upper layer is the parsing layer which simply transforms the query text string into a QueryNodeTree. Currently our implementations of this layer use javacc. 2. The query node processors do most of the work. It is in fact a configurable chain of processors. Each processors can walk the tree and modify nodes or even the tree's structure. That makes it possible to e.g. do query optimization before the query is executed or to tokenize terms. 3. The third layer is also a configurable chain of builders, which transform the QueryNodeTree into Lucene Query objects. Furthermore the query parser uses flexible configuration objects, which are based on AttributeSource/Attribute. It also uses message classes that allow to attach resource bundles. This makes it possible to translate messages, which is an important feature of a query parser. This design allows us to develop different query syntaxes very quickly. Adriano wrote the Lucene-compatible syntax in a matter of hours, and the underlying processors and builders in a few days. We now have a 100% compatible Lucene query parser, which means the syntax is identical and all query parser test cases pass on the new one too using a wrapper. Recent posts show that there is demand for query syntax improvements, e.g improved range query syntax or operator precedence. There are already different QP implementations in Lucene+contrib, however I think we did not keep them all up to date and in sync. This is not too surprising, because usually when fixes and changes are
[jira] Updated: (LUCENE-1742) Wrap SegmentInfos in public class
[ https://issues.apache.org/jira/browse/LUCENE-1742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Rutherglen updated LUCENE-1742: - Attachment: LUCENE-1742.patch After looking at it, I wasn't sure why we couldn't simply make the read only methods in SegmentInfo and SegmentInfos (and the classes) public. Maybe this can make it into 2.9? Wrap SegmentInfos in public class -- Key: LUCENE-1742 URL: https://issues.apache.org/jira/browse/LUCENE-1742 Project: Lucene - Java Issue Type: Improvement Components: Index Affects Versions: 2.4.1 Reporter: Jason Rutherglen Priority: Trivial Fix For: 3.0 Attachments: LUCENE-1742.patch Original Estimate: 48h Remaining Estimate: 48h Wrap SegmentInfos in a public class so that subclasses of MergePolicy do not need to be in the org.apache.lucene.index package. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1567) New flexible query parser
[ https://issues.apache.org/jira/browse/LUCENE-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12730653#action_12730653 ] Mark Miller commented on LUCENE-1567: - Hang on a sec - it sounds like the target was 1.4 because this was going to replace a 1.4 core piece of functionality. I don't know that all of the details are fully straightened out though. 1. I'm not pro moving the QueryParser to contrib myself, unless we actually move forward on that 'modules' thread - if not, it doesn't appear very helpful to me. 2. If we move this to contrib, perhaps it can be 1.5? But then in 3.0, can we have 1.5 already? Or is that 3.1? If its 3.1, than if we remove the deprecated query parser in 3.0, you won't have a java 1.4 replacement to move to (if course we could keep the old QueryParser till 4.0 ... ). I'm not clear that we can't add new functionality to 3.0 though. I know Mike has mentioned it, but I can't find where it says that - I just see that we can remove deprecations, not that we can't also add new features. I may be missing something though? We should get things fully straightened out before you spend too much time switching between 1.4 and 1.5 though. New flexible query parser - Key: LUCENE-1567 URL: https://issues.apache.org/jira/browse/LUCENE-1567 Project: Lucene - Java Issue Type: New Feature Components: QueryParser Environment: N/A Reporter: Luis Alves Assignee: Grant Ingersoll Attachments: lucene_1567_adriano_crestani_07_13_2009.patch, lucene_trunk_FlexQueryParser_2009July09_v4.patch, lucene_trunk_FlexQueryParser_2009July10_v5.patch, lucene_trunk_FlexQueryParser_2009March24.patch, lucene_trunk_FlexQueryParser_2009March26_v3.patch, new_query_parser_src.tar, QueryParser_restructure_meetup_june2009_v2.pdf From New flexible query parser thread by Micheal Busch in my team at IBM we have used a different query parser than Lucene's in our products for quite a while. Recently we spent a significant amount of time in refactoring the code and designing a very generic architecture, so that this query parser can be easily used for different products with varying query syntaxes. This work was originally driven by Andreas Neumann (who, however, left our team); most of the code was written by Luis Alves, who has been a bit active in Lucene in the past, and Adriano Campos, who joined our team at IBM half a year ago. Adriano is Apache committer and PMC member on the Tuscany project and getting familiar with Lucene now too. We think this code is much more flexible and extensible than the current Lucene query parser, and would therefore like to contribute it to Lucene. I'd like to give a very brief architecture overview here, Adriano and Luis can then answer more detailed questions as they're much more familiar with the code than I am. The goal was it to separate syntax and semantics of a query. E.g. 'a AND b', '+a +b', 'AND(a,b)' could be different syntaxes for the same query. We distinguish the semantics of the different query components, e.g. whether and how to tokenize/lemmatize/normalize the different terms or which Query objects to create for the terms. We wanted to be able to write a parser with a new syntax, while reusing the underlying semantics, as quickly as possible. In fact, Adriano is currently working on a 100% Lucene-syntax compatible implementation to make it easy for people who are using Lucene's query parser to switch. The query parser has three layers and its core is what we call the QueryNodeTree. It is a tree that initially represents the syntax of the original query, e.g. for 'a AND b': AND / \ A B The three layers are: 1. QueryParser 2. QueryNodeProcessor 3. QueryBuilder 1. The upper layer is the parsing layer which simply transforms the query text string into a QueryNodeTree. Currently our implementations of this layer use javacc. 2. The query node processors do most of the work. It is in fact a configurable chain of processors. Each processors can walk the tree and modify nodes or even the tree's structure. That makes it possible to e.g. do query optimization before the query is executed or to tokenize terms. 3. The third layer is also a configurable chain of builders, which transform the QueryNodeTree into Lucene Query objects. Furthermore the query parser uses flexible configuration objects, which are based on AttributeSource/Attribute. It also uses message classes that allow to attach resource bundles. This makes it possible to translate messages, which is an important feature of a query parser. This design allows us to develop different query syntaxes very quickly. Adriano wrote the Lucene-compatible syntax in a matter of hours, and the underlying processors and builders in a few
[jira] Commented: (LUCENE-1566) Large Lucene index can hit false OOM due to Sun JRE issue
[ https://issues.apache.org/jira/browse/LUCENE-1566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12730704#action_12730704 ] Thulasi Ram Naidu P commented on LUCENE-1566: - any temporary solution for this problem? Large Lucene index can hit false OOM due to Sun JRE issue - Key: LUCENE-1566 URL: https://issues.apache.org/jira/browse/LUCENE-1566 Project: Lucene - Java Issue Type: Bug Components: Index Affects Versions: 2.4.1 Reporter: Michael McCandless Assignee: Simon Willnauer Priority: Minor Fix For: 2.9 Attachments: LUCENE-1566.patch, LUCENE-1566.patch This is not a Lucene issue, but I want to open this so future google diggers can more easily find it. There's this nasty bug in Sun's JRE: http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6478546 The gist seems to be, if you try to read a large (eg 200 MB) number of bytes during a single RandomAccessFile.read call, you can incorrectly hit OOM. Lucene does this, with norms, since we read in one byte per doc per field with norms, as a contiguous array of length maxDoc(). The workaround was a custom patch to do large file reads as several smaller reads. Background here: http://www.nabble.com/problems-with-large-Lucene-index-td22347854.html -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org