RE: software grants
Hi Grant, I think it is pretty clear that when the code lives in the public somewhere else (i.e. source forge or Google code, etc.) it needs to go through a grant. That being said, I'm not particularly concerned about Trie, for the record. Trie was in Sourceforge's SVN as part of panFMP, so it lived in public before. The last revision was 342: http://panfmp.svn.sourceforge.net/viewvc/panfmp/main/trunk/src/de/pangaea/me tadataportal/search/TrieRangeQuery.java?revision=315 http://panfmp.svn.sourceforge.net/viewvc/panfmp/main/trunk/src/de/pangaea/m etadataportal/search/TrieRangeQuery.java?revision=315view=markuppathrev=34 2 view=markuppathrev=342 http://panfmp.svn.sourceforge.net/viewvc/panfmp/main/trunk/src/de/pangaea/me tadataportal/utils/TrieUtils.java?revision=308 http://panfmp.svn.sourceforge.net/viewvc/panfmp/main/trunk/src/de/pangaea/m etadataportal/utils/TrieUtils.java?revision=308view=markuppathrev=342 view=markuppathrev=342 The first version in Lucenes contrib was a modified version of the above SVN revision (see LUCENE-1470). After that it was deleted from panFMP's SVN and the new and further optimized Lucene version was used for this project. If you like, we can fill out a software grant to be sure (if it is still possible to do this after the code transfer). I am the only person that must sign the grant on my side. I can do a checkout of these two files, tar and md5 them. Uwe
Re: broken links when building web-site
Yes, I've seen those too and have always wrote them off as Forrest errors. I could never track down anything actually wrong on the site, so I ignored it. The broken-links.xml file has been checked in for a good long time, I believe. On Jul 7, 2009, at 3:00 PM, Uwe Schindler wrote: I tried to build the docs inside trunk and also the docs in the site (https://svn.apache.org/repos/asf/lucene/java/site), which both fail to build. The error is the same here (Win XP), except, that it says, that it cannot find the images (which are indeed not available). The last time I generated the site docs for revision 784758, after that Grant applied LUCENE-1706. Maybe he missed to commit some new images for the lucidimagination powered search. But from the change in broken-links.xml, I see, that Grant must have seen the same error, but ignored it. The docs seem to be correct, so I think this error is not fatal. Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Michael McCandless [mailto:luc...@mikemccandless.com] Sent: Tuesday, July 07, 2009 8:25 PM To: java-dev@lucene.apache.org Subject: broken links when building web-site I'm trying to regen the web site docs (w/ forrest), for LUCENE-1522, but I'm hitting a BUILD FAILED at the end, I think because of these broken links: X [0] images/instruction_arrow.pngBROKEN: /lucene/h2.1522/src/site/src/documentation/content/xdocs/ images.instructio n_arrow.png (No such file or directory) X [0] skin/images/current.gif BROKEN: /tango/offload/usr/local/src/apache-forrest-0.8/main/webapp/. (Is a directory) X [0] skin/images/chapter.gif BROKEN: /tango/offload/usr/local/src/apache-forrest-0.8/main/webapp/. (Is a directory) X [0] skin/images/page.gif BROKEN: /tango/offload/usr/local/src/apache-forrest-0.8/main/webapp/. (Is a directory) Does anyone else see this? Mike - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: software grants
On Tue, Jul 7, 2009 at 10:27 PM, Grant Ingersollgsing...@apache.org wrote: I think it is pretty clear that when the code lives in the public somewhere else (i.e. source forge or Google code, etc.) it needs to go through a grant. It's not clear to me... I think it's just another factor to consider. It also matters how big of a body of code it is, how many people developed it over how long, what licenses were used over it's development history, etc. Just because someone may make a patch or feature available on github first does not mean a software grant is automatically needed. -Yonik http://www.lucidimagination.com - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-1726) IndexWriter.readerPool create new segmentReader outside of sync block
[ https://issues.apache.org/jira/browse/LUCENE-1726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Rutherglen updated LUCENE-1726: - Attachment: LUCENE-1726.trunk.test.patch I tried the test on trunk and get the same error. They're all docstore related files so maybe extra doc stores are being opened? {code} [junit] MockRAMDirectory: cannot close: there are still open files: {_s4.fdt=2, _g2.fdx=2, _s4.fdx=2, _g2.tvf=2, _dw.fdx=2, _g2.tvd=2, _g2.tvx=2, _ks.tvf=2, _n9.tvx=2, _ks.tvx=2, _n9.fdx=2, _ks.fdx=2, _dw.cfx=1, _n9.tvf=2, _cp.cfx=1, _s4.tvf=2, _dw.tvx=2, _87.fdx=2, _fr.tvx=2, _87.tvf=2, _fr.tvd=2, _87.fdt=2, _ks.tvd=2, _s4.tvd=2, _dw.tvd=2, _n9.fdt=2, _g2.fdt=2, _87.tvd=2, _fr.fdt=2, _dw.fdt=2, _dj.cfx=1, _s4.tvx=2, _ks.fdt=2, _n9.tvd=2, _fr.tvf=2, _fr.fdx=2, _dw.tvf=2, _87.tvx=2} [junit] java.lang.RuntimeException: MockRAMDirectory: cannot close: there are still open files: {_s4.fdt=2, _g2.fdx=2, _s4.fdx=2, _g2.tvf=2, _dw.fdx=2, _g2.tvd=2, _g2.tvx=2, _ks.tvf=2, _n9.tvx=2, _ks.tvx=2, _n9.fdx=2, _ks.fdx=2, _dw.cfx=1, _n9.tvf=2, _cp.cfx=1, _s4.tvf=2, _dw.tvx=2, _87.fdx=2, _fr.tvx=2, _87.tvf=2, _fr.tvd=2, _87.fdt=2, _ks.tvd=2, _s4.tvd=2, _dw.tvd=2, _n9.fdt=2, _g2.fdt=2, _87.tvd=2, _fr.fdt=2, _dw.fdt=2, _dj.cfx=1, _s4.tvx=2, _ks.fdt=2, _n9.tvd=2, _fr.tvf=2, _fr.fdx=2, _dw.tvf=2, _87.tvx=2} [junit] at org.apache.lucene.store.MockRAMDirectory.close(MockRAMDirectory.java:278) [junit] at org.apache.lucene.index.Test1726.testIndexing(Test1726.java:48) [junit] at org.apache.lucene.util.LuceneTestCase.runTest(LuceneTestCase.java:88) {code} IndexWriter.readerPool create new segmentReader outside of sync block - Key: LUCENE-1726 URL: https://issues.apache.org/jira/browse/LUCENE-1726 Project: Lucene - Java Issue Type: Improvement Components: Index Affects Versions: 2.4.1 Reporter: Jason Rutherglen Assignee: Michael McCandless Priority: Trivial Fix For: 3.1 Attachments: LUCENE-1726.patch, LUCENE-1726.patch, LUCENE-1726.patch, LUCENE-1726.patch, LUCENE-1726.trunk.test.patch Original Estimate: 48h Remaining Estimate: 48h I think we will want to do something like what field cache does with CreationPlaceholder for IndexWriter.readerPool. Otherwise we have the (I think somewhat problematic) issue of all other readerPool.get* methods waiting for an SR to warm. It would be good to implement this for 2.9. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Issue Comment Edited: (LUCENE-1726) IndexWriter.readerPool create new segmentReader outside of sync block
[ https://issues.apache.org/jira/browse/LUCENE-1726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12728787#action_12728787 ] Jason Rutherglen edited comment on LUCENE-1726 at 7/8/09 9:47 AM: -- I tried the test on trunk and get the same error. They're all docstore related files so maybe extra doc stores are being opened? {code} [junit] MockRAMDirectory: cannot close: there are still open files: {_s4.fdt=2, _g2.fdx=2, _s4.fdx=2, _g2.tvf=2, _dw.fdx=2, _g2.tvd=2, _g2.tvx=2, _ks.tvf=2, _n9.tvx=2, _ks.tvx=2, _n9.fdx=2, _ks.fdx=2, _dw.cfx=1, _n9.tvf=2, _cp.cfx=1, _s4.tvf=2, _dw.tvx=2, _87.fdx=2, _fr.tvx=2, _87.tvf=2, _fr.tvd=2, _87.fdt=2, _ks.tvd=2, _s4.tvd=2, _dw.tvd=2, _n9.fdt=2, _g2.fdt=2, _87.tvd=2, _fr.fdt=2, _dw.fdt=2, _dj.cfx=1, _s4.tvx=2, _ks.fdt=2, _n9.tvd=2, _fr.tvf=2, _fr.fdx=2, _dw.tvf=2, _87.tvx=2} [junit] java.lang.RuntimeException: MockRAMDirectory: cannot close: there are still open files: {_s4.fdt=2, _g2.fdx=2, _s4.fdx=2, _g2.tvf=2, _dw.fdx=2, _g2.tvd=2, _g2.tvx=2, _ks.tvf=2, _n9.tvx=2, _ks.tvx=2, _n9.fdx=2, _ks.fdx=2, _dw.cfx=1, _n9.tvf=2, _cp.cfx=1, _s4.tvf=2, _dw.tvx=2, _87.fdx=2, _fr.tvx=2, _87.tvf=2, _fr.tvd=2, _87.fdt=2, _ks.tvd=2, _s4.tvd=2, _dw.tvd=2, _n9.fdt=2, _g2.fdt=2, _87.tvd=2, _fr.fdt=2, _dw.fdt=2, _dj.cfx=1, _s4.tvx=2, _ks.fdt=2, _n9.tvd=2, _fr.tvf=2, _fr.fdx=2, _dw.tvf=2, _87.tvx=2} [junit] at org.apache.lucene.store.MockRAMDirectory.close(MockRAMDirectory.j ava:278) [junit]at org.apache.lucene.index.Test1726.testIndexing(Test1726.java:48) [junit] at org.apache.lucene.util.LuceneTestCase.runTest(LuceneTestCase.java :88) {code} was (Author: jasonrutherglen): I tried the test on trunk and get the same error. They're all docstore related files so maybe extra doc stores are being opened? {code} [junit] MockRAMDirectory: cannot close: there are still open files: {_s4.fdt=2, _g2.fdx=2, _s4.fdx=2, _g2.tvf=2, _dw.fdx=2, _g2.tvd=2, _g2.tvx=2, _ks.tvf=2, _n9.tvx=2, _ks.tvx=2, _n9.fdx=2, _ks.fdx=2, _dw.cfx=1, _n9.tvf=2, _cp.cfx=1, _s4.tvf=2, _dw.tvx=2, _87.fdx=2, _fr.tvx=2, _87.tvf=2, _fr.tvd=2, _87.fdt=2, _ks.tvd=2, _s4.tvd=2, _dw.tvd=2, _n9.fdt=2, _g2.fdt=2, _87.tvd=2, _fr.fdt=2, _dw.fdt=2, _dj.cfx=1, _s4.tvx=2, _ks.fdt=2, _n9.tvd=2, _fr.tvf=2, _fr.fdx=2, _dw.tvf=2, _87.tvx=2} [junit] java.lang.RuntimeException: MockRAMDirectory: cannot close: there are still open files: {_s4.fdt=2, _g2.fdx=2, _s4.fdx=2, _g2.tvf=2, _dw.fdx=2, _g2.tvd=2, _g2.tvx=2, _ks.tvf=2, _n9.tvx=2, _ks.tvx=2, _n9.fdx=2, _ks.fdx=2, _dw.cfx=1, _n9.tvf=2, _cp.cfx=1, _s4.tvf=2, _dw.tvx=2, _87.fdx=2, _fr.tvx=2, _87.tvf=2, _fr.tvd=2, _87.fdt=2, _ks.tvd=2, _s4.tvd=2, _dw.tvd=2, _n9.fdt=2, _g2.fdt=2, _87.tvd=2, _fr.fdt=2, _dw.fdt=2, _dj.cfx=1, _s4.tvx=2, _ks.fdt=2, _n9.tvd=2, _fr.tvf=2, _fr.fdx=2, _dw.tvf=2, _87.tvx=2} [junit] at org.apache.lucene.store.MockRAMDirectory.close(MockRAMDirectory.java:278) [junit] at org.apache.lucene.index.Test1726.testIndexing(Test1726.java:48) [junit] at org.apache.lucene.util.LuceneTestCase.runTest(LuceneTestCase.java:88) {code} IndexWriter.readerPool create new segmentReader outside of sync block - Key: LUCENE-1726 URL: https://issues.apache.org/jira/browse/LUCENE-1726 Project: Lucene - Java Issue Type: Improvement Components: Index Affects Versions: 2.4.1 Reporter: Jason Rutherglen Assignee: Michael McCandless Priority: Trivial Fix For: 3.1 Attachments: LUCENE-1726.patch, LUCENE-1726.patch, LUCENE-1726.patch, LUCENE-1726.patch, LUCENE-1726.trunk.test.patch Original Estimate: 48h Remaining Estimate: 48h I think we will want to do something like what field cache does with CreationPlaceholder for IndexWriter.readerPool. Otherwise we have the (I think somewhat problematic) issue of all other readerPool.get* methods waiting for an SR to warm. It would be good to implement this for 2.9. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1693) AttributeSource/TokenStream API improvements
[ https://issues.apache.org/jira/browse/LUCENE-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12728823#action_12728823 ] Mark Miller commented on LUCENE-1693: - Mr. Busch my friend, I'll buy both you and Uwe *many* beers if you resolve this issue soon! AttributeSource/TokenStream API improvements Key: LUCENE-1693 URL: https://issues.apache.org/jira/browse/LUCENE-1693 Project: Lucene - Java Issue Type: Improvement Components: Analysis Reporter: Michael Busch Assignee: Michael Busch Priority: Minor Fix For: 2.9 Attachments: LUCENE-1693.patch, LUCENE-1693.patch, LUCENE-1693.patch, LUCENE-1693.patch, LUCENE-1693.patch, LUCENE-1693.patch, lucene-1693.patch, TestCompatibility.java, TestCompatibility.java, TestCompatibility.java, TestCompatibility.java This patch makes the following improvements to AttributeSource and TokenStream/Filter: - removes the set/getUseNewAPI() methods (including the standard ones). Instead by default incrementToken() throws a subclass of UnsupportedOperationException. The indexer tries to call incrementToken() initially once to see if the exception is thrown; if so, it falls back to the old API. - introduces interfaces for all Attributes. The corresponding implementations have the postfix 'Impl', e.g. TermAttribute and TermAttributeImpl. AttributeSource now has a factory for creating the Attribute instances; the default implementation looks for implementing classes with the postfix 'Impl'. Token now implements all 6 TokenAttribute interfaces. - new method added to AttributeSource: addAttributeImpl(AttributeImpl). Using reflection it walks up in the class hierarchy of the passed in object and finds all interfaces that the class or superclasses implement and that extend the Attribute interface. It then adds the interface-instance mappings to the attribute map for each of the found interfaces. - AttributeImpl now has a default implementation of toString that uses reflection to print out the values of the attributes in a default formatting. This makes it a bit easier to implement AttributeImpl, because toString() was declared abstract before. - Cloning is now done much more efficiently in captureState. The method figures out which unique AttributeImpl instances are contained as values in the attributes map, because those are the ones that need to be cloned. It creates a single linked list that supports deep cloning (in the inner class AttributeSource.State). AttributeSource keeps track of when this state changes, i.e. whenever new attributes are added to the AttributeSource. Only in that case will captureState recompute the state, otherwise it will simply clone the precomputed state and return the clone. restoreState(AttributeSource.State) walks the linked list and uses the copyTo() method of AttributeImpl to copy all values over into the attribute that the source stream (e.g. SinkTokenizer) uses. The cloning performance can be greatly improved if not multiple AttributeImpl instances are used in one TokenStream. A user can e.g. simply add a Token instance to the stream instead of the individual attributes. Or the user could implement a subclass of AttributeImpl that implements exactly the Attribute interfaces needed. I think this should be considered an expert API (addAttributeImpl), as this manual optimization is only needed if cloning performance is crucial. I ran some quick performance tests using Tee/Sink tokenizers (which do cloning) and the performance was roughly 20% faster with the new API. I'll run some more performance tests and post more numbers then. Note also that when we add serialization to the Attributes, e.g. for supporting storing serialized TokenStreams in the index, then the serialization should benefit even significantly more from the new API than cloning. Also, the TokenStream API does not change, except for the removal of the set/getUseNewAPI methods. So the patches in LUCENE-1460 should still work. All core tests pass, however, I need to update all the documentation and also add some unit tests for the new AttributeSource functionality. So this patch is not ready to commit yet, but I wanted to post it already for some feedback. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1726) IndexWriter.readerPool create new segmentReader outside of sync block
[ https://issues.apache.org/jira/browse/LUCENE-1726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12728824#action_12728824 ] Michael McCandless commented on LUCENE-1726: Hmm... I'll dig into this test case. IndexWriter.readerPool create new segmentReader outside of sync block - Key: LUCENE-1726 URL: https://issues.apache.org/jira/browse/LUCENE-1726 Project: Lucene - Java Issue Type: Improvement Components: Index Affects Versions: 2.4.1 Reporter: Jason Rutherglen Assignee: Michael McCandless Priority: Trivial Fix For: 3.1 Attachments: LUCENE-1726.patch, LUCENE-1726.patch, LUCENE-1726.patch, LUCENE-1726.patch, LUCENE-1726.trunk.test.patch Original Estimate: 48h Remaining Estimate: 48h I think we will want to do something like what field cache does with CreationPlaceholder for IndexWriter.readerPool. Otherwise we have the (I think somewhat problematic) issue of all other readerPool.get* methods waiting for an SR to warm. It would be good to implement this for 2.9. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1726) IndexWriter.readerPool create new segmentReader outside of sync block
[ https://issues.apache.org/jira/browse/LUCENE-1726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12728833#action_12728833 ] Jason Rutherglen commented on LUCENE-1726: -- Mike, I was wondering if you can recommend techniques or tools for debugging this type of multithreading issue? (i.e. how do you go about figuring this type of issue out?) IndexWriter.readerPool create new segmentReader outside of sync block - Key: LUCENE-1726 URL: https://issues.apache.org/jira/browse/LUCENE-1726 Project: Lucene - Java Issue Type: Improvement Components: Index Affects Versions: 2.4.1 Reporter: Jason Rutherglen Assignee: Michael McCandless Priority: Trivial Fix For: 3.1 Attachments: LUCENE-1726.patch, LUCENE-1726.patch, LUCENE-1726.patch, LUCENE-1726.patch, LUCENE-1726.trunk.test.patch Original Estimate: 48h Remaining Estimate: 48h I think we will want to do something like what field cache does with CreationPlaceholder for IndexWriter.readerPool. Otherwise we have the (I think somewhat problematic) issue of all other readerPool.get* methods waiting for an SR to warm. It would be good to implement this for 2.9. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1726) IndexWriter.readerPool create new segmentReader outside of sync block
[ https://issues.apache.org/jira/browse/LUCENE-1726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12728853#action_12728853 ] Michael McCandless commented on LUCENE-1726: I don't have any particular tools... First I simplify the test as much as possible while still hitting the failure (eg this failure happens w/ only 2 threads), then see if the error will happen if I turn on IndexWriter's infoStream (it doesn't for this, so far). If so, I scrutinize the series of events to find the hazard; else, I turn off infoStream and add back in a small number of prints, as long as failure still happens. Often I use a simple Python script that runs the test over over until a failure happens, saving the log, and then scrutinize that. It's good to start with a rough guess, eg this failure is w/ only doc stores so it seems likely the merging logic that opens doc stores just before kicking off the merge may be to blame. IndexWriter.readerPool create new segmentReader outside of sync block - Key: LUCENE-1726 URL: https://issues.apache.org/jira/browse/LUCENE-1726 Project: Lucene - Java Issue Type: Improvement Components: Index Affects Versions: 2.4.1 Reporter: Jason Rutherglen Assignee: Michael McCandless Priority: Trivial Fix For: 3.1 Attachments: LUCENE-1726.patch, LUCENE-1726.patch, LUCENE-1726.patch, LUCENE-1726.patch, LUCENE-1726.trunk.test.patch Original Estimate: 48h Remaining Estimate: 48h I think we will want to do something like what field cache does with CreationPlaceholder for IndexWriter.readerPool. Otherwise we have the (I think somewhat problematic) issue of all other readerPool.get* methods waiting for an SR to warm. It would be good to implement this for 2.9. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1693) AttributeSource/TokenStream API improvements
[ https://issues.apache.org/jira/browse/LUCENE-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12728879#action_12728879 ] Michael Busch commented on LUCENE-1693: --- Alright, I hope you are coming to Oakland in November! I had a few (literally) sleepless nights last week to meet some internal deadlines; but it looks like I'll now have time to work on Lucene, so I'll continue on this issue tonight! AttributeSource/TokenStream API improvements Key: LUCENE-1693 URL: https://issues.apache.org/jira/browse/LUCENE-1693 Project: Lucene - Java Issue Type: Improvement Components: Analysis Reporter: Michael Busch Assignee: Michael Busch Priority: Minor Fix For: 2.9 Attachments: LUCENE-1693.patch, LUCENE-1693.patch, LUCENE-1693.patch, LUCENE-1693.patch, LUCENE-1693.patch, LUCENE-1693.patch, lucene-1693.patch, TestCompatibility.java, TestCompatibility.java, TestCompatibility.java, TestCompatibility.java This patch makes the following improvements to AttributeSource and TokenStream/Filter: - removes the set/getUseNewAPI() methods (including the standard ones). Instead by default incrementToken() throws a subclass of UnsupportedOperationException. The indexer tries to call incrementToken() initially once to see if the exception is thrown; if so, it falls back to the old API. - introduces interfaces for all Attributes. The corresponding implementations have the postfix 'Impl', e.g. TermAttribute and TermAttributeImpl. AttributeSource now has a factory for creating the Attribute instances; the default implementation looks for implementing classes with the postfix 'Impl'. Token now implements all 6 TokenAttribute interfaces. - new method added to AttributeSource: addAttributeImpl(AttributeImpl). Using reflection it walks up in the class hierarchy of the passed in object and finds all interfaces that the class or superclasses implement and that extend the Attribute interface. It then adds the interface-instance mappings to the attribute map for each of the found interfaces. - AttributeImpl now has a default implementation of toString that uses reflection to print out the values of the attributes in a default formatting. This makes it a bit easier to implement AttributeImpl, because toString() was declared abstract before. - Cloning is now done much more efficiently in captureState. The method figures out which unique AttributeImpl instances are contained as values in the attributes map, because those are the ones that need to be cloned. It creates a single linked list that supports deep cloning (in the inner class AttributeSource.State). AttributeSource keeps track of when this state changes, i.e. whenever new attributes are added to the AttributeSource. Only in that case will captureState recompute the state, otherwise it will simply clone the precomputed state and return the clone. restoreState(AttributeSource.State) walks the linked list and uses the copyTo() method of AttributeImpl to copy all values over into the attribute that the source stream (e.g. SinkTokenizer) uses. The cloning performance can be greatly improved if not multiple AttributeImpl instances are used in one TokenStream. A user can e.g. simply add a Token instance to the stream instead of the individual attributes. Or the user could implement a subclass of AttributeImpl that implements exactly the Attribute interfaces needed. I think this should be considered an expert API (addAttributeImpl), as this manual optimization is only needed if cloning performance is crucial. I ran some quick performance tests using Tee/Sink tokenizers (which do cloning) and the performance was roughly 20% faster with the new API. I'll run some more performance tests and post more numbers then. Note also that when we add serialization to the Attributes, e.g. for supporting storing serialized TokenStreams in the index, then the serialization should benefit even significantly more from the new API than cloning. Also, the TokenStream API does not change, except for the removal of the set/getUseNewAPI methods. So the patches in LUCENE-1460 should still work. All core tests pass, however, I need to update all the documentation and also add some unit tests for the new AttributeSource functionality. So this patch is not ready to commit yet, but I wanted to post it already for some feedback. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail:
[jira] Updated: (LUCENE-1726) IndexWriter.readerPool create new segmentReader outside of sync block
[ https://issues.apache.org/jira/browse/LUCENE-1726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-1726: --- Attachment: LUCENE-1726.patch OK the problem happens when a segment is first opened by a merge that doesn't need to merge the doc stores; later, an NRT reader is opened that separately opens the doc stores of the same [pooled] SegmentReader, but then it's the merge that closes the read-only clone of the reader. In this case the separately opened (by the NRT reader) doc stores are not closed by the merge thread. It's the mirror image of LUCENE-1639. I've fixed it by pulling all shared readers in a SegmentReader into a separate static class (CoreReaders). Cloned SegmentReaders share the same instance of this class so that if a clone later opens the doc stores, any prior ancestor (that the clone was created from) would also close those readers if it's the reader to decRef to 0. I did something similar for LUCENE-1609 (which I'll now hit conflicts on after committing this... sigh). I plan to commit in a day or so. IndexWriter.readerPool create new segmentReader outside of sync block - Key: LUCENE-1726 URL: https://issues.apache.org/jira/browse/LUCENE-1726 Project: Lucene - Java Issue Type: Improvement Components: Index Affects Versions: 2.4.1 Reporter: Jason Rutherglen Assignee: Michael McCandless Priority: Trivial Fix For: 3.1 Attachments: LUCENE-1726.patch, LUCENE-1726.patch, LUCENE-1726.patch, LUCENE-1726.patch, LUCENE-1726.patch, LUCENE-1726.trunk.test.patch Original Estimate: 48h Remaining Estimate: 48h I think we will want to do something like what field cache does with CreationPlaceholder for IndexWriter.readerPool. Otherwise we have the (I think somewhat problematic) issue of all other readerPool.get* methods waiting for an SR to warm. It would be good to implement this for 2.9. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1726) IndexWriter.readerPool create new segmentReader outside of sync block
[ https://issues.apache.org/jira/browse/LUCENE-1726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12728909#action_12728909 ] Jason Rutherglen commented on LUCENE-1726: -- The test now passes, needs to go in the patch, perhaps in TestIndexWriterReader? Great work on this, it's easier to understand SegmentReader now that all the shared objects are in one object (CoreReaders). It should make debugging go more smoothly. Is there a reason we're not synchronizing on SR.core in openDocStores? Couldn't we synchronize on core for the cloning methods? IndexWriter.readerPool create new segmentReader outside of sync block - Key: LUCENE-1726 URL: https://issues.apache.org/jira/browse/LUCENE-1726 Project: Lucene - Java Issue Type: Improvement Components: Index Affects Versions: 2.4.1 Reporter: Jason Rutherglen Assignee: Michael McCandless Priority: Trivial Fix For: 3.1 Attachments: LUCENE-1726.patch, LUCENE-1726.patch, LUCENE-1726.patch, LUCENE-1726.patch, LUCENE-1726.patch, LUCENE-1726.trunk.test.patch Original Estimate: 48h Remaining Estimate: 48h I think we will want to do something like what field cache does with CreationPlaceholder for IndexWriter.readerPool. Otherwise we have the (I think somewhat problematic) issue of all other readerPool.get* methods waiting for an SR to warm. It would be good to implement this for 2.9. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: A Comparison of Open Source Search Engines
Interesting, I never realized there was lucene-java-...@apache.org . My thoughts are on http://www.jroller.com/otis/entry/open_source_search_engine_benchmark (and in several comments in the blog itself). Otis - Original Message From: Sean Owen sro...@gmail.com To: lucene-java-...@apache.org Sent: Monday, July 6, 2009 11:06:14 AM Subject: A Comparison of Open Source Search Engines http://zooie.wordpress.com/2009/07/06/a-comparison-of-open-source-search-engines-and-indexing-twitter/ I imagine many of you already saw this -- Lucene does pretty well in this shootout. The only area it tended to lag, it seems, is memory usage and speed in some cases. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1726) IndexWriter.readerPool create new segmentReader outside of sync block
[ https://issues.apache.org/jira/browse/LUCENE-1726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12728938#action_12728938 ] Michael McCandless commented on LUCENE-1726: bq. Is there a reason we're not synchronizing on SR.core in openDocStores? I was going to say because IW sychronizes but in fact it doesn't, properly, because when merging we go and open doc stores in unsynchronized context. So I'll synchronize(core) in SR.openDocStores. bq. Couldn't we synchronize on core for the cloning methods? I don't think that's needed? The core is simply carried over to the newly cloned reader. IndexWriter.readerPool create new segmentReader outside of sync block - Key: LUCENE-1726 URL: https://issues.apache.org/jira/browse/LUCENE-1726 Project: Lucene - Java Issue Type: Improvement Components: Index Affects Versions: 2.4.1 Reporter: Jason Rutherglen Assignee: Michael McCandless Priority: Trivial Fix For: 3.1 Attachments: LUCENE-1726.patch, LUCENE-1726.patch, LUCENE-1726.patch, LUCENE-1726.patch, LUCENE-1726.patch, LUCENE-1726.trunk.test.patch Original Estimate: 48h Remaining Estimate: 48h I think we will want to do something like what field cache does with CreationPlaceholder for IndexWriter.readerPool. Otherwise we have the (I think somewhat problematic) issue of all other readerPool.get* methods waiting for an SR to warm. It would be good to implement this for 2.9. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: A Comparison of Open Source Search Engines
On Mon, Jul 6, 2009 at 6:01 PM, Earwin Burrfoot ear...@gmail.com wrote: Anybody knows other interesting open-source search engines? http://hounder.org
[jira] Commented: (LUCENE-1706) Site search powered by Lucene/Solr
[ https://issues.apache.org/jira/browse/LUCENE-1706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12729007#action_12729007 ] Grant Ingersoll commented on LUCENE-1706: - Checking... Site search powered by Lucene/Solr -- Key: LUCENE-1706 URL: https://issues.apache.org/jira/browse/LUCENE-1706 Project: Lucene - Java Issue Type: New Feature Reporter: Grant Ingersoll Assignee: Grant Ingersoll Priority: Minor Fix For: 2.9 Attachments: LUCENE-1706.patch, LUCENE-1706.patch For a number of years now, the Lucene community has been criticized for not eating our own dog food when it comes to search. My company has built and hosts a site search (http://www.lucidimagination.com/search) that is powered by Apache Solr and Lucene and we'd like to donate it's use to the Lucene community. Additionally, it allows one to search all of the Lucene content from a single place, including web, wiki, JIRA and mail archives. See also http://www.lucidimagination.com/search/document/bf22a570bf9385c7/search_on_lucene_apache_org You can see it live on Mahout, Tika and Solr Lucid has a fault tolerant setup with replication and fail over as well as monitoring services in place. We are committed to maintaining and expanding the search capabilities on the site. The following patch adds a skin to the Forrest site that enables the Lucene site to search Lucene only content using Lucene/Solr. When a search is submitted, it automatically selects the Lucene facet such that only Lucene content is searched. From there, users can then narrow/broaden their search criteria. I plan on committing in a 3 or 4 days. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1726) IndexWriter.readerPool create new segmentReader outside of sync block
[ https://issues.apache.org/jira/browse/LUCENE-1726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12729024#action_12729024 ] Jason Rutherglen commented on LUCENE-1726: -- {quote}I don't think that's needed? The core is simply carried over to the newly cloned reader.{quote} Right however wouldn't it be somewhat cleaner to sync on core for all clone operations given we don't want those to occur (external to IW) at the same time? Ultimately we want core to be the controller of it's resources rather than the SR being cloned? I ran the test with the SRMapValue sync code, (4 threads) with the sync on SR.core in openDocStore for 10 minutes, 2 core Windows XML laptop Java 6.14 and no errors. Then same with 2 threads for 5 minutes and no errors. I'll keep on running it to see if we can get an error. I'm still a little confused as to why we're going to see the bug if readerPool.get is syncing on the SRMapValue. I guess there's a slight possibility of the error, and perhaps a more randomized test would produce it. IndexWriter.readerPool create new segmentReader outside of sync block - Key: LUCENE-1726 URL: https://issues.apache.org/jira/browse/LUCENE-1726 Project: Lucene - Java Issue Type: Improvement Components: Index Affects Versions: 2.4.1 Reporter: Jason Rutherglen Assignee: Michael McCandless Priority: Trivial Fix For: 3.1 Attachments: LUCENE-1726.patch, LUCENE-1726.patch, LUCENE-1726.patch, LUCENE-1726.patch, LUCENE-1726.patch, LUCENE-1726.trunk.test.patch Original Estimate: 48h Remaining Estimate: 48h I think we will want to do something like what field cache does with CreationPlaceholder for IndexWriter.readerPool. Otherwise we have the (I think somewhat problematic) issue of all other readerPool.get* methods waiting for an SR to warm. It would be good to implement this for 2.9. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1731) Allow ConstantScoreQuery to use custom rewrite method if using for highlighting
[ https://issues.apache.org/jira/browse/LUCENE-1731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12729034#action_12729034 ] Mark Miller commented on LUCENE-1731: - Hey Ashley, This was added to the SpanScorer Scorer for the Highlighter a while back as part of resolving that Solr issue. Hopefully I will have to time to make it the default by 2.9's release, but its there as an option now if you use the SpanScorer. The issue was: LUCENE-1425 - Add ConstantScore highlighting support to SpanScorer Allow ConstantScoreQuery to use custom rewrite method if using for highlighting --- Key: LUCENE-1731 URL: https://issues.apache.org/jira/browse/LUCENE-1731 Project: Lucene - Java Issue Type: Improvement Components: contrib/highlighter Affects Versions: 2.4, 2.4.1 Reporter: Ashley Sole Priority: Minor I'd like to submit a patch for ConstantScoreQuery which simply contains a setter method to state whether it is being used for highlighting or not. If it is being used for highlighting, then the rewrite method can take each of the terms in the filter and create a BooleanQuery to return (if the number of terms in the filter are less than 1024), otherwise it simply uses the old rewrite method. This allows you to highlight upto 1024 terms when using a ConstantScoreQuery, which since it is a filter, will currently not be highlighted. The idea for this came from Mark Millers article Bringing the Highlighter back to Wildcard Queries in Solr 1.4, I would just like to make it available in core lucene -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org