[jira] [Commented] (SOLR-3049) UpdateRequestProcessorChain for UIMA : runtimeParameters: not all types supported
[ https://issues.apache.org/jira/browse/SOLR-3049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13195719#comment-13195719 ] Tommaso Teofili commented on SOLR-3049: --- Good catch, if you could provide that patch I will take care of review and commit it if that is ok. UpdateRequestProcessorChain for UIMA : runtimeParameters: not all types supported - Key: SOLR-3049 URL: https://issues.apache.org/jira/browse/SOLR-3049 Project: Solr Issue Type: Bug Components: update Reporter: Harsh P Priority: Minor Labels: uima, update_request_handler solrconfig.xml file has an option to override certain UIMA runtime parameters in the UpdateRequestProcessorChain section. There are certain UIMA annotators like RegexAnnotator which define runtimeParameters value as an Array which is not currently supported in the Solr-UIMA interface. In java/org/apache/solr/uima/processor/ae/OverridingParamsAEProvider.java, private Object getRuntimeValue(AnalysisEngineDescription desc, String attributeName) function defines override for UIMA analysis engine runtimeParameters as they are passed to UIMA Analysis Engine. runtimeParameters which are currently supported in the Solr-UIMA interface are: String Integer Boolean Float I have made a hack to fix this issue to add Array support. I would like to submit that as a patch if no one else is working on fixing this issue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Solr-3.x - Build # 584 - Failure
Build: https://builds.apache.org/job/Solr-3.x/584/ 1 tests failed. REGRESSION: org.apache.solr.search.TestSort.testRandomFieldNameSorts Error Message: Over 0.2% oddities in test: 12/5900 have func/query parsing semenatics gotten broader? Stack Trace: junit.framework.AssertionFailedError: Over 0.2% oddities in test: 12/5900 have func/query parsing semenatics gotten broader? at org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:147) at org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:50) at org.apache.solr.search.TestSort.__CLR2_6_3ug45jo16xn(TestSort.java:140) at org.apache.solr.search.TestSort.testRandomFieldNameSorts(TestSort.java:62) at org.apache.lucene.util.LuceneTestCase$2$1.evaluate(LuceneTestCase.java:432) Build Log (for compile errors): [...truncated 30216 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
RE: Welcome Tommaso Teofili as Lucene/Solr committer
Hi Tommaso, Changing the HTML on the Apache Server is not enough, you should change the XML files and then run forrest (have fun!) locally on your computer and then svn export the files to the people.apache.org (the cronjob is as far as I know not running anymore): http://wiki.apache.org/lucene-java/HowToUpdateTheWebsite http://wiki.apache.org/lucene-java/HowToUpdateTheWebsite The same applies to Solr, where the webpage is in the trunk/3.x Solr checkout directly. The root webpage for both projects currently only contains the PMC, no change needed. Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de/ http://www.thetaphi.de eMail: u...@thetaphi.de From: Tommaso Teofili [mailto:tommaso.teof...@gmail.com] Sent: Sunday, January 29, 2012 6:05 AM To: dev@lucene.apache.org Subject: Re: Welcome Tommaso Teofili as Lucene/Solr committer Thank you all guys for this warm welcome. Cheers, Tommaso p.s. I did update http://lucene.apache.org/java/docs/whoweare.html but I didn't have the permission to change the /lucene/cms/trunk/content/whoweare.mdtext for the (yet unreleased) CMS based Lucene website 2012/1/28 Martijn v Groningen martijn.v.gronin...@gmail.com Welcome! On 27 January 2012 05:49, Shai Erera ser...@gmail.com wrote: Welcome ! Shai On Fri, Jan 27, 2012 at 12:42 AM, Michael McCandless luc...@mikemccandless.com wrote: Welcome Tommaso! -- Met vriendelijke groet, Martijn van Groningen
[JENKINS] Lucene-Solr-tests-only-trunk-java7 - Build # 1641 - Still Failing
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk-java7/1641/ 1 tests failed. FAILED: org.apache.solr.cloud.FullSolrCloudTest.testDistribSearch Error Message: http://localhost:24750/solr/collection1 Stack Trace: org.apache.solr.client.solrj.SolrServerException: http://localhost:24750/solr/collection1 at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:481) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:251) at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:104) at org.apache.solr.cloud.FullSolrCloudTest.index_specific(FullSolrCloudTest.java:493) at org.apache.solr.cloud.FullSolrCloudTest.brindDownShardIndexSomeDocsAndRecover(FullSolrCloudTest.java:720) at org.apache.solr.cloud.FullSolrCloudTest.doTest(FullSolrCloudTest.java:545) at org.apache.solr.BaseDistributedSearchTestCase.testDistribSearch(BaseDistributedSearchTestCase.java:663) at org.apache.lucene.util.LuceneTestCase$3$1.evaluate(LuceneTestCase.java:529) at org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:165) at org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:57) Caused by: java.net.SocketTimeoutException: Read timed out at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(SocketInputStream.java:150) at java.net.SocketInputStream.read(SocketInputStream.java:121) at java.io.BufferedInputStream.fill(BufferedInputStream.java:235) at java.io.BufferedInputStream.read(BufferedInputStream.java:254) at org.apache.commons.httpclient.HttpParser.readRawLine(HttpParser.java:78) at org.apache.commons.httpclient.HttpParser.readLine(HttpParser.java:106) at org.apache.commons.httpclient.HttpConnection.readLine(HttpConnection.java:1116) at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$HttpConnectionAdapter.readLine(MultiThreadedHttpConnectionManager.java:1413) at org.apache.commons.httpclient.HttpMethodBase.readStatusLine(HttpMethodBase.java:1973) at org.apache.commons.httpclient.HttpMethodBase.readResponse(HttpMethodBase.java:1735) at org.apache.commons.httpclient.HttpMethodBase.execute(HttpMethodBase.java:1098) at org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:398) at org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:171) at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397) at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:425) Build Log (for compile errors): [...truncated 10232 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Welcome Tommaso Teofili as Lucene/Solr committer
Hi Uwe, I already followed the instructions on that wiki page, changing the XML file, running 'forrest run/site' and then committing both XML and generated HTML files (see [1][2]). Just I didn't know the cronjob was not running anymore, however I can see the page published as if the cronjob ran regularly. Maybe I missed something, if that's the case please let me know. Tommaso p.s.: I just realized I didn't commit the updated whoweare.pdf, will do that shortly. [1] : http://svn.apache.org/repos/asf/lucene/java/site/src/documentation/content/xdocs/whoweare.xml [2] : http://svn.apache.org/repos/asf/lucene/java/site/docs/whoweare.html 2012/1/29 Uwe Schindler u...@thetaphi.de Hi Tommaso, ** ** Changing the HTML on the Apache Server is not enough, you should change the XML files and then run forrest (have fun!) locally on your computer and then svn export the files to the people.apache.org (the cronjob is as far as I know not running anymore): ** ** http://wiki.apache.org/lucene-java/HowToUpdateTheWebsite ** ** The same applies to Solr, where the webpage is in the trunk/3.x Solr checkout directly. The root webpage for both projects currently only contains the PMC, no change needed. ** ** Uwe ** ** - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de ** ** *From:* Tommaso Teofili [mailto:tommaso.teof...@gmail.com] *Sent:* Sunday, January 29, 2012 6:05 AM *To:* dev@lucene.apache.org *Subject:* Re: Welcome Tommaso Teofili as Lucene/Solr committer ** ** Thank you all guys for this warm welcome. Cheers, Tommaso ** ** p.s. I did update http://lucene.apache.org/java/docs/whoweare.html but I didn't have the permission to change the /lucene/cms/trunk/content/whoweare.mdtext for the (yet unreleased) CMS based Lucene website ** ** 2012/1/28 Martijn v Groningen martijn.v.gronin...@gmail.com Welcome! ** ** On 27 January 2012 05:49, Shai Erera ser...@gmail.com wrote: Welcome ! Shai ** ** On Fri, Jan 27, 2012 at 12:42 AM, Michael McCandless luc...@mikemccandless.com wrote: Welcome Tommaso! ** ** ** ** -- Met vriendelijke groet, Martijn van Groningen ** **
RE: Welcome Tommaso Teofili as Lucene/Solr committer
Hi, Ok, then the cronjob seems to work again! I just missed your commits somehow, for me it looked like you did the change directly on apache's servers. The PDF is also committed since a few minutes, thanks. You should also add yourself to Solr, that's in the source checkout, same procedure. In that case you should do the change on trunk and merge also back to 3.x branch. Solr is unfortunately having the web page in its versioned source checkout, so you have to keep all in sync. About the new CMS: I have no idea about the status, maybe it's currently not yet made available to all committers. Grant? Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de/ http://www.thetaphi.de eMail: u...@thetaphi.de From: Tommaso Teofili [mailto:tommaso.teof...@gmail.com] Sent: Sunday, January 29, 2012 11:17 AM To: dev@lucene.apache.org Subject: Re: Welcome Tommaso Teofili as Lucene/Solr committer Hi Uwe, I already followed the instructions on that wiki page, changing the XML file, running 'forrest run/site' and then committing both XML and generated HTML files (see [1][2]). Just I didn't know the cronjob was not running anymore, however I can see the page published as if the cronjob ran regularly. Maybe I missed something, if that's the case please let me know. Tommaso p.s.: I just realized I didn't commit the updated whoweare.pdf, will do that shortly. [1] : http://svn.apache.org/repos/asf/lucene/java/site/src/documentation/content/x docs/whoweare.xml [2] : http://svn.apache.org/repos/asf/lucene/java/site/docs/whoweare.html 2012/1/29 Uwe Schindler u...@thetaphi.de Hi Tommaso, Changing the HTML on the Apache Server is not enough, you should change the XML files and then run forrest (have fun!) locally on your computer and then svn export the files to the people.apache.org (the cronjob is as far as I know not running anymore): http://wiki.apache.org/lucene-java/HowToUpdateTheWebsite http://wiki.apache.org/lucene-java/HowToUpdateTheWebsite The same applies to Solr, where the webpage is in the trunk/3.x Solr checkout directly. The root webpage for both projects currently only contains the PMC, no change needed. Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de http://www.thetaphi.de/ eMail: u...@thetaphi.de From: Tommaso Teofili [mailto:tommaso.teof...@gmail.com] Sent: Sunday, January 29, 2012 6:05 AM To: dev@lucene.apache.org Subject: Re: Welcome Tommaso Teofili as Lucene/Solr committer Thank you all guys for this warm welcome. Cheers, Tommaso p.s. I did update http://lucene.apache.org/java/docs/whoweare.html but I didn't have the permission to change the /lucene/cms/trunk/content/whoweare.mdtext for the (yet unreleased) CMS based Lucene website 2012/1/28 Martijn v Groningen martijn.v.gronin...@gmail.com Welcome! On 27 January 2012 05:49, Shai Erera ser...@gmail.com wrote: Welcome ! Shai On Fri, Jan 27, 2012 at 12:42 AM, Michael McCandless luc...@mikemccandless.com wrote: Welcome Tommaso! -- Met vriendelijke groet, Martijn van Groningen
[JENKINS] Lucene-Solr-tests-only-trunk - Build # 12296 - Still Failing
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk/12296/ 1 tests failed. FAILED: org.apache.solr.cloud.FullSolrCloudTest.testDistribSearch Error Message: http://localhost:20690/solr/collection1 Stack Trace: org.apache.solr.client.solrj.SolrServerException: http://localhost:20690/solr/collection1 at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:481) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:251) at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:104) at org.apache.solr.cloud.FullSolrCloudTest.index_specific(FullSolrCloudTest.java:493) at org.apache.solr.cloud.FullSolrCloudTest.brindDownShardIndexSomeDocsAndRecover(FullSolrCloudTest.java:720) at org.apache.solr.cloud.FullSolrCloudTest.doTest(FullSolrCloudTest.java:545) at org.apache.solr.BaseDistributedSearchTestCase.testDistribSearch(BaseDistributedSearchTestCase.java:663) at org.apache.lucene.util.LuceneTestCase$3$1.evaluate(LuceneTestCase.java:529) at org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:165) at org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:57) Caused by: java.net.SocketTimeoutException: Read timed out at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(SocketInputStream.java:146) at java.io.BufferedInputStream.fill(BufferedInputStream.java:235) at java.io.BufferedInputStream.read(BufferedInputStream.java:254) at org.apache.commons.httpclient.HttpParser.readRawLine(HttpParser.java:78) at org.apache.commons.httpclient.HttpParser.readLine(HttpParser.java:106) at org.apache.commons.httpclient.HttpConnection.readLine(HttpConnection.java:1116) at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$HttpConnectionAdapter.readLine(MultiThreadedHttpConnectionManager.java:1413) at org.apache.commons.httpclient.HttpMethodBase.readStatusLine(HttpMethodBase.java:1973) at org.apache.commons.httpclient.HttpMethodBase.readResponse(HttpMethodBase.java:1735) at org.apache.commons.httpclient.HttpMethodBase.execute(HttpMethodBase.java:1098) at org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:398) at org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:171) at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397) at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:425) Build Log (for compile errors): [...truncated 8721 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Solr-trunk - Build # 1748 - Failure
Build: https://builds.apache.org/job/Solr-trunk/1748/ 1 tests failed. FAILED: org.apache.solr.cloud.FullSolrCloudTest.testDistribSearch Error Message: http://localhost:33651/solr/collection1 Stack Trace: org.apache.solr.client.solrj.SolrServerException: http://localhost:33651/solr/collection1 at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:481) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:251) at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:104) at org.apache.solr.cloud.FullSolrCloudTest.index_specific(FullSolrCloudTest.java:493) at org.apache.solr.cloud.FullSolrCloudTest.brindDownShardIndexSomeDocsAndRecover(FullSolrCloudTest.java:720) at org.apache.solr.cloud.FullSolrCloudTest.doTest(FullSolrCloudTest.java:545) at org.apache.solr.BaseDistributedSearchTestCase.testDistribSearch(BaseDistributedSearchTestCase.java:663) at org.apache.lucene.util.LuceneTestCase$3$1.evaluate(LuceneTestCase.java:529) at org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:165) at org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:57) Caused by: java.net.SocketTimeoutException: Read timed out at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(SocketInputStream.java:146) at java.io.BufferedInputStream.fill(BufferedInputStream.java:235) at java.io.BufferedInputStream.read(BufferedInputStream.java:254) at org.apache.commons.httpclient.HttpParser.readRawLine(HttpParser.java:78) at org.apache.commons.httpclient.HttpParser.readLine(HttpParser.java:106) at org.apache.commons.httpclient.HttpConnection.readLine(HttpConnection.java:1116) at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$HttpConnectionAdapter.readLine(MultiThreadedHttpConnectionManager.java:1413) at org.apache.commons.httpclient.HttpMethodBase.readStatusLine(HttpMethodBase.java:1973) at org.apache.commons.httpclient.HttpMethodBase.readResponse(HttpMethodBase.java:1735) at org.apache.commons.httpclient.HttpMethodBase.execute(HttpMethodBase.java:1098) at org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:398) at org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:171) at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397) at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:425) Build Log (for compile errors): [...truncated 9929 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Welcome Tommaso Teofili as Lucene/Solr committer
On Sun, Jan 29, 2012 at 11:26 AM, Uwe Schindler u...@thetaphi.de wrote: Hi, Ok, then the cronjob seems to work again! I just missed your commits somehow, for me it looked like you did the change directly on apache’s servers. The PDF is also committed since a few minutes, thanks. You should also add yourself to Solr, that’s in the source checkout, same procedure. In that case you should do the change on trunk and merge also back to 3.x branch. Solr is unfortunately having the web page in its versioned source checkout, so you have to keep all in sync. About the new CMS: I have no idea about the status, maybe it’s currently not yet made available to all committers. Grant? /lucene/cms was pmc only in the auth file. I opened that path for all lucene/solr committers. Tommaso you should be able to change that patch now. simon Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de From: Tommaso Teofili [mailto:tommaso.teof...@gmail.com] Sent: Sunday, January 29, 2012 11:17 AM To: dev@lucene.apache.org Subject: Re: Welcome Tommaso Teofili as Lucene/Solr committer Hi Uwe, I already followed the instructions on that wiki page, changing the XML file, running 'forrest run/site' and then committing both XML and generated HTML files (see [1][2]). Just I didn't know the cronjob was not running anymore, however I can see the page published as if the cronjob ran regularly. Maybe I missed something, if that's the case please let me know. Tommaso p.s.: I just realized I didn't commit the updated whoweare.pdf, will do that shortly. [1] : http://svn.apache.org/repos/asf/lucene/java/site/src/documentation/content/xdocs/whoweare.xml [2] : http://svn.apache.org/repos/asf/lucene/java/site/docs/whoweare.html 2012/1/29 Uwe Schindler u...@thetaphi.de Hi Tommaso, Changing the HTML on the Apache Server is not enough, you should change the XML files and then run forrest (have fun!) locally on your computer and then svn export the files to the people.apache.org (the cronjob is as far as I know not running anymore): http://wiki.apache.org/lucene-java/HowToUpdateTheWebsite The same applies to Solr, where the webpage is in the trunk/3.x Solr checkout directly. The root webpage for both projects currently only contains the PMC, no change needed. Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de From: Tommaso Teofili [mailto:tommaso.teof...@gmail.com] Sent: Sunday, January 29, 2012 6:05 AM To: dev@lucene.apache.org Subject: Re: Welcome Tommaso Teofili as Lucene/Solr committer Thank you all guys for this warm welcome. Cheers, Tommaso p.s. I did update http://lucene.apache.org/java/docs/whoweare.html but I didn't have the permission to change the /lucene/cms/trunk/content/whoweare.mdtext for the (yet unreleased) CMS based Lucene website 2012/1/28 Martijn v Groningen martijn.v.gronin...@gmail.com Welcome! On 27 January 2012 05:49, Shai Erera ser...@gmail.com wrote: Welcome ! Shai On Fri, Jan 27, 2012 at 12:42 AM, Michael McCandless luc...@mikemccandless.com wrote: Welcome Tommaso! -- Met vriendelijke groet, Martijn van Groningen - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Welcome Tommaso Teofili as Lucene/Solr committer
Uwe, Simon, thanks, I'll add myself to Solr and CMS websites as well. Tommaso 2012/1/29 Simon Willnauer simon.willna...@googlemail.com On Sun, Jan 29, 2012 at 11:26 AM, Uwe Schindler u...@thetaphi.de wrote: Hi, Ok, then the cronjob seems to work again! I just missed your commits somehow, for me it looked like you did the change directly on apache’s servers. The PDF is also committed since a few minutes, thanks. You should also add yourself to Solr, that’s in the source checkout, same procedure. In that case you should do the change on trunk and merge also back to 3.x branch. Solr is unfortunately having the web page in its versioned source checkout, so you have to keep all in sync. About the new CMS: I have no idea about the status, maybe it’s currently not yet made available to all committers. Grant? /lucene/cms was pmc only in the auth file. I opened that path for all lucene/solr committers. Tommaso you should be able to change that patch now. simon Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de From: Tommaso Teofili [mailto:tommaso.teof...@gmail.com] Sent: Sunday, January 29, 2012 11:17 AM To: dev@lucene.apache.org Subject: Re: Welcome Tommaso Teofili as Lucene/Solr committer Hi Uwe, I already followed the instructions on that wiki page, changing the XML file, running 'forrest run/site' and then committing both XML and generated HTML files (see [1][2]). Just I didn't know the cronjob was not running anymore, however I can see the page published as if the cronjob ran regularly. Maybe I missed something, if that's the case please let me know. Tommaso p.s.: I just realized I didn't commit the updated whoweare.pdf, will do that shortly. [1] : http://svn.apache.org/repos/asf/lucene/java/site/src/documentation/content/xdocs/whoweare.xml [2] : http://svn.apache.org/repos/asf/lucene/java/site/docs/whoweare.html 2012/1/29 Uwe Schindler u...@thetaphi.de Hi Tommaso, Changing the HTML on the Apache Server is not enough, you should change the XML files and then run forrest (have fun!) locally on your computer and then svn export the files to the people.apache.org (the cronjob is as far as I know not running anymore): http://wiki.apache.org/lucene-java/HowToUpdateTheWebsite The same applies to Solr, where the webpage is in the trunk/3.x Solr checkout directly. The root webpage for both projects currently only contains the PMC, no change needed. Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de From: Tommaso Teofili [mailto:tommaso.teof...@gmail.com] Sent: Sunday, January 29, 2012 6:05 AM To: dev@lucene.apache.org Subject: Re: Welcome Tommaso Teofili as Lucene/Solr committer Thank you all guys for this warm welcome. Cheers, Tommaso p.s. I did update http://lucene.apache.org/java/docs/whoweare.html but I didn't have the permission to change the /lucene/cms/trunk/content/whoweare.mdtext for the (yet unreleased) CMS based Lucene website 2012/1/28 Martijn v Groningen martijn.v.gronin...@gmail.com Welcome! On 27 January 2012 05:49, Shai Erera ser...@gmail.com wrote: Welcome ! Shai On Fri, Jan 27, 2012 at 12:42 AM, Michael McCandless luc...@mikemccandless.com wrote: Welcome Tommaso! -- Met vriendelijke groet, Martijn van Groningen - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2358) Distributing Indexing
[ https://issues.apache.org/jira/browse/SOLR-2358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13195732#comment-13195732 ] Robert Muir commented on SOLR-2358: --- {quote} I can't currently get into the hudson machine - used the wrong username the other day and seemed to get ip banned pretty much right away. Looking into getting that undone. {quote} Yeah thats probably the best way to move forward. Otherwise you have to wait like an hour just to see if one tweak to a single test worked. {quote} Which tricks? This could be part of it by the sound of things. {quote} It depends on what the test is doing, but just a few ideas: * any client operations in tests should have a low connect()timeout/so_timeout. if you always set this then it will never hang for long periods of time. * if you absolutely need to test the case where you don't get a timeout but another exception, use an ipv6 test address (eg [ff01::114]). because jenkins has no ipv6, it fails fast always. this won't work forever... * in a situation where you have A talking to B, and you want to test a condition where B goes down, instead of just bringing B down, instead you can consider mocking up a remote node to test failures. bring up a mock downed server (e.g. just a ServerSocket on that same port with reuseAddress=true). this one can return whatever error you want, or just disconnect, and even assert that A tried to connect to it. maybe instead of using real remote jettys at all, most tests could even be totally implemented this way: it would be faster and simpler than spinning up so many jettys in all the tests. Distributing Indexing - Key: SOLR-2358 URL: https://issues.apache.org/jira/browse/SOLR-2358 Project: Solr Issue Type: New Feature Components: SolrCloud, update Reporter: William Mayor Priority: Minor Fix For: 4.0 Attachments: 2shard4server.jpg, SOLR-2358.patch, SOLR-2358.patch, apache-solr-noggit-r1211150.jar, zookeeper-3.3.4.jar The indexing side of SolrCloud - the goal of this issue is to provide durable, fault tolerant indexing to an elastic cluster of Solr instances. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3661) move deletes under codec
[ https://issues.apache.org/jira/browse/LUCENE-3661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13195733#comment-13195733 ] Robert Muir commented on LUCENE-3661: - Merging this to trunk now... We can use LUCENE-3613 issue for any remaining splitting of 4.x/3.x codec impls (stored fields, deletes). move deletes under codec Key: LUCENE-3661 URL: https://issues.apache.org/jira/browse/LUCENE-3661 Project: Lucene - Java Issue Type: Task Affects Versions: 4.0 Reporter: Robert Muir Attachments: LUCENE-3661.patch After LUCENE-3631, this should be easier I think. I haven't looked at it much myself but i'll play around a bit, but at a glance: * SegmentReader to have Bits liveDocs instead of BitVector * address the TODO in the IW-using ctors so that SegmentReader doesn't take a parent but just an existing core. * we need some type of minimal MutableBits or similar subinterface of bits. BitVector and maybe Fixed/OpenBitSet could implement it * BitVector becomes an impl detail and moves to codec (maybe we have a shared base class and split the 3.x/4.x up rather than the conditional backwards) * I think the invertAll should not be used by IndexWriter, instead we define the codec interface to say give me a new MutableBits, by default all are set ? * redundant internally-consistent checks in checkLiveCounts should be done in the codec impl instead of in SegmentReader. * plain text impl in SimpleText. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-3661) move deletes under codec
[ https://issues.apache.org/jira/browse/LUCENE-3661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir resolved LUCENE-3661. - Resolution: Fixed Fix Version/s: 4.0 move deletes under codec Key: LUCENE-3661 URL: https://issues.apache.org/jira/browse/LUCENE-3661 Project: Lucene - Java Issue Type: Task Affects Versions: 4.0 Reporter: Robert Muir Fix For: 4.0 Attachments: LUCENE-3661.patch After LUCENE-3631, this should be easier I think. I haven't looked at it much myself but i'll play around a bit, but at a glance: * SegmentReader to have Bits liveDocs instead of BitVector * address the TODO in the IW-using ctors so that SegmentReader doesn't take a parent but just an existing core. * we need some type of minimal MutableBits or similar subinterface of bits. BitVector and maybe Fixed/OpenBitSet could implement it * BitVector becomes an impl detail and moves to codec (maybe we have a shared base class and split the 3.x/4.x up rather than the conditional backwards) * I think the invertAll should not be used by IndexWriter, instead we define the codec interface to say give me a new MutableBits, by default all are set ? * redundant internally-consistent checks in checkLiveCounts should be done in the codec impl instead of in SegmentReader. * plain text impl in SimpleText. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-3728) better handling of files inside/outside CFS by codec
better handling of files inside/outside CFS by codec Key: LUCENE-3728 URL: https://issues.apache.org/jira/browse/LUCENE-3728 Project: Lucene - Java Issue Type: Improvement Reporter: Robert Muir Since norms and deletes were moved under Codec (LUCENE-3606, LUCENE-3661), we never really properly addressed the issue of how Codec.files() should work, considering these files are always stored outside of CFS. LUCENE-3606 added a hack, LUCENE-3661 cleaned up the hack a little bit more, but its still a hack. Currently the logic in SegmentInfo.files() is: {code} clearCache() if (compoundFile) { // don't call Codec.files(), hardcoded CFS extensions, etc } else { Codec.files() } // always add files stored outside CFS regardless of CFS setting Codec.separateFiles() if (sharedDocStores) { // hardcoded shared doc store extensions, etc } {code} Also various codec methods take a Directory parameter, but its inconsistent what this Directory is in the case of CFS: for some parts of the index its the CFS directory, for others (deletes, separate norms) its not. I wonder if instead we could restructure this so that SegmentInfo.files() logic is: {code} clearCache() Codec.files() {code} and so that Codec is instead responsible. instead Codec.files logic by default would do the if (compoundFile) thing, and Lucene3x codec itself would only have the if (sharedDocStores) thing, and any part of the codec that wants to put stuff always outside of CFS (e.g. Lucene3x separate norms, deletes) could just use SegmentInfo.dir. Directory parameters in the case of CFS would always consistently be the CFSDirectory. I haven't totally tested if this will work but there is definitely some cleanups we can do either way, and I think it would be a good step to try to clean this up and simplify it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-tests-only-trunk - Build # 12297 - Still Failing
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk/12297/ 1 tests failed. FAILED: org.apache.solr.cloud.FullSolrCloudTest.testDistribSearch Error Message: http://localhost:17075/solr/collection1 Stack Trace: org.apache.solr.client.solrj.SolrServerException: http://localhost:17075/solr/collection1 at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:481) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:251) at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:104) at org.apache.solr.cloud.FullSolrCloudTest.index_specific(FullSolrCloudTest.java:493) at org.apache.solr.cloud.FullSolrCloudTest.brindDownShardIndexSomeDocsAndRecover(FullSolrCloudTest.java:720) at org.apache.solr.cloud.FullSolrCloudTest.doTest(FullSolrCloudTest.java:545) at org.apache.solr.BaseDistributedSearchTestCase.testDistribSearch(BaseDistributedSearchTestCase.java:663) at org.apache.lucene.util.LuceneTestCase$3$1.evaluate(LuceneTestCase.java:529) at org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:165) at org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:57) Caused by: java.net.SocketTimeoutException: Read timed out at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(SocketInputStream.java:146) at java.io.BufferedInputStream.fill(BufferedInputStream.java:235) at java.io.BufferedInputStream.read(BufferedInputStream.java:254) at org.apache.commons.httpclient.HttpParser.readRawLine(HttpParser.java:78) at org.apache.commons.httpclient.HttpParser.readLine(HttpParser.java:106) at org.apache.commons.httpclient.HttpConnection.readLine(HttpConnection.java:1116) at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$HttpConnectionAdapter.readLine(MultiThreadedHttpConnectionManager.java:1413) at org.apache.commons.httpclient.HttpMethodBase.readStatusLine(HttpMethodBase.java:1973) at org.apache.commons.httpclient.HttpMethodBase.readResponse(HttpMethodBase.java:1735) at org.apache.commons.httpclient.HttpMethodBase.execute(HttpMethodBase.java:1098) at org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:398) at org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:171) at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397) at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:425) Build Log (for compile errors): [...truncated 8169 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3728) better handling of files inside/outside CFS by codec
[ https://issues.apache.org/jira/browse/LUCENE-3728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13195741#comment-13195741 ] Robert Muir commented on LUCENE-3728: - I'm gonna slowly iterate on cleaning this up on the branch for lucene-3661 (branches/lucene3661: i recreated it), in case anyone wants to jump in and help or test out some ideas. better handling of files inside/outside CFS by codec Key: LUCENE-3728 URL: https://issues.apache.org/jira/browse/LUCENE-3728 Project: Lucene - Java Issue Type: Improvement Reporter: Robert Muir Since norms and deletes were moved under Codec (LUCENE-3606, LUCENE-3661), we never really properly addressed the issue of how Codec.files() should work, considering these files are always stored outside of CFS. LUCENE-3606 added a hack, LUCENE-3661 cleaned up the hack a little bit more, but its still a hack. Currently the logic in SegmentInfo.files() is: {code} clearCache() if (compoundFile) { // don't call Codec.files(), hardcoded CFS extensions, etc } else { Codec.files() } // always add files stored outside CFS regardless of CFS setting Codec.separateFiles() if (sharedDocStores) { // hardcoded shared doc store extensions, etc } {code} Also various codec methods take a Directory parameter, but its inconsistent what this Directory is in the case of CFS: for some parts of the index its the CFS directory, for others (deletes, separate norms) its not. I wonder if instead we could restructure this so that SegmentInfo.files() logic is: {code} clearCache() Codec.files() {code} and so that Codec is instead responsible. instead Codec.files logic by default would do the if (compoundFile) thing, and Lucene3x codec itself would only have the if (sharedDocStores) thing, and any part of the codec that wants to put stuff always outside of CFS (e.g. Lucene3x separate norms, deletes) could just use SegmentInfo.dir. Directory parameters in the case of CFS would always consistently be the CFSDirectory. I haven't totally tested if this will work but there is definitely some cleanups we can do either way, and I think it would be a good step to try to clean this up and simplify it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-tests-only-trunk-java7 - Build # 1642 - Still Failing
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk-java7/1642/ 1 tests failed. FAILED: org.apache.solr.cloud.FullSolrCloudTest.testDistribSearch Error Message: http://localhost:31200/solr/collection1 Stack Trace: org.apache.solr.client.solrj.SolrServerException: http://localhost:31200/solr/collection1 at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:481) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:251) at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:104) at org.apache.solr.cloud.FullSolrCloudTest.index_specific(FullSolrCloudTest.java:493) at org.apache.solr.cloud.FullSolrCloudTest.brindDownShardIndexSomeDocsAndRecover(FullSolrCloudTest.java:720) at org.apache.solr.cloud.FullSolrCloudTest.doTest(FullSolrCloudTest.java:545) at org.apache.solr.BaseDistributedSearchTestCase.testDistribSearch(BaseDistributedSearchTestCase.java:663) at org.apache.lucene.util.LuceneTestCase$3$1.evaluate(LuceneTestCase.java:529) at org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:165) at org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:57) Caused by: java.net.SocketTimeoutException: Read timed out at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(SocketInputStream.java:150) at java.net.SocketInputStream.read(SocketInputStream.java:121) at java.io.BufferedInputStream.fill(BufferedInputStream.java:235) at java.io.BufferedInputStream.read(BufferedInputStream.java:254) at org.apache.commons.httpclient.HttpParser.readRawLine(HttpParser.java:78) at org.apache.commons.httpclient.HttpParser.readLine(HttpParser.java:106) at org.apache.commons.httpclient.HttpConnection.readLine(HttpConnection.java:1116) at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$HttpConnectionAdapter.readLine(MultiThreadedHttpConnectionManager.java:1413) at org.apache.commons.httpclient.HttpMethodBase.readStatusLine(HttpMethodBase.java:1973) at org.apache.commons.httpclient.HttpMethodBase.readResponse(HttpMethodBase.java:1735) at org.apache.commons.httpclient.HttpMethodBase.execute(HttpMethodBase.java:1098) at org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:398) at org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:171) at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397) at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:425) Build Log (for compile errors): [...truncated 10465 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-tests-only-trunk - Build # 12298 - Still Failing
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk/12298/ 1 tests failed. FAILED: org.apache.solr.cloud.FullSolrCloudTest.testDistribSearch Error Message: http://localhost:19819/solr/collection1 Stack Trace: org.apache.solr.client.solrj.SolrServerException: http://localhost:19819/solr/collection1 at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:481) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:251) at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:104) at org.apache.solr.cloud.FullSolrCloudTest.index_specific(FullSolrCloudTest.java:493) at org.apache.solr.cloud.FullSolrCloudTest.brindDownShardIndexSomeDocsAndRecover(FullSolrCloudTest.java:720) at org.apache.solr.cloud.FullSolrCloudTest.doTest(FullSolrCloudTest.java:545) at org.apache.solr.BaseDistributedSearchTestCase.testDistribSearch(BaseDistributedSearchTestCase.java:663) at org.apache.lucene.util.LuceneTestCase$3$1.evaluate(LuceneTestCase.java:529) at org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:165) at org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:57) Caused by: java.net.SocketTimeoutException: Read timed out at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(SocketInputStream.java:146) at java.io.BufferedInputStream.fill(BufferedInputStream.java:235) at java.io.BufferedInputStream.read(BufferedInputStream.java:254) at org.apache.commons.httpclient.HttpParser.readRawLine(HttpParser.java:78) at org.apache.commons.httpclient.HttpParser.readLine(HttpParser.java:106) at org.apache.commons.httpclient.HttpConnection.readLine(HttpConnection.java:1116) at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$HttpConnectionAdapter.readLine(MultiThreadedHttpConnectionManager.java:1413) at org.apache.commons.httpclient.HttpMethodBase.readStatusLine(HttpMethodBase.java:1973) at org.apache.commons.httpclient.HttpMethodBase.readResponse(HttpMethodBase.java:1735) at org.apache.commons.httpclient.HttpMethodBase.execute(HttpMethodBase.java:1098) at org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:398) at org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:171) at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397) at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:425) Build Log (for compile errors): [...truncated 8165 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2858) Separate SegmentReaders (and other atomic readers) from composite IndexReaders
[ https://issues.apache.org/jira/browse/LUCENE-2858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13195757#comment-13195757 ] Uwe Schindler commented on LUCENE-2858: --- I now fixed the branch's test-framework and all remaining TODOs about the API. Now the horrible stupid slave-work to port all test starts. I assume the API is now fixed, as nobody complained after one week. Separate SegmentReaders (and other atomic readers) from composite IndexReaders -- Key: LUCENE-2858 URL: https://issues.apache.org/jira/browse/LUCENE-2858 Project: Lucene - Java Issue Type: Task Reporter: Uwe Schindler Assignee: Uwe Schindler Priority: Blocker Fix For: 4.0 With current trunk, whenever you open an IndexReader on a directory you get back a DirectoryReader which is a composite reader. The interface of IndexReader has now lots of methods that simply throw UOE (in fact more than 50% of all methods that are commonly used ones are unuseable now). This confuses users and makes the API hard to understand. This issue should split atomic readers from reader collections with a separate API. After that, you are no longer able, to get TermsEnum without wrapping from those composite readers. We currently have helper classes for wrapping (SlowMultiReaderWrapper - please rename, the name is really ugly; or Multi*), those should be retrofitted to implement the correct classes (SlowMultiReaderWrapper would be an atomic reader but takes a composite reader as ctor param, maybe it could also simply take a ListAtomicReader). In my opinion, maybe composite readers could implement some collection APIs and also have the ReaderUtil method directly built in (possibly as a view in the util.Collection sense). In general composite readers do not really need to look like the previous IndexReaders, they could simply be a collection of SegmentReaders with some functionality like reopen. On the other side, atomic readers do not need reopen logic anymore? When a segment changes, you need a new atomic reader? - maybe because of deletions thats not the best idea, but we should investigate. Maybe make the whole reopen logic simplier to use (ast least on the collection reader level). We should decide about good names, i have no preference at the moment. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-3716) Discussion topic: Move all Commit/VersionReopen stuff from abstract IR to DirectoryReader
[ https://issues.apache.org/jira/browse/LUCENE-3716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler resolved LUCENE-3716. --- Resolution: Not A Problem Nobody complained after one week, I will proceed with the branch and close this sub-task. Discussion topic: Move all Commit/VersionReopen stuff from abstract IR to DirectoryReader -- Key: LUCENE-3716 URL: https://issues.apache.org/jira/browse/LUCENE-3716 Project: Lucene - Java Issue Type: Sub-task Affects Versions: 4.0 Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: 4.0 When implementing the parent issue, I noticed a lot of other stuff in IndexReader thats only implemented in DirectoryReader/SegmentReader and is not really related to IndexReader at all: - getVersion (maybe also isCurrent) only affects DirectoryReaders, because of the commit-stuff there is no easy way for e.g. MultiReader to implement this - reopen/openIfChanged cannot be implemented easily by most AtomicIndexReaders, but also CompositeIndexReader is the wrong place to define those methods - all methods returning/opening IndexCommits In the parant issue, I already let IndexReader.open() return DirectoryReader and I made this class public. We should move the whole stuff (including IR.open) to DirectoryReader. Reopening outside DirectoryReader is not really needed. If some people think, it should maybe stay abstract (affects only the reopen/version stuff), there are ways for other readers to implement it, but for sure its not specific to IR's in general. In that case I would decalre an interface that DirectoryReader implements. Code like SearcherManager/Solr could then instanceof the IR instance and find out if it's worth reopening/version checking). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2358) Distributing Indexing
[ https://issues.apache.org/jira/browse/SOLR-2358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13195759#comment-13195759 ] Mark Miller commented on SOLR-2358: --- These tests really need to be done with real jetty instances (at least some of them). I'll try adding some timeouts where we are not currently using them (generally they are used from any test code but not always in non test code). Distributing Indexing - Key: SOLR-2358 URL: https://issues.apache.org/jira/browse/SOLR-2358 Project: Solr Issue Type: New Feature Components: SolrCloud, update Reporter: William Mayor Priority: Minor Fix For: 4.0 Attachments: 2shard4server.jpg, SOLR-2358.patch, SOLR-2358.patch, apache-solr-noggit-r1211150.jar, zookeeper-3.3.4.jar The indexing side of SolrCloud - the goal of this issue is to provide durable, fault tolerant indexing to an elastic cluster of Solr instances. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-3072) FAILED: org.apache.solr.cloud.FullSolrCloudTest.testDistribSearch
FAILED: org.apache.solr.cloud.FullSolrCloudTest.testDistribSearch -- Key: SOLR-3072 URL: https://issues.apache.org/jira/browse/SOLR-3072 Project: Solr Issue Type: Test Components: SolrCloud Reporter: Mark Miller Assignee: Mark Miller Priority: Minor Fix For: 4.0 Another test that seems to fail often on jenkins but not other systems. We take down a jetty instance and then try to query a still up jetty index with the load balancing solr client (within the solrcloud client). We get an so read timeout on the request. I saw this once in early dev - I couldn't figure out what to blame other than http client - the other server was down, and somehow that was affecting the request to the server that was still up. I tried not sharing the httpclient instance between requests and making a new one each time and it started working - I reverted that and it was still working though - and worked since. Seems to fail consistently on jenkins though. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [JENKINS] Lucene-Solr-tests-only-trunk-java7 - Build # 1642 - Still Failing
I've created SOLR-3072 to track this. Read time out on an available, up and running jetty instance on jenkins. On Jan 29, 2012, at 7:30 AM, Apache Jenkins Server wrote: Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk-java7/1642/ 1 tests failed. FAILED: org.apache.solr.cloud.FullSolrCloudTest.testDistribSearch Error Message: http://localhost:31200/solr/collection1 Stack Trace: org.apache.solr.client.solrj.SolrServerException: http://localhost:31200/solr/collection1 at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:481) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:251) at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:104) at org.apache.solr.cloud.FullSolrCloudTest.index_specific(FullSolrCloudTest.java:493) at org.apache.solr.cloud.FullSolrCloudTest.brindDownShardIndexSomeDocsAndRecover(FullSolrCloudTest.java:720) at org.apache.solr.cloud.FullSolrCloudTest.doTest(FullSolrCloudTest.java:545) at org.apache.solr.BaseDistributedSearchTestCase.testDistribSearch(BaseDistributedSearchTestCase.java:663) at org.apache.lucene.util.LuceneTestCase$3$1.evaluate(LuceneTestCase.java:529) at org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:165) at org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:57) Caused by: java.net.SocketTimeoutException: Read timed out at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(SocketInputStream.java:150) at java.net.SocketInputStream.read(SocketInputStream.java:121) at java.io.BufferedInputStream.fill(BufferedInputStream.java:235) at java.io.BufferedInputStream.read(BufferedInputStream.java:254) at org.apache.commons.httpclient.HttpParser.readRawLine(HttpParser.java:78) at org.apache.commons.httpclient.HttpParser.readLine(HttpParser.java:106) at org.apache.commons.httpclient.HttpConnection.readLine(HttpConnection.java:1116) at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$HttpConnectionAdapter.readLine(MultiThreadedHttpConnectionManager.java:1413) at org.apache.commons.httpclient.HttpMethodBase.readStatusLine(HttpMethodBase.java:1973) at org.apache.commons.httpclient.HttpMethodBase.readResponse(HttpMethodBase.java:1735) at org.apache.commons.httpclient.HttpMethodBase.execute(HttpMethodBase.java:1098) at org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:398) at org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:171) at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397) at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:425) Build Log (for compile errors): [...truncated 10465 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - Mark Miller lucidimagination.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2358) Distributing Indexing
[ https://issues.apache.org/jira/browse/SOLR-2358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13195761#comment-13195761 ] Yonik Seeley commented on SOLR-2358: We should be careful of using socket read timeouts in non-test code for operations that could potentially take a long time... commit, optimize, and even query requests (depending on what the request is). By default, solr does not currently time out requests because we don't know what the upper bound is. Distributing Indexing - Key: SOLR-2358 URL: https://issues.apache.org/jira/browse/SOLR-2358 Project: Solr Issue Type: New Feature Components: SolrCloud, update Reporter: William Mayor Priority: Minor Fix For: 4.0 Attachments: 2shard4server.jpg, SOLR-2358.patch, SOLR-2358.patch, apache-solr-noggit-r1211150.jar, zookeeper-3.3.4.jar The indexing side of SolrCloud - the goal of this issue is to provide durable, fault tolerant indexing to an elastic cluster of Solr instances. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2858) Separate SegmentReaders (and other atomic readers) from composite IndexReaders
[ https://issues.apache.org/jira/browse/LUCENE-2858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13195768#comment-13195768 ] Yonik Seeley commented on LUCENE-2858: -- bq. SlowMultiReaderWrapper - please rename, the name is really ugly; or Multi* +1, the Slow* is misleading as it makes it seem like there's a faster way you should be doing it. CompositeReaderWrapper should be fine. And no, it doesn't sound too cool for the hypothetical developers who use that as a criteria when coding ;-) Other possibilities include AtomicReaderEmulator, AtomicEmulatorReader, CompositeAsAtomicReader, etc Separate SegmentReaders (and other atomic readers) from composite IndexReaders -- Key: LUCENE-2858 URL: https://issues.apache.org/jira/browse/LUCENE-2858 Project: Lucene - Java Issue Type: Task Reporter: Uwe Schindler Assignee: Uwe Schindler Priority: Blocker Fix For: 4.0 With current trunk, whenever you open an IndexReader on a directory you get back a DirectoryReader which is a composite reader. The interface of IndexReader has now lots of methods that simply throw UOE (in fact more than 50% of all methods that are commonly used ones are unuseable now). This confuses users and makes the API hard to understand. This issue should split atomic readers from reader collections with a separate API. After that, you are no longer able, to get TermsEnum without wrapping from those composite readers. We currently have helper classes for wrapping (SlowMultiReaderWrapper - please rename, the name is really ugly; or Multi*), those should be retrofitted to implement the correct classes (SlowMultiReaderWrapper would be an atomic reader but takes a composite reader as ctor param, maybe it could also simply take a ListAtomicReader). In my opinion, maybe composite readers could implement some collection APIs and also have the ReaderUtil method directly built in (possibly as a view in the util.Collection sense). In general composite readers do not really need to look like the previous IndexReaders, they could simply be a collection of SegmentReaders with some functionality like reopen. On the other side, atomic readers do not need reopen logic anymore? When a segment changes, you need a new atomic reader? - maybe because of deletions thats not the best idea, but we should investigate. Maybe make the whole reopen logic simplier to use (ast least on the collection reader level). We should decide about good names, i have no preference at the moment. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2358) Distributing Indexing
[ https://issues.apache.org/jira/browse/SOLR-2358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13195783#comment-13195783 ] Mark Miller commented on SOLR-2358: --- Yup, I agree - in general in non test code we don't want to time out by default - that is why I've stuck to only using them in the tests until now. I've tried adding one to the Solr cmd distributor for a bit though - just to see if that helps on Jenkins any. I'd like to narrow in and at least know if this is the problem or not (blackhole hangups). For some things, like a request to recover, timeouts may be fine I think. Once I am able to log into jenkins again, I can hopefully narrow down what is happening a lot faster. Distributing Indexing - Key: SOLR-2358 URL: https://issues.apache.org/jira/browse/SOLR-2358 Project: Solr Issue Type: New Feature Components: SolrCloud, update Reporter: William Mayor Priority: Minor Fix For: 4.0 Attachments: 2shard4server.jpg, SOLR-2358.patch, SOLR-2358.patch, apache-solr-noggit-r1211150.jar, zookeeper-3.3.4.jar The indexing side of SolrCloud - the goal of this issue is to provide durable, fault tolerant indexing to an elastic cluster of Solr instances. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2358) Distributing Indexing
[ https://issues.apache.org/jira/browse/SOLR-2358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13195787#comment-13195787 ] Yonik Seeley commented on SOLR-2358: bq. For some things, like a request to recover, timeouts may be fine I think. Definitely - we have a lot better handle on Solr created requests. Replication (although it can take a long time to send a big file, there shouldn't be long periods where no packets are sent), PeerSync, etc. Although IIRC, a new cloud-style replication request involves the recipient doing a commit? Distributing Indexing - Key: SOLR-2358 URL: https://issues.apache.org/jira/browse/SOLR-2358 Project: Solr Issue Type: New Feature Components: SolrCloud, update Reporter: William Mayor Priority: Minor Fix For: 4.0 Attachments: 2shard4server.jpg, SOLR-2358.patch, SOLR-2358.patch, apache-solr-noggit-r1211150.jar, zookeeper-3.3.4.jar The indexing side of SolrCloud - the goal of this issue is to provide durable, fault tolerant indexing to an elastic cluster of Solr instances. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-2795) Genericize DirectIOLinuxDir - UnixDir
[ https://issues.apache.org/jira/browse/LUCENE-2795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-2795. Resolution: Fixed Fix Version/s: 4.0 Thanks Varun! Genericize DirectIOLinuxDir - UnixDir -- Key: LUCENE-2795 URL: https://issues.apache.org/jira/browse/LUCENE-2795 Project: Lucene - Java Issue Type: Improvement Components: core/store Reporter: Michael McCandless Assignee: Varun Thacker Labels: gsoc2011, lucene-gsoc-11, mentor Fix For: 4.0 Attachments: LUCENE-2795.patch, LUCENE-2795.patch, LUCENE-2795.patch, LUCENE-2795.patch, LUCENE-2795.patch, LUCENE-2795.patch, LUCENE-2795.patch, LUCENE-2795.patch, LUCENE-2795.patch, LUCENE-2795.patch Today DirectIOLinuxDir is tricky/dangerous to use, because you only want to use it for indexWriter and not IndexReader (searching). It's a trap. But, once we do LUCENE-2793, we can make it fully general purpose because then a single native Dir impl can be used. I'd also like to make it generic to other Unices, if we can, so that it becomes UnixDirectory. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-3729) Allow using FST to hold terms data in DocValues.BYTES_*_SORTED
Allow using FST to hold terms data in DocValues.BYTES_*_SORTED -- Key: LUCENE-3729 URL: https://issues.apache.org/jira/browse/LUCENE-3729 Project: Lucene - Java Issue Type: Improvement Reporter: Michael McCandless -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3017) Allow edismax stopword filter factory implementation to be specified
[ https://issues.apache.org/jira/browse/SOLR-3017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13195810#comment-13195810 ] Erick Erickson commented on SOLR-3017: -- A couple of questions: 1 I notice that guava is in here. The only other place I see imports for google.common is in the carrot code. Does anyone object to guava getting used in core? I only ask because it's used in so few places, do we prefer apache commons StringUtils for this kind of stuff or do we just not care? 2 In ExtendedDismaxQParserPlugin, around lines 1140 (in noStopwordFilterAnalyzer) there are a couple of tests like: if (stopwordFilterFactoryClass.isInstance(tf)) { Scanning the code, it seems like stopwordFilterFactoryClass could be null, an NPE here seems questionable. Otherwise, this seems fine to me from a tactical perspective, anyone want to weigh in on whether this is a good thing overall? Allow edismax stopword filter factory implementation to be specified Key: SOLR-3017 URL: https://issues.apache.org/jira/browse/SOLR-3017 Project: Solr Issue Type: Improvement Affects Versions: 4.0 Reporter: Michael Dodsworth Priority: Minor Fix For: 4.0 Attachments: SOLR-3017.patch, edismax_stop_filter_factory.patch Currently, the edismax query parser assumes that stopword filtering is being done by StopFilter: the removal of the stop filter is performed by looking for an instance of 'StopFilterFactory' (hard-coded) within the associated field's analysis chain. We'd like to be able to use our own stop filters whilst keeping the edismax stopword removal goodness. The supplied patch allows the stopword filter factory class to be supplied as a param, stopwordFilterClassName. If no value is given, the default (StopFilterFactory) is used. Another option I looked into was to extend StopFilterFactory to create our own filter. Unfortunately, StopFilterFactory's 'create' method returns StopFilter, not TokenStream. StopFilter is also final. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3729) Allow using FST to hold terms data in DocValues.BYTES_*_SORTED
[ https://issues.apache.org/jira/browse/LUCENE-3729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-3729: --- Attachment: LUCENE-3729.patch Prototype patch just for testing... As a quick test for viability here... I hacked FieldCacheImpl.DocTermsIndexImpl, to build an FST to map term - ord, and changed the lookup method to use the new Util.getByOutput method. Then I tested perf on 10M docs from Wikipedia: {noformat} TaskQPS base StdDev base QPS fstfcStdDev fstfc Pct diff TermGroup1M 47.751.59 25.750.36 -48% - -43% TermBGroup1M 17.100.58 14.200.37 -21% - -11% PKLookup 158.736.07 155.843.00 -7% - 4% TermTitleSort 43.492.54 42.731.84 -11% - 8% Respell 81.133.24 80.673.83 -8% - 8% Term 106.133.59 106.031.28 -4% - 4% TermBGroup1M1P 25.310.44 25.370.54 -3% - 4% Fuzzy2 55.321.21 55.762.55 -5% - 7% Fuzzy1 74.061.21 74.882.80 -4% - 6% SloppyPhrase9.820.619.950.42 -8% - 12% SpanNear3.390.163.470.15 -6% - 12% Phrase9.290.699.660.69 -10% - 20% Wildcard 20.150.66 21.230.460% - 11% AndHighHigh 13.430.55 14.240.70 -3% - 15% Prefix3 10.050.53 10.700.190% - 14% AndHighMed 56.623.36 60.544.28 -6% - 21% OrHighMed 25.780.98 27.751.51 -1% - 17% OrHighHigh 10.970.41 11.820.63 -1% - 17% IntNRQ9.740.81 10.830.260% - 24% {noformat} Two-pass grouping took a big hit... and single-pass grouping a moderate hit... but TermTitleSort was a minor slowdown, which is good news. The net RAM required across all segs for the title field FST was 30.2 MB, vs 46.5 MB for the current FieldCache terms storage (PagedBytes + PackedInts), which is ~35% less. The FST for the group-by fields was quite a bit larger (~60%) RAM usage than PagedBytes + PackedInts, because these fields are actually randomly generated unicode strings... I didn't make the change to use the FST for term - ord lookup (have to fix the binarySearchLookup method), but we really should do this for real because it's doing an unnecessary binary search (repeated ord - term lookup) now. Ie, perf should be better than above... grouping is a heavier user of binarySearchLookup than sorting so it should help recover some of that slowdown. Also, Util.getByOutput currently doesn't optimize for array'd arcs... so if we fix that we should get some small perf gain. To do this for real I think we should do it only with DocValues, because the FST build time is relatively costly. Allow using FST to hold terms data in DocValues.BYTES_*_SORTED -- Key: LUCENE-3729 URL: https://issues.apache.org/jira/browse/LUCENE-3729 Project: Lucene - Java Issue Type: Improvement Reporter: Michael McCandless Attachments: LUCENE-3729.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3729) Allow using FST to hold terms data in DocValues.BYTES_*_SORTED
[ https://issues.apache.org/jira/browse/LUCENE-3729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-3729: --- Attachment: LUCENE-3729.patch Updated patch to use FST for term - ord lookup during 2 pass grouping... much better results: {noformat} TaskQPS base StdDev base QPS fstfcStdDev fstfc Pct diff TermGroup1M 45.691.25 43.430.90 -9% - 0% PKLookup 164.029.00 157.126.23 -12% - 5% Respell 64.282.75 61.962.48 -11% - 4% SloppyPhrase5.990.265.790.35 -13% - 7% Fuzzy2 54.072.47 52.561.83 -10% - 5% Phrase 14.970.16 14.610.44 -6% - 1% TermTitleSort 39.530.56 38.711.93 -8% - 4% OrHighMed 32.911.33 32.271.48 -10% - 6% OrHighHigh 15.100.62 14.830.68 -9% - 7% AndHighMed 53.490.56 52.532.02 -6% - 3% SpanNear 14.280.21 14.040.28 -5% - 1% Fuzzy1 60.371.25 59.391.65 -6% - 3% AndHighHigh9.150.129.010.25 -5% - 2% TermBGroup1M 33.530.61 33.070.77 -5% - 2% IntNRQ 10.160.47 10.040.90 -14% - 12% Prefix3 40.440.59 40.441.99 -6% - 6% Wildcard 35.380.63 35.551.49 -5% - 6% TermBGroup1M1P 16.540.34 16.780.54 -3% - 6% Term 100.392.64 103.133.69 -3% - 9% {noformat} Allow using FST to hold terms data in DocValues.BYTES_*_SORTED -- Key: LUCENE-3729 URL: https://issues.apache.org/jira/browse/LUCENE-3729 Project: Lucene - Java Issue Type: Improvement Reporter: Michael McCandless Attachments: LUCENE-3729.patch, LUCENE-3729.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2764) Create a NorwegianLightStemmer and NorwegianMinimalStemmer
[ https://issues.apache.org/jira/browse/SOLR-2764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jan Høydahl updated SOLR-2764: -- Attachment: SOLR-2764.patch Thanks Christian. I further refined stuff: - For MinimalStemmer, we now do two-pass removal for the -dom and -het endings. This means that the word kristendom will first be stemmed to kristen, and then all the general rules apply so it will be further stemmed to krist. The effect of this is that both kristen,kristendom,kristendommen,kristendommens will all be stemmed to krist (due to in this case incorrect interpretation of -en as plural ending), but when stopping at -dom removal, kristendom would not match inflections of kristen. What do you think, is this a reasonable improvement or could there be side effects? I've not added these rules to the MinimalStemmer, to keep it simpler. Create a NorwegianLightStemmer and NorwegianMinimalStemmer -- Key: SOLR-2764 URL: https://issues.apache.org/jira/browse/SOLR-2764 Project: Solr Issue Type: New Feature Components: Schema and Analysis Reporter: Jan Høydahl Fix For: 3.6, 4.0 Attachments: SOLR-2764.patch, SOLR-2764.patch, SOLR-2764.patch, SOLR-2764.patch We need a simple light-weight stemmer and a minimal stemmer for plural/singlular only in Norwegian -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (SOLR-3013) Add UIMA based tokenizers / filters that can be used in the schema.xml
[ https://issues.apache.org/jira/browse/SOLR-3013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tommaso Teofili reassigned SOLR-3013: - Assignee: Tommaso Teofili Add UIMA based tokenizers / filters that can be used in the schema.xml -- Key: SOLR-3013 URL: https://issues.apache.org/jira/browse/SOLR-3013 Project: Solr Issue Type: Improvement Components: update Affects Versions: 3.5 Reporter: Tommaso Teofili Assignee: Tommaso Teofili Priority: Minor Labels: uima, update_request_handler Fix For: 3.6, 4.0 Attachments: SOLR-3013.patch Add UIMA based tokenizers / filters that can be declared and used directly inside the schema.xml. Thus instead of using the UIMA UpdateRequestProcessor one could directly define per-field NLP capable tokenizers / filters. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Issue Comment Edited] (SOLR-2764) Create a NorwegianLightStemmer and NorwegianMinimalStemmer
[ https://issues.apache.org/jira/browse/SOLR-2764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13195820#comment-13195820 ] Jan Høydahl edited comment on SOLR-2764 at 1/29/12 10:09 PM: - Thanks Christian. I further refined stuff: - I think the MinimalStemmer is more or less good to go, it seems to do what it's supposed to - For LightStemmer, we now do two-pass removal for the -dom and -het endings. This means that the word kristendom will first be stemmed to kristen, and then all the general rules apply so it will be further stemmed to krist. The effect of this is that both kristen,kristendom,kristendommen,kristendommens will all be stemmed to krist (due to in this case incorrect interpretation of -en as singular definite ending). - Added some more tests to highlight this What do you think, is this -dom -het thing a reasonable improvement or could there be side effects? Are there some other general rules that could easily be incorporated to catch semi-regular conjugations for the light stemmer? was (Author: janhoy): Thanks Christian. I further refined stuff: - For MinimalStemmer, we now do two-pass removal for the -dom and -het endings. This means that the word kristendom will first be stemmed to kristen, and then all the general rules apply so it will be further stemmed to krist. The effect of this is that both kristen,kristendom,kristendommen,kristendommens will all be stemmed to krist (due to in this case incorrect interpretation of -en as plural ending), but when stopping at -dom removal, kristendom would not match inflections of kristen. What do you think, is this a reasonable improvement or could there be side effects? I've not added these rules to the MinimalStemmer, to keep it simpler. Create a NorwegianLightStemmer and NorwegianMinimalStemmer -- Key: SOLR-2764 URL: https://issues.apache.org/jira/browse/SOLR-2764 Project: Solr Issue Type: New Feature Components: Schema and Analysis Reporter: Jan Høydahl Fix For: 3.6, 4.0 Attachments: SOLR-2764.patch, SOLR-2764.patch, SOLR-2764.patch, SOLR-2764.patch We need a simple light-weight stemmer and a minimal stemmer for plural/singlular only in Norwegian -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3013) Add UIMA based tokenizers / filters that can be used in the schema.xml
[ https://issues.apache.org/jira/browse/SOLR-3013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13195861#comment-13195861 ] Tommaso Teofili commented on SOLR-3013: --- If no one objects I'll commit this shortly. Add UIMA based tokenizers / filters that can be used in the schema.xml -- Key: SOLR-3013 URL: https://issues.apache.org/jira/browse/SOLR-3013 Project: Solr Issue Type: Improvement Components: update Affects Versions: 3.5 Reporter: Tommaso Teofili Assignee: Tommaso Teofili Priority: Minor Labels: uima, update_request_handler Fix For: 3.6, 4.0 Attachments: SOLR-3013.patch Add UIMA based tokenizers / filters that can be declared and used directly inside the schema.xml. Thus instead of using the UIMA UpdateRequestProcessor one could directly define per-field NLP capable tokenizers / filters. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2764) Create a NorwegianLightStemmer and NorwegianMinimalStemmer
[ https://issues.apache.org/jira/browse/SOLR-2764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13195863#comment-13195863 ] Robert Muir commented on SOLR-2764: --- just some general suggestions: in a light stemmer, i would be wary of derivational endings. it seems in the case of dom/het because its dealing with adj/noun that its on the edge (maybe ok here), but if possible it would be more ideal to avoid multiple passes... this is the kind of thing that causes snowball problems. Can you think of examples for dom/het where the meaning would be changed? for example: freedom is used the same way in english, but stemming this to free is very lossy, since free has a variety of meanings (such as costs nothing), some of which are incompatible with freedom. This is the danger of stripping derivational suffixes... Create a NorwegianLightStemmer and NorwegianMinimalStemmer -- Key: SOLR-2764 URL: https://issues.apache.org/jira/browse/SOLR-2764 Project: Solr Issue Type: New Feature Components: Schema and Analysis Reporter: Jan Høydahl Fix For: 3.6, 4.0 Attachments: SOLR-2764.patch, SOLR-2764.patch, SOLR-2764.patch, SOLR-2764.patch We need a simple light-weight stemmer and a minimal stemmer for plural/singlular only in Norwegian -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2858) Separate SegmentReaders (and other atomic readers) from composite IndexReaders
[ https://issues.apache.org/jira/browse/LUCENE-2858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13195868#comment-13195868 ] Robert Muir commented on LUCENE-2858: - Can we please do some eclipse-renames like: AtomicIndexReader - AtomicReader AtomicIndexReader.AtomicReaderContext - AtomicReader.Context The verbosity of the api is killing me :) Separate SegmentReaders (and other atomic readers) from composite IndexReaders -- Key: LUCENE-2858 URL: https://issues.apache.org/jira/browse/LUCENE-2858 Project: Lucene - Java Issue Type: Task Reporter: Uwe Schindler Assignee: Uwe Schindler Priority: Blocker Fix For: 4.0 With current trunk, whenever you open an IndexReader on a directory you get back a DirectoryReader which is a composite reader. The interface of IndexReader has now lots of methods that simply throw UOE (in fact more than 50% of all methods that are commonly used ones are unuseable now). This confuses users and makes the API hard to understand. This issue should split atomic readers from reader collections with a separate API. After that, you are no longer able, to get TermsEnum without wrapping from those composite readers. We currently have helper classes for wrapping (SlowMultiReaderWrapper - please rename, the name is really ugly; or Multi*), those should be retrofitted to implement the correct classes (SlowMultiReaderWrapper would be an atomic reader but takes a composite reader as ctor param, maybe it could also simply take a ListAtomicReader). In my opinion, maybe composite readers could implement some collection APIs and also have the ReaderUtil method directly built in (possibly as a view in the util.Collection sense). In general composite readers do not really need to look like the previous IndexReaders, they could simply be a collection of SegmentReaders with some functionality like reopen. On the other side, atomic readers do not need reopen logic anymore? When a segment changes, you need a new atomic reader? - maybe because of deletions thats not the best idea, but we should investigate. Maybe make the whole reopen logic simplier to use (ast least on the collection reader level). We should decide about good names, i have no preference at the moment. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2858) Separate SegmentReaders (and other atomic readers) from composite IndexReaders
[ https://issues.apache.org/jira/browse/LUCENE-2858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13195873#comment-13195873 ] Michael McCandless commented on LUCENE-2858: +1 for those names. Separate SegmentReaders (and other atomic readers) from composite IndexReaders -- Key: LUCENE-2858 URL: https://issues.apache.org/jira/browse/LUCENE-2858 Project: Lucene - Java Issue Type: Task Reporter: Uwe Schindler Assignee: Uwe Schindler Priority: Blocker Fix For: 4.0 With current trunk, whenever you open an IndexReader on a directory you get back a DirectoryReader which is a composite reader. The interface of IndexReader has now lots of methods that simply throw UOE (in fact more than 50% of all methods that are commonly used ones are unuseable now). This confuses users and makes the API hard to understand. This issue should split atomic readers from reader collections with a separate API. After that, you are no longer able, to get TermsEnum without wrapping from those composite readers. We currently have helper classes for wrapping (SlowMultiReaderWrapper - please rename, the name is really ugly; or Multi*), those should be retrofitted to implement the correct classes (SlowMultiReaderWrapper would be an atomic reader but takes a composite reader as ctor param, maybe it could also simply take a ListAtomicReader). In my opinion, maybe composite readers could implement some collection APIs and also have the ReaderUtil method directly built in (possibly as a view in the util.Collection sense). In general composite readers do not really need to look like the previous IndexReaders, they could simply be a collection of SegmentReaders with some functionality like reopen. On the other side, atomic readers do not need reopen logic anymore? When a segment changes, you need a new atomic reader? - maybe because of deletions thats not the best idea, but we should investigate. Maybe make the whole reopen logic simplier to use (ast least on the collection reader level). We should decide about good names, i have no preference at the moment. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2858) Separate SegmentReaders (and other atomic readers) from composite IndexReaders
[ https://issues.apache.org/jira/browse/LUCENE-2858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13195876#comment-13195876 ] Uwe Schindler commented on LUCENE-2858: --- Jaja, will fix this... Separate SegmentReaders (and other atomic readers) from composite IndexReaders -- Key: LUCENE-2858 URL: https://issues.apache.org/jira/browse/LUCENE-2858 Project: Lucene - Java Issue Type: Task Reporter: Uwe Schindler Assignee: Uwe Schindler Priority: Blocker Fix For: 4.0 With current trunk, whenever you open an IndexReader on a directory you get back a DirectoryReader which is a composite reader. The interface of IndexReader has now lots of methods that simply throw UOE (in fact more than 50% of all methods that are commonly used ones are unuseable now). This confuses users and makes the API hard to understand. This issue should split atomic readers from reader collections with a separate API. After that, you are no longer able, to get TermsEnum without wrapping from those composite readers. We currently have helper classes for wrapping (SlowMultiReaderWrapper - please rename, the name is really ugly; or Multi*), those should be retrofitted to implement the correct classes (SlowMultiReaderWrapper would be an atomic reader but takes a composite reader as ctor param, maybe it could also simply take a ListAtomicReader). In my opinion, maybe composite readers could implement some collection APIs and also have the ReaderUtil method directly built in (possibly as a view in the util.Collection sense). In general composite readers do not really need to look like the previous IndexReaders, they could simply be a collection of SegmentReaders with some functionality like reopen. On the other side, atomic readers do not need reopen logic anymore? When a segment changes, you need a new atomic reader? - maybe because of deletions thats not the best idea, but we should investigate. Maybe make the whole reopen logic simplier to use (ast least on the collection reader level). We should decide about good names, i have no preference at the moment. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-tests-only-trunk-java7 - Build # 1646 - Failure
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk-java7/1646/ 1 tests failed. REGRESSION: org.apache.solr.cloud.ZkControllerTest.testUploadToCloud Error Message: KeeperErrorCode = NodeExists for /configs/config1/schema-reversed.xml Stack Trace: org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode = NodeExists for /configs/config1/schema-reversed.xml at org.apache.zookeeper.KeeperException.create(KeeperException.java:110) at org.apache.zookeeper.KeeperException.create(KeeperException.java:42) at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:643) at org.apache.solr.common.cloud.SolrZkClient$10.execute(SolrZkClient.java:486) at org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:65) at org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:483) at org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:369) at org.apache.solr.cloud.ZkController.uploadToZK(ZkController.java:896) at org.apache.solr.cloud.ZkController.uploadToZK(ZkController.java:689) at org.apache.solr.cloud.ZkControllerTest.testUploadToCloud(ZkControllerTest.java:121) at org.apache.lucene.util.LuceneTestCase$3$1.evaluate(LuceneTestCase.java:529) at org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:165) at org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:57) Build Log (for compile errors): [...truncated 9809 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-3725) Add optional packing to FST building
[ https://issues.apache.org/jira/browse/LUCENE-3725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-3725. Resolution: Fixed Add optional packing to FST building Key: LUCENE-3725 URL: https://issues.apache.org/jira/browse/LUCENE-3725 Project: Lucene - Java Issue Type: Improvement Components: core/FSTs Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 3.6, 4.0 Attachments: LUCENE-3725.patch, LUCENE-3725.patch, LUCENE-3725.patch, Perf.java The FSTs produced by Builder can be further shrunk if you are willing to spend highish transient RAM to do so... our Builder today tries hard not to use much RAM (and has options to tweak down the RAM usage, in exchange for somewhat lager FST), even when building immense FSTs. But for apps that can afford highish transient RAM to get a smaller net FST, I think we should offer packing. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3714) add suggester that uses shortest path/wFST instead of buckets
[ https://issues.apache.org/jira/browse/LUCENE-3714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-3714: Attachment: LUCENE-3714.patch I've been wanting to work on this.. havent found the time. This just syncs the patch up to trunk's FST api changes. add suggester that uses shortest path/wFST instead of buckets - Key: LUCENE-3714 URL: https://issues.apache.org/jira/browse/LUCENE-3714 Project: Lucene - Java Issue Type: New Feature Components: modules/spellchecker Reporter: Robert Muir Attachments: LUCENE-3714.patch, LUCENE-3714.patch, LUCENE-3714.patch, LUCENE-3714.patch, LUCENE-3714.patch, TestMe.java, out.png Currently the FST suggester (really an FSA) quantizes weights into buckets (e.g. single byte) and puts them in front of the word. This makes it fast, but you lose granularity in your suggestions. Lately the question was raised, if you build lucene's FST with positiveintoutputs, does it behave the same as a tropical semiring wFST? In other words, after completing the word, we instead traverse min(output) at each node to find the 'shortest path' to the best suggestion (with the highest score). This means we wouldnt need to quantize weights at all and it might make some operations (e.g. adding fuzzy matching etc) a lot easier. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-3.x - Build # 627 - Failure
Build: https://builds.apache.org/job/Lucene-3.x/627/ No tests ran. Build Log (for compile errors): [...truncated 10841 lines...] [exec] A /usr/home/hudson/hudson-slave/workspace/Lucene-3.x/checkout/lucene/build/svn-export/src/test-framework/java/overview.html [exec] A /usr/home/hudson/hudson-slave/workspace/Lucene-3.x/checkout/lucene/build/svn-export/src/test-framework/resources [exec] A /usr/home/hudson/hudson-slave/workspace/Lucene-3.x/checkout/lucene/build/svn-export/src/test-framework/resources/org [exec] A /usr/home/hudson/hudson-slave/workspace/Lucene-3.x/checkout/lucene/build/svn-export/src/test-framework/resources/org/apache [exec] A /usr/home/hudson/hudson-slave/workspace/Lucene-3.x/checkout/lucene/build/svn-export/src/test-framework/resources/org/apache/lucene [exec] A /usr/home/hudson/hudson-slave/workspace/Lucene-3.x/checkout/lucene/build/svn-export/src/test-framework/resources/org/apache/lucene/util [exec] A /usr/home/hudson/hudson-slave/workspace/Lucene-3.x/checkout/lucene/build/svn-export/src/test-framework/resources/org/apache/lucene/util/europarl.lines.txt.gz [exec] A /usr/home/hudson/hudson-slave/workspace/Lucene-3.x/checkout/lucene/build/svn-export/index.html [exec] A /usr/home/hudson/hudson-slave/workspace/Lucene-3.x/checkout/lucene/build/svn-export/README.txt [exec] A /usr/home/hudson/hudson-slave/workspace/Lucene-3.x/checkout/lucene/build/svn-export/JRE_VERSION_MIGRATION.txt [exec] A /usr/home/hudson/hudson-slave/workspace/Lucene-3.x/checkout/lucene/build/svn-export/BUILD.txt [exec] A /usr/home/hudson/hudson-slave/workspace/Lucene-3.x/checkout/lucene/build/svn-export/build.xml [exec] Exported revision 1237511. [delete] Deleting directory /usr/home/hudson/hudson-slave/workspace/Lucene-3.x/checkout/lucene/build/svn-export/src/tools/javadoc/java5 [mkdir] Created dir: /usr/home/hudson/hudson-slave/workspace/Lucene-3.x/checkout/lucene/build/svn-export/docs/changes [exec] --2012-01-30 00:39:15-- https://issues.apache.org/jira/rest/api/2.0.alpha1/project/LUCENE [exec] Resolving issues.apache.org (issues.apache.org)... 140.211.11.121 [exec] Connecting to issues.apache.org (issues.apache.org)|140.211.11.121|:443... connected. [exec] WARNING: cannot verify issues.apache.org's certificate, issued by `/C=US/O=Thawte, Inc./CN=Thawte SSL CA': [exec] Unable to locally verify the issuer's authority. [exec] HTTP request sent, awaiting response... 200 OK [exec] Length: unspecified [application/json] [exec] Saving to: `STDOUT' [exec] [exec] 0K .. .. 46.4M=0s [exec] [exec] 2012-01-30 00:39:17 (46.4 MB/s) - written to stdout [16744] [exec] [exec] Use of uninitialized value $heading in lc at /usr/home/hudson/hudson-slave/workspace/Lucene-3.x/checkout/lucene/build/svn-export/src/site/changes/changes2html.pl line 384, line 5949. [exec] Use of uninitialized value $heading in lc at /usr/home/hudson/hudson-slave/workspace/Lucene-3.x/checkout/lucene/build/svn-export/src/site/changes/changes2html.pl line 384, line 5949. [exec] Use of uninitialized value $heading in lc at /usr/home/hudson/hudson-slave/workspace/Lucene-3.x/checkout/lucene/build/svn-export/src/site/changes/changes2html.pl line 384, line 5949. [exec] Use of uninitialized value $heading in lc at /usr/home/hudson/hudson-slave/workspace/Lucene-3.x/checkout/lucene/build/svn-export/src/site/changes/changes2html.pl line 384, line 5949. [exec] Use of uninitialized value $heading in lc at /usr/home/hudson/hudson-slave/workspace/Lucene-3.x/checkout/lucene/build/svn-export/src/site/changes/changes2html.pl line 384, line 5949. [exec] Use of uninitialized value $heading in lc at /usr/home/hudson/hudson-slave/workspace/Lucene-3.x/checkout/lucene/build/svn-export/src/site/changes/changes2html.pl line 384, line 5949. [exec] Use of uninitialized value $heading in lc at /usr/home/hudson/hudson-slave/workspace/Lucene-3.x/checkout/lucene/build/svn-export/src/site/changes/changes2html.pl line 384, line 5949. [exec] Use of uninitialized value $heading in lc at /usr/home/hudson/hudson-slave/workspace/Lucene-3.x/checkout/lucene/build/svn-export/src/site/changes/changes2html.pl line 384, line 5949. [exec] Use of uninitialized value $heading in lc at /usr/home/hudson/hudson-slave/workspace/Lucene-3.x/checkout/lucene/build/svn-export/src/site/changes/changes2html.pl line 384, line 5949. [exec] Use of uninitialized value $heading in lc at /usr/home/hudson/hudson-slave/workspace/Lucene-3.x/checkout/lucene/build/svn-export/src/site/changes/changes2html.pl line 384, line 5949. [exec] Use of uninitialized value $heading in lc at /usr/home/hudson/hudson-slave/workspace/Lucene-3.x/checkout/lucene/build/svn-export/src/site/changes/changes2html.pl
[JENKINS] Lucene-Solr-tests-only-3.x - Build # 12319 - Failure
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-3.x/12319/ No tests ran. Build Log (for compile errors): [...truncated 16 lines...] U lucene/CHANGES.txt U lucene/src/test/org/apache/lucene/util/fst/TestFSTs.java U lucene/src/java/org/apache/lucene/util/FixedBitSet.java U lucene/src/java/org/apache/lucene/util/fst/PairOutputs.java U lucene/src/java/org/apache/lucene/util/fst/Util.java U lucene/src/java/org/apache/lucene/util/fst/FSTEnum.java U lucene/src/java/org/apache/lucene/util/fst/PositiveIntOutputs.java U lucene/src/java/org/apache/lucene/util/fst/Outputs.java U lucene/src/java/org/apache/lucene/util/fst/Builder.java U lucene/src/java/org/apache/lucene/util/fst/NodeHash.java U lucene/src/java/org/apache/lucene/util/fst/FST.java U lucene/src/java/org/apache/lucene/util/UnicodeUtil.java Ulucene U. At revision 1237512 Reverting /home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x/nightly Updating http://svn.apache.org/repos/asf/lucene/dev/nightly At revision 1237512 no change for http://svn.apache.org/repos/asf/lucene/dev/nightly since the previous build No emails were triggered. [Lucene-Solr-tests-only-3.x] $ /bin/bash -xe /var/tmp/hudson5884032007436283436.sh + sh /home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x/nightly/hudson-lusolr-tests-3.x.sh + ANT_HOME=/home/hudson/tools/ant/latest1.7 + SVNVERSION_EXE=svnversion + SVN_EXE=svn + CLOVER=/home/hudson/tools/clover/clover2latest + JAVA_HOME_15=/home/hudson/tools/java/latest1.5 + [ -z '' ] + JAVA_HOME_16=/home/hudson/tools/java/latest1.6 + JAVADOC_HOME_15=/usr/local/share/doc/jdk1.5/api + JAVADOC_HOME_16=/usr/local/share/doc/jdk1.6/api + ROOT_DIR=checkout + CORE_DIR=checkout/lucene + MODULES_DIR=checkout/modules + SOLR_DIR=checkout/solr + ARTIFACTS=/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x/artifacts + JAVADOCS_ARTIFACTS=/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x/javadocs + DUMP_DIR=/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x/heapdumps + TESTS_MULTIPLIER=3 + TEST_LINE_DOCS_FILE=/home/hudson/lucene-data/enwiki.random.lines.txt.gz + TEST_JVM_ARGS='-XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x/heapdumps/ ' + set +x + mkdir -p /home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x/heapdumps + rm -rf /home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x/heapdumps/README.txt + echo 'This directory contains heap dumps that may be generated by test runs when OOM occurred.' + TESTS_MULTIPLIER=5 + cd /home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x/checkout + JAVA_HOME=/home/hudson/tools/java/latest1.5 /home/hudson/tools/ant/latest1.7/bin/ant clean Buildfile: build.xml clean: clean: [delete] Deleting directory /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x/checkout/lucene/build [echo] Building solr... clean: [delete] Deleting directory /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x/checkout/solr/build [echo] TODO: fix tests to not write files to 'core/src/test-files/solr/data'! [delete] Deleting directory /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x/checkout/solr/core/src/test-files/solr/data BUILD SUCCESSFUL Total time: 41 seconds + cd /home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x/checkout/lucene + JAVA_HOME=/home/hudson/tools/java/latest1.5 /home/hudson/tools/ant/latest1.7/bin/ant compile compile-test build-contrib compile-backwards Buildfile: build.xml jflex-uptodate-check: jflex-notice: javacc-uptodate-check: javacc-notice: init: clover.setup: clover.info: [echo] [echo] Clover not found. Code coverage reports disabled. [echo] clover: common.compile-core: [mkdir] Created dir: /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x/checkout/lucene/build/classes/java [javac] Compiling 498 source files to /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x/checkout/lucene/build/classes/java [javac] /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x/checkout/lucene/src/java/org/apache/lucene/queryParser/CharStream.java:34: warning: [dep-ann] deprecated name isnt annotated with @Deprecated [javac] int getColumn(); [javac] ^ [javac] /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x/checkout/lucene/src/java/org/apache/lucene/queryParser/CharStream.java:41: warning: [dep-ann] deprecated name isnt annotated with @Deprecated [javac] int getLine(); [javac] ^ [javac] /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x/checkout/lucene/src/java/org/apache/lucene/util/fst/FST.java:1764: method does not override a method from its superclass [javac] @Override [javac] ^
[JENKINS] Lucene-Solr-tests-only-3.x-java7 - Build # 1673 - Failure
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-3.x-java7/1673/ No tests ran. Build Log (for compile errors): [...truncated 13 lines...] U lucene/CHANGES.txt U lucene/src/test/org/apache/lucene/util/fst/TestFSTs.java U lucene/src/java/org/apache/lucene/util/FixedBitSet.java U lucene/src/java/org/apache/lucene/util/fst/PairOutputs.java U lucene/src/java/org/apache/lucene/util/fst/Util.java U lucene/src/java/org/apache/lucene/util/fst/FSTEnum.java U lucene/src/java/org/apache/lucene/util/fst/PositiveIntOutputs.java U lucene/src/java/org/apache/lucene/util/fst/Outputs.java U lucene/src/java/org/apache/lucene/util/fst/Builder.java U lucene/src/java/org/apache/lucene/util/fst/NodeHash.java U lucene/src/java/org/apache/lucene/util/fst/FST.java U lucene/src/java/org/apache/lucene/util/UnicodeUtil.java Ulucene U. At revision 1237513 Reverting /home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x-java7/nightly Updating http://svn.apache.org/repos/asf/lucene/dev/nightly At revision 1237513 no change for http://svn.apache.org/repos/asf/lucene/dev/nightly since the previous build No emails were triggered. [Lucene-Solr-tests-only-3.x-java7] $ /bin/bash -xe /var/tmp/hudson2486050198597281484.sh + sh /home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x-java7/nightly/hudson-lusolr-tests-3.x.sh + ANT_HOME=/home/hudson/tools/ant/latest1.7 + SVNVERSION_EXE=svnversion + SVN_EXE=svn + CLOVER=/home/hudson/tools/clover/clover2latest + JAVA_HOME_15=/home/hudson/tools/java/latest1.5 + [ -z yes ] + JAVA_HOME_16=/home/hudson/tools/java/latest1.7 + JAVADOC_HOME_15=/usr/local/share/doc/jdk1.5/api + JAVADOC_HOME_16=/usr/local/share/doc/jdk1.6/api + ROOT_DIR=checkout + CORE_DIR=checkout/lucene + MODULES_DIR=checkout/modules + SOLR_DIR=checkout/solr + ARTIFACTS=/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x-java7/artifacts + JAVADOCS_ARTIFACTS=/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x-java7/javadocs + DUMP_DIR=/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x-java7/heapdumps + TESTS_MULTIPLIER=3 + TEST_LINE_DOCS_FILE=/home/hudson/lucene-data/enwiki.random.lines.txt.gz + TEST_JVM_ARGS='-XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x-java7/heapdumps/ ' + set +x + mkdir -p /home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x-java7/heapdumps + rm -rf /home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x-java7/heapdumps/README.txt + echo 'This directory contains heap dumps that may be generated by test runs when OOM occurred.' + TESTS_MULTIPLIER=5 + cd /home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x-java7/checkout + JAVA_HOME=/home/hudson/tools/java/latest1.5 /home/hudson/tools/ant/latest1.7/bin/ant clean Buildfile: build.xml clean: clean: [delete] Deleting directory /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x-java7/checkout/lucene/build [echo] Building solr... clean: [delete] Deleting directory /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x-java7/checkout/solr/build [echo] TODO: fix tests to not write files to 'core/src/test-files/solr/data'! [delete] Deleting directory /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x-java7/checkout/solr/core/src/test-files/solr/data BUILD SUCCESSFUL Total time: 28 seconds + cd /home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x-java7/checkout/lucene + JAVA_HOME=/home/hudson/tools/java/latest1.5 /home/hudson/tools/ant/latest1.7/bin/ant compile compile-test build-contrib compile-backwards Buildfile: build.xml jflex-uptodate-check: jflex-notice: javacc-uptodate-check: javacc-notice: init: clover.setup: clover.info: [echo] [echo] Clover not found. Code coverage reports disabled. [echo] clover: common.compile-core: [mkdir] Created dir: /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x-java7/checkout/lucene/build/classes/java [javac] Compiling 498 source files to /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x-java7/checkout/lucene/build/classes/java [javac] /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x-java7/checkout/lucene/src/java/org/apache/lucene/queryParser/CharStream.java:34: warning: [dep-ann] deprecated name isnt annotated with @Deprecated [javac] int getColumn(); [javac] ^ [javac] /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x-java7/checkout/lucene/src/java/org/apache/lucene/queryParser/CharStream.java:41: warning: [dep-ann] deprecated name isnt annotated with @Deprecated [javac] int getLine(); [javac] ^ [javac]
[jira] [Created] (SOLR-3073) Distributed Grouping fails if the uniqueKey is a UUID
Distributed Grouping fails if the uniqueKey is a UUID - Key: SOLR-3073 URL: https://issues.apache.org/jira/browse/SOLR-3073 Project: Solr Issue Type: Bug Affects Versions: 3.5 Reporter: Devon Krisman Priority: Minor Fix For: 3.6 Attachments: SOLR-3073-3x.patch Attempting use distributed grouping (using a StrField as the group.fieldname) with a UUID as the uniqueKey results in an error because the classname (java.util.UUID) is prepended to the field value during the second phase of the grouping. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-3073) Distributed Grouping fails if the uniqueKey is a UUID
[ https://issues.apache.org/jira/browse/SOLR-3073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Devon Krisman updated SOLR-3073: Attachment: SOLR-3073-3x.patch Distributed Grouping fails if the uniqueKey is a UUID - Key: SOLR-3073 URL: https://issues.apache.org/jira/browse/SOLR-3073 Project: Solr Issue Type: Bug Affects Versions: 3.5 Reporter: Devon Krisman Priority: Minor Fix For: 3.6 Attachments: SOLR-3073-3x.patch Attempting use distributed grouping (using a StrField as the group.fieldname) with a UUID as the uniqueKey results in an error because the classname (java.util.UUID) is prepended to the field value during the second phase of the grouping. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3073) Distributed Grouping fails if the uniqueKey is a UUID
[ https://issues.apache.org/jira/browse/SOLR-3073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13195898#comment-13195898 ] Devon Krisman commented on SOLR-3073: - This is the error that happens when you try to run a distributed grouping search with a UUID as the uniqueKey: SEVERE: org.apache.solr.common.SolrException: Invalid UUID String: 'java.util.UUID:317db1e1-b778-ec66-ef68-ddd00b096632' at org.apache.solr.schema.UUIDField.toInternal(UUIDField.java:85) at org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:217) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:194) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1372) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:326) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:945) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:756) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:218) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) The request handlers append the field Object's classname to its string value if it is from an unrecognized class, the attached patch should add java.util.UUID to the recognized classtypes for Solr's response handlers. Distributed Grouping fails if the uniqueKey is a UUID - Key: SOLR-3073 URL: https://issues.apache.org/jira/browse/SOLR-3073 Project: Solr Issue Type: Bug Affects Versions: 3.5 Reporter: Devon Krisman Priority: Minor Fix For: 3.6 Attachments: SOLR-3073-3x.patch Attempting use distributed grouping (using a StrField as the group.fieldname) with a UUID as the uniqueKey results in an error because the classname (java.util.UUID) is prepended to the field value during the second phase of the grouping. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-3727) fix assertions/checks that use File.length() to use getFilePointer()
[ https://issues.apache.org/jira/browse/LUCENE-3727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir resolved LUCENE-3727. - Resolution: Fixed Fix Version/s: 3.6 fix assertions/checks that use File.length() to use getFilePointer() Key: LUCENE-3727 URL: https://issues.apache.org/jira/browse/LUCENE-3727 Project: Lucene - Java Issue Type: Task Affects Versions: 3.6 Reporter: Robert Muir Fix For: 3.6 Attachments: LUCENE-3727.patch, LUCENE-3727.patch This came up on this thread Getting RuntimeException: after flush: fdx size mismatch while Indexing (http://www.lucidimagination.com/search/document/a8db01a220f0a126) In trunk, a side effect of the codec refactoring is that these assertions were pushed into codecs as finish() before close(). they check getFilePointer() instead in this computation, which checks that lucene did its part (instead of falsely tripping if directory metadata is stale). I think we should fix these checks/asserts on 3.x too -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-3073) Distributed Grouping fails if the uniqueKey is a UUID
[ https://issues.apache.org/jira/browse/SOLR-3073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Devon Krisman updated SOLR-3073: Description: Attempting to use distributed grouping (using a StrField as the group.fieldname) with a UUID as the uniqueKey results in an error because the classname (java.util.UUID) is prepended to the field value during the second phase of the grouping. (was: Attempting use distributed grouping (using a StrField as the group.fieldname) with a UUID as the uniqueKey results in an error because the classname (java.util.UUID) is prepended to the field value during the second phase of the grouping.) Distributed Grouping fails if the uniqueKey is a UUID - Key: SOLR-3073 URL: https://issues.apache.org/jira/browse/SOLR-3073 Project: Solr Issue Type: Bug Affects Versions: 3.5 Reporter: Devon Krisman Priority: Minor Fix For: 3.6 Attachments: SOLR-3073-3x.patch Attempting to use distributed grouping (using a StrField as the group.fieldname) with a UUID as the uniqueKey results in an error because the classname (java.util.UUID) is prepended to the field value during the second phase of the grouping. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2858) Separate SegmentReaders (and other atomic readers) from composite IndexReaders
[ https://issues.apache.org/jira/browse/LUCENE-2858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13195904#comment-13195904 ] Uwe Schindler commented on LUCENE-2858: --- I renamed the enclosing classes and also removed the public ctors from ReaderContexts (to prevent stupid things already reported on mailing lists). The renameing of ReaderContexts all to the same name Context, but with different enclosing class is a refactoring, Eclipse cannot do (it creates invalid code). It seems only NetBeans can do this, I will try to find a solution. The problem is that Eclipse always tries to import the inner class, what causes conflicts. Finally, e.g. the method getDocIdSet should look like getDocIdSet(AtomicReader.Context,...) [only importing AtomicReader], but Eclipse always tries to use Context [and import oal.AtomicReader.Context]. At the end we should have abstract IndexReader.Context, AtomicReader.Context, CompositeReader.Context. Will go to bed now. Separate SegmentReaders (and other atomic readers) from composite IndexReaders -- Key: LUCENE-2858 URL: https://issues.apache.org/jira/browse/LUCENE-2858 Project: Lucene - Java Issue Type: Task Reporter: Uwe Schindler Assignee: Uwe Schindler Priority: Blocker Fix For: 4.0 With current trunk, whenever you open an IndexReader on a directory you get back a DirectoryReader which is a composite reader. The interface of IndexReader has now lots of methods that simply throw UOE (in fact more than 50% of all methods that are commonly used ones are unuseable now). This confuses users and makes the API hard to understand. This issue should split atomic readers from reader collections with a separate API. After that, you are no longer able, to get TermsEnum without wrapping from those composite readers. We currently have helper classes for wrapping (SlowMultiReaderWrapper - please rename, the name is really ugly; or Multi*), those should be retrofitted to implement the correct classes (SlowMultiReaderWrapper would be an atomic reader but takes a composite reader as ctor param, maybe it could also simply take a ListAtomicReader). In my opinion, maybe composite readers could implement some collection APIs and also have the ReaderUtil method directly built in (possibly as a view in the util.Collection sense). In general composite readers do not really need to look like the previous IndexReaders, they could simply be a collection of SegmentReaders with some functionality like reopen. On the other side, atomic readers do not need reopen logic anymore? When a segment changes, you need a new atomic reader? - maybe because of deletions thats not the best idea, but we should investigate. Maybe make the whole reopen logic simplier to use (ast least on the collection reader level). We should decide about good names, i have no preference at the moment. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3013) Add UIMA based tokenizers / filters that can be used in the schema.xml
[ https://issues.apache.org/jira/browse/SOLR-3013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13195906#comment-13195906 ] Chris Male commented on SOLR-3013: -- Hey Tommaso, Did a quick glance over the patch. Couple of things: - Could UIMATypeAwareAnalyzerTest (and any other Analyzer/Tokenizer tests) use BaseTokenStreamTestCase? It has some useful utility methods to verify that your Analyzer works as expected - UIMABaseAnalyzerTest could do the same, and could probably make use of newDirectory() etc to handle some of the boilerplate Add UIMA based tokenizers / filters that can be used in the schema.xml -- Key: SOLR-3013 URL: https://issues.apache.org/jira/browse/SOLR-3013 Project: Solr Issue Type: Improvement Components: update Affects Versions: 3.5 Reporter: Tommaso Teofili Assignee: Tommaso Teofili Priority: Minor Labels: uima, update_request_handler Fix For: 3.6, 4.0 Attachments: SOLR-3013.patch Add UIMA based tokenizers / filters that can be declared and used directly inside the schema.xml. Thus instead of using the UIMA UpdateRequestProcessor one could directly define per-field NLP capable tokenizers / filters. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3013) Add UIMA based tokenizers / filters that can be used in the schema.xml
[ https://issues.apache.org/jira/browse/SOLR-3013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13195908#comment-13195908 ] Robert Muir commented on SOLR-3013: --- in addition to what Chris said: * it looks like some correctOffset() etc are missing (these would be detected by BaseTokenStreamTestCase.checkRandomData likely) * the analysis components look as if they might be able to work with lucene too... maybe we could refactor the Tokenizer/Analyzer/etc in a new modules/analysis/uima that depends on uima? And Solr uima module would provide the factories to integrate Add UIMA based tokenizers / filters that can be used in the schema.xml -- Key: SOLR-3013 URL: https://issues.apache.org/jira/browse/SOLR-3013 Project: Solr Issue Type: Improvement Components: update Affects Versions: 3.5 Reporter: Tommaso Teofili Assignee: Tommaso Teofili Priority: Minor Labels: uima, update_request_handler Fix For: 3.6, 4.0 Attachments: SOLR-3013.patch Add UIMA based tokenizers / filters that can be declared and used directly inside the schema.xml. Thus instead of using the UIMA UpdateRequestProcessor one could directly define per-field NLP capable tokenizers / filters. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3013) Add UIMA based tokenizers / filters that can be used in the schema.xml
[ https://issues.apache.org/jira/browse/SOLR-3013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13195909#comment-13195909 ] Chris Male commented on SOLR-3013: -- {quote} the analysis components look as if they might be able to work with lucene too... maybe we could refactor the Tokenizer/Analyzer/etc in a new modules/analysis/uima that depends on uima? And Solr uima module would provide the factories to integrate {quote} I absolutely agree. Add UIMA based tokenizers / filters that can be used in the schema.xml -- Key: SOLR-3013 URL: https://issues.apache.org/jira/browse/SOLR-3013 Project: Solr Issue Type: Improvement Components: update Affects Versions: 3.5 Reporter: Tommaso Teofili Assignee: Tommaso Teofili Priority: Minor Labels: uima, update_request_handler Fix For: 3.6, 4.0 Attachments: SOLR-3013.patch Add UIMA based tokenizers / filters that can be declared and used directly inside the schema.xml. Thus instead of using the UIMA UpdateRequestProcessor one could directly define per-field NLP capable tokenizers / filters. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-3719) FVH: slow performance on very large queries
[ https://issues.apache.org/jira/browse/LUCENE-3719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Sekiguchi resolved LUCENE-3719. Resolution: Fixed Fix Version/s: 4.0 3.6 Assignee: Koji Sekiguchi trunk: Committed revision 1237528. 3x: Committed revision 1237531. FVH: slow performance on very large queries --- Key: LUCENE-3719 URL: https://issues.apache.org/jira/browse/LUCENE-3719 Project: Lucene - Java Issue Type: Bug Components: modules/highlighter Affects Versions: 3.5, 4.0 Reporter: Igor Motov Assignee: Koji Sekiguchi Priority: Minor Fix For: 3.6, 4.0 Attachments: LUCENE-3719.patch The change from HashSet to ArrayList for flatQueries in LUCENE-3019 resulted in very significant slowdown in some of our e-discovery queries after upgrade from 3.4.0 to 3.5.0. Our queries sometime contain tens of thousands of terms. As a result, major portion of execution time for such queries is now spent in the flatQueries.contains( sourceQuery ) method calls. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3719) FVH: slow performance on very large queries
[ https://issues.apache.org/jira/browse/LUCENE-3719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13195913#comment-13195913 ] Koji Sekiguchi commented on LUCENE-3719: Thanks Igor for reporting the issue and providing the patch! FVH: slow performance on very large queries --- Key: LUCENE-3719 URL: https://issues.apache.org/jira/browse/LUCENE-3719 Project: Lucene - Java Issue Type: Bug Components: modules/highlighter Affects Versions: 3.5, 4.0 Reporter: Igor Motov Assignee: Koji Sekiguchi Priority: Minor Fix For: 3.6, 4.0 Attachments: LUCENE-3719.patch The change from HashSet to ArrayList for flatQueries in LUCENE-3019 resulted in very significant slowdown in some of our e-discovery queries after upgrade from 3.4.0 to 3.5.0. Our queries sometime contain tens of thousands of terms. As a result, major portion of execution time for such queries is now spent in the flatQueries.contains( sourceQuery ) method calls. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3725) Add optional packing to FST building
[ https://issues.apache.org/jira/browse/LUCENE-3725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13195914#comment-13195914 ] Robert Muir commented on LUCENE-3725: - Just some numbers with another (CJK) fst I have been playing with, this one uses BYTE2 + SingleByteOutput Before: Finished: 326915 words, 77222 nodes, 358677 arcs, 2617255 bytes... Zipped: 1812629 bytes Packed: Finished: 326915 words, 77222 nodes, 358677 arcs, 2027763 bytes... Zipped: 1735486 bytes Add optional packing to FST building Key: LUCENE-3725 URL: https://issues.apache.org/jira/browse/LUCENE-3725 Project: Lucene - Java Issue Type: Improvement Components: core/FSTs Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 3.6, 4.0 Attachments: LUCENE-3725.patch, LUCENE-3725.patch, LUCENE-3725.patch, Perf.java The FSTs produced by Builder can be further shrunk if you are willing to spend highish transient RAM to do so... our Builder today tries hard not to use much RAM (and has options to tweak down the RAM usage, in exchange for somewhat lager FST), even when building immense FSTs. But for apps that can afford highish transient RAM to get a smaller net FST, I think we should offer packing. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-3730) Improved Kuromoji search mode segmentation/decompounding
Improved Kuromoji search mode segmentation/decompounding Key: LUCENE-3730 URL: https://issues.apache.org/jira/browse/LUCENE-3730 Project: Lucene - Java Issue Type: Improvement Components: modules/analysis Affects Versions: 3.6, 4.0 Reporter: Christian Moen Kuromoji has a segmentation mode for search that uses a heuristic to promote additional segmentation of long candidate tokens to get a decompounding effect. This heuristic has been improved. Patch is coming up. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-1198) confine all solrconfig.xml parsing to SolrConfig.java
[ https://issues.apache.org/jira/browse/SOLR-1198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Noble Paul resolved SOLR-1198. -- Resolution: Fixed Fix Version/s: (was: 3.6) (was: 4.0) 1.4 This was resolved in 1.4 confine all solrconfig.xml parsing to SolrConfig.java - Key: SOLR-1198 URL: https://issues.apache.org/jira/browse/SOLR-1198 Project: Solr Issue Type: Improvement Reporter: Noble Paul Assignee: Noble Paul Fix For: 1.4 Attachments: SOLR-1198.patch, SOLR-1198.patch, SOLR-1198.patch, SOLR-1198.patch, SOLR-1198.patch, SOLR-1198.patch, SOLR-1198.patch, SOLR-1198.patch, SOLR-1198.patch, SOLR-1198.patch, SOLR-1198.patch, SOLR-1198.patch Currently , xpath evaluations are spread across Solr code. It would be cleaner if if can do it all in one place . All the parsing can be done in SolrConfig.java another problem with the current design is that we are not able to benefit from re-use of solrconfig object across cores. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3730) Improved Kuromoji search mode segmentation/decompounding
[ https://issues.apache.org/jira/browse/LUCENE-3730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Christian Moen updated LUCENE-3730: --- Attachment: LUCENE-3730_trunk.patch Improved Kuromoji search mode segmentation/decompounding Key: LUCENE-3730 URL: https://issues.apache.org/jira/browse/LUCENE-3730 Project: Lucene - Java Issue Type: Improvement Components: modules/analysis Affects Versions: 3.6, 4.0 Reporter: Christian Moen Attachments: LUCENE-3730_trunk.patch Kuromoji has a segmentation mode for search that uses a heuristic to promote additional segmentation of long candidate tokens to get a decompounding effect. This heuristic has been improved. Patch is coming up. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3730) Improved Kuromoji search mode segmentation/decompounding
[ https://issues.apache.org/jira/browse/LUCENE-3730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13195927#comment-13195927 ] Christian Moen commented on LUCENE-3730: Find attached a patch for {{trunk}} that improves the heuristic. Search segmentation tests/examples are in {{search-segmentation-tests.txt}} and is validated by {{TestSearchMode}}. Note that both the tests and the heuristic is tuned for IPADIC. Hence, we need to revisit this when we add support for other dictionaries/models. I've also moved the ASF license header in {{TestExtendedMode.java}} to the right place. Improved Kuromoji search mode segmentation/decompounding Key: LUCENE-3730 URL: https://issues.apache.org/jira/browse/LUCENE-3730 Project: Lucene - Java Issue Type: Improvement Components: modules/analysis Affects Versions: 3.6, 4.0 Reporter: Christian Moen Attachments: LUCENE-3730_trunk.patch Kuromoji has a segmentation mode for search that uses a heuristic to promote additional segmentation of long candidate tokens to get a decompounding effect. This heuristic has been improved. Patch is coming up. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3730) Improved Kuromoji search mode segmentation/decompounding
[ https://issues.apache.org/jira/browse/LUCENE-3730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13195929#comment-13195929 ] Christian Moen commented on LUCENE-3730: If you want to try the new search mode, there's a simple Kuromoji web interface available on http://atilika.org/kuromoji that perhaps is useful. After inputing some text and pressing enter, click normal mode to switch to search mode to test the various segmentation modes for the given input. Improved Kuromoji search mode segmentation/decompounding Key: LUCENE-3730 URL: https://issues.apache.org/jira/browse/LUCENE-3730 Project: Lucene - Java Issue Type: Improvement Components: modules/analysis Affects Versions: 3.6, 4.0 Reporter: Christian Moen Attachments: LUCENE-3730_trunk.patch Kuromoji has a segmentation mode for search that uses a heuristic to promote additional segmentation of long candidate tokens to get a decompounding effect. This heuristic has been improved. Patch is coming up. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3726) Default KuromojiAnalyzer to use search mode
[ https://issues.apache.org/jira/browse/LUCENE-3726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13195932#comment-13195932 ] Christian Moen commented on LUCENE-3726: I've improved the heuristic and submitted a patch to LUCENE-3730, which covers the issue. We can now deal with cases such as コニカミノルタホールディングス and many others just fine. The former becomes コニカ ミノルタ ホールディングス as we'd like. I think we should apply LUCENE-3730 before changing any defaults -- and also independently of changing any defaults. I think we should also make sure that the default we use for Lucene is consistent with the Solr's default in {{schema.xml}} for {{text_ja}}. I'll do additional tests on a Japanese corpus and provide feedback, and we can use this as a basis for how to follow up. Hopefully, we'll have sufficient and good data to conclude on this. Default KuromojiAnalyzer to use search mode --- Key: LUCENE-3726 URL: https://issues.apache.org/jira/browse/LUCENE-3726 Project: Lucene - Java Issue Type: Improvement Affects Versions: 3.6, 4.0 Reporter: Robert Muir Kuromoji supports an option to segment text in a way more suitable for search, by preventing long compound nouns as indexing terms. In general 'how you segment' can be important depending on the application (see http://nlp.stanford.edu/pubs/acl-wmt08-cws.pdf for some studies on this in chinese) The current algorithm punishes the cost based on some parameters (SEARCH_MODE_PENALTY, SEARCH_MODE_LENGTH, etc) for long runs of kanji. Some questions (these can be separate future issues if any useful ideas come out): * should these parameters continue to be static-final, or configurable? * should POS also play a role in the algorithm (can/should we refine exactly what we decompound)? * is the Tokenizer the best place to do this, or should we do it in a tokenfilter? or both? with a tokenfilter, one idea would be to also preserve the original indexing term, overlapping it: e.g. ABCD - AB, CD, ABCD(posInc=0) from my understanding this tends to help with noun compounds in other languages, because IDF of the original term boosts 'exact' compound matches. but does a tokenfilter provide the segmenter enough 'context' to do this properly? Either way, I think as a start we should turn on what we have by default: its likely a very easy win. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3056) Introduce Japanese field type in schema.xml
[ https://issues.apache.org/jira/browse/SOLR-3056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13195966#comment-13195966 ] Christian Moen commented on SOLR-3056: -- Robert, I've improved the search mode heuristic (see LUCENE-3730 with patch) and I've also provided some feedback on LUCENE-3726. Before providing a patch to use search mode as our default, I'd like to do some corpus-based testing to make sure overall segmentation quality is where I'd like it to be. As for this JIRA, I guess it has branched out into the following topics: # Introduce field type for Japanese in {{schema.xml}} # Move Kuromoji to core to make it generally available in Solr # Get rid of contrib altogether There seems to be consensus to move Kuromoji to core from at least three people (excluding myself). Do you prefer that we conclude on LUCENE-3726 before we follow up on getting Japanese support for Solr and Lucene working out-of-the-box -- or can we conclude on default search mode separately? I'm happy to start JIRAs for moving Kuromoji to get Japanese support in place if that's the best next course of action. Please advise. Many thanks. Introduce Japanese field type in schema.xml --- Key: SOLR-3056 URL: https://issues.apache.org/jira/browse/SOLR-3056 Project: Solr Issue Type: New Feature Components: Schema and Analysis Affects Versions: 3.6, 4.0 Reporter: Christian Moen Kuromoji (LUCENE-3305) is now on both on trunk and branch_3x (thanks again Robert, Uwe and Simon). It would be very good to get a default field type defined for Japanese in {{schema.xml}} so we can good Japanese out-of-the-box support in Solr. I've been playing with the below configuration today, which I think is a reasonable starting point for Japanese. There's lot to be said about various considerations necessary when searching Japanese, but perhaps a wiki page is more suitable to cover the wider topic? In order to make the below {{text_ja}} field type work, Kuromoji itself and its analyzers need to be seen by the Solr classloader. However, these are currently in contrib and I'm wondering if we should consider moving them to core to make them directly available. If there are concerns with additional memory usage, etc. for non-Japanese users, we can make sure resources are loaded lazily and only when needed in factory-land. Any thoughts? {code:xml} !-- Text field type is suitable for Japanese text using morphological analysis NOTE: Please copy files contrib/analysis-extras/lucene-libs/lucene-kuromoji-x.y.z.jar dist/apache-solr-analysis-extras-x.y.z.jar to your Solr lib directory (i.e. example/solr/lib) before before starting Solr. (x.y.z refers to a version number) If you would like to optimize for precision, default operator AND with solrQueryParser defaultOperator=AND/ below (this file). Use OR if you would like to optimize for recall (default). -- fieldType name=text_ja class=solr.TextField positionIncrementGap=100 autoGeneratePhraseQueries=false analyzer !-- Kuromoji Japanese morphological analyzer/tokenizer Use search-mode to get a noun-decompounding effect useful for search. Example: 関西国際空港 (Kansai International Airpart) becomes 関西 (Kansai) 国際 (International) 空港 (airport) so we get a match for 空港 (airport) as we would expect from a good search engine Valid values for mode are: normal: default segmentation search: segmentation useful for search (extra compound splitting) extended: search mode with unigramming of unknown words (experimental) NOTE: Search mode improves segmentation for search at the expense of part-of-speech accuracy -- tokenizer class=solr.KuromojiTokenizerFactory mode=search/ !-- Reduces inflected verbs and adjectives to their base/dectionary forms (辞書形) -- filter class=solr.KuromojiBaseFormFilterFactory/ !-- Optionally remove tokens with certain part-of-speeches filter class=solr.KuromojiPartOfSpeechStopFilterFactory tags=stopTags.txt enablePositionIncrements=true/ -- !-- Normalizes full-width romaji to half-with and half-width kana to full-width (Unicode NFKC subset) -- filter class=solr.CJKWidthFilterFactory/ !-- Lower-case romaji characters -- filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (SOLR-3060) add highlighter support to SurroundQParserPlugin
[ https://issues.apache.org/jira/browse/SOLR-3060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13195968#comment-13195968 ] abhimanyu commented on SOLR-3060: - thanks for your patch but i am not able to apply this patch ,i am not using any svn version , please tell me how to apply this patch. add highlighter support to SurroundQParserPlugin - Key: SOLR-3060 URL: https://issues.apache.org/jira/browse/SOLR-3060 Project: Solr Issue Type: Improvement Components: search Affects Versions: 4.0 Reporter: Ahmet Arslan Priority: Minor Fix For: 4.0 Attachments: SOLR-3060.patch, SOLR-3060.patch Highlighter does not recognize SrndQuery family. http://search-lucene.com/m/FuDsU1sTjgM http://search-lucene.com/m/wD8c11gNTb61 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3060) add highlighter support to SurroundQParserPlugin
[ https://issues.apache.org/jira/browse/SOLR-3060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13195969#comment-13195969 ] abhimanyu commented on SOLR-3060: - i am using patch -p0 -i SOLR-3060.patch --dry-run as mentioned in the docs , but error is coming that not a correct p option,plz tell me how to use ur patch add highlighter support to SurroundQParserPlugin - Key: SOLR-3060 URL: https://issues.apache.org/jira/browse/SOLR-3060 Project: Solr Issue Type: Improvement Components: search Affects Versions: 4.0 Reporter: Ahmet Arslan Priority: Minor Fix For: 4.0 Attachments: SOLR-3060.patch, SOLR-3060.patch Highlighter does not recognize SrndQuery family. http://search-lucene.com/m/FuDsU1sTjgM http://search-lucene.com/m/wD8c11gNTb61 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3060) add highlighter support to SurroundQParserPlugin
[ https://issues.apache.org/jira/browse/SOLR-3060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13195976#comment-13195976 ] Shalu Singh commented on SOLR-3060: --- i am facing the same problem. Donno how to apply the SOLR-3060.patch add highlighter support to SurroundQParserPlugin - Key: SOLR-3060 URL: https://issues.apache.org/jira/browse/SOLR-3060 Project: Solr Issue Type: Improvement Components: search Affects Versions: 4.0 Reporter: Ahmet Arslan Priority: Minor Fix For: 4.0 Attachments: SOLR-3060.patch, SOLR-3060.patch Highlighter does not recognize SrndQuery family. http://search-lucene.com/m/FuDsU1sTjgM http://search-lucene.com/m/wD8c11gNTb61 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org