[jira] [Commented] (LUCENE-2956) Support updateDocument() with DWPTs
[ https://issues.apache.org/jira/browse/LUCENE-2956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13019234#comment-13019234 ] Simon Willnauer commented on LUCENE-2956: - bq. Though it worries me a little how complex the whole delete/update logic is becoming (not only the part this patch adds). I can not more agree. Its been very complex making all the tests pass and figuring out all the little nifty cornercases here. A different, somewhat simpler approach would be great. Eventually for Searchable Ram Buffers we might need to switch to seq. ids anyway but I think for landing DWPT on trunk we can go with the current approach. I will update the latest patch and commit it to the branch and merge with trunk again. Once that is done I will setup a hudson build for RT so we give it a little exercise while we prepare moving to trunk. Support updateDocument() with DWPTs --- Key: LUCENE-2956 URL: https://issues.apache.org/jira/browse/LUCENE-2956 Project: Lucene - Java Issue Type: Bug Components: Index Affects Versions: Realtime Branch Reporter: Michael Busch Assignee: Simon Willnauer Priority: Minor Fix For: Realtime Branch Attachments: LUCENE-2956.patch With separate DocumentsWriterPerThreads (DWPT) it can currently happen that the delete part of an updateDocument() is flushed and committed separately from the corresponding new document. We need to make sure that updateDocument() is always an atomic operation from a IW.commit() and IW.getReader() perspective. See LUCENE-2324 for more details. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Patch for http_proxy support in solr-ruby client
Hi Otis, The fork you're talking is mine! But the repos I forked is not official, so I am trying to find out where the official version is so I can patch it. D On 13/04/2011 04:45, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Hi, Hm, maybe you are asking where solr-ruby actually lives and is being developed? I'm not sure. I see it under solr/client/ruby/solr-ruby (no new development in ages?), but I also see an *active* solr-ruby fork over on https://github.com/bbcrd/solr-ruby . So if you want to contribute to solr-ruby on Github, get yourself a Github account, fork that solr-ruby, make your change, and submit it via the pull request. This is separate from Solr @ Apache. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: Duncan Robertson duncan.robert...@bbc.co.uk To: dev@lucene.apache.org Sent: Tue, April 12, 2011 4:36:17 AM Subject: Patch for http_proxy support in solr-ruby client Hi, I have a patch for adding http_proxy support to the solr-ruby client. I thought the project was managed via Github, but this turns out not to be the case. It the process the same as for Solr itself? https://github.com/bbcrd/solr-ruby/compare/5b06e66f4e%5E...a76aee983e Best, Duncan http://www.bbc.co.uk/ This e-mail (and any attachments) is confidential and may contain personal views which are not the views of the BBC unless specifically stated. If you have received it in error, please delete it from your system. Do not use, copy or disclose the information in any way nor act in reliance on it and notify the sender immediately. Please note that the BBC monitors e-mails sent or received. Further communication will signify your consent to this. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org http://www.bbc.co.uk/ This e-mail (and any attachments) is confidential and may contain personal views which are not the views of the BBC unless specifically stated. If you have received it in error, please delete it from your system. Do not use, copy or disclose the information in any way nor act in reliance on it and notify the sender immediately. Please note that the BBC monitors e-mails sent or received. Further communication will signify your consent to this. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-2956) Support updateDocument() with DWPTs
[ https://issues.apache.org/jira/browse/LUCENE-2956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer updated LUCENE-2956: Attachment: LUCENE-2956.patch here is an updated patch fixing some spellings, adds atomic updates for Term[] and Query[] and removes the LogMergePolicy restriction from TestRollingUpdates Support updateDocument() with DWPTs --- Key: LUCENE-2956 URL: https://issues.apache.org/jira/browse/LUCENE-2956 Project: Lucene - Java Issue Type: Bug Components: Index Affects Versions: Realtime Branch Reporter: Michael Busch Assignee: Simon Willnauer Priority: Minor Fix For: Realtime Branch Attachments: LUCENE-2956.patch, LUCENE-2956.patch With separate DocumentsWriterPerThreads (DWPT) it can currently happen that the delete part of an updateDocument() is flushed and committed separately from the corresponding new document. We need to make sure that updateDocument() is always an atomic operation from a IW.commit() and IW.getReader() perspective. See LUCENE-2324 for more details. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[HUDSON] Lucene-Solr-tests-only-trunk - Build # 7061 - Failure
Build: https://hudson.apache.org/hudson/job/Lucene-Solr-tests-only-trunk/7061/ 14 tests failed. REGRESSION: org.apache.lucene.index.TestIndexReaderReopen.testThreadSafety Error Message: Error occurred in thread Thread-110: /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/lucene/build/test/1/test8006296579247039339tmp/_e_2.doc (Too many open files in system) Stack Trace: junit.framework.AssertionFailedError: Error occurred in thread Thread-110: /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/lucene/build/test/1/test8006296579247039339tmp/_e_2.doc (Too many open files in system) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1232) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1160) /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/lucene/build/test/1/test8006296579247039339tmp/_e_2.doc (Too many open files in system) at org.apache.lucene.index.TestIndexReaderReopen.testThreadSafety(TestIndexReaderReopen.java:833) REGRESSION: org.apache.lucene.index.TestIndexWriterMergePolicy.testMergeDocCount0 Error Message: /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/lucene/build/test/3/test8275723700845306539tmp/_0_0.tiv (Too many open files in system) Stack Trace: java.io.FileNotFoundException: /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/lucene/build/test/3/test8275723700845306539tmp/_0_0.tiv (Too many open files in system) at java.io.RandomAccessFile.open(Native Method) at java.io.RandomAccessFile.init(RandomAccessFile.java:233) at org.apache.lucene.store.FSDirectory$FSIndexOutput.init(FSDirectory.java:448) at org.apache.lucene.store.FSDirectory.createOutput(FSDirectory.java:312) at org.apache.lucene.store.MockDirectoryWrapper.createOutput(MockDirectoryWrapper.java:348) at org.apache.lucene.index.codecs.VariableGapTermsIndexWriter.init(VariableGapTermsIndexWriter.java:161) at org.apache.lucene.index.codecs.standard.StandardCodec.fieldsConsumer(StandardCodec.java:58) at org.apache.lucene.index.PerFieldCodecWrapper$FieldsWriter.init(PerFieldCodecWrapper.java:64) at org.apache.lucene.index.PerFieldCodecWrapper.fieldsConsumer(PerFieldCodecWrapper.java:54) at org.apache.lucene.index.FreqProxTermsWriter.flush(FreqProxTermsWriter.java:78) at org.apache.lucene.index.TermsHash.flush(TermsHash.java:103) at org.apache.lucene.index.DocInverter.flush(DocInverter.java:65) at org.apache.lucene.index.DocFieldProcessor.flush(DocFieldProcessor.java:55) at org.apache.lucene.index.DocumentsWriter.flush(DocumentsWriter.java:567) at org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:2497) at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:2462) at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1211) at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1180) at org.apache.lucene.index.TestIndexWriterMergePolicy.addDoc(TestIndexWriterMergePolicy.java:221) at org.apache.lucene.index.TestIndexWriterMergePolicy.testMergeDocCount0(TestIndexWriterMergePolicy.java:189) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1232) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1160) REGRESSION: org.apache.lucene.index.TestIndexWriterOnDiskFull.testAddIndexOnDiskFull Error Message: addIndexes(Directory[]) + optimize() hit IOException after disk space was freed up Stack Trace: junit.framework.AssertionFailedError: addIndexes(Directory[]) + optimize() hit IOException after disk space was freed up at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1232) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1160) at org.apache.lucene.index.TestIndexWriterOnDiskFull.testAddIndexOnDiskFull(TestIndexWriterOnDiskFull.java:327) REGRESSION: org.apache.lucene.index.TestLongPostings.testLongPostings Error Message: /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/lucene/build/test/5/longpostings.6978566692871504462/_14_0.tib (Too many open files in system) Stack Trace: java.io.FileNotFoundException: /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/lucene/build/test/5/longpostings.6978566692871504462/_14_0.tib (Too many open files in system) at java.io.RandomAccessFile.open(Native Method) at java.io.RandomAccessFile.init(RandomAccessFile.java:233) at org.apache.lucene.store.FSDirectory$FSIndexOutput.init(FSDirectory.java:448) at
TestIndexWriterDelete#testUpdatesOnDiskFull can false fail
In TestIndexWriterDelete#testUpdatesOnDiskFull especially between line 538 and 553 we could get a random exception from the MockDirectoryWrapper which makes the test fail since we are not catching / expecting those exceptions. I can make this fail on trunk even in 1000 runs but on realtime it fails quickly after I merged this morning. I think we should just disable the random exception for this part and reenable after we are done, see patch below! - Thoughts? Index: lucene/src/test/org/apache/lucene/index/TestIndexWriterDelete.java === --- lucene/src/test/org/apache/lucene/index/TestIndexWriterDelete.java (revision 1091721) +++ lucene/src/test/org/apache/lucene/index/TestIndexWriterDelete.java (working copy) @@ -536,7 +536,9 @@ fail(testName + hit IOException after disk space was freed up); } } - +// prevent throwing a random exception here!! +final double randomIOExceptionRate = dir.getRandomIOExceptionRate(); +dir.setRandomIOExceptionRate(0.0); if (!success) { // Must force the close else the writer can have // open files which cause exc in MockRAMDir.close @@ -549,6 +551,7 @@ _TestUtil.checkIndex(dir); TestIndexWriter.assertNoUnreferencedFiles(dir, after writer.close); } +dir.setRandomIOExceptionRate(randomIOExceptionRate); // Finally, verify index is not corrupt, and, if // we succeeded, we see all docs changed, and if - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2956) Support updateDocument() with DWPTs
[ https://issues.apache.org/jira/browse/LUCENE-2956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13019298#comment-13019298 ] Simon Willnauer commented on LUCENE-2956: - I committed that patch and merged with trunk Support updateDocument() with DWPTs --- Key: LUCENE-2956 URL: https://issues.apache.org/jira/browse/LUCENE-2956 Project: Lucene - Java Issue Type: Bug Components: Index Affects Versions: Realtime Branch Reporter: Michael Busch Assignee: Simon Willnauer Priority: Minor Fix For: Realtime Branch Attachments: LUCENE-2956.patch, LUCENE-2956.patch With separate DocumentsWriterPerThreads (DWPT) it can currently happen that the delete part of an updateDocument() is flushed and committed separately from the corresponding new document. We need to make sure that updateDocument() is always an atomic operation from a IW.commit() and IW.getReader() perspective. See LUCENE-2324 for more details. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Numerical ids for terms?
On Tue, 2011-04-12 at 11:41 +0200, Gregor Heinrich wrote: Hi -- has there been any effort to create a numerical representation of Lucene indices. That is, to use the Lucene Directory backend as a large term-document matrix at index level. As this would require bijective mapping between terms (per-field, as customary in Lucene) and a numerical index (integer, monotonous from 0 to numTerms()-1), I guess this requires some some special modifications to the Lucene core. Maybe you're thinking about something like TermsEnum? https://hudson.apache.org/hudson/job/Lucene-trunk/javadoc/all/org/apache/lucene/index/TermsEnum.html It provides ordinal-access to terms, represented with longs. In order to make the access at index-level rather than segment-level you will have to perform a merge of the ordinals from the different segments. Unfortunately it is optional whether the codec supports ordinal-based terms access and the default codec does not, so you will have to explicitly select a codec when you build your index. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3018) Lucene Native Directory implementation need automated build
[ https://issues.apache.org/jira/browse/LUCENE-3018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13019326#comment-13019326 ] Simon Willnauer commented on LUCENE-3018: - varun, pastbin links are not ideal for work on issues here. you can post small snippets directly here or upload a patch so we can review. nevertheless, the example you have added to pastebin seems just like a generic example can you try to integrate it into the trunk/lucene/conrib/misk/build.xml file and make it compile the NativePosixUtil.cpp? If you have that you can create a patch with svn diff LUCENE-3018.patch and upload it. if you need 3rd party libs like ant contrib you can upload them here too. simon Lucene Native Directory implementation need automated build --- Key: LUCENE-3018 URL: https://issues.apache.org/jira/browse/LUCENE-3018 Project: Lucene - Java Issue Type: Wish Components: Build Affects Versions: 4.0 Reporter: Simon Willnauer Assignee: Varun Thacker Priority: Minor Fix For: 4.0 Currently the native directory impl in contrib/misc require manual action to compile the c code (partially) documented in https://svn.apache.org/repos/asf/lucene/dev/trunk/lucene/contrib/misc/src/java/overview.html yet it would be nice if we had an ant task and documentation for all platforms how to compile them and set up the prerequisites. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Patch for http_proxy support in solr-ruby client
Duncan - I'm the original creator of solr-ruby and put it under Solr's svn. But many folks are now using RSolr, and even in our own (JRuby-based product) we use simply Net::HTTP and not a library like solr-ruby or RSolr. I don't personally have incentive to continue to maintain solr-ruby, so maybe your fork is now official? Though the git craze has made me feel weary because so many official versions are simply someone's personal fork. We can pull solr-ruby from Solr's svn eventually, as something else more official takes its place. Erik On Apr 13, 2011, at 04:13 , Duncan Robertson wrote: Hi Otis, The fork you're talking is mine! But the repos I forked is not official, so I am trying to find out where the official version is so I can patch it. D On 13/04/2011 04:45, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Hi, Hm, maybe you are asking where solr-ruby actually lives and is being developed? I'm not sure. I see it under solr/client/ruby/solr-ruby (no new development in ages?), but I also see an *active* solr-ruby fork over on https://github.com/bbcrd/solr-ruby . So if you want to contribute to solr-ruby on Github, get yourself a Github account, fork that solr-ruby, make your change, and submit it via the pull request. This is separate from Solr @ Apache. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: Duncan Robertson duncan.robert...@bbc.co.uk To: dev@lucene.apache.org Sent: Tue, April 12, 2011 4:36:17 AM Subject: Patch for http_proxy support in solr-ruby client Hi, I have a patch for adding http_proxy support to the solr-ruby client. I thought the project was managed via Github, but this turns out not to be the case. It the process the same as for Solr itself? https://github.com/bbcrd/solr-ruby/compare/5b06e66f4e%5E...a76aee983e Best, Duncan http://www.bbc.co.uk/ This e-mail (and any attachments) is confidential and may contain personal views which are not the views of the BBC unless specifically stated. If you have received it in error, please delete it from your system. Do not use, copy or disclose the information in any way nor act in reliance on it and notify the sender immediately. Please note that the BBC monitors e-mails sent or received. Further communication will signify your consent to this. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org http://www.bbc.co.uk/ This e-mail (and any attachments) is confidential and may contain personal views which are not the views of the BBC unless specifically stated. If you have received it in error, please delete it from your system. Do not use, copy or disclose the information in any way nor act in reliance on it and notify the sender immediately. Please note that the BBC monitors e-mails sent or received. Further communication will signify your consent to this. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3018) Lucene Native Directory implementation need automated build
[ https://issues.apache.org/jira/browse/LUCENE-3018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13019329#comment-13019329 ] Uwe Schindler commented on LUCENE-3018: --- Hi, I suggest to use ANT Contrib for compiling the C Parts. That includes fine in our build infrastructure and supplies ANT tasks for compiling and linking: [http://ant-contrib.sourceforge.net/cpptasks/index.html] I think your example pastebin is using this. We only need to add the JAR in the lib folder of Lucene, so ANT can load the plugin. Lucene Native Directory implementation need automated build --- Key: LUCENE-3018 URL: https://issues.apache.org/jira/browse/LUCENE-3018 Project: Lucene - Java Issue Type: Wish Components: Build Affects Versions: 4.0 Reporter: Simon Willnauer Assignee: Varun Thacker Priority: Minor Fix For: 4.0 Currently the native directory impl in contrib/misc require manual action to compile the c code (partially) documented in https://svn.apache.org/repos/asf/lucene/dev/trunk/lucene/contrib/misc/src/java/overview.html yet it would be nice if we had an ant task and documentation for all platforms how to compile them and set up the prerequisites. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3018) Lucene Native Directory implementation need automated build
[ https://issues.apache.org/jira/browse/LUCENE-3018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13019356#comment-13019356 ] Varun Thacker commented on LUCENE-3018: --- I made the mistake of adding the ant-contrib jar and trying to compile it. This requires cpptasks which is not part of ant-contrib Link to the cpptasks jar : http://sourceforge.net/projects/ant-contrib/files/ant-contrib/cpptasks-1.0-beta4/ Adding this jar , I was able to compile the code. Lucene Native Directory implementation need automated build --- Key: LUCENE-3018 URL: https://issues.apache.org/jira/browse/LUCENE-3018 Project: Lucene - Java Issue Type: Wish Components: Build Affects Versions: 4.0 Reporter: Simon Willnauer Assignee: Varun Thacker Priority: Minor Fix For: 4.0 Currently the native directory impl in contrib/misc require manual action to compile the c code (partially) documented in https://svn.apache.org/repos/asf/lucene/dev/trunk/lucene/contrib/misc/src/java/overview.html yet it would be nice if we had an ant task and documentation for all platforms how to compile them and set up the prerequisites. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [HUDSON] Lucene-trunk - Build # 1528 - Still Failing
GC overhead limit exceeded... Mike http://blog.mikemccandless.com On Tue, Apr 12, 2011 at 10:43 PM, Apache Hudson Server hud...@hudson.apache.org wrote: Build: https://hudson.apache.org/hudson/job/Lucene-trunk/1528/ 1 tests failed. REGRESSION: org.apache.lucene.index.TestNRTThreads.testNRTThreads Error Message: Some threads threw uncaught exceptions! Stack Trace: junit.framework.AssertionFailedError: Some threads threw uncaught exceptions! at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1232) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1160) at org.apache.lucene.util.LuceneTestCase.tearDown(LuceneTestCase.java:521) Build Log (for compile errors): [...truncated 11900 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: TestIndexWriterDelete#testUpdatesOnDiskFull can false fail
+1 Mike http://blog.mikemccandless.com On Wed, Apr 13, 2011 at 5:58 AM, Simon Willnauer simon.willna...@googlemail.com wrote: In TestIndexWriterDelete#testUpdatesOnDiskFull especially between line 538 and 553 we could get a random exception from the MockDirectoryWrapper which makes the test fail since we are not catching / expecting those exceptions. I can make this fail on trunk even in 1000 runs but on realtime it fails quickly after I merged this morning. I think we should just disable the random exception for this part and reenable after we are done, see patch below! - Thoughts? Index: lucene/src/test/org/apache/lucene/index/TestIndexWriterDelete.java === --- lucene/src/test/org/apache/lucene/index/TestIndexWriterDelete.java (revision 1091721) +++ lucene/src/test/org/apache/lucene/index/TestIndexWriterDelete.java (working copy) @@ -536,7 +536,9 @@ fail(testName + hit IOException after disk space was freed up); } } - + // prevent throwing a random exception here!! + final double randomIOExceptionRate = dir.getRandomIOExceptionRate(); + dir.setRandomIOExceptionRate(0.0); if (!success) { // Must force the close else the writer can have // open files which cause exc in MockRAMDir.close @@ -549,6 +551,7 @@ _TestUtil.checkIndex(dir); TestIndexWriter.assertNoUnreferencedFiles(dir, after writer.close); } + dir.setRandomIOExceptionRate(randomIOExceptionRate); // Finally, verify index is not corrupt, and, if // we succeeded, we see all docs changed, and if - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2956) Support updateDocument() with DWPTs
[ https://issues.apache.org/jira/browse/LUCENE-2956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13019370#comment-13019370 ] Jason Rutherglen commented on LUCENE-2956: -- Simon, nice work. I agree with Michael B. that the deletes are super complex. We had discussed using sequence ids for all segments (not just the RT enabled DWPT ones) however we never worked out a specification, eg, for things like wrap around if a primitive short[] was used. Shall we start again on LUCENE-2312? I think we still need/want to use sequence ids there. The RT DWPTs shouldn't have so many documents that using a long[] for the sequence ids is too RAM consuming? Support updateDocument() with DWPTs --- Key: LUCENE-2956 URL: https://issues.apache.org/jira/browse/LUCENE-2956 Project: Lucene - Java Issue Type: Bug Components: Index Affects Versions: Realtime Branch Reporter: Michael Busch Assignee: Simon Willnauer Priority: Minor Fix For: Realtime Branch Attachments: LUCENE-2956.patch, LUCENE-2956.patch With separate DocumentsWriterPerThreads (DWPT) it can currently happen that the delete part of an updateDocument() is flushed and committed separately from the corresponding new document. We need to make sure that updateDocument() is always an atomic operation from a IW.commit() and IW.getReader() perspective. See LUCENE-2324 for more details. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[HUDSON] Lucene-Solr-tests-only-3.x - Build # 7062 - Failure
Build: https://hudson.apache.org/hudson/job/Lucene-Solr-tests-only-3.x/7062/ 1 tests failed. REGRESSION: org.apache.lucene.collation.TestCollationKeyAnalyzer.testThreadSafe Error Message: Java heap space Stack Trace: java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOf(Arrays.java:2894) at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:117) at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:589) at java.lang.StringBuffer.append(StringBuffer.java:337) at java.text.RuleBasedCollator.getCollationKey(RuleBasedCollator.java:617) at org.apache.lucene.collation.CollationKeyFilter.incrementToken(CollationKeyFilter.java:93) at org.apache.lucene.collation.CollationTestBase.assertThreadSafe(CollationTestBase.java:304) at org.apache.lucene.collation.TestCollationKeyAnalyzer.testThreadSafe(TestCollationKeyAnalyzer.java:89) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1076) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1008) Build Log (for compile errors): [...truncated 9243 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
need help in constructing a query
Need help in constructing a solr query, I need the values for a field. I want values which does not have embedded space The value of the indexed field should not have embedded space. Please help. Thanks Premila
[jira] [Commented] (SOLR-2436) move uimaConfig to under the uima's update processor in solrconfig.xml
[ https://issues.apache.org/jira/browse/SOLR-2436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13019377#comment-13019377 ] Tommaso Teofili commented on SOLR-2436: --- Hello Koji, your patch seems fine to me from the functional point of view. Just, I don't think the SolrUIMAConfigurationReader should be emptied, I wouldn't remove it preferring to assign to it the simple responsibility of reading args without the previous explicit Node traversing but, as you did, using the Solr way. I also made some fixes to remove warning while getting objects from the NamedList. move uimaConfig to under the uima's update processor in solrconfig.xml -- Key: SOLR-2436 URL: https://issues.apache.org/jira/browse/SOLR-2436 Project: Solr Issue Type: Improvement Affects Versions: 3.1 Reporter: Koji Sekiguchi Priority: Minor Attachments: SOLR-2436-3.patch, SOLR-2436.patch, SOLR-2436.patch, SOLR-2436.patch, SOLR-2436_2.patch Solr contrib UIMA has its config just beneath config. I think it should move to uima's update processor tag. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2436) move uimaConfig to under the uima's update processor in solrconfig.xml
[ https://issues.apache.org/jira/browse/SOLR-2436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tommaso Teofili updated SOLR-2436: -- Attachment: SOLR-2436-3.patch move uimaConfig to under the uima's update processor in solrconfig.xml -- Key: SOLR-2436 URL: https://issues.apache.org/jira/browse/SOLR-2436 Project: Solr Issue Type: Improvement Affects Versions: 3.1 Reporter: Koji Sekiguchi Priority: Minor Attachments: SOLR-2436-3.patch, SOLR-2436.patch, SOLR-2436.patch, SOLR-2436.patch, SOLR-2436_2.patch Solr contrib UIMA has its config just beneath config. I think it should move to uima's update processor tag. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Patch for http_proxy support in solr-ruby client
Thanks Erik, I hadn't seen RSolr and it looks like it fixes all the problems was having. Maybe rather than keeping many solutions, I'll just take a look at this one. Duncan On 13/04/2011 14:51, Erik Hatcher erik.hatc...@gmail.com wrote: Duncan - I'm the original creator of solr-ruby and put it under Solr's svn. But many folks are now using RSolr, and even in our own (JRuby-based product) we use simply Net::HTTP and not a library like solr-ruby or RSolr. I don't personally have incentive to continue to maintain solr-ruby, so maybe your fork is now official? Though the git craze has made me feel weary because so many official versions are simply someone's personal fork. We can pull solr-ruby from Solr's svn eventually, as something else more official takes its place. Erik On Apr 13, 2011, at 04:13 , Duncan Robertson wrote: Hi Otis, The fork you're talking is mine! But the repos I forked is not official, so I am trying to find out where the official version is so I can patch it. D On 13/04/2011 04:45, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Hi, Hm, maybe you are asking where solr-ruby actually lives and is being developed? I'm not sure. I see it under solr/client/ruby/solr-ruby (no new development in ages?), but I also see an *active* solr-ruby fork over on https://github.com/bbcrd/solr-ruby . So if you want to contribute to solr-ruby on Github, get yourself a Github account, fork that solr-ruby, make your change, and submit it via the pull request. This is separate from Solr @ Apache. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: Duncan Robertson duncan.robert...@bbc.co.uk To: dev@lucene.apache.org Sent: Tue, April 12, 2011 4:36:17 AM Subject: Patch for http_proxy support in solr-ruby client Hi, I have a patch for adding http_proxy support to the solr-ruby client. I thought the project was managed via Github, but this turns out not to be the case. It the process the same as for Solr itself? https://github.com/bbcrd/solr-ruby/compare/5b06e66f4e%5E...a76aee983e Best, Duncan http://www.bbc.co.uk/ This e-mail (and any attachments) is confidential and may contain personal views which are not the views of the BBC unless specifically stated. If you have received it in error, please delete it from your system. Do not use, copy or disclose the information in any way nor act in reliance on it and notify the sender immediately. Please note that the BBC monitors e-mails sent or received. Further communication will signify your consent to this. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org http://www.bbc.co.uk/ This e-mail (and any attachments) is confidential and may contain personal views which are not the views of the BBC unless specifically stated. If you have received it in error, please delete it from your system. Do not use, copy or disclose the information in any way nor act in reliance on it and notify the sender immediately. Please note that the BBC monitors e-mails sent or received. Further communication will signify your consent to this. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org http://www.bbc.co.uk/ This e-mail (and any attachments) is confidential and may contain personal views which are not the views of the BBC unless specifically stated. If you have received it in error, please delete it from your system. Do not use, copy or disclose the information in any way nor act in reliance on it and notify the sender immediately. Please note that the BBC monitors e-mails sent or received. Further communication will signify your consent to this. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
GSoC: LUCENE-2308: Separately specify a field's type
Hi all, if everything goes well I'll be delighted to be part of this project this summer together with my assigned mentor Mike. My task will be to introduce new classes to Lucene core which will enable to separate Fields' Lucene properties from it's value ( https://issues.apache.org/jira/browse/LUCENE-2308). As you assume, this will largely impact lucene solr, so we need to think this through thoroughly. Changes will include: - Introduction of an FieldType class that will hold all the extra properties now stored inside Field instance other than field value itself. - New FieldTypeAttribute interface will be added to handle extension with new field properties inspired by IndexWriterConfig. - Refactoring and dividing of settings for term frequency and positioning can also be done (LUCENE-2048https://issues.apache.org/jira/browse/LUCENE-2048 ) - Discuss possible effects of completion of LUCENE-2310https://issues.apache.org/jira/browse/LUCENE-2310on this project - Adequate Factory class for easier configuration of new Field instances together with manually added new FieldTypeAttributes - FieldType, once instantiated is read-only. Only fields value can be changed. - Simple hierarchy of Field classes with core properties logically predefaulted. E.g.: - NumberField, - StringField, - TextField, - NonIndexedField, My questions and issues: - Backward compatibility? Will this go to Lucene 3.0? - What is the best way to break this into small baby steps? Kindly, Nikola Tanković
[jira] [Created] (LUCENE-3026) smartcn analysis throw NullPointer exception when the length of analysed text over 32767
smartcn analysis throw NullPointer exception when the length of analysed text over 32767 Key: LUCENE-3026 URL: https://issues.apache.org/jira/browse/LUCENE-3026 Project: Lucene - Java Issue Type: Bug Components: contrib/analyzers Affects Versions: 3.1, 4.0 Reporter: wangzhenghang That all because of org.apache.lucene.analysis.cn.smart.hhmm.SegGraph's makeIndex() method: public ListSegToken makeIndex() { ListSegToken result = new ArrayListSegToken(); int s = -1, count = 0, size = tokenListTable.size(); ListSegToken tokenList; short index = 0; while (count size) { if (isStartExist(s)) { tokenList = tokenListTable.get(s); for (SegToken st : tokenList) { st.index = index; result.add(st); index++; } count++; } s++; } return result; } 'short index = 0;' should be 'int index = 0;'. And that's reported here http://code.google.com/p/imdict-chinese-analyzer/issues/detail?id=2, http://code.google.com/p/imdict-chinese-analyzer/issues/detail?id=11, the author XiaoPingGao have already fixed this bug:http://code.google.com/p/imdict-chinese-analyzer/source/browse/trunk/src/org/apache/lucene/analysis/cn/smart/hhmm/SegGraph.java -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3026) smartcn analysis throw NullPointer exception when the length of analysed text over 32767
[ https://issues.apache.org/jira/browse/LUCENE-3026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wangzhenghang updated LUCENE-3026: -- Description: That's all because of org.apache.lucene.analysis.cn.smart.hhmm.SegGraph's makeIndex() method: public ListSegToken makeIndex() { ListSegToken result = new ArrayListSegToken(); int s = -1, count = 0, size = tokenListTable.size(); ListSegToken tokenList; short index = 0; while (count size) { if (isStartExist(s)) { tokenList = tokenListTable.get(s); for (SegToken st : tokenList) { st.index = index; result.add(st); index++; } count++; } s++; } return result; } here 'short index = 0;' should be 'int index = 0;'. And that's reported here http://code.google.com/p/imdict-chinese-analyzer/issues/detail?id=2 and http://code.google.com/p/imdict-chinese-analyzer/issues/detail?id=11, the author XiaoPingGao have already fixed this bug:http://code.google.com/p/imdict-chinese-analyzer/source/browse/trunk/src/org/apache/lucene/analysis/cn/smart/hhmm/SegGraph.java was: That all because of org.apache.lucene.analysis.cn.smart.hhmm.SegGraph's makeIndex() method: public ListSegToken makeIndex() { ListSegToken result = new ArrayListSegToken(); int s = -1, count = 0, size = tokenListTable.size(); ListSegToken tokenList; short index = 0; while (count size) { if (isStartExist(s)) { tokenList = tokenListTable.get(s); for (SegToken st : tokenList) { st.index = index; result.add(st); index++; } count++; } s++; } return result; } 'short index = 0;' should be 'int index = 0;'. And that's reported here http://code.google.com/p/imdict-chinese-analyzer/issues/detail?id=2, http://code.google.com/p/imdict-chinese-analyzer/issues/detail?id=11, the author XiaoPingGao have already fixed this bug:http://code.google.com/p/imdict-chinese-analyzer/source/browse/trunk/src/org/apache/lucene/analysis/cn/smart/hhmm/SegGraph.java smartcn analysis throw NullPointer exception when the length of analysed text over 32767 Key: LUCENE-3026 URL: https://issues.apache.org/jira/browse/LUCENE-3026 Project: Lucene - Java Issue Type: Bug Components: contrib/analyzers Affects Versions: 3.1, 4.0 Reporter: wangzhenghang That's all because of org.apache.lucene.analysis.cn.smart.hhmm.SegGraph's makeIndex() method: public ListSegToken makeIndex() { ListSegToken result = new ArrayListSegToken(); int s = -1, count = 0, size = tokenListTable.size(); ListSegToken tokenList; short index = 0; while (count size) { if (isStartExist(s)) { tokenList = tokenListTable.get(s); for (SegToken st : tokenList) { st.index = index; result.add(st); index++; } count++; } s++; } return result; } here 'short index = 0;' should be 'int index = 0;'. And that's reported here http://code.google.com/p/imdict-chinese-analyzer/issues/detail?id=2 and http://code.google.com/p/imdict-chinese-analyzer/issues/detail?id=11, the author XiaoPingGao have already fixed this bug:http://code.google.com/p/imdict-chinese-analyzer/source/browse/trunk/src/org/apache/lucene/analysis/cn/smart/hhmm/SegGraph.java -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2312) Search on IndexWriter's RAM Buffer
[ https://issues.apache.org/jira/browse/LUCENE-2312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13019391#comment-13019391 ] Jason Rutherglen commented on LUCENE-2312: -- In the current patch, I'm copying the parallel array for the end of a term's postings per reader [re]open. However in the case where we're opening a reader after each document is indexed, this is wasteful. We can simply queue the term ids from the last indexed document, and only copy the newly updated values over to the 'read' only consistent parallel array. Search on IndexWriter's RAM Buffer -- Key: LUCENE-2312 URL: https://issues.apache.org/jira/browse/LUCENE-2312 Project: Lucene - Java Issue Type: New Feature Components: Search Affects Versions: Realtime Branch Reporter: Jason Rutherglen Assignee: Michael Busch Fix For: Realtime Branch Attachments: LUCENE-2312-FC.patch, LUCENE-2312.patch In order to offer user's near realtime search, without incurring an indexing performance penalty, we can implement search on IndexWriter's RAM buffer. This is the buffer that is filled in RAM as documents are indexed. Currently the RAM buffer is flushed to the underlying directory (usually disk) before being made searchable. Todays Lucene based NRT systems must incur the cost of merging segments, which can slow indexing. Michael Busch has good suggestions regarding how to handle deletes using max doc ids. https://issues.apache.org/jira/browse/LUCENE-2293?focusedCommentId=12841923page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12841923 The area that isn't fully fleshed out is the terms dictionary, which needs to be sorted prior to queries executing. Currently IW implements a specialized hash table. Michael B has a suggestion here: https://issues.apache.org/jira/browse/LUCENE-2293?focusedCommentId=12841915page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12841915 -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Numerical ids for terms?
Thanks Toke and Kirill -- I guess that's the way to go (at least until v4.0). Best regards gregor On 4/13/11 3:42 PM, Toke Eskildsen wrote: On Tue, 2011-04-12 at 11:41 +0200, Gregor Heinrich wrote: Hi -- has there been any effort to create a numerical representation of Lucene indices. That is, to use the Lucene Directory backend as a large term-document matrix at index level. As this would require bijective mapping between terms (per-field, as customary in Lucene) and a numerical index (integer, monotonous from 0 to numTerms()-1), I guess this requires some some special modifications to the Lucene core. Maybe you're thinking about something like TermsEnum? https://hudson.apache.org/hudson/job/Lucene-trunk/javadoc/all/org/apache/lucene/index/TermsEnum.html It provides ordinal-access to terms, represented with longs. In order to make the access at index-level rather than segment-level you will have to perform a merge of the ordinals from the different segments. Unfortunately it is optional whether the codec supports ordinal-based terms access and the default codec does not, so you will have to explicitly select a codec when you build your index. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-64) strict hierarchical facets
[ https://issues.apache.org/jira/browse/SOLR-64?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Relephant updated SOLR-64: -- Attachment: SOLR-64_3.1.0.diff Hi all, we have just tried to apply solr-64 to 3.1. Attached SOLR-64_3.1.0.diff. Hope that helps. strict hierarchical facets -- Key: SOLR-64 URL: https://issues.apache.org/jira/browse/SOLR-64 Project: Solr Issue Type: New Feature Components: search Reporter: Yonik Seeley Assignee: Koji Sekiguchi Fix For: 4.0 Attachments: SOLR-64.patch, SOLR-64.patch, SOLR-64.patch, SOLR-64.patch, SOLR-64_3.1.0.diff Strict Facet Hierarchies... each tag has at most one parent (a tree). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-64) strict hierarchical facets
[ https://issues.apache.org/jira/browse/SOLR-64?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Relephant updated SOLR-64: -- Attachment: (was: SOLR-64_3.1.0.diff) strict hierarchical facets -- Key: SOLR-64 URL: https://issues.apache.org/jira/browse/SOLR-64 Project: Solr Issue Type: New Feature Components: search Reporter: Yonik Seeley Assignee: Koji Sekiguchi Fix For: 4.0 Attachments: SOLR-64.patch, SOLR-64.patch, SOLR-64.patch, SOLR-64.patch, SOLR-64_3.1.0.patch Strict Facet Hierarchies... each tag has at most one parent (a tree). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-64) strict hierarchical facets
[ https://issues.apache.org/jira/browse/SOLR-64?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Relephant updated SOLR-64: -- Attachment: SOLR-64_3.1.0.patch strict hierarchical facets -- Key: SOLR-64 URL: https://issues.apache.org/jira/browse/SOLR-64 Project: Solr Issue Type: New Feature Components: search Reporter: Yonik Seeley Assignee: Koji Sekiguchi Fix For: 4.0 Attachments: SOLR-64.patch, SOLR-64.patch, SOLR-64.patch, SOLR-64.patch, SOLR-64_3.1.0.patch Strict Facet Hierarchies... each tag has at most one parent (a tree). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Issue Comment Edited] (SOLR-64) strict hierarchical facets
[ https://issues.apache.org/jira/browse/SOLR-64?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13019399#comment-13019399 ] Relephant edited comment on SOLR-64 at 4/13/11 4:04 PM: Hi all, we have just tried to apply solr-64 to 3.1. Attached SOLR-64_3.1.0.patch. Hope that helps. was (Author: relephant): Hi all, we have just tried to apply solr-64 to 3.1. Attached SOLR-64_3.1.0.diff. Hope that helps. strict hierarchical facets -- Key: SOLR-64 URL: https://issues.apache.org/jira/browse/SOLR-64 Project: Solr Issue Type: New Feature Components: search Reporter: Yonik Seeley Assignee: Koji Sekiguchi Fix For: 4.0 Attachments: SOLR-64.patch, SOLR-64.patch, SOLR-64.patch, SOLR-64.patch, SOLR-64_3.1.0.patch Strict Facet Hierarchies... each tag has at most one parent (a tree). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2939) Highlighter should try and use maxDocCharsToAnalyze in WeightedSpanTermExtractor when adding a new field to MemoryIndex as well as when using CachingTokenStream
[ https://issues.apache.org/jira/browse/LUCENE-2939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13019421#comment-13019421 ] Mark Miller commented on LUCENE-2939: - Okay - I'm going to commit to trunk shortly. Highlighter should try and use maxDocCharsToAnalyze in WeightedSpanTermExtractor when adding a new field to MemoryIndex as well as when using CachingTokenStream Key: LUCENE-2939 URL: https://issues.apache.org/jira/browse/LUCENE-2939 Project: Lucene - Java Issue Type: Bug Components: contrib/highlighter Reporter: Mark Miller Assignee: Mark Miller Priority: Minor Fix For: 3.1.1, 3.2, 4.0 Attachments: LUCENE-2939.patch, LUCENE-2939.patch, LUCENE-2939.patch, LUCENE-2939.patch huge documents can be drastically slower than need be because the entire field is added to the memory index this cost can be greatly reduced in many cases if we try and respect maxDocCharsToAnalyze things can be improved even further by respecting this setting with CachingTokenStream -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
PayloadProcessorProvider Usage
Hey, In Lucene 3.1 we've introduced PayloadProcessorProvider which allows you to rewrite payloads of terms during merge. The main scenario is when you merge indexes, and you want to rewrite/remap payloads of the incoming indexes, but one can certainly use it to rewrite the payloads of a term, in a given index. When we worked on it, we thought of two ways the user can rewrite payloads when he merges indexes: 1) Set PPP on the target IW, call addIndexes(IndexReader), while PPP will be applied on the incoming directories only. 2) Set PPP on the source IW, call IW.optimize(), then use targetIW.addIndexes(Directory). The latter is better since in both cases the incoming segments are rewritten anyway, however in the first case you might run into merging segments of the target index as well, something you might want to avoid (that was the purpose of optimizing addIndexes(Directory)). But it turns out the latter is not so easy to achieve. If the source index has only 1 segment (at least in my case, ~100% of the time), then calling optimize() doesn't do anything because the MP thinks the index is already optimized and returns no MergeSpec. To overcome this, I wrote a ForceOptimizeMP which extends LogMP and forces optimize even if there is only one segment. Another option is to set the noCFSRation to 1.0 and flip the useCompoundFile flag (ie if source is compound, create no compound and vice versa). That can work too, but I don't think it's very good, because the source index will be changed from compound to non (or vice versa), which is something that the app didn't want. So I think option 1 is better, but I wanted to ask if someone knows of a better way to achieve this? Shai
[jira] [Commented] (SOLR-2436) move uimaConfig to under the uima's update processor in solrconfig.xml
[ https://issues.apache.org/jira/browse/SOLR-2436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13019465#comment-13019465 ] Uwe Schindler commented on SOLR-2436: - I just looked at the patch, is the SOLR-2436_2.patch still active or replaced by Kojis? I ask because: {noformat} +try{ + final InputSource is = new InputSource(loader.openConfig(uimaConfigFile)); + DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance(); + // only enable xinclude, if SystemId is present (makes no sense otherwise) + if (is.getSystemId() != null) { +try { + dbf.setXIncludeAware(true); + dbf.setNamespaceAware(true); +} catch( UnsupportedOperationException e ) { + LOG.warn( XML parser doesn't support XInclude option ); +} + } {noformat} This XInclude Handling is broken (the if-clause never gets executed). We have a new framework that makes XML-Loading from ResourceLoaders working correct, even with relative pathes! Just look at the example committed during the cleanup issue (look at other places in solr where DocumentBuilders or XMLStreamReaders are instantiated. The new Solr way to load such files is a special URI scheme that is internally used to resolve ResourceLoader resources correctly (see SOLR-1656). The latest patch looks fine, it embeds the config directly, which seems much more consistent. move uimaConfig to under the uima's update processor in solrconfig.xml -- Key: SOLR-2436 URL: https://issues.apache.org/jira/browse/SOLR-2436 Project: Solr Issue Type: Improvement Affects Versions: 3.1 Reporter: Koji Sekiguchi Priority: Minor Attachments: SOLR-2436-3.patch, SOLR-2436.patch, SOLR-2436.patch, SOLR-2436.patch, SOLR-2436_2.patch Solr contrib UIMA has its config just beneath config. I think it should move to uima's update processor tag. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2436) move uimaConfig to under the uima's update processor in solrconfig.xml
[ https://issues.apache.org/jira/browse/SOLR-2436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13019470#comment-13019470 ] Uwe Schindler commented on SOLR-2436: - Here is the new way to load XML from ResourceLoaders in Solr (taken from Config). This code also intercepts errors and warnings and logs them correctly (parsers tend to write them to System.err): {code:java} is = new InputSource(loader.openConfig(name)); is.setSystemId(SystemIdResolver.createSystemIdFromResourceName(name)); // only enable xinclude, if a SystemId is available if (is.getSystemId() != null) { try { dbf.setXIncludeAware(true); dbf.setNamespaceAware(true); } catch(UnsupportedOperationException e) { log.warn(name + XML parser doesn't support XInclude option); } } final DocumentBuilder db = dbf.newDocumentBuilder(); db.setEntityResolver(new SystemIdResolver(loader)); db.setErrorHandler(xmllog); try { doc = db.parse(is); } finally { // some XML parsers are broken and don't close the byte stream (but they should according to spec) IOUtils.closeQuietly(is.getByteStream()); } {code} move uimaConfig to under the uima's update processor in solrconfig.xml -- Key: SOLR-2436 URL: https://issues.apache.org/jira/browse/SOLR-2436 Project: Solr Issue Type: Improvement Affects Versions: 3.1 Reporter: Koji Sekiguchi Priority: Minor Attachments: SOLR-2436-3.patch, SOLR-2436.patch, SOLR-2436.patch, SOLR-2436.patch, SOLR-2436_2.patch Solr contrib UIMA has its config just beneath config. I think it should move to uima's update processor tag. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Issue Comment Edited] (SOLR-2436) move uimaConfig to under the uima's update processor in solrconfig.xml
[ https://issues.apache.org/jira/browse/SOLR-2436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13019470#comment-13019470 ] Uwe Schindler edited comment on SOLR-2436 at 4/13/11 6:14 PM: -- Here is the new way to load XML from ResourceLoaders in Solr (taken from Config). This code also intercepts errors and warnings and logs them correctly (parsers tend to write them to System.err): {code:java} public static final Logger log = LoggerFactory.getLogger(Config.class); private static final XMLErrorLogger xmllog = new XMLErrorLogger(log); ... final InputSource is = new InputSource(loader.openConfig(name)); is.setSystemId(SystemIdResolver.createSystemIdFromResourceName(name)); final DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance(); // only enable xinclude, if a SystemId is available if (is.getSystemId() != null) { try { dbf.setXIncludeAware(true); dbf.setNamespaceAware(true); } catch(UnsupportedOperationException e) { log.warn(name + XML parser doesn't support XInclude option); } } final DocumentBuilder db = dbf.newDocumentBuilder(); db.setEntityResolver(new SystemIdResolver(loader)); db.setErrorHandler(xmllog); try { doc = db.parse(is); } finally { // some XML parsers are broken and don't close the byte stream (but they should according to spec) IOUtils.closeQuietly(is.getByteStream()); } {code} was (Author: thetaphi): Here is the new way to load XML from ResourceLoaders in Solr (taken from Config). This code also intercepts errors and warnings and logs them correctly (parsers tend to write them to System.err): {code:java} is = new InputSource(loader.openConfig(name)); is.setSystemId(SystemIdResolver.createSystemIdFromResourceName(name)); // only enable xinclude, if a SystemId is available if (is.getSystemId() != null) { try { dbf.setXIncludeAware(true); dbf.setNamespaceAware(true); } catch(UnsupportedOperationException e) { log.warn(name + XML parser doesn't support XInclude option); } } final DocumentBuilder db = dbf.newDocumentBuilder(); db.setEntityResolver(new SystemIdResolver(loader)); db.setErrorHandler(xmllog); try { doc = db.parse(is); } finally { // some XML parsers are broken and don't close the byte stream (but they should according to spec) IOUtils.closeQuietly(is.getByteStream()); } {code} move uimaConfig to under the uima's update processor in solrconfig.xml -- Key: SOLR-2436 URL: https://issues.apache.org/jira/browse/SOLR-2436 Project: Solr Issue Type: Improvement Affects Versions: 3.1 Reporter: Koji Sekiguchi Priority: Minor Attachments: SOLR-2436-3.patch, SOLR-2436.patch, SOLR-2436.patch, SOLR-2436.patch, SOLR-2436_2.patch Solr contrib UIMA has its config just beneath config. I think it should move to uima's update processor tag. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2436) move uimaConfig to under the uima's update processor in solrconfig.xml
[ https://issues.apache.org/jira/browse/SOLR-2436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13019473#comment-13019473 ] Uwe Schindler commented on SOLR-2436: - Maybe we should add my last comment into the Wiki: Howto load XML from Solr's config resources, to prevent broken code again from appearing (if this no issue here anymore this is fine, I was just alarmed). I had a hard time to fix all XML handling in Solr (DIH is still broken with charsets), but XInclude now works as expected everywhere. move uimaConfig to under the uima's update processor in solrconfig.xml -- Key: SOLR-2436 URL: https://issues.apache.org/jira/browse/SOLR-2436 Project: Solr Issue Type: Improvement Affects Versions: 3.1 Reporter: Koji Sekiguchi Priority: Minor Attachments: SOLR-2436-3.patch, SOLR-2436.patch, SOLR-2436.patch, SOLR-2436.patch, SOLR-2436_2.patch Solr contrib UIMA has its config just beneath config. I think it should move to uima's update processor tag. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2436) move uimaConfig to under the uima's update processor in solrconfig.xml
[ https://issues.apache.org/jira/browse/SOLR-2436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13019474#comment-13019474 ] Mark Miller commented on SOLR-2436: --- bq. Maybe we should add my last comment into the Wiki: +1 move uimaConfig to under the uima's update processor in solrconfig.xml -- Key: SOLR-2436 URL: https://issues.apache.org/jira/browse/SOLR-2436 Project: Solr Issue Type: Improvement Affects Versions: 3.1 Reporter: Koji Sekiguchi Priority: Minor Attachments: SOLR-2436-3.patch, SOLR-2436.patch, SOLR-2436.patch, SOLR-2436.patch, SOLR-2436_2.patch Solr contrib UIMA has its config just beneath config. I think it should move to uima's update processor tag. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2436) move uimaConfig to under the uima's update processor in solrconfig.xml
[ https://issues.apache.org/jira/browse/SOLR-2436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13019476#comment-13019476 ] Mark Miller commented on SOLR-2436: --- Or perhaps we need a utility method and pointer to that? move uimaConfig to under the uima's update processor in solrconfig.xml -- Key: SOLR-2436 URL: https://issues.apache.org/jira/browse/SOLR-2436 Project: Solr Issue Type: Improvement Affects Versions: 3.1 Reporter: Koji Sekiguchi Priority: Minor Attachments: SOLR-2436-3.patch, SOLR-2436.patch, SOLR-2436.patch, SOLR-2436.patch, SOLR-2436_2.patch Solr contrib UIMA has its config just beneath config. I think it should move to uima's update processor tag. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: An IDF variation with penalty for very rare terms
On Wed, Apr 13, 2011 at 01:01:09AM +0400, Earwin Burrfoot wrote: Excuse me for somewhat of an offtopic, but have anybody ever seen/used -subj- ? Something that looks like like http://dl.dropbox.com/u/920413/IDFplusplus.png Traditional log(N/x) tail, but when nearing zero freq, instead of going to +inf you do a nice round bump (with controlled height/location/sharpness) and drop down to -inf (or zero). I haven't used that technique, nor can I quote academic literature blessing it. Nevertheless, what you're doing makes sense makes sense to me. Rationale is that - most good, discriminating terms are found in at least a certain percentage of your documents, but there are lots of mostly unique crapterms, which at some collection sizes stop being strictly unique and with IDF's help explode your scores. So you've designed a heuristic that allows you to filter a certain kind of noise. It sounds a lot like how people tune length normalization to adapt to their document collections. Many tuning techniques are corpus-specific. Whatever works, works! Marvin Humphrey - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[HUDSON] Lucene-Solr-tests-only-3.x - Build # 7075 - Failure
Build: https://hudson.apache.org/hudson/job/Lucene-Solr-tests-only-3.x/7075/ 1 tests failed. REGRESSION: org.apache.lucene.collation.TestCollationKeyAnalyzer.testThreadSafe Error Message: Java heap space Stack Trace: java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOf(Arrays.java:2894) at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:117) at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:589) at java.lang.StringBuffer.append(StringBuffer.java:337) at java.text.RuleBasedCollator.getCollationKey(RuleBasedCollator.java:617) at org.apache.lucene.collation.CollationKeyFilter.incrementToken(CollationKeyFilter.java:93) at org.apache.lucene.collation.CollationTestBase.assertThreadSafe(CollationTestBase.java:304) at org.apache.lucene.collation.TestCollationKeyAnalyzer.testThreadSafe(TestCollationKeyAnalyzer.java:89) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1082) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1010) Build Log (for compile errors): [...truncated 5276 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3026) smartcn analyzer throw NullPointer exception when the length of analysed text over 32767
[ https://issues.apache.org/jira/browse/LUCENE-3026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wangzhenghang updated LUCENE-3026: -- Summary: smartcn analyzer throw NullPointer exception when the length of analysed text over 32767 (was: smartcn analysis throw NullPointer exception when the length of analysed text over 32767) smartcn analyzer throw NullPointer exception when the length of analysed text over 32767 Key: LUCENE-3026 URL: https://issues.apache.org/jira/browse/LUCENE-3026 Project: Lucene - Java Issue Type: Bug Components: contrib/analyzers Affects Versions: 3.1, 4.0 Reporter: wangzhenghang That's all because of org.apache.lucene.analysis.cn.smart.hhmm.SegGraph's makeIndex() method: public ListSegToken makeIndex() { ListSegToken result = new ArrayListSegToken(); int s = -1, count = 0, size = tokenListTable.size(); ListSegToken tokenList; short index = 0; while (count size) { if (isStartExist(s)) { tokenList = tokenListTable.get(s); for (SegToken st : tokenList) { st.index = index; result.add(st); index++; } count++; } s++; } return result; } here 'short index = 0;' should be 'int index = 0;'. And that's reported here http://code.google.com/p/imdict-chinese-analyzer/issues/detail?id=2 and http://code.google.com/p/imdict-chinese-analyzer/issues/detail?id=11, the author XiaoPingGao have already fixed this bug:http://code.google.com/p/imdict-chinese-analyzer/source/browse/trunk/src/org/apache/lucene/analysis/cn/smart/hhmm/SegGraph.java -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3026) smartcn analyzer throw NullPointer exception when the length of analysed text over 32767
[ https://issues.apache.org/jira/browse/LUCENE-3026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13019636#comment-13019636 ] Robert Muir commented on LUCENE-3026: - This sounds like a bug, do you want to try your hand at contributing a patch? See http://wiki.apache.org/lucene-java/HowToContribute for some instructions. smartcn analyzer throw NullPointer exception when the length of analysed text over 32767 Key: LUCENE-3026 URL: https://issues.apache.org/jira/browse/LUCENE-3026 Project: Lucene - Java Issue Type: Bug Components: contrib/analyzers Affects Versions: 3.1, 4.0 Reporter: wangzhenghang That's all because of org.apache.lucene.analysis.cn.smart.hhmm.SegGraph's makeIndex() method: public ListSegToken makeIndex() { ListSegToken result = new ArrayListSegToken(); int s = -1, count = 0, size = tokenListTable.size(); ListSegToken tokenList; short index = 0; while (count size) { if (isStartExist(s)) { tokenList = tokenListTable.get(s); for (SegToken st : tokenList) { st.index = index; result.add(st); index++; } count++; } s++; } return result; } here 'short index = 0;' should be 'int index = 0;'. And that's reported here http://code.google.com/p/imdict-chinese-analyzer/issues/detail?id=2 and http://code.google.com/p/imdict-chinese-analyzer/issues/detail?id=11, the author XiaoPingGao have already fixed this bug:http://code.google.com/p/imdict-chinese-analyzer/source/browse/trunk/src/org/apache/lucene/analysis/cn/smart/hhmm/SegGraph.java -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3022) DictionaryCompoundWordTokenFilter Flag onlyLongestMatch has no affect
[ https://issues.apache.org/jira/browse/LUCENE-3022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13019637#comment-13019637 ] Robert Muir commented on LUCENE-3022: - This sounds like a bug, do you want to try your hand at contributing a patch? See http://wiki.apache.org/lucene-java/HowToContribute for some instructions. DictionaryCompoundWordTokenFilter Flag onlyLongestMatch has no affect - Key: LUCENE-3022 URL: https://issues.apache.org/jira/browse/LUCENE-3022 Project: Lucene - Java Issue Type: Bug Components: contrib/analyzers Affects Versions: 2.9.4, 3.1 Reporter: Johann Höchtl Priority: Minor Original Estimate: 5m Remaining Estimate: 5m When using the DictionaryCompoundWordTokenFilter with a german dictionary, I got a strange behaviour: The german word streifenbluse (blouse with stripes) was decompounded to streifen (stripe),reifen(tire) which makes no sense at all. I thought the flag onlyLongestMatch would fix this, because streifen is longer than reifen, but it had no effect. So I reviewed the sourcecode and found the problem: [code] protected void decomposeInternal(final Token token) { // Only words longer than minWordSize get processed if (token.length() this.minWordSize) { return; } char[] lowerCaseTermBuffer=makeLowerCaseCopy(token.buffer()); for (int i=0;itoken.length()-this.minSubwordSize;++i) { Token longestMatchToken=null; for (int j=this.minSubwordSize-1;jthis.maxSubwordSize;++j) { if(i+jtoken.length()) { break; } if(dictionary.contains(lowerCaseTermBuffer, i, j)) { if (this.onlyLongestMatch) { if (longestMatchToken!=null) { if (longestMatchToken.length()j) { longestMatchToken=createToken(i,j,token); } } else { longestMatchToken=createToken(i,j,token); } } else { tokens.add(createToken(i,j,token)); } } } if (this.onlyLongestMatch longestMatchToken!=null) { tokens.add(longestMatchToken); } } } [/code] should be changed to [code] protected void decomposeInternal(final Token token) { // Only words longer than minWordSize get processed if (token.termLength() this.minWordSize) { return; } char[] lowerCaseTermBuffer=makeLowerCaseCopy(token.termBuffer()); Token longestMatchToken=null; for (int i=0;itoken.termLength()-this.minSubwordSize;++i) { for (int j=this.minSubwordSize-1;jthis.maxSubwordSize;++j) { if(i+jtoken.termLength()) { break; } if(dictionary.contains(lowerCaseTermBuffer, i, j)) { if (this.onlyLongestMatch) { if (longestMatchToken!=null) { if (longestMatchToken.termLength()j) { longestMatchToken=createToken(i,j,token); } } else { longestMatchToken=createToken(i,j,token); } } else { tokens.add(createToken(i,j,token)); } } } } if (this.onlyLongestMatch longestMatchToken!=null) { tokens.add(longestMatchToken); } } [/code] So, that only the longest token is really indexed and the onlyLongestMatch Flag makes sense. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2436) move uimaConfig to under the uima's update processor in solrconfig.xml
[ https://issues.apache.org/jira/browse/SOLR-2436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13019649#comment-13019649 ] Koji Sekiguchi commented on SOLR-2436: -- Hi Uwe, The problematic snippet regarding XInclude handling has been first introduced in my patch that I borrowed from DIH. When I did it, I missed something. Thank you for the alarm. Now we are trying to embed the config in update processor instead of loading it from out of solrconfig.xml, the problematic snippet are gone. move uimaConfig to under the uima's update processor in solrconfig.xml -- Key: SOLR-2436 URL: https://issues.apache.org/jira/browse/SOLR-2436 Project: Solr Issue Type: Improvement Affects Versions: 3.1 Reporter: Koji Sekiguchi Priority: Minor Attachments: SOLR-2436-3.patch, SOLR-2436.patch, SOLR-2436.patch, SOLR-2436.patch, SOLR-2436_2.patch Solr contrib UIMA has its config just beneath config. I think it should move to uima's update processor tag. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2436) move uimaConfig to under the uima's update processor in solrconfig.xml
[ https://issues.apache.org/jira/browse/SOLR-2436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13019652#comment-13019652 ] Koji Sekiguchi commented on SOLR-2436: -- The patch looks good, Tommaso! If it is going to commit, it breaks back-compat. I think we need a note for users in CHANGES.txt. move uimaConfig to under the uima's update processor in solrconfig.xml -- Key: SOLR-2436 URL: https://issues.apache.org/jira/browse/SOLR-2436 Project: Solr Issue Type: Improvement Affects Versions: 3.1 Reporter: Koji Sekiguchi Priority: Minor Attachments: SOLR-2436-3.patch, SOLR-2436.patch, SOLR-2436.patch, SOLR-2436.patch, SOLR-2436_2.patch Solr contrib UIMA has its config just beneath config. I think it should move to uima's update processor tag. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-2467) Custom analyzer load exceptions are not logged.
Custom analyzer load exceptions are not logged. --- Key: SOLR-2467 URL: https://issues.apache.org/jira/browse/SOLR-2467 Project: Solr Issue Type: Bug Affects Versions: 3.1 Reporter: Alexander Kistanov Priority: Minor If any exception occurred on custom analyzer load the following catch code is working: {code:title=solr/src/java/org/apache/solr/schema/IndexSchema.java} } catch (Exception e) { throw new SolrException( SolrException.ErrorCode.SERVER_ERROR, Cannot load analyzer: +analyzerName ); } {code} Analyzer load exception e is not logged at all. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[HUDSON] Lucene-Solr-tests-only-3.x - Build # 7082 - Failure
Build: https://hudson.apache.org/hudson/job/Lucene-Solr-tests-only-3.x/7082/ No tests ran. Build Log (for compile errors): [...truncated 118 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3026) smartcn analyzer throw NullPointer exception when the length of analysed text over 32767
[ https://issues.apache.org/jira/browse/LUCENE-3026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13019671#comment-13019671 ] wangzhenghang commented on LUCENE-3026: --- It's done smartcn analyzer throw NullPointer exception when the length of analysed text over 32767 Key: LUCENE-3026 URL: https://issues.apache.org/jira/browse/LUCENE-3026 Project: Lucene - Java Issue Type: Bug Components: contrib/analyzers Affects Versions: 3.1, 4.0 Reporter: wangzhenghang Attachments: LUCENE-3026.patch That's all because of org.apache.lucene.analysis.cn.smart.hhmm.SegGraph's makeIndex() method: public ListSegToken makeIndex() { ListSegToken result = new ArrayListSegToken(); int s = -1, count = 0, size = tokenListTable.size(); ListSegToken tokenList; short index = 0; while (count size) { if (isStartExist(s)) { tokenList = tokenListTable.get(s); for (SegToken st : tokenList) { st.index = index; result.add(st); index++; } count++; } s++; } return result; } here 'short index = 0;' should be 'int index = 0;'. And that's reported here http://code.google.com/p/imdict-chinese-analyzer/issues/detail?id=2 and http://code.google.com/p/imdict-chinese-analyzer/issues/detail?id=11, the author XiaoPingGao have already fixed this bug:http://code.google.com/p/imdict-chinese-analyzer/source/browse/trunk/src/org/apache/lucene/analysis/cn/smart/hhmm/SegGraph.java -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3026) smartcn analyzer throw NullPointer exception when the length of analysed text over 32767
[ https://issues.apache.org/jira/browse/LUCENE-3026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wangzhenghang updated LUCENE-3026: -- Attachment: LUCENE-3026.patch smartcn analyzer throw NullPointer exception when the length of analysed text over 32767 Key: LUCENE-3026 URL: https://issues.apache.org/jira/browse/LUCENE-3026 Project: Lucene - Java Issue Type: Bug Components: contrib/analyzers Affects Versions: 3.1, 4.0 Reporter: wangzhenghang Attachments: LUCENE-3026.patch That's all because of org.apache.lucene.analysis.cn.smart.hhmm.SegGraph's makeIndex() method: public ListSegToken makeIndex() { ListSegToken result = new ArrayListSegToken(); int s = -1, count = 0, size = tokenListTable.size(); ListSegToken tokenList; short index = 0; while (count size) { if (isStartExist(s)) { tokenList = tokenListTable.get(s); for (SegToken st : tokenList) { st.index = index; result.add(st); index++; } count++; } s++; } return result; } here 'short index = 0;' should be 'int index = 0;'. And that's reported here http://code.google.com/p/imdict-chinese-analyzer/issues/detail?id=2 and http://code.google.com/p/imdict-chinese-analyzer/issues/detail?id=11, the author XiaoPingGao have already fixed this bug:http://code.google.com/p/imdict-chinese-analyzer/source/browse/trunk/src/org/apache/lucene/analysis/cn/smart/hhmm/SegGraph.java -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org