[jira] Commented: (LUCENE-493) Nightly build archives do not contain Java source code.
[ http://issues.apache.org/jira/browse/LUCENE-493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12461740 ] Grant Ingersoll commented on LUCENE-493: I have updated nightly.sh to run package-tgz-src after running the binary distribution. I have done a few test runs and it looks good. I have saved and committed the change and it should run tonight. The src packaging will be named lucene-DATE-src.tar.gz where DATE is the date of the build. I'll keep an eye on it for the next couple of days, then close this out. Nightly build archives do not contain Java source code. --- Key: LUCENE-493 URL: http://issues.apache.org/jira/browse/LUCENE-493 Project: Lucene - Java Issue Type: Bug Components: Website Reporter: James Pine Assigned To: Grant Ingersoll Priority: Minor Under the Lucene News section of the Overview page, this item's link: 26 January 2006 - Nightly builds available http://cvs.apache.org/dist/lucene/java/nightly/ goes to a directory with several 1.9M files, none of which have the src/java tree in them. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-675) Lucene benchmark: objective performance test for Lucene
[ http://issues.apache.org/jira/browse/LUCENE-675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll updated LUCENE-675: --- Priority: Minor (was: Major) Lucene benchmark: objective performance test for Lucene --- Key: LUCENE-675 URL: http://issues.apache.org/jira/browse/LUCENE-675 Project: Lucene - Java Issue Type: Improvement Reporter: Andrzej Bialecki Assigned To: Grant Ingersoll Priority: Minor Attachments: benchmark.byTask.patch, benchmark.patch, BenchmarkingIndexer.pm, extract_reuters.plx, LuceneBenchmark.java, LuceneIndexer.java, taskBenchmark.zip, timedata.zip, tiny.alg, tiny.properties We need an objective way to measure the performance of Lucene, both indexing and querying, on a known corpus. This issue is intended to collect comments and patches implementing a suite of such benchmarking tests. Regarding the corpus: one of the widely used and freely available corpora is the original Reuters collection, available from http://www-2.cs.cmu.edu/afs/cs.cmu.edu/project/theo-20/www/data/news20.tar.gz or http://people.csail.mit.edu/u/j/jrennie/public_html/20Newsgroups/20news-18828.tar.gz. I propose to use this corpus as a base for benchmarks. The benchmarking suite could automatically retrieve it from known locations, and cache it locally. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: New Issues
I have created an infrastructure issue to request help with this: https://issues.apache.org/jira/browse/INFRA-1093 Having not dealt w/ infra. before, I hope this is the correct way to do this. -Grant On Dec 21, 2006, at 2:15 PM, Doug Cutting wrote: Grant Ingersoll wrote: I don't have permission to change it (otherwise I would.) I don't see where to change it except globally for all of the Jira installation. Doug - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] -- Grant Ingersoll Center for Natural Language Processing http://www.cnlp.org Read the Lucene Java FAQ at http://wiki.apache.org/jakarta-lucene/ LuceneFAQ - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Searching an empty index with a sort field specified throws a runtime exception
Subject says it all. Just wondering if this behavior is acceptable to everyone and I should code around it, or if we might get a fix. The exception is thrown when the sort field is examined for what type it is (type set to auto). Lucene throws an exception saying that the field is empty and so the type cannot be inferred. The field is empty because the index is empty. Perhaps there is an easy fix short of a Lucene patch as well. Little help? - Mark
Re: Searching an empty index with a sort field specified throws a runtime exception
On 1/2/07, Mark Miller [EMAIL PROTECTED] wrote: Subject says it all. Just wondering if this behavior is acceptable to everyone and I should code around it, or if we might get a fix. The exception is thrown when the sort field is examined for what type it is (type set to auto). Lucene throws an exception saying that the field is empty and so the type cannot be inferred. The field is empty because the index is empty. Perhaps there is an easy fix short of a Lucene patch as well. Little help? I remember fixing this for every sort type except auto, which I don't recommend using since it's very error prone. No harm in fixing it if someone wants to submit a patch though. -Yonik http://incubator.apache.org/solr Solr, the open-source Lucene search server - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Created: (LUCENE-763) LuceneDictionary skips first word in enumeration
LuceneDictionary skips first word in enumeration Key: LUCENE-763 URL: http://issues.apache.org/jira/browse/LUCENE-763 Project: Lucene - Java Issue Type: Bug Components: Other Affects Versions: 2.0.0 Environment: Windows Sun JRE 1.4.2_10_b03 Reporter: Dan Ertman The current code for LuceneDictionary will always skip the first word of the TermEnum. The reason is that it doesn't initially retrieve TermEnum.term - its first call is to TermEnum.next, which moves it past the first term (line 76). To see this problem cause a failure, add this test to TestSpellChecker: similar = spellChecker.suggestSimilar(eihgt,2); assertEquals(1, similar.length); assertEquals(similar[0], eight); Because eight is the first word in the index, it will fail. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-740) Bugs in contrib/snowball/.../SnowballProgram.java - Kraaij-Pohlmann gives Index-OOB Exception
[ https://issues.apache.org/jira/browse/LUCENE-740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12461873 ] Steven Parkes commented on LUCENE-740: -- I don't see that redistribution in binary form makes any difference as far as the BSD license is concerned. The only difference between source and binary by BSD is the condition that the license terms be included in the docs as opposed to the sources. It looks like an explicit ASF policy on 3party inclusion is in the works:http://people.apache.org/~cliffs/3party.html but at this point it's only a proposal. If that, or something close to it becomes policy, It doesn't look like the snowball stuff poses any problem: the BSD is a Category A (good) license. At some point it looks like the policy will require highlighting the fact that inclusion of the snowball stuff makes the affected distributions multi-licensed, but that doesn't look terribly onerous. I've added a patch with a copy of the BSD license suitably modified (they only reference the BSD license in the snowball materials) and I've added a few lines to NOTICE.txt as seems to be required(?): http://www.apache.org/licenses/example-NOTICE.txt Bugs in contrib/snowball/.../SnowballProgram.java - Kraaij-Pohlmann gives Index-OOB Exception -- Key: LUCENE-740 URL: https://issues.apache.org/jira/browse/LUCENE-740 Project: Lucene - Java Issue Type: Bug Components: Analysis Affects Versions: 1.9 Environment: linux amd64 Reporter: Andreas Kohn Priority: Minor Attachments: 740-license.txt, lucene-1.9.1-SnowballProgram.java, snowball.patch.txt (copied from mail to java-user) while playing with the various stemmers of Lucene(-1.9.1), I got an index out of bounds exception: lucene-1.9.1java -cp build/contrib/snowball/lucene-snowball-1.9.2-dev.jar net.sf.snowball.TestApp Kp bla.txt Exception in thread main java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:64) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:615) at net.sf.snowball.TestApp.main(TestApp.java:56) Caused by: java.lang.StringIndexOutOfBoundsException: String index out of range: 11 at java.lang.StringBuffer.charAt(StringBuffer.java:303) at net.sf.snowball.SnowballProgram.find_among_b(SnowballProgram.java:270) at net.sf.snowball.ext.KpStemmer.r_Step_4(KpStemmer.java:1122) at net.sf.snowball.ext.KpStemmer.stem(KpStemmer.java:1997) This happens when executing lucene-1.9.1java -cp build/contrib/snowball/lucene-snowball-1.9.2-dev.jar net.sf.snowball.TestApp Kp bla.txt bla.txt contains just this word: 'spijsvertering'. After some debugging, and some tests with the original snowball distribution from snowball.tartarus.org, it seems that the attached change is needed to avoid the exception. (The change comes from tartarus' SnowballProgram.java) -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-740) Bugs in contrib/snowball/.../SnowballProgram.java - Kraaij-Pohlmann gives Index-OOB Exception
[ https://issues.apache.org/jira/browse/LUCENE-740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Parkes updated LUCENE-740: - Attachment: 740-license.txt Bugs in contrib/snowball/.../SnowballProgram.java - Kraaij-Pohlmann gives Index-OOB Exception -- Key: LUCENE-740 URL: https://issues.apache.org/jira/browse/LUCENE-740 Project: Lucene - Java Issue Type: Bug Components: Analysis Affects Versions: 1.9 Environment: linux amd64 Reporter: Andreas Kohn Priority: Minor Attachments: 740-license.txt, lucene-1.9.1-SnowballProgram.java, snowball.patch.txt (copied from mail to java-user) while playing with the various stemmers of Lucene(-1.9.1), I got an index out of bounds exception: lucene-1.9.1java -cp build/contrib/snowball/lucene-snowball-1.9.2-dev.jar net.sf.snowball.TestApp Kp bla.txt Exception in thread main java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:64) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:615) at net.sf.snowball.TestApp.main(TestApp.java:56) Caused by: java.lang.StringIndexOutOfBoundsException: String index out of range: 11 at java.lang.StringBuffer.charAt(StringBuffer.java:303) at net.sf.snowball.SnowballProgram.find_among_b(SnowballProgram.java:270) at net.sf.snowball.ext.KpStemmer.r_Step_4(KpStemmer.java:1122) at net.sf.snowball.ext.KpStemmer.stem(KpStemmer.java:1997) This happens when executing lucene-1.9.1java -cp build/contrib/snowball/lucene-snowball-1.9.2-dev.jar net.sf.snowball.TestApp Kp bla.txt bla.txt contains just this word: 'spijsvertering'. After some debugging, and some tests with the original snowball distribution from snowball.tartarus.org, it seems that the attached change is needed to avoid the exception. (The change comes from tartarus' SnowballProgram.java) -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-707) Lucene Java Site docs
[ https://issues.apache.org/jira/browse/LUCENE-707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12461894 ] George Aroush commented on LUCENE-707: -- Hi, What will it take to fix the page at http://lucene.apache.org/ so that Lucene.Net is also on tab'ed-link? On the 22nd of Nov Otis pointed this out but I still don't see a mention of Lucene.Net. Since Lucene4c is a dead project now, replacing it with Lucene.Net is an appropriate thing to do (Lucene.Net can use some exposure.) The link to Lucene.Net is: http://incubator.apache.org/lucene.net/ Also, as a note, the project name is Lucene.Net and not Lucene.NET. Thanks! -- George Lucene Java Site docs - Key: LUCENE-707 URL: https://issues.apache.org/jira/browse/LUCENE-707 Project: Lucene - Java Issue Type: Improvement Components: Website Environment: N/A Reporter: Grant Ingersoll Assigned To: Grant Ingersoll Priority: Minor It would be really nice if the Java site docs where consistent with the rest of the Lucene family (namely, with navigation tabs, etc.) so that one can easily go between Nutch, Hadoop, etc. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-707) Lucene Java Site docs
[ https://issues.apache.org/jira/browse/LUCENE-707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12461902 ] Grant Ingersoll commented on LUCENE-707: Hi George, This is a top level issue. You should be properly linked under the Lucene Java Related Projects section. I would supply a patch based on http://svn.apache.org/viewvc/lucene/site and then somehow get the attention of one of the TLP committers (PMC members, Doug?, Yonik?). Good luck, Grant Lucene Java Site docs - Key: LUCENE-707 URL: https://issues.apache.org/jira/browse/LUCENE-707 Project: Lucene - Java Issue Type: Improvement Components: Website Environment: N/A Reporter: Grant Ingersoll Assigned To: Grant Ingersoll Priority: Minor It would be really nice if the Java site docs where consistent with the rest of the Lucene family (namely, with navigation tabs, etc.) so that one can easily go between Nutch, Hadoop, etc. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-707) Lucene Java Site docs
[ https://issues.apache.org/jira/browse/LUCENE-707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12461904 ] George Aroush commented on LUCENE-707: -- Thanks for the quick response Grant! Unfortunately, I am not familiar with forrest which is how I believe the patch must be generated. If I supply the required text changes, can someone take care of making the changes? If not, can someone point me to where I can learn about forrest? I believe Doug does have commit privilege. Regards, -- George Lucene Java Site docs - Key: LUCENE-707 URL: https://issues.apache.org/jira/browse/LUCENE-707 Project: Lucene - Java Issue Type: Improvement Components: Website Environment: N/A Reporter: Grant Ingersoll Assigned To: Grant Ingersoll Priority: Minor It would be really nice if the Java site docs where consistent with the rest of the Lucene family (namely, with navigation tabs, etc.) so that one can easily go between Nutch, Hadoop, etc. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]