[jira] Commented: (LUCENE-493) Nightly build archives do not contain Java source code.

2007-01-02 Thread Grant Ingersoll (JIRA)

[ 
http://issues.apache.org/jira/browse/LUCENE-493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12461740
 ] 

Grant Ingersoll commented on LUCENE-493:


I have updated nightly.sh to run package-tgz-src after running the binary 
distribution.  I have done a few test runs and it looks good.  I have saved and 
committed the change and it should run tonight.  The src packaging will be 
named lucene-DATE-src.tar.gz where DATE is the date of the build.

I'll keep an eye on it for the next couple of days, then close this out.

 Nightly build archives do not contain Java source code.
 ---

 Key: LUCENE-493
 URL: http://issues.apache.org/jira/browse/LUCENE-493
 Project: Lucene - Java
  Issue Type: Bug
  Components: Website
Reporter: James Pine
 Assigned To: Grant Ingersoll
Priority: Minor

 Under the Lucene News section of the Overview page, this item's link:
 26 January 2006 - Nightly builds available
 http://cvs.apache.org/dist/lucene/java/nightly/
 goes to a directory with several 1.9M files, none of which have the src/java 
 tree in them.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Updated: (LUCENE-675) Lucene benchmark: objective performance test for Lucene

2007-01-02 Thread Grant Ingersoll (JIRA)

 [ 
http://issues.apache.org/jira/browse/LUCENE-675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll updated LUCENE-675:
---

Priority: Minor  (was: Major)

 Lucene benchmark: objective performance test for Lucene
 ---

 Key: LUCENE-675
 URL: http://issues.apache.org/jira/browse/LUCENE-675
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Andrzej Bialecki 
 Assigned To: Grant Ingersoll
Priority: Minor
 Attachments: benchmark.byTask.patch, benchmark.patch, 
 BenchmarkingIndexer.pm, extract_reuters.plx, LuceneBenchmark.java, 
 LuceneIndexer.java, taskBenchmark.zip, timedata.zip, tiny.alg, tiny.properties


 We need an objective way to measure the performance of Lucene, both indexing 
 and querying, on a known corpus. This issue is intended to collect comments 
 and patches implementing a suite of such benchmarking tests.
 Regarding the corpus: one of the widely used and freely available corpora is 
 the original Reuters collection, available from 
 http://www-2.cs.cmu.edu/afs/cs.cmu.edu/project/theo-20/www/data/news20.tar.gz 
 or 
 http://people.csail.mit.edu/u/j/jrennie/public_html/20Newsgroups/20news-18828.tar.gz.
  I propose to use this corpus as a base for benchmarks. The benchmarking 
 suite could automatically retrieve it from known locations, and cache it 
 locally.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: New Issues

2007-01-02 Thread Grant Ingersoll
I have created an infrastructure issue to request help with this:  
https://issues.apache.org/jira/browse/INFRA-1093


Having not dealt w/ infra. before, I hope this is the correct way to  
do this.


-Grant

On Dec 21, 2006, at 2:15 PM, Doug Cutting wrote:


Grant Ingersoll wrote:

I don't have permission to change it (otherwise I would.)


I don't see where to change it except globally for all of the Jira  
installation.


Doug

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



--
Grant Ingersoll
Center for Natural Language Processing
http://www.cnlp.org

Read the Lucene Java FAQ at http://wiki.apache.org/jakarta-lucene/ 
LuceneFAQ




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Searching an empty index with a sort field specified throws a runtime exception

2007-01-02 Thread Mark Miller

Subject says it all. Just wondering if this behavior is acceptable to
everyone and I should code around it, or if we might get a fix. The
exception is thrown when the sort field is examined for what type it is
(type set to auto). Lucene throws an exception saying that  the field is
empty and so the type cannot be inferred. The field is empty because the
index is empty.  Perhaps there is an  easy  fix short of a Lucene patch as
well. Little help?

- Mark


Re: Searching an empty index with a sort field specified throws a runtime exception

2007-01-02 Thread Yonik Seeley

On 1/2/07, Mark Miller [EMAIL PROTECTED] wrote:

Subject says it all. Just wondering if this behavior is acceptable to
everyone and I should code around it, or if we might get a fix. The
exception is thrown when the sort field is examined for what type it is
(type set to auto). Lucene throws an exception saying that  the field is
empty and so the type cannot be inferred. The field is empty because the
index is empty.  Perhaps there is an  easy  fix short of a Lucene patch as
well. Little help?


I remember fixing this for every sort type except auto, which I
don't recommend using since it's very error prone.  No harm in fixing
it if someone wants to submit a patch though.


-Yonik
http://incubator.apache.org/solr Solr, the open-source Lucene search server

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Created: (LUCENE-763) LuceneDictionary skips first word in enumeration

2007-01-02 Thread Dan Ertman (JIRA)
LuceneDictionary skips first word in enumeration


 Key: LUCENE-763
 URL: http://issues.apache.org/jira/browse/LUCENE-763
 Project: Lucene - Java
  Issue Type: Bug
  Components: Other
Affects Versions: 2.0.0
 Environment: Windows Sun JRE 1.4.2_10_b03
Reporter: Dan Ertman


The current code for LuceneDictionary will always skip the first word of the 
TermEnum. The reason is that it doesn't initially retrieve TermEnum.term - its 
first call is to TermEnum.next, which moves it past the first term (line 76).
To see this problem cause a failure, add this test to TestSpellChecker:
similar = spellChecker.suggestSimilar(eihgt,2);
  assertEquals(1, similar.length);
  assertEquals(similar[0], eight);

Because eight is the first word in the index, it will fail.


-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-740) Bugs in contrib/snowball/.../SnowballProgram.java - Kraaij-Pohlmann gives Index-OOB Exception

2007-01-02 Thread Steven Parkes (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12461873
 ] 

Steven Parkes commented on LUCENE-740:
--

I don't see that redistribution in binary form makes any difference as far as 
the BSD license is concerned. The only difference between source and binary by 
BSD is the condition that the license terms be included in the docs as opposed 
to the sources.

It looks like an explicit ASF policy on 3party inclusion is in the 
works:http://people.apache.org/~cliffs/3party.html but at this point it's only 
a proposal.

If that, or something close to it becomes policy, It doesn't look like the 
snowball stuff poses any problem: the BSD is a Category A (good) license.

At some point it looks like the policy will require highlighting the fact that 
inclusion of the snowball stuff makes the affected distributions 
multi-licensed, but that doesn't look terribly onerous.

I've added a patch with a copy of the BSD license suitably modified (they only 
reference the BSD license in the snowball materials) and I've added a few lines 
to NOTICE.txt as seems to be required(?): 
http://www.apache.org/licenses/example-NOTICE.txt

 Bugs in contrib/snowball/.../SnowballProgram.java - Kraaij-Pohlmann gives 
 Index-OOB Exception
 --

 Key: LUCENE-740
 URL: https://issues.apache.org/jira/browse/LUCENE-740
 Project: Lucene - Java
  Issue Type: Bug
  Components: Analysis
Affects Versions: 1.9
 Environment: linux amd64
Reporter: Andreas Kohn
Priority: Minor
 Attachments: 740-license.txt, lucene-1.9.1-SnowballProgram.java, 
 snowball.patch.txt


 (copied from mail to java-user)
 while playing with the various stemmers of Lucene(-1.9.1), I got an
 index out of bounds exception:
 lucene-1.9.1java -cp
 build/contrib/snowball/lucene-snowball-1.9.2-dev.jar
 net.sf.snowball.TestApp Kp bla.txt
 Exception in thread main java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:64)
at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:615)
at net.sf.snowball.TestApp.main(TestApp.java:56)
 Caused by: java.lang.StringIndexOutOfBoundsException: String index out
 of range: 11
at java.lang.StringBuffer.charAt(StringBuffer.java:303)
at 
 net.sf.snowball.SnowballProgram.find_among_b(SnowballProgram.java:270)
at net.sf.snowball.ext.KpStemmer.r_Step_4(KpStemmer.java:1122)
at net.sf.snowball.ext.KpStemmer.stem(KpStemmer.java:1997)
 This happens when executing
 lucene-1.9.1java -cp
 build/contrib/snowball/lucene-snowball-1.9.2-dev.jar
 net.sf.snowball.TestApp Kp bla.txt
 bla.txt contains just this word: 'spijsvertering'.
 After some debugging, and some tests with the original snowball
 distribution from snowball.tartarus.org, it seems that the attached
 change is needed to avoid the exception.
 (The change comes from tartarus' SnowballProgram.java)

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Updated: (LUCENE-740) Bugs in contrib/snowball/.../SnowballProgram.java - Kraaij-Pohlmann gives Index-OOB Exception

2007-01-02 Thread Steven Parkes (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Parkes updated LUCENE-740:
-

Attachment: 740-license.txt

 Bugs in contrib/snowball/.../SnowballProgram.java - Kraaij-Pohlmann gives 
 Index-OOB Exception
 --

 Key: LUCENE-740
 URL: https://issues.apache.org/jira/browse/LUCENE-740
 Project: Lucene - Java
  Issue Type: Bug
  Components: Analysis
Affects Versions: 1.9
 Environment: linux amd64
Reporter: Andreas Kohn
Priority: Minor
 Attachments: 740-license.txt, lucene-1.9.1-SnowballProgram.java, 
 snowball.patch.txt


 (copied from mail to java-user)
 while playing with the various stemmers of Lucene(-1.9.1), I got an
 index out of bounds exception:
 lucene-1.9.1java -cp
 build/contrib/snowball/lucene-snowball-1.9.2-dev.jar
 net.sf.snowball.TestApp Kp bla.txt
 Exception in thread main java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:64)
at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:615)
at net.sf.snowball.TestApp.main(TestApp.java:56)
 Caused by: java.lang.StringIndexOutOfBoundsException: String index out
 of range: 11
at java.lang.StringBuffer.charAt(StringBuffer.java:303)
at 
 net.sf.snowball.SnowballProgram.find_among_b(SnowballProgram.java:270)
at net.sf.snowball.ext.KpStemmer.r_Step_4(KpStemmer.java:1122)
at net.sf.snowball.ext.KpStemmer.stem(KpStemmer.java:1997)
 This happens when executing
 lucene-1.9.1java -cp
 build/contrib/snowball/lucene-snowball-1.9.2-dev.jar
 net.sf.snowball.TestApp Kp bla.txt
 bla.txt contains just this word: 'spijsvertering'.
 After some debugging, and some tests with the original snowball
 distribution from snowball.tartarus.org, it seems that the attached
 change is needed to avoid the exception.
 (The change comes from tartarus' SnowballProgram.java)

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-707) Lucene Java Site docs

2007-01-02 Thread George Aroush (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12461894
 ] 

George Aroush commented on LUCENE-707:
--

Hi,

What will it take to fix the page at http://lucene.apache.org/ so that 
Lucene.Net is also on tab'ed-link?  On the 22nd of Nov Otis pointed this out 
but I still don't see a mention of Lucene.Net.

Since Lucene4c is a dead project now, replacing it with Lucene.Net is an 
appropriate thing to do (Lucene.Net can use some exposure.)  The link to 
Lucene.Net is: http://incubator.apache.org/lucene.net/

Also, as a note, the project name is Lucene.Net and not Lucene.NET.

Thanks!

-- George


 Lucene Java Site docs
 -

 Key: LUCENE-707
 URL: https://issues.apache.org/jira/browse/LUCENE-707
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Website
 Environment: N/A
Reporter: Grant Ingersoll
 Assigned To: Grant Ingersoll
Priority: Minor

 It would be really nice if the Java site docs where consistent with the rest 
 of the Lucene family (namely, with navigation tabs, etc.) so that one can 
 easily go between Nutch, Hadoop, etc.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-707) Lucene Java Site docs

2007-01-02 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12461902
 ] 

Grant Ingersoll commented on LUCENE-707:


Hi George, 

This is a top level issue.  You should be properly linked under the Lucene Java 
Related Projects section. 

I would supply a patch based on http://svn.apache.org/viewvc/lucene/site and 
then somehow get the attention of one of the TLP committers (PMC members, 
Doug?, Yonik?).

Good luck,
Grant 

 Lucene Java Site docs
 -

 Key: LUCENE-707
 URL: https://issues.apache.org/jira/browse/LUCENE-707
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Website
 Environment: N/A
Reporter: Grant Ingersoll
 Assigned To: Grant Ingersoll
Priority: Minor

 It would be really nice if the Java site docs where consistent with the rest 
 of the Lucene family (namely, with navigation tabs, etc.) so that one can 
 easily go between Nutch, Hadoop, etc.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-707) Lucene Java Site docs

2007-01-02 Thread George Aroush (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12461904
 ] 

George Aroush commented on LUCENE-707:
--

Thanks for the quick response Grant!

Unfortunately, I am not familiar with forrest which is how I believe the 
patch must be generated.  If I supply the required text changes, can someone 
take care of making the changes?  If not, can someone point me to where I can 
learn about forrest?

I believe Doug does have commit privilege.

Regards,

-- George

 Lucene Java Site docs
 -

 Key: LUCENE-707
 URL: https://issues.apache.org/jira/browse/LUCENE-707
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Website
 Environment: N/A
Reporter: Grant Ingersoll
 Assigned To: Grant Ingersoll
Priority: Minor

 It would be really nice if the Java site docs where consistent with the rest 
 of the Lucene family (namely, with navigation tabs, etc.) so that one can 
 easily go between Nutch, Hadoop, etc.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]