RE: CHANGES questions

2009-09-21 Thread Uwe Schindler
I've been reading through CHANGES.txt and had a few questions/comments: 1. The attribute entry still says Token is deprecated. I can fix, but isn't a huge deal. Another one? +1 for changing. 2. L-1658 talks about changing FSDirectory for SimpleDirectory and adds a static open() method,

Re: CHANGES questions

2009-09-21 Thread Michael McCandless
On Sun, Sep 20, 2009 at 7:40 PM, Mark Miller markrmil...@gmail.com wrote: Mark Miller wrote: Something along the lines of:  * LUCENE-1658, LUCENE-1451: Renamed FSDirectory to SimpleFSDirectory    (but left an FSDirectory base class).  Added an FSDirectory.open    static method to pick a

RE: svn commit: r817220 - /lucene/java/trunk/CHANGES.txt

2009-09-21 Thread Uwe Schindler
And inline in your diff we have the deprecated Token class: * LUCENE-1422, LUCENE-1693: New TokenStream API that uses a new class called AttributeSource instead of the now deprecated Token class. All attributes that the Token class had have been moved into separate classes: @@

Re: svn commit: r817220 - /lucene/java/trunk/CHANGES.txt

2009-09-21 Thread Mark Miller
Uwe Schindler wrote: And inline in your diff we have the deprecated Token class: * LUCENE-1422, LUCENE-1693: New TokenStream API that uses a new class called AttributeSource instead of the now deprecated Token class. All attributes that the Token class had have been moved into

RE: svn commit: r817220 - /lucene/java/trunk/CHANGES.txt

2009-09-21 Thread Uwe Schindler
This was the answer about your first commit (merge FSDir stuff). At the time I posted the answer, you fixed the deprecated Token thing :-) - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Mark Miller

Re: svn commit: r817220 - /lucene/java/trunk/CHANGES.txt

2009-09-21 Thread Mark Miller
I see what you mean! My first change had the dep token piece in the diff. Thats a funny coincidence. Through me for a loop. Uwe Schindler wrote: This was the answer about your first commit (merge FSDir stuff). At the time I posted the answer, you fixed the deprecated Token thing :-) -

Re: ReleaseTodo steps

2009-09-21 Thread Mark Miller
Grant Ingersoll wrote: On Sep 17, 2009, at 3:07 PM, Mark Miller wrote: So in the section: Building the Release artifacts bullet 8: Make sure that for each release file an md5 checksum file exists. At this step in the process, the zip/tars do not have an md5 checksum file that exists (at

RE: ReleaseTodo steps

2009-09-21 Thread Uwe Schindler
Oddly though, while all of the Maven hashes are in a file thats 32bytes, when I save this hash, its 33bytes. Any thoughts? Line feed? - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional

Re: ReleaseTodo steps

2009-09-21 Thread John Wang
Hi Guys: A quick comment on 2.9 release: org.apache.lucene.Weight interface has been changed to an abstract class. This is a non-backward compatible change and would break many custom Query implementations. Is this intentional? Thanks -John On Mon, Sep 21, 2009 at 8:59 PM, Uwe Schindler

Re: ReleaseTodo steps

2009-09-21 Thread Mark Miller
Uwe Schindler wrote: Oddly though, while all of the Maven hashes are in a file thats 32bytes, when I save this hash, its 33bytes. Any thoughts? Line feed? - To unsubscribe, e-mail:

Re: ReleaseTodo steps

2009-09-21 Thread Mark Miller
Yeah it is, sorry :( Check out the back compat break section in changes - its the first section I think. John Wang wrote: Hi Guys: A quick comment on 2.9 release: org.apache.lucene.Weight interface has been changed to an abstract class. This is a non-backward compatible change and

TermCount per fiend

2009-09-21 Thread John Wang
Hi guys: Not sure if this would be a better fit on the users or the dev list. It would be very useful to be able to get term count given a field, e.g. int IndexReader.termCount(String field) Wanted to get your opinion on what is the best way to approach this. After looking

Re: ReleaseTodo steps

2009-09-21 Thread John Wang
Thanks Mark for the clarification! -John On Mon, Sep 21, 2009 at 9:09 PM, Mark Miller markrmil...@gmail.com wrote: Yeah it is, sorry :( Check out the back compat break section in changes - its the first section I think. John Wang wrote: Hi Guys: A quick comment on 2.9 release:

Re: ReleaseTodo steps

2009-09-21 Thread Yonik Seeley
On Mon, Sep 21, 2009 at 8:56 AM, Mark Miller markrmil...@gmail.com wrote: Have you done this before Yonik? md5sum generates a hash line like this: a21f40c4f4fb1c54903e761caf43e1d7 *lucene-2.9.0.tar.gz Remove the '*' character? 1. Lucene 2.4.1 doesn't seem to have these md5 hashes for the non

Re: ReleaseTodo steps

2009-09-21 Thread Mark Miller
Thanks! I assumed you dropped the second part entirely, because the Maven artifact md5's only appear to have the hash. Your link to the dist with the non Maven md5's clears that up though. I guess the mirrors just don't have the md5 files. bq. All of the old releases used to be there, but they

Re: ReleaseTodo steps

2009-09-21 Thread Mark Miller
Yonik Seeley wrote: On Mon, Sep 21, 2009 at 8:56 AM, Mark Miller markrmil...@gmail.com wrote: Have you done this before Yonik? md5sum generates a hash line like this: a21f40c4f4fb1c54903e761caf43e1d7 *lucene-2.9.0.tar.gz Remove the '*' character? Oddly, my version of md5sum

[no subject]

2009-09-21 Thread Thomas D'Silva
I would like to contribute a class based on the MoreLikeThis class in contrib/queries that generates a query based on the tags associated with a document. The class assumes that documents are tagged with a set of tags (which are stored in the index in a seperate Field). The class determines the

2.9 vote

2009-09-21 Thread Mark Miller
Uploading 2.9 vote candidate as I type. Gonna check it out a bit more after the upload too, but when its up, I *think* we are ready to begin the vote process. I'll send out an official vote start email a bit later (I've got to CC the general mailing list as well). Hopefully I haven't screwed up

[jira] Commented: (LUCENE-1910) Extension to MoreLikeThis to use tag information

2009-09-21 Thread Mark Harwood (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12757924#action_12757924 ] Mark Harwood commented on LUCENE-1910: -- Hi Thomas, Following your request for

RE: svn commit: r817286 - /lucene/java/site/docs/doap.rdf

2009-09-21 Thread Steven A Rowe
Mark Miller wrote: release Version +nameLucene 2.9.0/name +created2009-09-23/created +revision2.9.0/revision + /Version Stupid question from the peanut gallery: Doesn't a VOTE require 3 days? I ask because (3 + 2009-09-21) = 2009-09-24, not -23.

Re: svn commit: r817286 - /lucene/java/site/docs/doap.rdf

2009-09-21 Thread Mark Miller
Steven A Rowe wrote: Mark Miller wrote: release Version +nameLucene 2.9.0/name +created2009-09-23/created +revision2.9.0/revision + /Version Stupid question from the peanut gallery: Doesn't a VOTE require 3 days? I ask because (3 +

Re: svn commit: r817286 - /lucene/java/site/docs/doap.rdf

2009-09-21 Thread Yonik Seeley
On Mon, Sep 21, 2009 at 12:45 PM, Mark Miller markrmil...@gmail.com wrote: I actually almost sent an email questioning it, but the day is supposed to be an estimate, so I figure its likely to be off a day or two anyway. +1, don't worry about it. Need to wait for mirrors to sync anyway, so it's

[VOTE] Release Lucene 2.9.0

2009-09-21 Thread Mark Miller
Okay, lets give this a shot: The (proposed) release artifacts have been built and are up at: http://people.apache.org/~markrmiller/staging-area/lucene2.9/ The changes are here: http://people.apache.org/~markrmiller/staging-area/lucene2.9changes/ Please vote to officially release these

[jira] Commented: (LUCENE-1781) Large distances in Spatial go beyond Prime MEridian

2009-09-21 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12757961#action_12757961 ] Michael McCandless commented on LUCENE-1781: bq. Can we go with my patch, and

[jira] Created: (LUCENE-1921) Absurdly large radius (miles) search fails to include entire earth

2009-09-21 Thread Michael McCandless (JIRA)
Absurdly large radius (miles) search fails to include entire earth -- Key: LUCENE-1921 URL: https://issues.apache.org/jira/browse/LUCENE-1921 Project: Lucene - Java Issue Type:

[jira] Commented: (LUCENE-1781) Large distances in Spatial go beyond Prime MEridian

2009-09-21 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12757963#action_12757963 ] Michael McCandless commented on LUCENE-1781: I opened LUCENE-1921. Large

Re: TermCount per fiend

2009-09-21 Thread Michael McCandless
MultiReaders can't quickly compute the exact term count. Would they be allowed to throw UOE? (Like IndexReader.getUniqueTermCount) TermsHashPerField.numPostings (not .numPostingsInt) tells you the # unique terms currently in IndexWriter's RAM buffer, so I think we could save that out with

[jira] Commented: (LUCENE-1781) Large distances in Spatial go beyond Prime MEridian

2009-09-21 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12757972#action_12757972 ] Michael McCandless commented on LUCENE-1781: Mark is it OK to commit this now?

Re: How to leverage the LogMergePolicy calibrateSizeByDeletes patch in Solr ?

2009-09-21 Thread Jason Rutherglen
John, It would be great if Lucene's benchmark were used so everyone could execute the test in their own environment and verify. It's not clear the settings or code used to generate the results so it's difficult to draw any reliable conclusions. The steep spike shows greater evidence for the IO

[jira] Commented: (LUCENE-1917) ShingleFilter include words

2009-09-21 Thread Jason Rutherglen (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12758056#action_12758056 ] Jason Rutherglen commented on LUCENE-1917: -- I'm going to port SOLR-908 rather

Re: How to leverage the LogMergePolicy calibrateSizeByDeletes patch in Solr ?

2009-09-21 Thread John Wang
Jason: Before jumping into any conclusions, let me describe the test setup. It is rather different from Lucene benchmark as we are testing high updates in a realtime environment: We took a public corpus: medline, indexed to approximately 3 million docs. And update all the docs over and

Re: [jira] Commented: (LUCENE-1781) Large distances in Spatial go beyond Prime MEridian

2009-09-21 Thread Mark Miller
+1 - commit away. - Mark http://www.lucidimagination.com (mobile) On Sep 21, 2009, at 2:08 PM, Michael McCandless (JIRA) j...@apache.org wrote: [

Re: [jira] Commented: (LUCENE-1781) Large distances in Spatial go beyond Prime MEridian

2009-09-21 Thread Michael McCandless
Super, will do! Mike On Mon, Sep 21, 2009 at 7:52 PM, Mark Miller markrmil...@gmail.com wrote: +1 - commit away. - Mark http://www.lucidimagination.com (mobile) On Sep 21, 2009, at 2:08 PM, Michael McCandless (JIRA) j...@apache.org wrote:   [

[jira] Resolved: (LUCENE-1781) Large distances in Spatial go beyond Prime MEridian

2009-09-21 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-1781. Resolution: Fixed Fix Version/s: (was: 3.1) 2.9

Re: TermCount per fiend

2009-09-21 Thread John Wang
Thanks Michael! Makes lotta sense to me to wait for LUCENE-1458 then. Should I create an issue with a depedency on 1458? One application for this is within FieldCache construction of StringIndex: If we know the number of terms is small, the orderArray using an int per doc is wasteful. In the

Re: TermCount per fiend

2009-09-21 Thread Michael McCandless
On Mon, Sep 21, 2009 at 8:11 PM, John Wang john.w...@gmail.com wrote: Makes lotta sense to me to wait for LUCENE-1458 then. Should I create an issue with a depedency on 1458? Yes please open a new issue. One application for this is within FieldCache construction of StringIndex: If we know

[jira] Created: (LUCENE-1922) exposing the ability to get the number of unique term count per field

2009-09-21 Thread John Wang (JIRA)
exposing the ability to get the number of unique term count per field - Key: LUCENE-1922 URL: https://issues.apache.org/jira/browse/LUCENE-1922 Project: Lucene - Java Issue

Re: How to leverage the LogMergePolicy calibrateSizeByDeletes patch in Solr ?

2009-09-21 Thread Ted Dunning
John, I think that inherent in your test is a uniform distribution of updates. This seems unrealistic to me, not least because any distribution of updates caused by a population of objects interacting with each other should be translation invariant in time which is something a uniform

Re: How to leverage the LogMergePolicy calibrateSizeByDeletes patch in Solr ?

2009-09-21 Thread Jason Rutherglen
I'm not sure I communicated the idea properly. If CMS is set to 1 thread, no matter how intensive the CPU for a merge, it's limited to 1 core of what is in many cases a 4 or 8 core server. That leaves the other 3 or 7 cores for queries, which if slow, indicates that it isn't the merging that's

Re: Welcome, Koji

2009-09-21 Thread Robert Muir
welcome! On Mon, Sep 21, 2009 at 8:06 PM, Michael McCandless luc...@mikemccandless.com wrote: A warm welcome to our newest Lucene contrib committer, Koji Sekiguchi! Koji has given us the FastVectorHighlighter and CharFilter, among other fun things. He's also a committer in Solr. Welcome

Re: Welcome, Koji

2009-09-21 Thread Koji Sekiguchi
Hello everyone, I'm happy to be a new member of the contrib committers of Lucene. I hope I can help to improve Lucene in 3.0 and the future. Currently, I carry on my own company, RONDHUIT, based on Tokyo. In the company, we provide Lucene/Solr consulting and support services for our customers.

Re: How to leverage the LogMergePolicy calibrateSizeByDeletes patch in Solr ?

2009-09-21 Thread John Wang
Hi Ted: In our case it is profile updates. Each profile - 1 document keyed on member id. We do experience people updating their profile and the assumption is every member is likely to update their profile (that is a bit aggressive I'd agree, but it is nevertheless a safe upper bound)

Re: How to leverage the LogMergePolicy calibrateSizeByDeletes patch in Solr ?

2009-09-21 Thread John Wang
Jason: You are missing the point. The idea is to avoid merging of large segments. The point of this MergePolicy is to balance segment merges across the index. The aim is not to have 1 large segment, it is to have n segments with balanced sizes. When the large segment is out of the

Re: Welcome, Koji

2009-09-21 Thread Mark Miller
Welcome aboard Koji! - Mark Koji Sekiguchi wrote: Hello everyone, I'm happy to be a new member of the contrib committers of Lucene. I hope I can help to improve Lucene in 3.0 and the future. Currently, I carry on my own company, RONDHUIT, based on Tokyo. In the company, we provide

[jira] Updated: (LUCENE-995) Add open ended range query syntax to QueryParser

2009-09-21 Thread Adriano Crestani (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adriano Crestani updated LUCENE-995: Attachment: LUCENE-995_09_21_2009.patch The patch adds open ended range query to

Re: ReleaseTodo steps

2009-09-21 Thread Chris Hostetter
: md5sum generates a hash line like this: : a21f40c4f4fb1c54903e761caf43e1d7 *lucene-2.9.0.tar.gz : : Then when you do a check, it knows what file to check against. : : The Maven artifacts just list the hash though. So it seems proper to : remove the second part and just put the hash? Some

Re: Welcome, Koji

2009-09-21 Thread Shalin Shekhar Mangar
On Tue, Sep 22, 2009 at 5:36 AM, Michael McCandless luc...@mikemccandless.com wrote: A warm welcome to our newest Lucene contrib committer, Koji Sekiguchi! Koji has given us the FastVectorHighlighter and CharFilter, among other fun things. He's also a committer in Solr. Welcome aboard!

Build failed in Hudson: Lucene-trunk #955

2009-09-21 Thread Apache Hudson Server
See http://hudson.zones.apache.org/hudson/job/Lucene-trunk/955/ -- [...truncated 15617 lines...] [junit] [junit] Testsuite: org.apache.lucene.queryParser.TestMultiAnalyzer [junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 1.228 sec

2.9 NRT w.r.t. sorting and field cache

2009-09-21 Thread John Wang
Looking at the code, seems there is a disconnect between how/when field cache is loaded when IndexWriter.getReader() is called. Is FieldCache updated? Otherwise, are we reloading FieldCache for each reader instance? Seems for operations that lazy loads field cache, e.g. sorting, this has a

Re: 2.9 NRT w.r.t. sorting and field cache

2009-09-21 Thread Yonik Seeley
On Tue, Sep 22, 2009 at 12:56 AM, John Wang john.w...@gmail.com wrote: Looking at the code, seems there is a disconnect between how/when field cache is loaded when IndexWriter.getReader() is called. I'm not sure what you mean by disconnect Is FieldCache updated? FieldCache entries are

Re: Build failed in Hudson: Lucene-trunk #955

2009-09-21 Thread Yonik Seeley
On Tue, Sep 22, 2009 at 12:44 AM, Apache Hudson Server hud...@hudson.zones.apache.org wrote: BUILD FAILED http://hudson.zones.apache.org/hudson/job/Lucene-trunk/ws/trunk/build.xml:142: The following error occurred while executing this line:

Re: 2.9 NRT w.r.t. sorting and field cache

2009-09-21 Thread John Wang
Hi Yonik: Actually that is what I am looking for. Can you please point me to where/how sorting is done per-segment? When heaving indexing introduces or modifies segments, would it cause reloading of FieldCache at query time and thus would impact search performance? thanks -John On

RE: [jira] Commented: (LUCENE-1781) Large distances in Spatial go beyond Prime MEridian

2009-09-21 Thread Uwe Schindler
I thought, we are already in the voting phase? - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Mark Miller [mailto:markrmil...@gmail.com] Sent: Tuesday, September 22, 2009 1:52 AM To:

RE: Welcome, Koji

2009-09-21 Thread Uwe Schindler
Welcome Koji! - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Koji Sekiguchi [mailto:k...@r.email.ne.jp] Sent: Tuesday, September 22, 2009 3:17 AM To: java-dev@lucene.apache.org Subject: Re: Welcome,