I've been reading through CHANGES.txt and had a few questions/comments:
1. The attribute entry still says Token is deprecated. I can fix, but
isn't a huge deal.
Another one? +1 for changing.
2. L-1658 talks about changing FSDirectory for SimpleDirectory and
adds a static open() method,
On Sun, Sep 20, 2009 at 7:40 PM, Mark Miller markrmil...@gmail.com wrote:
Mark Miller wrote:
Something along the lines of:
* LUCENE-1658, LUCENE-1451: Renamed FSDirectory to SimpleFSDirectory
(but left an FSDirectory base class). Added an FSDirectory.open
static method to pick a
And inline in your diff we have the deprecated Token class:
* LUCENE-1422, LUCENE-1693: New TokenStream API that uses a new class
called
AttributeSource instead of the now deprecated Token class. All
attributes
that the Token class had have been moved into separate classes:
@@
Uwe Schindler wrote:
And inline in your diff we have the deprecated Token class:
* LUCENE-1422, LUCENE-1693: New TokenStream API that uses a new class
called
AttributeSource instead of the now deprecated Token class. All
attributes
that the Token class had have been moved into
This was the answer about your first commit (merge FSDir stuff). At the time
I posted the answer, you fixed the deprecated Token thing :-)
-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de
-Original Message-
From: Mark Miller
I see what you mean! My first change had the dep token piece in the
diff. Thats a funny coincidence. Through me for a loop.
Uwe Schindler wrote:
This was the answer about your first commit (merge FSDir stuff). At the time
I posted the answer, you fixed the deprecated Token thing :-)
-
Grant Ingersoll wrote:
On Sep 17, 2009, at 3:07 PM, Mark Miller wrote:
So in the section: Building the Release artifacts
bullet 8: Make sure that for each release file an md5 checksum file
exists.
At this step in the process, the zip/tars do not have an md5 checksum
file that exists (at
Oddly though, while all of the Maven hashes are in a file thats 32bytes,
when I save this hash, its 33bytes.
Any thoughts?
Line feed?
-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional
Hi Guys:
A quick comment on 2.9 release:
org.apache.lucene.Weight interface has been changed to an abstract class.
This is a non-backward compatible change and would break many custom Query
implementations. Is this intentional?
Thanks
-John
On Mon, Sep 21, 2009 at 8:59 PM, Uwe Schindler
Uwe Schindler wrote:
Oddly though, while all of the Maven hashes are in a file thats 32bytes,
when I save this hash, its 33bytes.
Any thoughts?
Line feed?
-
To unsubscribe, e-mail:
Yeah it is, sorry :(
Check out the back compat break section in changes - its the first
section I think.
John Wang wrote:
Hi Guys:
A quick comment on 2.9 release:
org.apache.lucene.Weight interface has been changed to an abstract
class. This is a non-backward compatible change and
Hi guys:
Not sure if this would be a better fit on the users or the dev list.
It would be very useful to be able to get term count given a field,
e.g.
int IndexReader.termCount(String field)
Wanted to get your opinion on what is the best way to approach this.
After looking
Thanks Mark for the clarification!
-John
On Mon, Sep 21, 2009 at 9:09 PM, Mark Miller markrmil...@gmail.com wrote:
Yeah it is, sorry :(
Check out the back compat break section in changes - its the first
section I think.
John Wang wrote:
Hi Guys:
A quick comment on 2.9 release:
On Mon, Sep 21, 2009 at 8:56 AM, Mark Miller markrmil...@gmail.com wrote:
Have you done this before Yonik?
md5sum generates a hash line like this:
a21f40c4f4fb1c54903e761caf43e1d7 *lucene-2.9.0.tar.gz
Remove the '*' character?
1. Lucene 2.4.1 doesn't seem to have these md5 hashes for the non
Thanks! I assumed you dropped the second part entirely, because the
Maven artifact md5's only appear to have the hash. Your link to the dist
with the non Maven md5's clears that up though. I guess the mirrors just
don't have the md5 files.
bq. All of the old releases used to be there, but they
Yonik Seeley wrote:
On Mon, Sep 21, 2009 at 8:56 AM, Mark Miller markrmil...@gmail.com wrote:
Have you done this before Yonik?
md5sum generates a hash line like this:
a21f40c4f4fb1c54903e761caf43e1d7 *lucene-2.9.0.tar.gz
Remove the '*' character?
Oddly, my version of md5sum
I would like to contribute a class based on the MoreLikeThis class in
contrib/queries that generates a query based on the tags associated
with a document. The class assumes that documents are tagged with a
set of tags (which are stored in the index in a seperate Field). The
class determines the
Uploading 2.9 vote candidate as I type.
Gonna check it out a bit more after the upload too, but when its up, I
*think* we are ready to begin the vote process.
I'll send out an official vote start email a bit later (I've got to CC
the general mailing list as well).
Hopefully I haven't screwed up
[
https://issues.apache.org/jira/browse/LUCENE-1910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12757924#action_12757924
]
Mark Harwood commented on LUCENE-1910:
--
Hi Thomas,
Following your request for
Mark Miller wrote:
release
Version
+nameLucene 2.9.0/name
+created2009-09-23/created
+revision2.9.0/revision
+ /Version
Stupid question from the peanut gallery:
Doesn't a VOTE require 3 days? I ask because (3 + 2009-09-21) = 2009-09-24,
not -23.
Steven A Rowe wrote:
Mark Miller wrote:
release
Version
+nameLucene 2.9.0/name
+created2009-09-23/created
+revision2.9.0/revision
+ /Version
Stupid question from the peanut gallery:
Doesn't a VOTE require 3 days? I ask because (3 +
On Mon, Sep 21, 2009 at 12:45 PM, Mark Miller markrmil...@gmail.com wrote:
I actually almost sent an email questioning it, but the day is supposed
to be an estimate, so I figure its likely to be off a day or two anyway.
+1, don't worry about it.
Need to wait for mirrors to sync anyway, so it's
Okay, lets give this a shot:
The (proposed) release artifacts have been built and are up at:
http://people.apache.org/~markrmiller/staging-area/lucene2.9/
The changes are here:
http://people.apache.org/~markrmiller/staging-area/lucene2.9changes/
Please vote to officially release these
[
https://issues.apache.org/jira/browse/LUCENE-1781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12757961#action_12757961
]
Michael McCandless commented on LUCENE-1781:
bq. Can we go with my patch, and
Absurdly large radius (miles) search fails to include entire earth
--
Key: LUCENE-1921
URL: https://issues.apache.org/jira/browse/LUCENE-1921
Project: Lucene - Java
Issue Type:
[
https://issues.apache.org/jira/browse/LUCENE-1781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12757963#action_12757963
]
Michael McCandless commented on LUCENE-1781:
I opened LUCENE-1921.
Large
MultiReaders can't quickly compute the exact term count. Would they
be allowed to throw UOE? (Like IndexReader.getUniqueTermCount)
TermsHashPerField.numPostings (not .numPostingsInt) tells you the #
unique terms currently in IndexWriter's RAM buffer, so I think we
could save that out with
[
https://issues.apache.org/jira/browse/LUCENE-1781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12757972#action_12757972
]
Michael McCandless commented on LUCENE-1781:
Mark is it OK to commit this now?
John,
It would be great if Lucene's benchmark were used so everyone
could execute the test in their own environment and verify. It's
not clear the settings or code used to generate the results so
it's difficult to draw any reliable conclusions.
The steep spike shows greater evidence for the IO
[
https://issues.apache.org/jira/browse/LUCENE-1917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12758056#action_12758056
]
Jason Rutherglen commented on LUCENE-1917:
--
I'm going to port SOLR-908 rather
Jason:
Before jumping into any conclusions, let me describe the test setup. It
is rather different from Lucene benchmark as we are testing high updates in
a realtime environment:
We took a public corpus: medline, indexed to approximately 3 million
docs. And update all the docs over and
+1 - commit away.
- Mark
http://www.lucidimagination.com (mobile)
On Sep 21, 2009, at 2:08 PM, Michael McCandless (JIRA) j...@apache.org
wrote:
[
Super, will do!
Mike
On Mon, Sep 21, 2009 at 7:52 PM, Mark Miller markrmil...@gmail.com wrote:
+1 - commit away.
- Mark
http://www.lucidimagination.com (mobile)
On Sep 21, 2009, at 2:08 PM, Michael McCandless (JIRA) j...@apache.org
wrote:
[
[
https://issues.apache.org/jira/browse/LUCENE-1781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael McCandless resolved LUCENE-1781.
Resolution: Fixed
Fix Version/s: (was: 3.1)
2.9
Thanks Michael!
Makes lotta sense to me to wait for LUCENE-1458 then. Should I create an
issue with a depedency on 1458?
One application for this is within FieldCache construction of StringIndex:
If we know the number of terms is small, the orderArray using an int per doc
is wasteful. In the
On Mon, Sep 21, 2009 at 8:11 PM, John Wang john.w...@gmail.com wrote:
Makes lotta sense to me to wait for LUCENE-1458 then. Should I create an
issue with a depedency on 1458?
Yes please open a new issue.
One application for this is within FieldCache construction of StringIndex:
If we know
exposing the ability to get the number of unique term count per field
-
Key: LUCENE-1922
URL: https://issues.apache.org/jira/browse/LUCENE-1922
Project: Lucene - Java
Issue
John,
I think that inherent in your test is a uniform distribution of updates.
This seems unrealistic to me, not least because any distribution of updates
caused by a population of objects interacting with each other should be
translation invariant in time which is something a uniform
I'm not sure I communicated the idea properly. If CMS is set to
1 thread, no matter how intensive the CPU for a merge, it's
limited to 1 core of what is in many cases a 4 or 8 core server.
That leaves the other 3 or 7 cores for queries, which if slow,
indicates that it isn't the merging that's
welcome!
On Mon, Sep 21, 2009 at 8:06 PM, Michael McCandless
luc...@mikemccandless.com wrote:
A warm welcome to our newest Lucene contrib committer, Koji Sekiguchi!
Koji has given us the FastVectorHighlighter and CharFilter, among
other fun things. He's also a committer in Solr.
Welcome
Hello everyone,
I'm happy to be a new member of the contrib committers of Lucene.
I hope I can help to improve Lucene in 3.0 and the future.
Currently, I carry on my own company, RONDHUIT, based on Tokyo.
In the company, we provide Lucene/Solr consulting and support
services for our customers.
Hi Ted:
In our case it is profile updates. Each profile - 1 document keyed on
member id.
We do experience people updating their profile and the assumption is
every member is likely to update their profile (that is a bit aggressive I'd
agree, but it is nevertheless a safe upper bound)
Jason:
You are missing the point.
The idea is to avoid merging of large segments. The point of this
MergePolicy is to balance segment merges across the index. The aim is not to
have 1 large segment, it is to have n segments with balanced sizes.
When the large segment is out of the
Welcome aboard Koji!
- Mark
Koji Sekiguchi wrote:
Hello everyone,
I'm happy to be a new member of the contrib committers of Lucene.
I hope I can help to improve Lucene in 3.0 and the future.
Currently, I carry on my own company, RONDHUIT, based on Tokyo.
In the company, we provide
[
https://issues.apache.org/jira/browse/LUCENE-995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Adriano Crestani updated LUCENE-995:
Attachment: LUCENE-995_09_21_2009.patch
The patch adds open ended range query to
: md5sum generates a hash line like this:
: a21f40c4f4fb1c54903e761caf43e1d7 *lucene-2.9.0.tar.gz
:
: Then when you do a check, it knows what file to check against.
:
: The Maven artifacts just list the hash though. So it seems proper to
: remove the second part and just put the hash?
Some
On Tue, Sep 22, 2009 at 5:36 AM, Michael McCandless
luc...@mikemccandless.com wrote:
A warm welcome to our newest Lucene contrib committer, Koji Sekiguchi!
Koji has given us the FastVectorHighlighter and CharFilter, among
other fun things. He's also a committer in Solr.
Welcome aboard!
See http://hudson.zones.apache.org/hudson/job/Lucene-trunk/955/
--
[...truncated 15617 lines...]
[junit]
[junit] Testsuite: org.apache.lucene.queryParser.TestMultiAnalyzer
[junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 1.228 sec
Looking at the code, seems there is a disconnect between how/when field
cache is loaded when IndexWriter.getReader() is called.
Is FieldCache updated? Otherwise, are we reloading FieldCache for each
reader instance?
Seems for operations that lazy loads field cache, e.g. sorting, this has a
On Tue, Sep 22, 2009 at 12:56 AM, John Wang john.w...@gmail.com wrote:
Looking at the code, seems there is a disconnect between how/when field
cache is loaded when IndexWriter.getReader() is called.
I'm not sure what you mean by disconnect
Is FieldCache updated?
FieldCache entries are
On Tue, Sep 22, 2009 at 12:44 AM, Apache Hudson Server
hud...@hudson.zones.apache.org wrote:
BUILD FAILED
http://hudson.zones.apache.org/hudson/job/Lucene-trunk/ws/trunk/build.xml:142:
The following error occurred while executing this line:
Hi Yonik:
Actually that is what I am looking for. Can you please point me to
where/how sorting is done per-segment?
When heaving indexing introduces or modifies segments, would it cause
reloading of FieldCache at query time and thus would impact search
performance?
thanks
-John
On
I thought, we are already in the voting phase?
-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de
-Original Message-
From: Mark Miller [mailto:markrmil...@gmail.com]
Sent: Tuesday, September 22, 2009 1:52 AM
To:
Welcome Koji!
-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de
-Original Message-
From: Koji Sekiguchi [mailto:k...@r.email.ne.jp]
Sent: Tuesday, September 22, 2009 3:17 AM
To: java-dev@lucene.apache.org
Subject: Re: Welcome,
54 matches
Mail list logo