Joerg Hohwiller wrote:
>> When we'll need .sha1 and .md5 files for all pushed Jars.
>> One of the other developers will have to do that,
>> as I don't have my PGP set up,
>> and hence no key for the KEYS file (if that's needed for the .sha1).
> You do not need PGP or something like this for SHA-* o
On Apr 5, 2007, at 5:26 PM, Michael McCandless wrote:
What we need to do is cut down on decompression and conflict
resolution costs when reading from one segment to another. KS has
solved this problem for stored fields. Field defs are global and
field values are keyed by name rather than fiel
"Grant Ingersoll" <[EMAIL PROTECTED]> wrote:
>
> Michael, like everyone else, I am watching this very closely. So far
> it sounds great!
>
> On Apr 5, 2007, at 8:03 PM, Michael McCandless wrote:
>
> > When I measure "amount of RAM @ flush time", I'm calling
> > MemoryMXBean.getHeapMemoryUsage
Michael, like everyone else, I am watching this very closely. So far
it sounds great!
On Apr 5, 2007, at 8:03 PM, Michael McCandless wrote:
When I measure "amount of RAM @ flush time", I'm calling
MemoryMXBean.getHeapMemoryUsage().getUsed(). So, this measures actual
process memory usage w
"Marvin Humphrey" <[EMAIL PROTECTED]> wrote:
> On Apr 5, 2007, at 12:06 PM, Michael McCandless wrote:
>
> >>> (I think for KS you "add" a previous segment not that
> >>> differently from how you "add" a document)?
> >>
> >> Yeah. KS has to decompress and serialize posting content, which sux.
> >
Hi Otis!
"Otis Gospodnetic" <[EMAIL PROTECTED]> wrote:
> You talk about a RAM buffer from 1MB - 96MB, but then you have the amount
> of RAM @ flush time (e.g. Avg RAM used (MB) @ flush: old34.5; new
> 3.4 [ 90.1% less]).
>
> I don't follow 100% of what you are doing in LUCENE-843, so
"Mike Klaas" <[EMAIL PROTECTED]> wrote:
> On 4/5/07, Chris Hostetter <[EMAIL PROTECTED]> wrote:
> >
> > : Thanks! But remember many Lucene apps won't see these speedups since I've
> > : carefully minimized cost of tokenization and cost of document retrieval.
> > I
> > : think for many Lucene ap
On 4/4/07, Otis Gospodnetic (JIRA) <[EMAIL PROTECTED]> wrote:
[
https://issues.apache.org/jira/browse/LUCENE-796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Otis Gospodnetic resolved LUCENE-796.
-
Resolution: Fixed
Makes s
[
https://issues.apache.org/jira/browse/LUCENE-857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12487116
]
Hoss Man commented on LUCENE-857:
-
>From email since i didn't notice Otis opened this issue already...
Date: Thu, 5
On 4/5/07, Chris Hostetter <[EMAIL PROTECTED]> wrote:
: Thanks! But remember many Lucene apps won't see these speedups since I've
: carefully minimized cost of tokenization and cost of document retrieval. I
: think for many Lucene apps these are a sizable part of time spend indexing.
true, bu
[
https://issues.apache.org/jira/browse/LUCENE-855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12487108
]
Matt Ericson commented on LUCENE-855:
-
I am almost done with my patch and I wanted to test it against this patch
: Thanks! But remember many Lucene apps won't see these speedups since I've
: carefully minimized cost of tokenization and cost of document retrieval. I
: think for many Lucene apps these are a sizable part of time spend indexing.
true, but as long as the changes you are making has no impact on
[
https://issues.apache.org/jira/browse/LUCENE-584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Otis Gospodnetic updated LUCENE-584:
Attachment: bench-diff.txt
Perhaps I did something wrong with the benchmark, but I didn't g
: Since caching is built into the public BitSet bits(IndexReader reader)
: method, I don't see a way to deprecate that, which means I'll just cut
: it out and document it in CHANGES.txt. Anyone who wants QueryFilter
: caching will be able to get the caching back by wrapping the QueryFilter
: in y
On Apr 5, 2007, at 12:06 PM, Michael McCandless wrote:
(I think for KS you "add" a previous segment not that
differently from how you "add" a document)?
Yeah. KS has to decompress and serialize posting content, which sux.
The one saving grace is that with the Fibonacci merge schedule and
th
[
https://issues.apache.org/jira/browse/LUCENE-857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Otis Gospodnetic updated LUCENE-857:
Attachment: LUCENE-857.patch
QueryFilter without caching.
I'll commit it tomorrow (Friday)
I'm not saying I'm against it, but one of the things that makes
Lucene so great is it's lack of dependencies in the core. It isn't
necessarily a slippery slope, either, if we do add one dependency.
Javolution is BSD license, AFAICT. I don't know if that is a good or
bad license as far as
Remove BitSet caching from QueryFilter
--
Key: LUCENE-857
URL: https://issues.apache.org/jira/browse/LUCENE-857
Project: Lucene - Java
Issue Type: Improvement
Reporter: Otis Gospodnetic
Sounds like I need to cut that out.
Since caching is built into the public BitSet bits(IndexReader reader) method,
I don't see a way to deprecate that, which means I'll just cut it out and
document it in CHANGES.txt. Anyone who wants QueryFilter caching will be able
to get the caching back by
Quick question, Mike:
You talk about a RAM buffer from 1MB - 96MB, but then you have the amount of
RAM @ flush time (e.g. Avg RAM used (MB) @ flush: old34.5; new 3.4 [
90.1% less]).
I don't follow 100% of what you are doing in LUCENE-843, so could you please
explain what these 2 dif
I'm not in love with the dependency idea, though it's not that big of a deal
for me.
However, I think you will want to get some of the performance patched (e.g.
LUCENE-843) in first, so you can compare the latest and greatest version of
Lucene with your Javalutionized version. From what I gathe
Yes, I believe enough in this approach to try it. I'm already starting
to play with it. I took the current trunk and I'm starting to play with
it. That begin said, I'm quite busy right now so I can't promise any
steady progress. Also, I won't apply patches that are already in JIRA,
so the numbe
"Marvin Humphrey" <[EMAIL PROTECTED]> wrote:
> > (I think for KS you "add" a previous segment not that
> > differently from how you "add" a document)?
>
> Yeah. KS has to decompress and serialize posting content, which sux.
>
> The one saving grace is that with the Fibonacci merge schedule and
What Mike said. Without seeing the Javalutionized Lucene in action we won't
get very far.
jean-Philippe, are you interested in making the changes to Lucene and showing
the performance improvement?
Note that you can use the super-nice and easy to use contrib/benchmark to
compare the "vanilla Luc
"Marvin Humphrey" <[EMAIL PROTECTED]> wrote:
>
> On Apr 5, 2007, at 8:54 AM, Michael McCandless wrote:
>
> > So you basically do not "de-dup" by field+term on your first pass
> > through the tokens in the doc (which is "roughly" what that hash
> > does). Instead, append all tokens in an array, t
"Paul Elschot" <[EMAIL PROTECTED]> wrote:
> At revision 525912:
>
> [junit] Testsuite: org.apache.lucene.index.TestIndexWriter
> [junit] Tests run: 16, Failures: 1, Errors: 0, Time elapsed: 52.161
> sec
> [junit]
> [junit] Testcase:
> testAddIndexOnDiskFull(org.apache.lucene.
On Apr 5, 2007, at 3:58 AM, Michael McCandless wrote:
Marvin do you have any sense of what the equivalent cost is
in KS
It's big. I don't have any good optimizations to suggest in this area.
(I think for KS you "add" a previous segment not that
differently from how you "add" a document)?
On 4/4/07, Jean-Philippe Robichaud <[EMAIL PROTECTED]> wrote:
I understand your concerns!
I was a little skeptical at the beginning. But even with the 1.5 jvm,
the improvements still holds.
Lucene creates a lots of "garbage" (strings, tokens, ...) either at
index time or query time. While the
Nothing fancy - Eclipse. It flagged it, I removed it, nothing "turned red"
indicating everything still compiled, unit tests still passed, committed.
If I recall correctly, one has to configure Eclipse to alert you to unused
variables, methods, and such, and I have that turned on.
Otis
. . . .
At revision 525912:
[junit] Testsuite: org.apache.lucene.index.TestIndexWriter
[junit] Tests run: 16, Failures: 1, Errors: 0, Time elapsed: 52.161 sec
[junit]
[junit] Testcase:
testAddIndexOnDiskFull(org.apache.lucene.index.TestIndexWriter): FAILED
[junit] max free Directory
On Apr 5, 2007, at 8:54 AM, Michael McCandless wrote:
So you basically do not "de-dup" by field+term on your first pass
through the tokens in the doc (which is "roughly" what that hash
does). Instead, append all tokens in an array, then sort first by
field+text and second by position? This is
[
https://issues.apache.org/jira/browse/LUCENE-856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12487049
]
Michael McCandless commented on LUCENE-856:
---
OK I re-ran the above test (10 MM docs @ ~5,500 bytes plain te
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
Hi Eric,
>
> On Apr 4, 2007, at 4:33 PM, Otis Gospodnetic wrote:
>> Eh, missing Jars in the Maven repo again. Why does this always get
>> dropped?
>
> Because none of us Lucene committers care much about Maven? :)
Its okay for you personally. And n
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
> Jörg,
Hi Otis,
> Since you offered to help - please see
> https://issues.apache.org/jira/browse/LUCENE-622 .
> lucene-core POM is there for 2.1.0, but if you need POMs for contrib/*,
> please attach them to that issue. We have Jars, obviously,
>
"Marvin Humphrey" <[EMAIL PROTECTED]> wrote:
>
> On Apr 5, 2007, at 3:58 AM, Michael McCandless wrote:
>
> > The one thing that still baffles me is: I can't get a persistent
> > Posting hash to be any faster.
>
> Don't use a hash, then. :)
>
> KS doesn't.
>
>* Give Token a "position" memb
[
https://issues.apache.org/jira/browse/LUCENE-622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12487030
]
Jörg Hohwiller commented on LUCENE-622:
---
If you apply this patch to svn
(http://svn.apache.org/repos/asf/lucen
[
https://issues.apache.org/jira/browse/LUCENE-622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jörg Hohwiller updated LUCENE-622:
--
Attachment: lucene-maven.patch
patch for partial mavenization of lucene
> Provide More of Luce
On Apr 5, 2007, at 3:58 AM, Michael McCandless wrote:
The one thing that still baffles me is: I can't get a persistent
Posting hash to be any faster.
Don't use a hash, then. :)
KS doesn't.
* Give Token a "position" member.
* After you've got accumulated all the Tokens, calculate
po
"eks dev" <[EMAIL PROTECTED]> wrote:
> wow, impressive numbers, congrats !
Thanks! But remember many Lucene apps won't see these speedups since I've
carefully minimized cost of tokenization and cost of document retrieval. I
think for many Lucene apps these are a sizable part of time spend index
wow, impressive numbers, congrats !
- Original Message
From: Michael McCandless (JIRA) <[EMAIL PROTECTED]>
To: java-dev@lucene.apache.org
Sent: Thursday, 5 April, 2007 3:22:32 PM
Subject: [jira] Commented: (LUCENE-843) improve how IndexWriter uses RAM to
buffer added documents
[
h
[
https://issues.apache.org/jira/browse/LUCENE-843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12486942
]
Michael McCandless commented on LUCENE-843:
---
OK I ran old (trunk) vs new (this patch) with increasing RAM
[
https://issues.apache.org/jira/browse/LUCENE-622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jörg Hohwiller updated LUCENE-622:
--
Attachment: lucene-highlighter-2.0.0.pom
pom for lucene-highlighter
> Provide More of Lucene F
Once more, now to java-dev instead of to java-commits:
Otis,
Can I ask which tool you used to catch this, and the previous one?
Regards,
Paul Elschot
On Thursday 05 April 2007 03:06, [EMAIL PROTECTED] wrote:
> Author: otis
> Date: Wed Apr 4 18:06:16 2007
> New Revision: 525669
>
> URL: http:
[
https://issues.apache.org/jira/browse/LUCENE-789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Alexey Lef updated LUCENE-789:
--
Attachment: TestMultiSearcherSimilarity.java
Attached unit test
> Custom similarity is ignored when us
I understand your concerns!
I was a little skeptical at the beginning. But even with the 1.5 jvm,
the improvements still holds.
Lucene creates a lots of "garbage" (strings, tokens, ...) either at
index time or query time. While the new garbage collector strategies did
seriously improve since jav
"Ning Li" <[EMAIL PROTECTED]> wrote:
> On 4/4/07, Michael McCandless (JIRA) <[EMAIL PROTECTED]> wrote:
> > Note that for "autoCommit=false", this optimization is somewhat less
> > important, depending on how often you actually close/open a new
> > IndexWriter. In the extreme case, if you open a w
"Marvin Humphrey" <[EMAIL PROTECTED]> wrote:
> On Apr 4, 2007, at 10:05 AM, Michael McCandless wrote:
>
> >> (: Ironically, the numbers for Lucene on that page are a little
> >> better than they should be because of a sneaky bug. I would have
> >> made updating the results a priority if they'd go
47 matches
Mail list logo