Re: publish to maven-repository

2007-04-05 Thread Sami Siren
Joerg Hohwiller wrote: >> When we'll need .sha1 and .md5 files for all pushed Jars. >> One of the other developers will have to do that, >> as I don't have my PGP set up, >> and hence no key for the KEYS file (if that's needed for the .sha1). > You do not need PGP or something like this for SHA-* o

Re: improve how IndexWriter uses RAM to buffer added documents

2007-04-05 Thread Marvin Humphrey
On Apr 5, 2007, at 5:26 PM, Michael McCandless wrote: What we need to do is cut down on decompression and conflict resolution costs when reading from one segment to another. KS has solved this problem for stored fields. Field defs are global and field values are keyed by name rather than fiel

Re: [jira] Commented: (LUCENE-843) improve how IndexWriter uses RAM to buffer added documents

2007-04-05 Thread Michael McCandless
"Grant Ingersoll" <[EMAIL PROTECTED]> wrote: > > Michael, like everyone else, I am watching this very closely. So far > it sounds great! > > On Apr 5, 2007, at 8:03 PM, Michael McCandless wrote: > > > When I measure "amount of RAM @ flush time", I'm calling > > MemoryMXBean.getHeapMemoryUsage

Re: [jira] Commented: (LUCENE-843) improve how IndexWriter uses RAM to buffer added documents

2007-04-05 Thread Grant Ingersoll
Michael, like everyone else, I am watching this very closely. So far it sounds great! On Apr 5, 2007, at 8:03 PM, Michael McCandless wrote: When I measure "amount of RAM @ flush time", I'm calling MemoryMXBean.getHeapMemoryUsage().getUsed(). So, this measures actual process memory usage w

Re: improve how IndexWriter uses RAM to buffer added documents

2007-04-05 Thread Michael McCandless
"Marvin Humphrey" <[EMAIL PROTECTED]> wrote: > On Apr 5, 2007, at 12:06 PM, Michael McCandless wrote: > > >>> (I think for KS you "add" a previous segment not that > >>> differently from how you "add" a document)? > >> > >> Yeah. KS has to decompress and serialize posting content, which sux. > >

Re: [jira] Commented: (LUCENE-843) improve how IndexWriter uses RAM to buffer added documents

2007-04-05 Thread Michael McCandless
Hi Otis! "Otis Gospodnetic" <[EMAIL PROTECTED]> wrote: > You talk about a RAM buffer from 1MB - 96MB, but then you have the amount > of RAM @ flush time (e.g. Avg RAM used (MB) @ flush: old34.5; new > 3.4 [ 90.1% less]). > > I don't follow 100% of what you are doing in LUCENE-843, so

Re: [jira] Commented: (LUCENE-843) improve how IndexWriter uses RAM to buffer added documents

2007-04-05 Thread Michael McCandless
"Mike Klaas" <[EMAIL PROTECTED]> wrote: > On 4/5/07, Chris Hostetter <[EMAIL PROTECTED]> wrote: > > > > : Thanks! But remember many Lucene apps won't see these speedups since I've > > : carefully minimized cost of tokenization and cost of document retrieval. > > I > > : think for many Lucene ap

Re: [jira] Resolved: (LUCENE-796) Change Visibility of fields[] in MultiFieldQueryParser

2007-04-05 Thread Mike Klaas
On 4/4/07, Otis Gospodnetic (JIRA) <[EMAIL PROTECTED]> wrote: [ https://issues.apache.org/jira/browse/LUCENE-796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Otis Gospodnetic resolved LUCENE-796. - Resolution: Fixed Makes s

[jira] Commented: (LUCENE-857) Remove BitSet caching from QueryFilter

2007-04-05 Thread Hoss Man (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12487116 ] Hoss Man commented on LUCENE-857: - >From email since i didn't notice Otis opened this issue already... Date: Thu, 5

Re: [jira] Commented: (LUCENE-843) improve how IndexWriter uses RAM to buffer added documents

2007-04-05 Thread Mike Klaas
On 4/5/07, Chris Hostetter <[EMAIL PROTECTED]> wrote: : Thanks! But remember many Lucene apps won't see these speedups since I've : carefully minimized cost of tokenization and cost of document retrieval. I : think for many Lucene apps these are a sizable part of time spend indexing. true, bu

[jira] Commented: (LUCENE-855) MemoryCachedRangeFilter to boost performance of Range queries

2007-04-05 Thread Matt Ericson (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12487108 ] Matt Ericson commented on LUCENE-855: - I am almost done with my patch and I wanted to test it against this patch

Re: [jira] Commented: (LUCENE-843) improve how IndexWriter uses RAM to buffer added documents

2007-04-05 Thread Chris Hostetter
: Thanks! But remember many Lucene apps won't see these speedups since I've : carefully minimized cost of tokenization and cost of document retrieval. I : think for many Lucene apps these are a sizable part of time spend indexing. true, but as long as the changes you are making has no impact on

[jira] Updated: (LUCENE-584) Decouple Filter from BitSet

2007-04-05 Thread Otis Gospodnetic (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Otis Gospodnetic updated LUCENE-584: Attachment: bench-diff.txt Perhaps I did something wrong with the benchmark, but I didn't g

Re: Caching in QueryFilter - why?

2007-04-05 Thread Chris Hostetter
: Since caching is built into the public BitSet bits(IndexReader reader) : method, I don't see a way to deprecate that, which means I'll just cut : it out and document it in CHANGES.txt. Anyone who wants QueryFilter : caching will be able to get the caching back by wrapping the QueryFilter : in y

Re: improve how IndexWriter uses RAM to buffer added documents

2007-04-05 Thread Marvin Humphrey
On Apr 5, 2007, at 12:06 PM, Michael McCandless wrote: (I think for KS you "add" a previous segment not that differently from how you "add" a document)? Yeah. KS has to decompress and serialize posting content, which sux. The one saving grace is that with the Fibonacci merge schedule and th

[jira] Updated: (LUCENE-857) Remove BitSet caching from QueryFilter

2007-04-05 Thread Otis Gospodnetic (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Otis Gospodnetic updated LUCENE-857: Attachment: LUCENE-857.patch QueryFilter without caching. I'll commit it tomorrow (Friday)

Re: Lucene and Javolution: A good mix ?

2007-04-05 Thread Grant Ingersoll
I'm not saying I'm against it, but one of the things that makes Lucene so great is it's lack of dependencies in the core. It isn't necessarily a slippery slope, either, if we do add one dependency. Javolution is BSD license, AFAICT. I don't know if that is a good or bad license as far as

[jira] Created: (LUCENE-857) Remove BitSet caching from QueryFilter

2007-04-05 Thread Otis Gospodnetic (JIRA)
Remove BitSet caching from QueryFilter -- Key: LUCENE-857 URL: https://issues.apache.org/jira/browse/LUCENE-857 Project: Lucene - Java Issue Type: Improvement Reporter: Otis Gospodnetic

Re: Caching in QueryFilter - why?

2007-04-05 Thread Otis Gospodnetic
Sounds like I need to cut that out. Since caching is built into the public BitSet bits(IndexReader reader) method, I don't see a way to deprecate that, which means I'll just cut it out and document it in CHANGES.txt. Anyone who wants QueryFilter caching will be able to get the caching back by

Re: [jira] Commented: (LUCENE-843) improve how IndexWriter uses RAM to buffer added documents

2007-04-05 Thread Otis Gospodnetic
Quick question, Mike: You talk about a RAM buffer from 1MB - 96MB, but then you have the amount of RAM @ flush time (e.g. Avg RAM used (MB) @ flush: old34.5; new 3.4 [ 90.1% less]). I don't follow 100% of what you are doing in LUCENE-843, so could you please explain what these 2 dif

Re: Lucene and Javolution: A good mix ?

2007-04-05 Thread Otis Gospodnetic
I'm not in love with the dependency idea, though it's not that big of a deal for me. However, I think you will want to get some of the performance patched (e.g. LUCENE-843) in first, so you can compare the latest and greatest version of Lucene with your Javalutionized version. From what I gathe

RE: Lucene and Javolution: A good mix ?

2007-04-05 Thread Jean-Philippe Robichaud
Yes, I believe enough in this approach to try it. I'm already starting to play with it. I took the current trunk and I'm starting to play with it. That begin said, I'm quite busy right now so I can't promise any steady progress. Also, I won't apply patches that are already in JIRA, so the numbe

Re: improve how IndexWriter uses RAM to buffer added documents

2007-04-05 Thread Michael McCandless
"Marvin Humphrey" <[EMAIL PROTECTED]> wrote: > > (I think for KS you "add" a previous segment not that > > differently from how you "add" a document)? > > Yeah. KS has to decompress and serialize posting content, which sux. > > The one saving grace is that with the Fibonacci merge schedule and

Re: Lucene and Javolution: A good mix ?

2007-04-05 Thread Otis Gospodnetic
What Mike said. Without seeing the Javalutionized Lucene in action we won't get very far. jean-Philippe, are you interested in making the changes to Lucene and showing the performance improvement? Note that you can use the super-nice and easy to use contrib/benchmark to compare the "vanilla Luc

Re: Eliminate postings hash (was Re: improve how IndexWriter uses RAM...)

2007-04-05 Thread Michael McCandless
"Marvin Humphrey" <[EMAIL PROTECTED]> wrote: > > On Apr 5, 2007, at 8:54 AM, Michael McCandless wrote: > > > So you basically do not "de-dup" by field+term on your first pass > > through the tokens in the doc (which is "roughly" what that hash > > does). Instead, append all tokens in an array, t

Re: TestIndexWriter.testAddIndexOnDiskFull failed

2007-04-05 Thread Michael McCandless
"Paul Elschot" <[EMAIL PROTECTED]> wrote: > At revision 525912: > > [junit] Testsuite: org.apache.lucene.index.TestIndexWriter > [junit] Tests run: 16, Failures: 1, Errors: 0, Time elapsed: 52.161 > sec > [junit] > [junit] Testcase: > testAddIndexOnDiskFull(org.apache.lucene.

Re: improve how IndexWriter uses RAM to buffer added documents

2007-04-05 Thread Marvin Humphrey
On Apr 5, 2007, at 3:58 AM, Michael McCandless wrote: Marvin do you have any sense of what the equivalent cost is in KS It's big. I don't have any good optimizations to suggest in this area. (I think for KS you "add" a previous segment not that differently from how you "add" a document)?

Re: Lucene and Javolution: A good mix ?

2007-04-05 Thread Mike Klaas
On 4/4/07, Jean-Philippe Robichaud <[EMAIL PROTECTED]> wrote: I understand your concerns! I was a little skeptical at the beginning. But even with the 1.5 jvm, the improvements still holds. Lucene creates a lots of "garbage" (strings, tokens, ...) either at index time or query time. While the

Re: svn commit: r525669 - /lucene/java/trunk/src/java/org/apache/lucene/search/BooleanScorer.java

2007-04-05 Thread Otis Gospodnetic
Nothing fancy - Eclipse. It flagged it, I removed it, nothing "turned red" indicating everything still compiled, unit tests still passed, committed. If I recall correctly, one has to configure Eclipse to alert you to unused variables, methods, and such, and I have that turned on. Otis . . . .

TestIndexWriter.testAddIndexOnDiskFull failed

2007-04-05 Thread Paul Elschot
At revision 525912: [junit] Testsuite: org.apache.lucene.index.TestIndexWriter [junit] Tests run: 16, Failures: 1, Errors: 0, Time elapsed: 52.161 sec [junit] [junit] Testcase: testAddIndexOnDiskFull(org.apache.lucene.index.TestIndexWriter): FAILED [junit] max free Directory

Re: Eliminate postings hash (was Re: improve how IndexWriter uses RAM...)

2007-04-05 Thread Marvin Humphrey
On Apr 5, 2007, at 8:54 AM, Michael McCandless wrote: So you basically do not "de-dup" by field+term on your first pass through the tokens in the doc (which is "roughly" what that hash does). Instead, append all tokens in an array, then sort first by field+text and second by position? This is

[jira] Commented: (LUCENE-856) Optimize segment merging

2007-04-05 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12487049 ] Michael McCandless commented on LUCENE-856: --- OK I re-ran the above test (10 MM docs @ ~5,500 bytes plain te

Re: publish to maven-repository

2007-04-05 Thread Joerg Hohwiller
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hi Eric, > > On Apr 4, 2007, at 4:33 PM, Otis Gospodnetic wrote: >> Eh, missing Jars in the Maven repo again. Why does this always get >> dropped? > > Because none of us Lucene committers care much about Maven? :) Its okay for you personally. And n

Re: publish to maven-repository

2007-04-05 Thread Joerg Hohwiller
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 > Jörg, Hi Otis, > Since you offered to help - please see > https://issues.apache.org/jira/browse/LUCENE-622 . > lucene-core POM is there for 2.1.0, but if you need POMs for contrib/*, > please attach them to that issue. We have Jars, obviously, >

Re: Eliminate postings hash (was Re: improve how IndexWriter uses RAM...)

2007-04-05 Thread Michael McCandless
"Marvin Humphrey" <[EMAIL PROTECTED]> wrote: > > On Apr 5, 2007, at 3:58 AM, Michael McCandless wrote: > > > The one thing that still baffles me is: I can't get a persistent > > Posting hash to be any faster. > > Don't use a hash, then. :) > > KS doesn't. > >* Give Token a "position" memb

[jira] Commented: (LUCENE-622) Provide More of Lucene For Maven

2007-04-05 Thread JIRA
[ https://issues.apache.org/jira/browse/LUCENE-622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12487030 ] Jörg Hohwiller commented on LUCENE-622: --- If you apply this patch to svn (http://svn.apache.org/repos/asf/lucen

[jira] Updated: (LUCENE-622) Provide More of Lucene For Maven

2007-04-05 Thread JIRA
[ https://issues.apache.org/jira/browse/LUCENE-622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jörg Hohwiller updated LUCENE-622: -- Attachment: lucene-maven.patch patch for partial mavenization of lucene > Provide More of Luce

Eliminate postings hash (was Re: improve how IndexWriter uses RAM...)

2007-04-05 Thread Marvin Humphrey
On Apr 5, 2007, at 3:58 AM, Michael McCandless wrote: The one thing that still baffles me is: I can't get a persistent Posting hash to be any faster. Don't use a hash, then. :) KS doesn't. * Give Token a "position" member. * After you've got accumulated all the Tokens, calculate po

Re: [jira] Commented: (LUCENE-843) improve how IndexWriter uses RAM to buffer added documents

2007-04-05 Thread Michael McCandless
"eks dev" <[EMAIL PROTECTED]> wrote: > wow, impressive numbers, congrats ! Thanks! But remember many Lucene apps won't see these speedups since I've carefully minimized cost of tokenization and cost of document retrieval. I think for many Lucene apps these are a sizable part of time spend index

Re: [jira] Commented: (LUCENE-843) improve how IndexWriter uses RAM to buffer added documents

2007-04-05 Thread eks dev
wow, impressive numbers, congrats ! - Original Message From: Michael McCandless (JIRA) <[EMAIL PROTECTED]> To: java-dev@lucene.apache.org Sent: Thursday, 5 April, 2007 3:22:32 PM Subject: [jira] Commented: (LUCENE-843) improve how IndexWriter uses RAM to buffer added documents [ h

[jira] Commented: (LUCENE-843) improve how IndexWriter uses RAM to buffer added documents

2007-04-05 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12486942 ] Michael McCandless commented on LUCENE-843: --- OK I ran old (trunk) vs new (this patch) with increasing RAM

[jira] Updated: (LUCENE-622) Provide More of Lucene For Maven

2007-04-05 Thread JIRA
[ https://issues.apache.org/jira/browse/LUCENE-622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jörg Hohwiller updated LUCENE-622: -- Attachment: lucene-highlighter-2.0.0.pom pom for lucene-highlighter > Provide More of Lucene F

Fwd: Re: svn commit: r525669 - /lucene/java/trunk/src/java/org/apache/lucene/search/BooleanScorer.java

2007-04-05 Thread Paul Elschot
Once more, now to java-dev instead of to java-commits: Otis, Can I ask which tool you used to catch this, and the previous one? Regards, Paul Elschot On Thursday 05 April 2007 03:06, [EMAIL PROTECTED] wrote: > Author: otis > Date: Wed Apr 4 18:06:16 2007 > New Revision: 525669 > > URL: http:

[jira] Updated: (LUCENE-789) Custom similarity is ignored when using MultiSearcher

2007-04-05 Thread Alexey Lef (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Lef updated LUCENE-789: -- Attachment: TestMultiSearcherSimilarity.java Attached unit test > Custom similarity is ignored when us

RE: Lucene and Javolution: A good mix ?

2007-04-05 Thread Jean-Philippe Robichaud
I understand your concerns! I was a little skeptical at the beginning. But even with the 1.5 jvm, the improvements still holds. Lucene creates a lots of "garbage" (strings, tokens, ...) either at index time or query time. While the new garbage collector strategies did seriously improve since jav

Re: [jira] Created: (LUCENE-856) Optimize segment merging

2007-04-05 Thread Michael McCandless
"Ning Li" <[EMAIL PROTECTED]> wrote: > On 4/4/07, Michael McCandless (JIRA) <[EMAIL PROTECTED]> wrote: > > Note that for "autoCommit=false", this optimization is somewhat less > > important, depending on how often you actually close/open a new > > IndexWriter. In the extreme case, if you open a w

Re: improve how IndexWriter uses RAM to buffer added documents

2007-04-05 Thread Michael McCandless
"Marvin Humphrey" <[EMAIL PROTECTED]> wrote: > On Apr 4, 2007, at 10:05 AM, Michael McCandless wrote: > > >> (: Ironically, the numbers for Lucene on that page are a little > >> better than they should be because of a sneaky bug. I would have > >> made updating the results a priority if they'd go