date:20031219

Re: best way of reusing IndexSearcher objects

2003-12-19 Thread Morus Walter

Doug Cutting writes:
 Dror Matalon wrote:
  There are two issues:
  1. Having new searches start using the new index only when it's ready,
  not in a half baked state, which means that you have to synchronize
  the switch from the old index to the new one.
 
 That's true.  If you're doing updates (as opposed to just additions) 
 then you probably want to do something like:
1. keep a single open IndexReader used by all searches
2. Every few minutes, process updates as follows:
  a. open a second IndexReader
  b. delete all documents that will be updated
  c. close this IndexReader, to flush deletions
  d. open an IndexWriter
  e. add all documents that are updated
  f. close the IndexReader
  g. replace the IndexReader used for searches (1, above)
 
Right. As long as you can control the reader instance from the update
process, it's better to do so, instead of checking, if the reader for search
is still up to date in the reader itself.

  2. It's not trivial to figure out when it's safe to discard the old
  index; all existing searches are done with it.
  
  To make things more complicated, the Hits object is dependent on your
  IndexSearcher object, so if you have Hits objects in use you probably
  can't  close your IndexSearcher.
  
  Is this a correct analysis or is there an obvious strategy to work
  around this issue?
 
 Right, you cannot safely close the IndexReader that's being used for 
 searching.  Rather, just drop it on the floor and let it get garbage 
 collected.  Its files will be closed when this happens.  Provided you're 
 not updating more frequently than the garbage collector runs, you should 
 only ever have two IndexReaders open and shouldn't run into file handle 
 issues.
 
I guess the alternative would be to have a reference counting that is
increased whenever a search starts and decreased when the hits object
is no longer used.
You could then set a flag and close the index when the count reaches 0.

Thanks for the comments.
   Morus

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Benchmark (WAS: Indexing Speed: Documents vs. Sentences)

2003-12-19 Thread Jochen Frey

Hello,

Here's is a benchmark. I am not sure if that is proper etiquette,
but I will just paste it into this mail and hope that it gets funneled into
the right channels.

Cheers!
Jochen


benchmark
  ul
  p
  bHardware Environment/bbr/
  liiDedicated machine for indexing/ino, some other work performed on
it. shouldn't influence results much since it's a multiple processor
machine/li
  liiCPU/i2x Intel Xeon 3.05GHz/li
  liiRAM/i4GB/li
  liiDrive configuration/iSCSI/li
  /p
  p
  bSoftware environment/bbr/
  liiJava Version/i1.4.2-b28/li
  liiJava VM/iJava HotSpot Client VM 1.4.2/li
  liiOS Version/iRedhat 8/li
  liiLocation of index/ilocal/li
  /p
  p
  bLucene indexing variables/bbr/
  liiNumber of source documents/i5,000,000/li
  liiTotal filesize of source documents/i40GB/li
  liiAverage filesize of source documents/i8kB/li
  liiSource documents storage location/iDB on remote server/li
  liiFile type of source documents/ipre-parsed HTML/li
  liiParser(s) used, if any/in/a/li
  liiAnalyzer(s) used/iStandardAnalyzer/li
  liiNumber of fields per document/i5/li
  liiType of fields/iactual text is indexed but not stored in lucene
index/li
  liiIndex persistence/i: Where the index is stored, e.g. 
FSDirectory, SqlDirectory, etc/li
  /p
  p
  bFigures/bbr/
  liiTime taken (in ms/s as an average of at least 3 indexing 
runs)/i332 minutes/li
  liiTime taken / 1000 docs indexed/i4 sec/li
  liiMemory consumption/iabout 100MB/li
  /p
  p
  bNotes/bbr/
  liiNotes/iWith the above configuration we pretty consistently
achieve a 250 docs / sec rate
  of indexing. The actual text cannot be retrieved from the index, this
keeps the index
   size down (6.1GB) and increases indexing speed. When the actual documents
are stored in the index
  the rate drops by about 30% to 160 docs / sec./li
  /p
  /ul
/benchmark


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

FW: Indexing Speed: Documents vs. Sentences

2003-12-19 Thread Jochen Frey


Stephane,

The actual indexing is actually less glamorous than it sounds. When
you index 1TB across 10 machines you end up with 100GB on each machine. We
do not merge the indexes either, since we get better speed on indexing as
well as querying when we keep indexes smaller and distributed across
different machines. (But somehow I think that I'll sit down and merge all of
them together and play with it when I get a chance ... 'cause it's cool :-)
I'll keep you posted when it happens).
 
My test set that I am playing with is 40GB, and I just posted a
benchmark.
 
Best,
Jochen

 -Original Message-
 From: Stephane Vaucher [mailto:[EMAIL PROTECTED]
 Sent: Thursday, December 18, 2003 9:01 AM
 To: Lucene Users List; [EMAIL PROTECTED]
 Subject: RE: Indexing Speed: Documents vs. Sentences
 
 Jochen,
 
 If you have a bit of time, could you post some metrics, (as an example,
 you can look at http://jakarta.apache.org/lucene/docs/benchmarks.html). I
 haven't heard of anyone indexing 1TB yet. I'm sure everyone is interested
 in problems you could be facing and we could probably give you some ideas.
 I know (oddly enough) I sometimes wish I had dataset greater than a few M
 docs to experiment with.
 
 cheers,
 sv
 
 On Thu, 18 Dec 2003, Jochen Frey wrote:
 
  Hi,
 
  Yes, this is correct, I am dealing with a few 100GB (close to 1TB).
  I am, however, distributing the data across several machines and then
 merge
  the results from all the machines together (until I find a better 
 faster
  solution).
 
  Cheers!
 
   -Original Message-
   From: Victor Hadianto [mailto:[EMAIL PROTECTED]
   Sent: Wednesday, December 17, 2003 10:50 PM
   To: Lucene Users List
   Subject: Re: Indexing Speed: Documents vs. Sentences
  
Hi,
   
I am using Lucene to index a large number of web pages (a few 100GB)
 and
   the
indexing speed is great.
  
   Jochen .. a few 100 GB? Is this correct?
  
   /victor
  
  
   -
   To unsubscribe, e-mail: [EMAIL PROTECTED]
   For additional commands, e-mail: [EMAIL PROTECTED]
 
 
  -
  To unsubscribe, e-mail: [EMAIL PROTECTED]
  For additional commands, e-mail: [EMAIL PROTECTED]
 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

DoubleMetaphoneQuery

2003-12-19 Thread David Spencer

I've seen discussions about using the double metaphone algorithm with 
Lucene (basically: like soundex, used
to find works that sound similar in English at least) but couldn't find 
an implementation, so I spent
a few minutes and wrote a Query and TermEnum object for this. I may have 
missed the prior art so sorry if I did...

[1] Here are some mail msgs that mention double metaphone wrt Lucene:

http://www.geocrawler.com/archives/3/2626/2000/10/0/4566951/
http://www.geocrawler.com/archives/3/2626/2001/8/50/6382300/
http://www.mail-archive.com/[EMAIL PROTECTED]/msg04648.html
[2] And Phoenix has a double metaphone  Analyzer, but not a Query, which 
I guess is another angle on things:

http://www.tangentum.biz/en/products/phonetix/api/com/tangentum/phonetix/lucene/PhoneticAnalyzer.html

[3] Attached are 2 files (DoubleMetaPhoneQuery and 
DoubleMetaphoneTermEnum) that I think are valid contributions
to the Lucene Sandbox. Hopefully all that has to be done is change the 
package line if the powers that be accept this.

Note: My impl uses the Jakarta CODEC package ( 
http://jakarta.apache.org/commons/codec/ ) for the double metaphone 
algorithm implementation.

Also, any query expansion such as this could exceed the bounds of a 
boolean query, thus BooleanQuery.setMaxClauseCount
may need to be used to avoid an exception.

[4] I've updated my Lucene demo site which has the ~3500 RFCs indexed 
and searchable by Lucene. I added an advanced query
page to try out the DoubleMetaphoneQuery:

It's a few lines down at this URL:

http://www.hostmon.com/rfc/advanced.jsp

[5] Most of the above is redundantly stated here as a kind of perma-link:

http://www.tropo.com/techno/java/lucene/metaphone.html

[6]

While it's easy to write additonal Query classes, I suspect they are a 
kind of dead end and won't really be
used unless they are integrated into the QueryParser - thus one concept 
is that the Lucene syntax should
have some extension mechanism so you can pass a query like 
metaphone::protokal to it and metaphone::
(note the double colons)  would mean to use DoubleMetaphoneQuery for 
this term. Maybe an extensible query parser
should be the subject of another email?

So: let me know if this is useful and plz enter it into the sandbox...

thx,
Dave Spencer








package com.tropo.lucene;

/* 
 * The Apache Software License, Version 1.1
 *
 * Copyright (c) 2001 The Apache Software Foundation.  All rights
 * reserved.
 *
 * Redistribution and use in source and binary forms, with or without
 * modification, are permitted provided that the following conditions
 * are met:
 *
 * 1. Redistributions of source code must retain the above copyright
 *notice, this list of conditions and the following disclaimer.
 *
 * 2. Redistributions in binary form must reproduce the above copyright
 *notice, this list of conditions and the following disclaimer in
 *the documentation and/or other materials provided with the
 *distribution.
 *
 * 3. The end-user documentation included with the redistribution,
 *if any, must include the following acknowledgment:
 *   This product includes software developed by the
 *Apache Software Foundation (http://www.apache.org/).
 *Alternately, this acknowledgment may appear in the software itself,
 *if and wherever such third-party acknowledgments normally appear.
 *
 * 4. The names Apache and Apache Software Foundation and
 *Apache Lucene must not be used to endorse or promote products
 *derived from this software without prior written permission. For
 *written permission, please contact [EMAIL PROTECTED]
 *
 * 5. Products derived from this software may not be called Apache,
 *Apache Lucene, nor may Apache appear in their name, without
 *prior written permission of the Apache Software Foundation.
 *
 * THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESSED OR IMPLIED
 * WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
 * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
 * DISCLAIMED.  IN NO EVENT SHALL THE APACHE SOFTWARE FOUNDATION OR
 * ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
 * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
 * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
 * USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
 * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
 * OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
 * OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
 * SUCH DAMAGE.
 * 
 *
 * This software consists of voluntary contributions made by many
 * individuals on behalf of the Apache Software Foundation.  For more
 * information on the Apache Software Foundation, please see
 * http://www.apache.org/.
 */

import java.io.IOException;

Re: syntax of queries.

2003-12-19 Thread Ernesto De Santis

Erik, Thanks!

The article is very good. thanks.

I have news questions:

 - apiQuery.add(new TermQuery(new Term(contents, dot)), false, true);

new Term(contents, dot)

The Term class, work for only one word?
this is right?
new Term(contents, dot java)
for search for dor OR java in contents.

My problem is that the user, entry a phrase, and i search for any word in a
phrase. No the entire phrase.
I need parse de string?, take word for word and add a TermQuery for each
word?

Bye, Ernesto.




- Original Message -
From: Erik Hatcher [EMAIL PROTECTED]
To: Lucene Users List [EMAIL PROTECTED]
Sent: Saturday, December 13, 2003 4:07 AM
Subject: Re: syntax of queries.


Try out the toString(fieldName) trick on your Query instances and
pair them up with what you have below - this will be quite insightful
for the issue - i promise!  :)

Look at my QueryParser article and search for toString on that page:
http://today.java.net/pub/a/today/2003/11/07/QueryParserRules.html

On Friday, December 12, 2003, at 10:38  PM, Ernesto De Santis wrote:

 Thanks Otis, I don´t resolve my problem.

 I see the Query sintaxis page, and the FAQ´s search section.
 I proof too many alternatives:

 body:(imprimir teclado) title:base = 451 hits

 body:(imprimir teclado)^5.1 title:base = 248 hits (* under 451)

 body:(imprimir teclado^5.1) title:base = 451 hits - first document:
 3287.html

 body:(imprimir^5.1 teclado) title:base = 451 hits - first document:
 1545.html

 conclusion:
 I think that the boost is only applicable for one word. not to
 parenthesys,
 and not to field.

 I wanna make the boost applicable to field.
 For me, is more important a hit in title that in body, for example.

 In the FAQ´s search secction:

 Clause  ::=  [ Modifier ] [ FieldName ':' ] BasicClause  [ Boost ]
 BasicClause ::= ( Term | Phrase | | PrefixQuery '(' Query ')'

 then, in my example BasicClause=(imprimir teclado) and Boost ^5.1.
 but not work.

 Regards, Ernesto.

 - Original Message -
 From: Otis Gospodnetic [EMAIL PROTECTED]
 To: Lucene Users List [EMAIL PROTECTED]; Ernesto De
 Santis [EMAIL PROTECTED]
 Sent: Friday, December 12, 2003 7:18 PM
 Subject: Re: syntax of queries.


 Maybe it's the spaces after title:?
 Try title:importar ... instead.

 Maybe it's the spaces before ^5.0?
 Try title:importar^5 instead

 You shouldn't need the parentheses in this case either, I believe.

 See Query Synax page on Lucene's site.

 Otis


 --- Ernesto De Santis [EMAIL PROTECTED] wrote:
 Hello

 I not undertanding the syntax of queries.
 I search with this string:

 title: (importar) ^5.0 OR title: (arquivos)

 return 6 hits.

 and with this:

 title: (arquivos) OR title: (importar) ^5.0

 27 hits.

 why?
 in the first, I think that work like AND, but, why? :-(

 Regards, Ernesto.



 __
 Do you Yahoo!?
 New Yahoo! Photos - easier uploading and sharing.
 http://photos.yahoo.com/

 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]





 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]





-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Sentence Endings: IndexWriter.maxFieldLength and Token.setPositionIncrement()

2003-12-19 Thread Jochen Frey

Hi!

I hope this is the right forum for this post.

I was wondering if other people would consider this a bug (it might be a
feature and I am missing the point of it):
 
.The default IndexWriter.maxFieldLength is 10,000.
.The point of maxFieldLength is to limit memory usage.
.The current position (which is compared against maxFieldLength) is
essentially determined by the sum of the PositionIncrements of all Tokens
added to the index.

Why does this matter? If you have setPositionIncrement(1000) for sentence
ending tokens, only the first 10 sentences of your document will be indexed,
the rest will not be searchable (since position will be greater than
10,000).

Why I think this is a bug: If you skip 1000 positions, no memory is required
by the DocumentWriter for the empty 999 positions, thus not using
maxFieldLength to limit memory but simply available positions.

I suggest that there be a counter in DocumentWriter, that counts the actual
number of tokens in the postingTable (probably in
DocumentWriter.addPosition), so that maxFieldLength is compared against the
number actual entries, not the number of actual entries and the number
skipped entries.

Best,
Jochen

PS: Please let me know if this is the wrong forum for this so I'll post to
the right one next time.



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: DoubleMetaphoneQuery

2003-12-19 Thread Erik Hatcher

Interestingly, I used a MetaphoneAnalyzer as an example in our book in  
progress.  I'm curious if you have measured performance with doing it  
at analysis time versus query time.  Enumerating all terms at query  
time is basically the same as doing a WildcardQuery or FuzzyQuery and  
involves a lot of work - although certainly on moderate size indexes it  
is probably not too painful.

Nice work on this!

I'd be happy to add this to the sandbox, and will do so in the next few  
days hopefully.

	Erik

On Friday, December 19, 2003, at 02:51  PM, David Spencer wrote:

I've seen discussions about using the double metaphone algorithm with  
Lucene (basically: like soundex, used
to find works that sound similar in English at least) but couldn't  
find an implementation, so I spent
a few minutes and wrote a Query and TermEnum object for this. I may  
have missed the prior art so sorry if I did...

[1] Here are some mail msgs that mention double metaphone wrt Lucene:

http://www.geocrawler.com/archives/3/2626/2000/10/0/4566951/
http://www.geocrawler.com/archives/3/2626/2001/8/50/6382300/
http://www.mail-archive.com/[EMAIL PROTECTED]/ 
msg04648.html

[2] And Phoenix has a double metaphone  Analyzer, but not a Query,  
which I guess is another angle on things:

http://www.tangentum.biz/en/products/phonetix/api/com/tangentum/ 
phonetix/lucene/PhoneticAnalyzer.html

[3] Attached are 2 files (DoubleMetaPhoneQuery and  
DoubleMetaphoneTermEnum) that I think are valid contributions
to the Lucene Sandbox. Hopefully all that has to be done is change the  
package line if the powers that be accept this.

Note: My impl uses the Jakarta CODEC package (  
http://jakarta.apache.org/commons/codec/ ) for the double metaphone  
algorithm implementation.

Also, any query expansion such as this could exceed the bounds of a  
boolean query, thus BooleanQuery.setMaxClauseCount
may need to be used to avoid an exception.

[4] I've updated my Lucene demo site which has the ~3500 RFCs indexed  
and searchable by Lucene. I added an advanced query
page to try out the DoubleMetaphoneQuery:

It's a few lines down at this URL:

http://www.hostmon.com/rfc/advanced.jsp

[5] Most of the above is redundantly stated here as a kind of  
perma-link:

http://www.tropo.com/techno/java/lucene/metaphone.html

[6]

While it's easy to write additonal Query classes, I suspect they are a  
kind of dead end and won't really be
used unless they are integrated into the QueryParser - thus one  
concept is that the Lucene syntax should
have some extension mechanism so you can pass a query like  
metaphone::protokal to it and metaphone::
(note the double colons)  would mean to use DoubleMetaphoneQuery for  
this term. Maybe an extensible query parser
should be the subject of another email?

So: let me know if this is useful and plz enter it into the sandbox...

thx,
Dave Spencer








package com.tropo.lucene;

/* 
 * The Apache Software License, Version 1.1
 *
 * Copyright (c) 2001 The Apache Software Foundation.  All rights
 * reserved.
 *
 * Redistribution and use in source and binary forms, with or without
 * modification, are permitted provided that the following conditions
 * are met:
 *
 * 1. Redistributions of source code must retain the above copyright
 *notice, this list of conditions and the following disclaimer.
 *
 * 2. Redistributions in binary form must reproduce the above copyright
 *notice, this list of conditions and the following disclaimer in
 *the documentation and/or other materials provided with the
 *distribution.
 *
 * 3. The end-user documentation included with the redistribution,
 *if any, must include the following acknowledgment:
 *   This product includes software developed by the
 *Apache Software Foundation (http://www.apache.org/).
 *Alternately, this acknowledgment may appear in the software  
itself,
 *if and wherever such third-party acknowledgments normally appear.
 *
 * 4. The names Apache and Apache Software Foundation and
 *Apache Lucene must not be used to endorse or promote products
 *derived from this software without prior written permission. For
 *written permission, please contact [EMAIL PROTECTED]
 *
 * 5. Products derived from this software may not be called Apache,
 *Apache Lucene, nor may Apache appear in their name, without
 *prior written permission of the Apache Software Foundation.
 *
 * THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESSED OR IMPLIED
 * WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
 * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
 * DISCLAIMED.  IN NO EVENT SHALL THE APACHE SOFTWARE FOUNDATION OR
 * ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
 * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
 * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
 * USE, DATA, OR PROFITS; OR

Re: syntax of queries.

2003-12-19 Thread Erik Hatcher

On Friday, December 19, 2003, at 05:42  PM, Ernesto De Santis wrote:
I have news questions:

 - apiQuery.add(new TermQuery(new Term(contents, dot)), false, 
true);

new Term(contents, dot)

The Term class, work for only one word?
Careful with terminology here.  It works for only one term.  What is 
a term?  That all depends on what happened during analysis.  Generally 
speaking, though, word is the right generalization for a term - but 
we have to be careful technically speaking.

new Term(contents, dot java)
for search for dor OR java in contents.
Wrong.  When constructing a query through the API, if you want an OR 
you'd need to add two TermQuery's to a BooleanQuery, one for each 
word and make them not required.

My problem is that the user, entry a phrase, and i search for any word 
in a
phrase. No the entire phrase.
I need parse de string?, take word for word and add a TermQuery for 
each
word?
Yes.

If you have a text string of multiple terms you want added to a boolean 
query you could do so programatically by analyzing the string as I do 
in my article in AnalyzerDemo or by parsing it through some other 
mechanism than QueryParser and add each as a TermQuery.  You're on 
track!

	Erik

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Sentence Endings: IndexWriter.maxFieldLength and Token.setPositionIncrement()

2003-12-19 Thread Doug Cutting

Jochen,

Someone else recently made a similar, reasonable complaint.  I agree 
that this should be fixed.  The fastest way to get it fixed would be to 
submit a patch to lucene-dev, with a test case, etc.

Doug

Jochen Frey wrote:
Hi!

I hope this is the right forum for this post.

I was wondering if other people would consider this a bug (it might be a
feature and I am missing the point of it):
 
.The default IndexWriter.maxFieldLength is 10,000.
.The point of maxFieldLength is to limit memory usage.
.The current position (which is compared against maxFieldLength) is
essentially determined by the sum of the PositionIncrements of all Tokens
added to the index.

Why does this matter? If you have setPositionIncrement(1000) for sentence
ending tokens, only the first 10 sentences of your document will be indexed,
the rest will not be searchable (since position will be greater than
10,000).
Why I think this is a bug: If you skip 1000 positions, no memory is required
by the DocumentWriter for the empty 999 positions, thus not using
maxFieldLength to limit memory but simply available positions.
I suggest that there be a counter in DocumentWriter, that counts the actual
number of tokens in the postingTable (probably in
DocumentWriter.addPosition), so that maxFieldLength is compared against the
number actual entries, not the number of actual entries and the number
skipped entries.
Best,
Jochen
PS: Please let me know if this is the wrong forum for this so I'll post to
the right one next time.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Lucene and JavaHelp

2003-12-19 Thread Mark R. Diggory

Has anyone thought about or used Lucene to build an indexed, searchable 
help system? Either Server or Application Based?

-M.

--
Mark Diggory
Software Developer
Harvard MIT Data Center
http://osprey.hmdc.harvard.edu
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: best way of reusing IndexSearcher objects

Benchmark (WAS: Indexing Speed: Documents vs. Sentences)

FW: Indexing Speed: Documents vs. Sentences

DoubleMetaphoneQuery

Re: syntax of queries.

Sentence Endings: IndexWriter.maxFieldLength and Token.setPositionIncrement()

Re: DoubleMetaphoneQuery

Re: syntax of queries.

Re: Sentence Endings: IndexWriter.maxFieldLength and Token.setPositionIncrement()

Lucene and JavaHelp

10 matches

Site Navigation

Mail list logo

Footer information