(TermInfosReader, SegmentTermEnum) Out Of Memory Exception patch

2005-09-20 Thread kieran
We've been experiencing terrible memory problems on our production 
search server, running lucene (1.4.3).


Our live app regularly opens new indexes and, in doing so, releases old 
IndexReaders for garbage collection.


But...there appears to be a memory leak in 
org.apache.lucene.index.TermInfosReader.java.
Under certain conditions (possibly related to JVM version, although I've 
personally observed it under both linux JVM 1.4.2_06, and 1.5.0_03, and 
SUNOS JVM 1.4.1) the ThreadLocal member variable, "enumerators" doesn't 
get garbage-collected when the TermInfosReader object is gc-ed.


Looking at the code in TermInfosReader.java, there's no reason why it 
_shouldn't_ be gc-ed, so I can only presume (and I've seen this 
suggested elsewhere) that there could be a bug in the garbage collector 
of some JVMs.


I've seen this problem briefly discussed; in particular at the following 
URL:

  http://java2.5341.com/msg/85821.html
The patch that Doug recommended, which is included in lucene-1.4.3 
doesn't work in our particular circumstances. Doug's patch only clears 
the ThreadLocal variable for the thread running the finalizer (my 
knowledge of java breaks down here - I'm not sure which thread actually 
runs the finalizer). In our situation, the TermInfosReader is 
(potentially) used by more than one thread, meaning that Doug's patch 
_doesn't_ allow the affected JVMs to correctly collect garbage.


So...I've devised a simple patch which, from my observations on linux 
JVMs 1.4.2_06, and 1.5.0_03, fixes this problem.


I've thought of submitting this to the project as a patch, but the 
lucene bugzilla account is disabled at the moment so...see the diff, 
below, and the attached, patched, file:


Kieran

21a22
> import java.util.Hashtable;
32c33
<   private ThreadLocal enumerators = new ThreadLocal();
---
>   private final Hashtable enumeratorsByThread = new Hashtable();
63c64
< SegmentTermEnum termEnum = (SegmentTermEnum)enumerators.get();
---
> SegmentTermEnum termEnum = 
(SegmentTermEnum)enumeratorsByThread.get(Thread.currentThread());

66c67
<   enumerators.set(termEnum);
---
>   enumeratorsByThread.put(Thread.currentThread(), termEnum);
197a199,208
>   }
>
>   /* some jvms might have trouble gc-ing enumeratorsByThread */
>   protected void finalize() throws Throwable {
> try {
> // make sure gc can clear up.
> enumeratorsByThread.clear();
> } finally {
> super.finalize();
> }

package org.apache.lucene.index;

/**
 * Copyright 2004 The Apache Software Foundation
 *
 * Licensed under the Apache License, Version 2.0 (the "License");
 * you may not use this file except in compliance with the License.
 * You may obtain a copy of the License at
 *
 * http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */

import java.io.IOException;

import org.apache.lucene.store.Directory;
import java.util.Hashtable;

/** This stores a monotonically increasing set of  pairs in a
 * Directory.  Pairs are accessed either by Term or by ordinal position the
 * set.  */

final class TermInfosReader {
  private Directory directory;
  private String segment;
  private FieldInfos fieldInfos;

  private final Hashtable enumeratorsByThread = new Hashtable();
  private SegmentTermEnum origEnum;
  private long size;

  TermInfosReader(Directory dir, String seg, FieldInfos fis)
   throws IOException {
directory = dir;
segment = seg;
fieldInfos = fis;

origEnum = new SegmentTermEnum(directory.openFile(segment + ".tis"),
   fieldInfos, false);
size = origEnum.size;
readIndex();
  }

  public int getSkipInterval() {
return origEnum.skipInterval;
  }

  final void close() throws IOException {
if (origEnum != null)
  origEnum.close();
  }

  /** Returns the number of term/value pairs in the set. */
  final long size() {
return size;
  }

  private SegmentTermEnum getEnum() {
SegmentTermEnum termEnum = (SegmentTermEnum)enumeratorsByThread.get(Thread.currentThread());
if (termEnum == null) {
  termEnum = terms();
  enumeratorsByThread.put(Thread.currentThread(), termEnum);
}
return termEnum;
  }

  Term[] indexTerms = null;
  TermInfo[] indexInfos;
  long[] indexPointers;

  private final void readIndex() throws IOException {
SegmentTermEnum indexEnum =
  new SegmentTermEnum(directory.openFile(segment + ".tii"),
			  fieldInfos, true);
try {
  int indexSize = (int)indexEnum.size;

  indexTerms = new Term[indexSize];
  indexInfos = new TermInfo[indexSize];
  indexPointers = new long[indexSize];

  for (int i = 0; indexEnum.next(); i++) {
	indexTerms[

Re: "Best-practice" in a web application

2005-09-20 Thread Erik Hatcher


On Sep 20, 2005, at 2:24 AM, Magne Skjeret wrote:

I am using lucene to index all my data, and it is working just great.

I will now add search to a web application, so the index can  
actually be

used, not just sit there.


Good idea... it'd be a shame for the index to sit unsearched!  :)

1. Can a search be performed while the index is beeing updated (add/ 
delete)?


Yes, no problem with searches occurring during add/delete  
operations.  The searches will not see new documents and still return  
deleted documents until a new IndexSearcher instance is used, though.



2. Should I create a IndexSearcher for each search?
   Or should I have 1 IndexSearcher for everyone? If, it is  
threadsafe,

or do I queue? Or should I create a pool of IndexSearchers?


For best performance, use a single IndexSearcher instance across your  
entire application.



3. Do I need to close the IndexSearcher while or after a index update
operation?


When you want index changes to be visible to searches, drop the old  
IndexSearcher instance and instantiate a new one.


Erik


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Relevance Feedback

2005-09-20 Thread Grant Ingersoll
I have implemented (more or less)  Rochio rel. feedback.  You have to
make some minor modifications b/c Lucene doesn't support boost values
less than 0, but other than it is pretty straightforward using the
TermVector support.  At feedback time, get the TermVector for the top X
documents and construct a new query using the frequencies of the terms
for boosting (maybe multiplying by the alpha, beta, gamma parameters if
you want).  I seem to recall others posting that they have implemented
similar things, so you may want to search the archive of this list.

I used the description in "Modern Information Retrieval" by Baeza-Yates
and Ribeiro-Neto for the algorithm.

>>> [EMAIL PROTECTED] 09/19/05 3:46 PM >>>
Does anyone have experiences with relevance feedback and lucene or
just
knows some good websites?
thx
stefan

-
To unsubscribe, e-mail: [EMAIL PROTECTED] 
For additional commands, e-mail: [EMAIL PROTECTED] 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: live update of index used by Tomcat

2005-09-20 Thread Vanlerberghe, Luc
You should keep your IndexReader open until the merge has finished, but
also until there are no more Hits Objects that depend on it (tricky in
multithreaded environments like tomcat).

The fact that the files cannot be deleted immediately after the merge is
no problem.  The filenames will be stored in the file 'deletable' and
Lucene will attempt to delete them on the next IndexWriter action (or so
I assume).
New IndexReader instances will only open the new files.

I solved the problem like this:

Objects that need the results of a search get an IndexSearcher instance
from a factory, do the search, use the Hits Object and then close the
IndexSearcher instance. That is, they behave as if they are using a
single new IndexReader each time.

The factory doesn't pass a Lucene IndexSearcher Object, but an instance
of a class called "DelayCloseIndexSearcher" that extends IndexSearcher.
The same instance is passed to all callers until a change in index
version is detected (using IndexReader.getCurrentVersion()), but each
time its usage count is incremented.
DelayCloseIndexSearcher overrides the close() method so it does not
close immediately: it only decrements the usage counter.  Only when the
factory signals that it has become obsolete (by calling mayClose()) and
after the usage count has dropped to 0, it calls super.close();

In my current implementation the factory checks if the index has changed
every time an IndexSearcher is required, but this could be changed to
happen only every few seconds, or even in a separate thread so no
searchers are blocked while the next DelayCloseIndexSearcher instance is
opening...

I can post the code and testcases if you're interested.

Luc


-Original Message-
From: Daniel Naber [mailto:[EMAIL PROTECTED] 
Sent: maandag 19 september 2005 20:49
To: java-user@lucene.apache.org
Subject: live update of index used by Tomcat

Hi,

I need to merge two indexes into one which is accessed by a Searcher in 
Tomcat. Tomcat keeps the searcher (or reader) open for good performance.

However, on Windows you cannot delete a file when it's opened for
reading, 
so I cannot do the merge while Tomcat is running and the reader is open.

But I don't want to shut down Tomcat or close the reader (not even for
10 
seconds) because the search needs to be up all the time. Does anybody
have 
a clever solution for this problem?

Regards
 Daniel

-- 
http://www.danielnaber.de

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



date keyword

2005-09-20 Thread haipeng du
I use lucene to index a key word with date object. When I search document, 
how could I process the searching result for that field? For example:
index date with 
Field field = Field.Keyword("created", new Date);
.
..
When I search that, I get that field back
Field f = doc.getField("created").
but value of that field is just like 
0edtel52h
How could I process that to get Date object back?
Thanks a lot. 

-- 
Haipeng Du
Software Engineer
Comphealth, 
Salt Lake City


RE: date keyword

2005-09-20 Thread Mordo, Aviran (EXP N-NANNATEK)
Lucene only uses strings to store and search, you should convert any
objects to string.
For dates you have a special Date field that you should use which
converts dated to a searchable strings

Aviran
http://www.aviransplace.com 

-Original Message-
From: haipeng du [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, September 20, 2005 10:40 AM
To: lucene-dev@jakarta.apache.org; lucene-user@jakarta.apache.org
Subject: date keyword

I use lucene to index a key word with date object. When I search
document, how could I process the searching result for that field? For
example:
index date with
Field field = Field.Keyword("created", new Date); .
..
When I search that, I get that field back Field f =
doc.getField("created").
but value of that field is just like
0edtel52h
How could I process that to get Date object back?
Thanks a lot. 

--
Haipeng Du
Software Engineer
Comphealth,
Salt Lake City


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: date keyword

2005-09-20 Thread haipeng du
I understand that. But from Field API, there is a method of Keyword which 
accepts Date object as value. When I use that method to index, I can not get 
real date back. I also use other method to pass a string value to index. 
That works great. Does that mean I can not use that method to index keyword?
Thanks a lot.
Haipeng

On 9/20/05, Mordo, Aviran (EXP N-NANNATEK) <[EMAIL PROTECTED]> wrote:
> 
> Lucene only uses strings to store and search, you should convert any
> objects to string.
> For dates you have a special Date field that you should use which
> converts dated to a searchable strings
> 
> Aviran
> http://www.aviransplace.com
> 
> -Original Message-
> From: haipeng du [mailto:[EMAIL PROTECTED]
> Sent: Tuesday, September 20, 2005 10:40 AM
> To: lucene-dev@jakarta.apache.org; lucene-user@jakarta.apache.org
> Subject: date keyword
> 
> I use lucene to index a key word with date object. When I search
> document, how could I process the searching result for that field? For
> example:
> index date with
> Field field = Field.Keyword("created", new Date); .
> ..
> When I search that, I get that field back Field f =
> doc.getField("created").
> but value of that field is just like
> 0edtel52h
> How could I process that to get Date object back?
> Thanks a lot.
> 
> --
> Haipeng Du
> Software Engineer
> Comphealth,
> Salt Lake City
> 
> 


-- 
Haipeng Du
Software Engineer
Comphealth, 
Salt Lake City


Re: date keyword

2005-09-20 Thread Erik Hatcher


On Sep 20, 2005, at 12:55 PM, haipeng du wrote:
I understand that. But from Field API, there is a method of Keyword  
which
accepts Date object as value. When I use that method to index, I  
can not get
real date back. I also use other method to pass a string value to  
index.
That works great. Does that mean I can not use that method to index  
keyword?

Thanks a lot.


Have a look at DateField.stringToDate:




Please also check the wiki, Lucene in Action, and the e-mail list  
archives if you are searching by date.  Lots of caveats apply.


Erik


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: (TermInfosReader, SegmentTermEnum) Out Of Memory Exception patch

2005-09-20 Thread Daniel Naber
On Tuesday 20 September 2005 12:13, kieran wrote:

> I've thought of submitting this to the project as a patch, but the
> lucene bugzilla account is disabled at the moment so...see the diff,
> below, and the attached, patched, file:

We've moved to JIRA but the link on our web page isn't updated yet:
http://issues.apache.org/jira/browse/LUCENE

Please create a report there and attach your patch (using diff -u is 
preferred, BTW).

Regards
 Daniel

-- 
http://www.danielnaber.de

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Sort by relevance+distance

2005-09-20 Thread markharw00d

To avoid caching 10,025 docs when you only want to see 10,000 to 10,025
(and assuming the user was paging through results) you might have to 
remember the lowest score  used in the previous page of results to avoid 
adding those 10,000 docs with score > lastLowScore

to the HitQueue again.





___ 
How much free photo storage do you get? Store your holiday 
snaps for FREE with Yahoo! Photos http://uk.photos.yahoo.com


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Storing HashMap as an UnIndexed Field

2005-09-20 Thread Tricia Williams
Hi,

   I'd like to store a HashMap for some extra data to be used when a given
document is retrieved as a Hit for a query.  To add an UnIndexed Field
to an index takes only Strings as parameters.  Does anyone have any
suggestions on how I might convert the HashMap to a String that is
efficiently recomposed into the desired HashMap on the other end?

Thanks,
Tricia


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Storing HashMap as an UnIndexed Field

2005-09-20 Thread Mordo, Aviran (EXP N-NANNATEK)
You can store the values as a coma separated string (which then you'll
need to parse manually back to a HashMap) 

-Original Message-
From: Tricia Williams [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, September 20, 2005 3:14 PM
To: java-user@lucene.apache.org
Subject: Storing HashMap as an UnIndexed Field

Hi,

   I'd like to store a HashMap for some extra data to be used when a
given document is retrieved as a Hit for a query.  To add an UnIndexed
Field to an index takes only Strings as parameters.  Does anyone have
any suggestions on how I might convert the HashMap to a String that is
efficiently recomposed into the desired HashMap on the other end?

Thanks,
Tricia


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Storing HashMap as an UnIndexed Field

2005-09-20 Thread Tricia Williams
Do you think there is anyway that I could use the serialization already
built into the HashMap data structure?

On Tue, 20 Sep 2005, Mordo, Aviran (EXP N-NANNATEK) wrote:

> You can store the values as a coma separated string (which then you'll
> need to parse manually back to a HashMap)
>
> -Original Message-
> From: Tricia Williams [mailto:[EMAIL PROTECTED]
> Sent: Tuesday, September 20, 2005 3:14 PM
> To: java-user@lucene.apache.org
> Subject: Storing HashMap as an UnIndexed Field
>
> Hi,
>
>I'd like to store a HashMap for some extra data to be used when a
> given document is retrieved as a Hit for a query.  To add an UnIndexed
> Field to an index takes only Strings as parameters.  Does anyone have
> any suggestions on how I might convert the HashMap to a String that is
> efficiently recomposed into the desired HashMap on the other end?
>
> Thanks,
> Tricia
>
>
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>
>
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: "Best-practice" in a web application AND live update of index used by Tomcat

2005-09-20 Thread Matthias Bräuer

Hello,

I have a question regarding your answers to two previous posts:


>For best performance, use a single IndexSearcher instance across your  
entire application.


>DelayCloseIndexSearcher overrides the close() method so it does not
>close immediately: it only decrements the usage counter. [...]


I have implemented a Client-Server application where a desktop program 
on a file server regularly updates an index while a web application 
running on Tomcat on the same server answers queries against this index. 
Currently the index is not built incrementally.


Now, the problem is that if I have an open IndexReader (or Searcher or 
Luke Toolbox, respectively), it is impossible to update the index 
because I get IOExceptions ala "Cannot delete _7.cfs". This even happens 
if the conflicting IndexReader isn't doing anything but just sitting 
still inside a Searcher to wait for the next query. So, does that mean I 
have to close the searcher after each request to let the indexer do its 
work? Or is there a way to tell Lucene that the IndexReader is only 
reading and not writing anything.


Thanks for your help,
Matthias



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Storing HashMap as an UnIndexed Field

2005-09-20 Thread Erik Hatcher


On Sep 20, 2005, at 3:29 PM, Tricia Williams wrote:
Do you think there is anyway that I could use the serialization  
already

built into the HashMap data structure?


A Document, when reconstituted from Hits, is essentially a glorified  
HashMap-like structure.  I recommend you simply iterate your HashMap  
during indexing and add each entry to the Document as a Field.Keyword  
or as an unindexed field.  It may be slightly more code than doing  
some type of serialization/de-serialization, but not by much.


Erik




On Tue, 20 Sep 2005, Mordo, Aviran (EXP N-NANNATEK) wrote:


You can store the values as a coma separated string (which then  
you'll

need to parse manually back to a HashMap)

-Original Message-
From: Tricia Williams [mailto:[EMAIL PROTECTED]
Sent: Tuesday, September 20, 2005 3:14 PM
To: java-user@lucene.apache.org
Subject: Storing HashMap as an UnIndexed Field

Hi,

   I'd like to store a HashMap for some extra data to be used when a
given document is retrieved as a Hit for a query.  To add an  
UnIndexed

Field to an index takes only Strings as parameters.  Does anyone have
any suggestions on how I might convert the HashMap to a String  
that is

efficiently recomposed into the desired HashMap on the other end?

Thanks,
Tricia


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Storing HashMap as an UnIndexed Field

2005-09-20 Thread Mordo, Aviran (EXP N-NANNATEK)
I can't think of a way you can use serialization, since lucene only
works with strings. 

-Original Message-
From: Tricia Williams [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, September 20, 2005 3:30 PM
To: java-user@lucene.apache.org
Subject: RE: Storing HashMap as an UnIndexed Field

Do you think there is anyway that I could use the serialization already
built into the HashMap data structure?

On Tue, 20 Sep 2005, Mordo, Aviran (EXP N-NANNATEK) wrote:

> You can store the values as a coma separated string (which then you'll

> need to parse manually back to a HashMap)
>
> -Original Message-
> From: Tricia Williams [mailto:[EMAIL PROTECTED]
> Sent: Tuesday, September 20, 2005 3:14 PM
> To: java-user@lucene.apache.org
> Subject: Storing HashMap as an UnIndexed Field
>
> Hi,
>
>I'd like to store a HashMap for some extra data to be used when a 
> given document is retrieved as a Hit for a query.  To add an UnIndexed

> Field to an index takes only Strings as parameters.  Does anyone have 
> any suggestions on how I might convert the HashMap to a String that is

> efficiently recomposed into the desired HashMap on the other end?
>
> Thanks,
> Tricia
>
>
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>
>
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



How the default ranking works

2005-09-20 Thread tirupathi reddy
Hello,
 
   How the hits are ranked in default case. If I have say some query like this:
 
title:"measurement procedure" AND id:ep6289*
 
Say I have some 10 documents matched with that query, how my hits will be 
displayed. Which record will display first and how it will do the ranking in 
default case. And also think that I don't have any sort Field.
 
Please give me some resources where I can read all about Indexing, Ranking of 
Lucene.
 
Thanx,
MTREDDY 


Tirupati Reddy Manyam 
24-06-08, 
Sundugaullee-24, 
79110 Freiburg 
GERMANY. 

Phone: 00497618811257 
cell : 004917624649007

__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

Re: How the default ranking works

2005-09-20 Thread Erik Hatcher


On Sep 20, 2005, at 4:03 PM, tirupathi reddy wrote:
   How the hits are ranked in default case. If I have say some  
query like this:


title:"measurement procedure" AND id:ep6289*

Say I have some 10 documents matched with that query, how my hits  
will be displayed. Which record will display first and how it will  
do the ranking in default case. And also think that I don't have  
any sort Field.


Please give me some resources where I can read all about Indexing,  
Ranking of Lucene.


To dig through it yourself, check out IndexSearcher.explain().  To  
dig deeper, look at the javadocs on Similarity.


There is of course a bit on scoring in "Lucene in Action":



You have a copy of this book, right?!  :)

Erik


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Storing HashMap as an UnIndexed Field

2005-09-20 Thread jian chen
well, certainly you can serialize into a byte stream and encode it using 
base64.

Jian

On 9/20/05, Mordo, Aviran (EXP N-NANNATEK) <[EMAIL PROTECTED]> wrote:
> 
> I can't think of a way you can use serialization, since lucene only
> works with strings.
> 
> -Original Message-
> From: Tricia Williams [mailto:[EMAIL PROTECTED]
> Sent: Tuesday, September 20, 2005 3:30 PM
> To: java-user@lucene.apache.org
> Subject: RE: Storing HashMap as an UnIndexed Field
> 
> Do you think there is anyway that I could use the serialization already
> built into the HashMap data structure?
> 
> On Tue, 20 Sep 2005, Mordo, Aviran (EXP N-NANNATEK) wrote:
> 
> > You can store the values as a coma separated string (which then you'll
> 
> > need to parse manually back to a HashMap)
> >
> > -Original Message-
> > From: Tricia Williams [mailto:[EMAIL PROTECTED]
> > Sent: Tuesday, September 20, 2005 3:14 PM
> > To: java-user@lucene.apache.org
> > Subject: Storing HashMap as an UnIndexed Field
> >
> > Hi,
> >
> > I'd like to store a HashMap for some extra data to be used when a
> > given document is retrieved as a Hit for a query. To add an UnIndexed
> 
> > Field to an index takes only Strings as parameters. Does anyone have
> > any suggestions on how I might convert the HashMap to a String that is
> 
> > efficiently recomposed into the desired HashMap on the other end?
> >
> > Thanks,
> > Tricia
> >
> >
> > -
> > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > For additional commands, e-mail: [EMAIL PROTECTED]
> >
> >
> >
> > -
> > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > For additional commands, e-mail: [EMAIL PROTECTED]
> >
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
>


Re: How the default ranking works

2005-09-20 Thread tirupathi reddy
hello erik,
 
  I don't have that book. Is it available in the internet?
 
Thanx,
MTREDDY


Tirupati Reddy Manyam 
24-06-08, 
Sundugaullee-24, 
79110 Freiburg 
GERMANY. 

Phone: 00497618811257 
cell : 004917624649007


-
Yahoo! for Good
 Click here to donate to the Hurricane Katrina relief effort. 

Re: How the default ranking works

2005-09-20 Thread Otis Gospodnetic
--- tirupathi reddy <[EMAIL PROTECTED]> wrote:

>   I don't have that book. Is it available in the internet?

Of course, here it is: http://www.manning.com/books/hatcher2 (Add to
Cart link)

Otis


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: How the default ranking works

2005-09-20 Thread Erik Hatcher


On Sep 20, 2005, at 4:19 PM, tirupathi reddy wrote:

  I don't have that book. Is it available in the internet?


You can order it online and have it shipped to you from many online  
bookstores - or buy the PDF directly from Manning online.  Links at  
the top of this page:


http://www.lucenebook.com/



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Storing HashMap as an UnIndexed Field

2005-09-20 Thread markharw00d

Or using XMLEncoder:

   HashMap map=new HashMap();
   map.put("foo","bar");
   ByteArrayOutputStream baos=new ByteArrayOutputStream();
   XMLEncoder encoder =new XMLEncoder(baos);
   encoder.writeObject(map);
   encoder.flush();
   System.out.println(baos.toString());






___ 
Yahoo! Messenger - NEW crystal clear PC to PC calling worldwide with voicemail http://uk.messenger.yahoo.com


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: "Best-practice" in a web application AND live update of index used by Tomcat

2005-09-20 Thread Ramanathan Krishnan

Hi,
I am a newbie to lucene but I guess the IOException will occur only in 
windows as window os doesn't close the files properly.  It shud work 
fine in Linux.
I had a similar problem when recreating the same index. The approach I 
took was to delete all documents and start adding new ones, instead of 
overwriting the index.


Krishnan

Matthias Bräuer wrote:


Hello,

I have a question regarding your answers to two previous posts:


>For best performance, use a single IndexSearcher instance across 
your  entire application.


>DelayCloseIndexSearcher overrides the close() method so it does not
>close immediately: it only decrements the usage counter. [...]


I have implemented a Client-Server application where a desktop program 
on a file server regularly updates an index while a web application 
running on Tomcat on the same server answers queries against this 
index. Currently the index is not built incrementally.


Now, the problem is that if I have an open IndexReader (or Searcher or 
Luke Toolbox, respectively), it is impossible to update the index 
because I get IOExceptions ala "Cannot delete _7.cfs". This even 
happens if the conflicting IndexReader isn't doing anything but just 
sitting still inside a Searcher to wait for the next query. So, does 
that mean I have to close the searcher after each request to let the 
indexer do its work? Or is there a way to tell Lucene that the 
IndexReader is only reading and not writing anything.


Thanks for your help,
Matthias



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]






-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Stale NFS file handle Exception

2005-09-20 Thread John Wang
How do you "update" the index?

-John

On 9/12/05, Harini Raghavan <[EMAIL PROTECTED]> wrote:
> 
> Hi All,
> I have 2 servers in the production environment, one running some Quartz
> jobs and the other one running the application. There is a common NFS
> mount which has the lucene index directory. The jobs fetch the latest
> data and update the lucene index. And the user can search on the index
> to retrieve documents. When I search on the index on nfs while the jobs
> are being run, I get the following exception :
> 
> java.io.IOException: Stale NFS file handle
> at java.io.RandomAccessFile.readBytes(Native Method)
> at java.io.RandomAccessFile.read(RandomAccessFile.java:315)
> at
> org.apache.lucene.store.FSInputStream.readInternal(FSDirectory.java:420)
> at org.apache.lucene.store.InputStream.readBytes(InputStream.java:61)
> at
> org.apache.lucene.index.CompoundFileReader$CSInputStream.readInternal(
> CompoundFileReader.java:220)
> at org.apache.lucene.store.InputStream.refill(InputStream.java:158)
> at org.apache.lucene.store.InputStream.readByte(InputStream.java:43)
> at org.apache.lucene.store.InputStream.readVInt(InputStream.java:83)
> at
> org.apache.lucene.index.SegmentTermEnum.readTerm(SegmentTermEnum.java:142)
> at org.apache.lucene.index.SegmentTermEnum.next(SegmentTermEnum.java:115)
> at
> org.apache.lucene.index.TermInfosReader.scanEnum(TermInfosReader.java:143)
> at org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java:137)
> at org.apache.lucene.index.SegmentTermDocs.seek(SegmentTermDocs.java:51)
> at org.apache.lucene.index.MultiTermDocs.termDocs(MultiReader.java:409)
> at org.apache.lucene.index.MultiTermDocs.read(MultiReader.java:377)
> 
> Can I have the index directory on a common nfs mount? Does lucene
> support this?
> Any help would be greatly appreciated.
> Thank you,
> Harini
> 
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
>


Re: search performance enhancement

2005-09-20 Thread John Wang
Hi Paul and other gurus:

In a related topic, seems lucene is scoring documents that would hit in a 
"prohibited" boolean clause, e.g. NOT field:value. It doesn't seem to make 
sense to score a document that is to be excluded from the result. Is this a 
difficult thing to fix?

Also in Paul's ealier comment: "... unless you have large indexes, this will 
probably not make much difference", what is "large" in this case. 

In our case, say millions of documents match some query, but after either a 
Filter is applied or after a NOT query (e.g. query with a NOT/prohibited 
clause) is applied, the resulting hit list has only 10 documents. Seems the 
millions of calls to score() is wasted, some of the score() call can be 
computational intensive.

Am I on the right track?

Thanks

-John

On 8/19/05, Paul Elschot <[EMAIL PROTECTED]> wrote:
> 
> On Friday 19 August 2005 18:09, John Wang wrote:
> > Hi Paul:
> >
> > Thanks for the pointer.
> >
> > How would I extend from the patch you submitted to filter out
> > more documents not using a Filter. e.g.
> >
> > have a class to skip documents based on a docID: boolean
> > isValid(int docID)
> >
> > My problem is I want to discard documents at query time without
> > having to construct a BitSet via filter. I have my own memory
> > structure to help me skip documents based the query and a docid.
> 
> Basically, you need to implement the next() and skipTo(int targetDoc)
> methods from Scorer on your memory structure. They are somewhat
> redundant (skipTo(doc() + 1) has the same semantics as next(), except
> initially), but that is for a mix of historical and performance reasons.
> Have a look at how this is done in the posted FilteredQuery class
> for SortedVIntList and BitSet.
> With only isValid(docId) these next() and skipTo() methods would
> have to count over the document numbers, which is less than ideal.
> 
> When you use the posted code, iirc it is only necessary to implement the
> SkipFilter interface on your memory structure. One can use that interface
> to build/cache such memory structures using an IndexReader, and
> from there the DocNrSkipper interface will do the rest (of the top of
> my head).
> One slight problem with the current Lucene implementation is that
> java.lang.BitSet is not interface.
> 
> Regards,
> Paul Elschot.
> 
> > Thanks
> >
> > -John
> >
> > On 8/16/05, Paul Elschot <[EMAIL PROTECTED]> wrote:
> > > Hi John,
> > >
> > > On Wednesday 17 August 2005 04:46, John Wang wrote:
> > > > Hi:
> > > >
> > > > I posted a bug (36147) a few days ago and didn't hear anything, so
> > > > I thought I'd try my luck on this list.
> > > >
> > > > The idea is to avoid score calculations on documents to be filtered
> > > > out anyway. (e.g. via Filter object passed to the searcher class)
> > > >
> > > > This seems to be an easy change.
> > >
> > > Have a look here:
> > > http://issues.apache.org/bugzilla/show_bug.cgi?id=32965
> > >
> > > > Also it would be nice to expose a method to return a score given a
> > > > docid, e.g.
> > > >
> > > > float getScore(int docid)
> > > >
> > > > on the Scorer class.
> > >
> > > skipTo(int docid) and score() will do that.
> > >
> > > > I am gonna make the change locally and do some performance analysis
> > > > on it and will post some numbers later.
> > >
> > > The default score computations are mostly table lookups, and pretty 
> fast.
> > > So, unless you have large indexes, this will probably not make
> > > much difference, but any performance improvement is wellcome.
> > > In larger indexes, it helps to use skipTo() while searching.
> > >
> > > Regards,
> > > Paul Elschot
> > >
> > >
> > > -
> > > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > > For additional commands, e-mail: [EMAIL PROTECTED]
> > >
> > >
> >
> > -
> > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > For additional commands, e-mail: [EMAIL PROTECTED]
> >
> >
> >
> >
> 
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
>