RE: New "Stream closed" exception with Java 6 - solved

2009-09-21 Thread Chris Bamford
Understood.  Thanks Hoss.

- Chris


- Original Message -
From: Chris Hostetter 
Sent: Fri, 18/9/2009 5:58pm
To: java-user@lucene.apache.org
Subject: RE: New "Stream closed" exception with Java 6 - solved


: > not really ... adding a document multiple times is a perfectly legal use 
: > case, adding a document with a "Reader" based field where the reader is 
: > already closed ... that's not legal (And Lucene doesn't really have any 
: > way of knowing if the Reader is closed because *it* closed it.

: Now I am confused, I must be missing something fundamental.  I take no 
: action that I am aware of which closes the Reader, so how is it 
: happening?  The attached code demonstrates the exception - please can 
: you advise on what is happening under the covers?  :-)

sorry for confusing you ... i should have said there are *some* use cases 
where adding the same document twice is legal -- but documents contain 
Fields that specify a reader are not one of those cases -- IndexWriter
consumes and closes the Reader to get the tokens for indexing, and the 
next time you re-add that same Document, the Reader is already closed.

My point was in response to your question about why IndexWRiter doesn't 
give you a different error about duplicate documents: it can't because 
there are *other* cases where indexing the same doc over and over is fine 
(docs that just contain simple strings).  All IndexWRiter kows when it 
sees a closed Reader is that it's closed, and it can't read from it -- it 
has no way of knowing *why* it's closed.  (trying to keep track of ever 
Reader IndexWriter ever closed would be an intractable problem)


-Hoss


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Memory consumed by IndexSearcher

2009-09-21 Thread Mindaugas Žakšauskas
Hi,

I was wondering what would be sensible amount of memory IndexSearcher
can consume? In my application we do retain reference to it for
quicker searches; however I have become a bit worried for it being a
memory hog. We are using Lucene 2.4.0 on 8 CPU Linux SMP box; JVM is
Sun's 1.6.0_14 64-Bit Server VM.

I am asking because I have ended up with IndexSearcher having Retained
size [1] of 145M. All of this memory is being eaten by
IndexSearcher::reader::subReaders[]. The reader is MultiSegmentReader
and all subReaders are SegmentReader. My memory dump showed subReaders
array having size of 37 SegmentReaders, 2 to 5 M each. I can send
YouKit screenshot if anyone's interested.

All of that should be viewed in the light of index size on the disk,
which is only 22M.

I appreciate that all of this memory can be used for legitimate
purposes; however is there a way to know when does it go over sensible
limit? Can there be a "sensible" limit at all? Also, is it possible to
set the physical boundary the IndexSearcher would never go over?

Thanks in advance for all answers.

Regards,
Mindaugas

[1] http://www.yourkit.com/docs/80/help/sizes.jsp

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Filtering query results based on relevance/acuracy

2009-09-21 Thread Alex
Hi,

I'm, a total newbie with lucene and trying to understand how to achieve my
(complicated) goals. So what I'm doing is yet totally experimental for me
but is probably extremely trivial for the experts in this list :)

I use lucene and Hibernate Search to index locations by their name, type,
etc ...
The LocationType is an Object that has it's "name" field indexed both
tokenized and untokenized.

The following LocationType names are indexed
"Restaurant"
"Mexican Restaurant"
"Chinese Restaurant"
"Greek Restaurant"
etc...

Considering the following query  :

"Mexican Restaurant"

I systematically get all the entries as a result, most certainly because the
"Restaurant" keyword is present in all of them.
I'm trying to have a finer grained result set.
Obviously for "Mexican Restaurant" I want the "Mexican Restaurant" entry as
a result but NOT "Chinese Restaurant" nor "Greek Restaurant" as they are
irrelevant. But maybe "Restaurant" itself should be returned with a lower
wight/score or maybe it shouldn't ... im not sure about this one.

1)
How can I do that ?

Here is the code I use for querying :


String[] typeFields = {"name", "tokenized_name"};
Map boostPerField = new HashMap(2);
boostPerField.put( "name", (float) 4);
boostPerField.put( "tokenized_name", (float) 2);


QueryParser parser = new MultiFieldQueryParser(
typeFields ,
new StandardAnalyzer(),
boostPerField
);

org.apache.lucene.search.Query luceneQuery;

try {
luceneQuery = parser.parse(queryString);
}
catch (ParseException e) {
throw new RuntimeException("Unable to parse query: " +
queryString, e);
}





I guess that there is a way to filter out results that have a score below a
given threshold or a way to filter out results based on score gap or
anything similar. But I have no idea on how to do this...


What is the best way to achieve what I want?

Thank you for your help !

Cheers,

Alex


2.9 NRT w.r.t. sorting and field cache

2009-09-21 Thread John Wang
Looking at the code, seems there is a disconnect between how/when field
cache is loaded when IndexWriter.getReader() is called.

Is FieldCache updated? Otherwise, are we reloading FieldCache for each
reader instance?

Seems for operations that lazy loads field cache, e.g. sorting, this has a
significant performance issue.

Please advise.

Thanks

-John