RE: New "Stream closed" exception with Java 6 - solved
Understood. Thanks Hoss. - Chris - Original Message - From: Chris Hostetter Sent: Fri, 18/9/2009 5:58pm To: java-user@lucene.apache.org Subject: RE: New "Stream closed" exception with Java 6 - solved : > not really ... adding a document multiple times is a perfectly legal use : > case, adding a document with a "Reader" based field where the reader is : > already closed ... that's not legal (And Lucene doesn't really have any : > way of knowing if the Reader is closed because *it* closed it. : Now I am confused, I must be missing something fundamental. I take no : action that I am aware of which closes the Reader, so how is it : happening? The attached code demonstrates the exception - please can : you advise on what is happening under the covers? :-) sorry for confusing you ... i should have said there are *some* use cases where adding the same document twice is legal -- but documents contain Fields that specify a reader are not one of those cases -- IndexWriter consumes and closes the Reader to get the tokens for indexing, and the next time you re-add that same Document, the Reader is already closed. My point was in response to your question about why IndexWRiter doesn't give you a different error about duplicate documents: it can't because there are *other* cases where indexing the same doc over and over is fine (docs that just contain simple strings). All IndexWRiter kows when it sees a closed Reader is that it's closed, and it can't read from it -- it has no way of knowing *why* it's closed. (trying to keep track of ever Reader IndexWriter ever closed would be an intractable problem) -Hoss - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Memory consumed by IndexSearcher
Hi, I was wondering what would be sensible amount of memory IndexSearcher can consume? In my application we do retain reference to it for quicker searches; however I have become a bit worried for it being a memory hog. We are using Lucene 2.4.0 on 8 CPU Linux SMP box; JVM is Sun's 1.6.0_14 64-Bit Server VM. I am asking because I have ended up with IndexSearcher having Retained size [1] of 145M. All of this memory is being eaten by IndexSearcher::reader::subReaders[]. The reader is MultiSegmentReader and all subReaders are SegmentReader. My memory dump showed subReaders array having size of 37 SegmentReaders, 2 to 5 M each. I can send YouKit screenshot if anyone's interested. All of that should be viewed in the light of index size on the disk, which is only 22M. I appreciate that all of this memory can be used for legitimate purposes; however is there a way to know when does it go over sensible limit? Can there be a "sensible" limit at all? Also, is it possible to set the physical boundary the IndexSearcher would never go over? Thanks in advance for all answers. Regards, Mindaugas [1] http://www.yourkit.com/docs/80/help/sizes.jsp - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Filtering query results based on relevance/acuracy
Hi, I'm, a total newbie with lucene and trying to understand how to achieve my (complicated) goals. So what I'm doing is yet totally experimental for me but is probably extremely trivial for the experts in this list :) I use lucene and Hibernate Search to index locations by their name, type, etc ... The LocationType is an Object that has it's "name" field indexed both tokenized and untokenized. The following LocationType names are indexed "Restaurant" "Mexican Restaurant" "Chinese Restaurant" "Greek Restaurant" etc... Considering the following query : "Mexican Restaurant" I systematically get all the entries as a result, most certainly because the "Restaurant" keyword is present in all of them. I'm trying to have a finer grained result set. Obviously for "Mexican Restaurant" I want the "Mexican Restaurant" entry as a result but NOT "Chinese Restaurant" nor "Greek Restaurant" as they are irrelevant. But maybe "Restaurant" itself should be returned with a lower wight/score or maybe it shouldn't ... im not sure about this one. 1) How can I do that ? Here is the code I use for querying : String[] typeFields = {"name", "tokenized_name"}; Map boostPerField = new HashMap(2); boostPerField.put( "name", (float) 4); boostPerField.put( "tokenized_name", (float) 2); QueryParser parser = new MultiFieldQueryParser( typeFields , new StandardAnalyzer(), boostPerField ); org.apache.lucene.search.Query luceneQuery; try { luceneQuery = parser.parse(queryString); } catch (ParseException e) { throw new RuntimeException("Unable to parse query: " + queryString, e); } I guess that there is a way to filter out results that have a score below a given threshold or a way to filter out results based on score gap or anything similar. But I have no idea on how to do this... What is the best way to achieve what I want? Thank you for your help ! Cheers, Alex
2.9 NRT w.r.t. sorting and field cache
Looking at the code, seems there is a disconnect between how/when field cache is loaded when IndexWriter.getReader() is called. Is FieldCache updated? Otherwise, are we reloading FieldCache for each reader instance? Seems for operations that lazy loads field cache, e.g. sorting, this has a significant performance issue. Please advise. Thanks -John