I appreciate your explanation, but I think that the use case I described merits a deeper exploration:
Scenario 1: 16 threads indexing; queue size = 1000; present api; need to store In this scenario, there are always 1000 Strings with all the contents of their respective files. Averaging 50k per document = 50MB of String objects in memory. Scenario 2: 16 threads indexing; queue size = 1000; Field constructor with Reader and Index/Store/Tokenize; need to store At any one time, there are only 16 Strings with their respective file contents (i.e. read in at index time); and 984 Readers waiting in the queue. Averaging 50k per document = 800k of String objects in memory + overhead of 984 Readers Or am I not understanding something in your explanation? My understanding is that the IndexWriter serializes Document writes, but does not queue them (explicitly: locking multiple calls that are waiting is not an explicit queue). Of course, a change I could make would be to defer populating the Field from the file Reader until just before it gets indexed, using the String Field constructor, resulting in the equivalent of #2 above. But pushing this to the API would be easier. Thanks, Glen 2009/9/15 Chris Hostetter <hossman_luc...@fucit.org>: > > : Someone has made the decision that we will not be interested in > : storing files read using a Reader (at least not with these > : constructors). > : This is rather arbitrary. > > No, it was not arbitrary at all. > > The javadocs there are not a "decree" of what shall or shan't be > supported, they are an explanation of how the constructor works so that > there isn't any confusion. > > The *reason* why the code works that way, is because when Lucene > processes Fields that use a Reader, it doesn't buffer the Reader so it > can't store the full contents of that Reader. the *reason* it doesnt' > buffer the reader is because then it would have to make very arbitrary > memory decisions that could easily result in OOM (Readers can in fact be > infinite streams of characters) > > clients that want to store the value, need to be able to provide the > entire value as a String. > > : might want to also store files in the index, having a queue of 1000 > : Documents with 1000 Readers to files is vastly preferable to having > : 1000 documents with 1000 (perhaps very large) Strings with all the > : contents of the files. While this is not the best for all cases (#open > > You've just pointed out exactly why it's not feasible for Lucene to store > the contents of the Reader -- it would have to do the exact same thing you > describe. The curerent API leaves it up to the client to decide when to > do this, and what to do if the Reader produces a String bigger then it > wants to deal with. > > > > -Hoss > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > -- - --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org