Hi, Thanks a lot for that, will peforms the experiments and publish the results. I'm aware to the risk of peformance degredation, but for the pilot I'm trying to run I think it's acceptable.
Thanks again! Michael McCandless-2 wrote: > > First, you need to limit the size of segments initially created by > IndexWriter due to newly added documents. Probably the simplest way > is to call IndexWriter.commit() frequently enough. You might want to > use IndexWriter.ramSizeInBytes() to gauge how much RAM is currently > consumed by IndexWriter's buffer to determine when to commit. But it > won't be an exact science, ie, the segment size will be different from > the RAM buffer size. So, experiment w/ it... > > Second, you need to prevent merging from creating a segment that's too > large. For this I would use the setMaxMergeMB method of the > LogByteSizeMergePolicy (which is IndexWriter's default merge policy). > But note that this max size applies to the *input* segments, so you'd > roughly want that to be 1.0 MB (your 10.0 MB divided by the merge > factor = 10), but probably make it smaller to be sure things stay > small enough. > > Note that with this approach, if your index is large enough, you'll > wind up with many segments and search performance will suffer when > compared to an index that doesn't have this max 10.0 MB file size > restriction. > > Mike > > On Thu, Sep 10, 2009 at 2:32 AM, Dvora <barak.ya...@gmail.com> wrote: >> >> Hello again, >> >> Can someone please comment on that, whether what I'm looking is possible >> or >> not? >> >> >> Dvora wrote: >>> >>> Hello, >>> >>> I'm using Lucene2.4. I'm developing a web application that using Lucene >>> (via compass) to do the searches. >>> I'm intending to deploy the application in Google App Engine >>> (http://code.google.com/appengine/), which limits files length to be >>> smaller than 10MB. I've read about the various policies supported by >>> Lucene to limit the file sizes, but on matter which policy I used and >>> which parameters, the index files still grew to be lot more the 10MB. >>> Looking at the code, I've managed to limit the cfs files (predicting the >>> file size in CompoundFileWriter before closing the file) - I guess that >>> will degrade performance, but it's OK for now. But now the FDT files are >>> becoming huge (about 60MB) and I cant identifiy a way to limit those >>> files. >>> >>> Is there some built-in and correct way to limit these files length? If >>> no, >>> can someone direct me please how should I tweak the source code to >>> achieve >>> that? >>> >>> Thanks for any help. >>> >> >> -- >> View this message in context: >> http://www.nabble.com/How-to-avoid-huge-index-files-tp25347505p25378056.html >> Sent from the Lucene - Java Users mailing list archive at Nabble.com. >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> >> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > -- View this message in context: http://www.nabble.com/How-to-avoid-huge-index-files-tp25347505p25380052.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org