Hi again, Can you add some details and guidelines how to implement that? Different files types have different structure, is such spliting doable without knowing Lucene internals?
Michael McCandless-2 wrote: > > You're welcome! > > Another, bottoms-up option would be to make a custom Directory impl > that simply splits up files above a certain size. That'd be more > generic and more reliable... > > Mike > > On Thu, Sep 10, 2009 at 5:26 AM, Dvora <barak.ya...@gmail.com> wrote: >> >> Hi, >> >> Thanks a lot for that, will peforms the experiments and publish the >> results. >> I'm aware to the risk of peformance degredation, but for the pilot I'm >> trying to run I think it's acceptable. >> >> Thanks again! >> >> >> >> Michael McCandless-2 wrote: >>> >>> First, you need to limit the size of segments initially created by >>> IndexWriter due to newly added documents. Probably the simplest way >>> is to call IndexWriter.commit() frequently enough. You might want to >>> use IndexWriter.ramSizeInBytes() to gauge how much RAM is currently >>> consumed by IndexWriter's buffer to determine when to commit. But it >>> won't be an exact science, ie, the segment size will be different from >>> the RAM buffer size. So, experiment w/ it... >>> >>> Second, you need to prevent merging from creating a segment that's too >>> large. For this I would use the setMaxMergeMB method of the >>> LogByteSizeMergePolicy (which is IndexWriter's default merge policy). >>> But note that this max size applies to the *input* segments, so you'd >>> roughly want that to be 1.0 MB (your 10.0 MB divided by the merge >>> factor = 10), but probably make it smaller to be sure things stay >>> small enough. >>> >>> Note that with this approach, if your index is large enough, you'll >>> wind up with many segments and search performance will suffer when >>> compared to an index that doesn't have this max 10.0 MB file size >>> restriction. >>> >>> Mike >>> >>> On Thu, Sep 10, 2009 at 2:32 AM, Dvora <barak.ya...@gmail.com> wrote: >>>> >>>> Hello again, >>>> >>>> Can someone please comment on that, whether what I'm looking is >>>> possible >>>> or >>>> not? >>>> >>>> >>>> Dvora wrote: >>>>> >>>>> Hello, >>>>> >>>>> I'm using Lucene2.4. I'm developing a web application that using >>>>> Lucene >>>>> (via compass) to do the searches. >>>>> I'm intending to deploy the application in Google App Engine >>>>> (http://code.google.com/appengine/), which limits files length to be >>>>> smaller than 10MB. I've read about the various policies supported by >>>>> Lucene to limit the file sizes, but on matter which policy I used and >>>>> which parameters, the index files still grew to be lot more the 10MB. >>>>> Looking at the code, I've managed to limit the cfs files (predicting >>>>> the >>>>> file size in CompoundFileWriter before closing the file) - I guess >>>>> that >>>>> will degrade performance, but it's OK for now. But now the FDT files >>>>> are >>>>> becoming huge (about 60MB) and I cant identifiy a way to limit those >>>>> files. >>>>> >>>>> Is there some built-in and correct way to limit these files length? If >>>>> no, >>>>> can someone direct me please how should I tweak the source code to >>>>> achieve >>>>> that? >>>>> >>>>> Thanks for any help. >>>>> >>>> >>>> -- >>>> View this message in context: >>>> http://www.nabble.com/How-to-avoid-huge-index-files-tp25347505p25378056.html >>>> Sent from the Lucene - Java Users mailing list archive at Nabble.com. >>>> >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>>> For additional commands, e-mail: java-user-h...@lucene.apache.org >>>> >>>> >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>> For additional commands, e-mail: java-user-h...@lucene.apache.org >>> >>> >>> >> >> -- >> View this message in context: >> http://www.nabble.com/How-to-avoid-huge-index-files-tp25347505p25380052.html >> Sent from the Lucene - Java Users mailing list archive at Nabble.com. >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> >> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > -- View this message in context: http://www.nabble.com/How-to-avoid-huge-index-files-tp25347505p25381489.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org