On Tue, Dec 11, 2012 at 3:14 PM, Uwe Schindler <u...@thetaphi.de> wrote:
> You can use Lucene 4.1 nightly builds from http://goo.gl/jZ6YD - it is > not yet released, but upgrading from Lucene 4.0 is easy. If you are not yet > on Lucene 4.0, there is more work to do, in that case a solution to your > problem would be to save the stored fields in a separate database/whatever > and only add *one* stored field to your index, containing the document ID > inside this external database. > > ----- > Uwe Schindler > H.-H.-Meier-Allee 63, D-28213 Bremen > http://www.thetaphi.de > eMail: u...@thetaphi.de Thank you Uwe. Already tried with the nightly build, but the codecs.jar in it isn't having a compressing codec at all, Tried pulling out from the trunk and then compiling, same issue, *org.apache.lucene.codecs.compressing*is missing. Any pointers? > > > > > -----Original Message----- > > From: Ramprakash Ramamoorthy [mailto:youngestachie...@gmail.com] > > Sent: Tuesday, December 11, 2012 10:32 AM > > To: java-user@lucene.apache.org > > Subject: Re: Separating the document dataset and the index dataset > > > > On Fri, Dec 7, 2012 at 1:11 PM, Jain Rahul <ja...@ivycomptech.com> > wrote: > > > > > If you are using lucene 4.0 and afford to compress your document > > > dataset while indexing, it will be a huge savings in terms of disk > > > space and also in IO (resulting in indexing throughput). > > > > > > In our case, it has helped us a lot as compressed data size was > > > roughly 3 times less than of original document data set size. > > > > > > You may want to check the below link. > > > > > > > > > http://blog.jpountz.net/post/33247161884/efficient-compressed-stored-f > > > ields-with-lucene > > > > > > Regards, > > > Rahul > > > > > > > Thank you Rahul. That indeed seems promising. Just one doubt, how do I > > plug this CompressingStoredFieldsFormat into my app, as in I tried > bundling > > it in a codec, but not sure if I am proceeding in the right path. Any > pointers > > would be of great help! > > > > > > > > > > > -----Original Message----- > > > From: Ramprakash Ramamoorthy [mailto:youngestachie...@gmail.com] > > > Sent: 07 December 2012 13:03 > > > To: java-user@lucene.apache.org > > > Subject: Separating the document dataset and the index dataset > > > > > > Greetings, > > > > > > We are using lucene in our log analysis tool. We get data > > > around 35Gb a day and we have this practice of zipping week old > > > indices and then unzip when need arises. > > > > > > Though the compression offers a huge saving with respect to > > > disk space, the decompression becomes an overhead. At times it takes > > > around > > > 10 minutes (de-compression takes 95% of the time) to search across a > > > month long set of logs. We need to unzip fully atleast to get the > > > total count from the index. > > > > > > My question is, we are setting Index.Store to true. Is > > > there a way where we can split the index dataset and the document > > > dataset. In my understanding, if at all separation is possible, the > > > document dataset can alone be zipped leaving the index dataset on > > > disk? Will it be tangible to do this? Any pointers? > > > > > > Or is adding more disks the only solution? Thanks in > advance! > > > > > > -- > > > With Thanks and Regards, > > > Ramprakash Ramamoorthy, > > > +91 9626975420 > > > This email and any attachments are confidential, and may be legally > > > privileged and protected by copyright. If you are not the intended > > > recipient dissemination or copying of this email is prohibited. If you > > > have received this in error, please notify the sender by replying by > > > email and then delete the email completely from your system. Any views > > > or opinions are solely those of the sender. This communication is not > > > intended to form a binding contract unless expressly indicated to the > > > contrary and properly authorised. Any actions taken on the basis of > > > this email are at the recipient's own risk. > > > > > > --------------------------------------------------------------------- > > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > > > > > > > > > > -- > > With Thanks and Regards, > > Ramprakash Ramamoorthy, > > Engineer Trainee, > > Zoho Corporation. > > +91 9626975420 > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > -- With Thanks and Regards, Ramprakash Ramamoorthy, Engineer Trainee, Zoho Corporation. +91 9626975420