On Fri, Dec 7, 2012 at 1:11 PM, Jain Rahul <ja...@ivycomptech.com> wrote:
> If you are using lucene 4.0 and afford to compress your document dataset > while indexing, it will be a huge savings in terms of disk space and also > in IO (resulting in indexing throughput). > > In our case, it has helped us a lot as compressed data size was roughly 3 > times less than of original document data set size. > > You may want to check the below link. > > > http://blog.jpountz.net/post/33247161884/efficient-compressed-stored-fields-with-lucene > > Regards, > Rahul > Thank you Rahul. That indeed seems promising. Just one doubt, how do I plug this CompressingStoredFieldsFormat into my app, as in I tried bundling it in a codec, but not sure if I am proceeding in the right path. Any pointers would be of great help! > > > -----Original Message----- > From: Ramprakash Ramamoorthy [mailto:youngestachie...@gmail.com] > Sent: 07 December 2012 13:03 > To: java-user@lucene.apache.org > Subject: Separating the document dataset and the index dataset > > Greetings, > > We are using lucene in our log analysis tool. We get data around > 35Gb a day and we have this practice of zipping week old indices and then > unzip when need arises. > > Though the compression offers a huge saving with respect to > disk space, the decompression becomes an overhead. At times it takes around > 10 minutes (de-compression takes 95% of the time) to search across a month > long set of logs. We need to unzip fully atleast to get the total count > from the index. > > My question is, we are setting Index.Store to true. Is there a > way where we can split the index dataset and the document dataset. In my > understanding, if at all separation is possible, the document dataset can > alone be zipped leaving the index dataset on disk? Will it be tangible to > do this? Any pointers? > > Or is adding more disks the only solution? Thanks in advance! > > -- > With Thanks and Regards, > Ramprakash Ramamoorthy, > +91 9626975420 > This email and any attachments are confidential, and may be legally > privileged and protected by copyright. If you are not the intended > recipient dissemination or copying of this email is prohibited. If you have > received this in error, please notify the sender by replying by email and > then delete the email completely from your system. Any views or opinions > are solely those of the sender. This communication is not intended to form > a binding contract unless expressly indicated to the contrary and properly > authorised. Any actions taken on the basis of this email are at the > recipient's own risk. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > -- With Thanks and Regards, Ramprakash Ramamoorthy, Engineer Trainee, Zoho Corporation. +91 9626975420