Re: Separating the document dataset and the index dataset

Ramprakash Ramamoorthy Tue, 11 Dec 2012 01:32:52 -0800

On Fri, Dec 7, 2012 at 1:11 PM, Jain Rahul <[email protected]> wrote:


> If you are using lucene 4.0 and afford to compress your document dataset
> while indexing, it will be a huge savings in terms of disk space and also
> in IO (resulting in indexing throughput).
>
> In our case, it has helped us a lot as compressed data size was roughly 3
> times less than  of original document data set size.
>
> You may want to check  the below  link.
>
>
> http://blog.jpountz.net/post/33247161884/efficient-compressed-stored-fields-with-lucene
>
> Regards,
> Rahul
>

Thank you Rahul. That indeed seems promising. Just one doubt, how do I plug
this  CompressingStoredFieldsFormat into my app, as in I tried bundling it
in a codec, but not sure if I am proceeding in the right path. Any pointers
would be of great help!

>
>
> -----Original Message-----
> From: Ramprakash Ramamoorthy [mailto:[email protected]]
> Sent: 07 December 2012 13:03
> To: [email protected]
> Subject: Separating the document dataset and the index dataset
>
> Greetings,
>
>          We are using lucene in our log analysis tool. We get data around
> 35Gb a day and we have this practice of zipping week old indices and then
> unzip when need arises.
>
>            Though the compression offers a huge saving with respect to
> disk space, the decompression becomes an overhead. At times it takes around
> 10 minutes (de-compression takes 95% of the time) to search across a month
> long set of logs. We need to unzip fully atleast to get the total count
> from the index.
>
>            My question is, we are setting Index.Store to true. Is there a
> way where we can split the index dataset and the document dataset. In my
> understanding, if at all separation is possible, the document dataset can
> alone be zipped leaving the index dataset on disk? Will it be tangible to
> do this? Any pointers?
>
>            Or is adding more disks the only solution? Thanks in advance!
>
> --
> With Thanks and Regards,
> Ramprakash Ramamoorthy,
> +91 9626975420
> This email and any attachments are confidential, and may be legally
> privileged and protected by copyright. If you are not the intended
> recipient dissemination or copying of this email is prohibited. If you have
> received this in error, please notify the sender by replying by email and
> then delete the email completely from your system. Any views or opinions
> are solely those of the sender. This communication is not intended to form
> a binding contract unless expressly indicated to the contrary and properly
> authorised. Any actions taken on the basis of this email are at the
> recipient's own risk.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>


-- 
With Thanks and Regards,
Ramprakash Ramamoorthy,
Engineer Trainee,
Zoho Corporation.
+91 9626975420

Re: Separating the document dataset and the index dataset

Reply via email to