[
https://issues.apache.org/jira/browse/LUCENE-2810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12970800#action_12970800
]
Robert Muir commented on LUCENE-2810:
-------------------------------------
I agree with Simon's words here 'specialize for a certain usecase'.
I prefer for compression to not be inside lucene, but I think it belongs inside
the app.
However I think that we should make ensure we provide the APIs so that someone
can implement compression in their app.
The problem to me is that I think its too tricky to have in lucene any
compression that will be good 'in general' without this app-specific knowledge.
Just like people use different compression for image files than they do ms-word
documents, etc.
Providing a "general purpose" compression with reasonable random access seems
redundant,
modern filesystems will do this for you transparently (e.g. NTFS you just tell
it that the .fdt should be compressed).
> Stored Fields Compression
> -------------------------
>
> Key: LUCENE-2810
> URL: https://issues.apache.org/jira/browse/LUCENE-2810
> Project: Lucene - Java
> Issue Type: Improvement
> Components: Store
> Reporter: Grant Ingersoll
> Assignee: Grant Ingersoll
>
> In some cases (logs, HTML pages w/ boilerplate, etc.), the stored fields for
> documents contain a lot of redundant information and end up wasting a lot of
> space across a large collection of documents. For instance, simply
> compressing a typical log file often results in > 75% compression rates. We
> should explore mechanisms for applying compression across all the documents
> for a field (or fields) while still maintaining relatively fast lookup (that
> being said, in most logging applications, fast retrieval of a given event is
> not always critical.) For instance, perhaps it is possible to have a part of
> storage that contains the set of unique values for all the fields and the
> document field value simply contains a reference (could be as small as a few
> bits depending on the number of uniq. items) to that value instead of having
> a full copy. Extending this, perhaps we can leverage some existing
> compression capabilities in Java to provide this as well.
> It may make sense to implement this as a Directory, but it might also make
> sense as a Codec, if and when we have support for changing storage Codecs.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]