[ 
https://issues.apache.org/jira/browse/LUCENE-4599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13526535#comment-13526535
 ] 

Robert Muir commented on LUCENE-4599:
-------------------------------------

{quote}
If you have ideas to efficiently compress term vectors, you're welcome!
{quote}

I think we waste space with the terms, especially prefix/suffix lengths (even 
so much so, the prefix encoding probably hurts in general for many people). 
these should likely be bulk-compressed. as you already noticed in the patch, 
frequencies are a waste too. 

flags are wasteful and stupid, but it seems like you already tried to address 
that to some extent. if we compress chunks of docs we should optimize the case 
where flags are the same. Its crazy that someone would have just positions for 
"body field" of document 2, but positions and offsets for "body field" of 
document 3. 

                
> Compressed term vectors
> -----------------------
>
>                 Key: LUCENE-4599
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4599
>             Project: Lucene - Core
>          Issue Type: Task
>          Components: core/codecs, core/termvectors
>            Reporter: Adrien Grand
>            Assignee: Adrien Grand
>            Priority: Minor
>             Fix For: 4.1
>
>         Attachments: LUCENE-4599.patch
>
>
> We should have codec-compressed term vectors similarly to what we have with 
> stored fields.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to