[jira] Commented: (LUCENE-1278) Add optional storing of document numbers in term dictionary

Eks Dev (JIRA) Sun, 20 Jul 2008 04:03:26 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-1278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12615077#action_12615077
 ]


Eks Dev commented on LUCENE-1278:
---------------------------------

in light of Mike's comments hier (Michael McCandless - 05/May/08 05:33 AM), I 
think it is worth mentioning that I am working on LUCENE-1340, that is storing 
postings without additional frq info. 

correct me if I am wrong, the only difference is that this approach with *.frq 
needs one seek more... at the same time, this could potentially increase term 
dict size, so we loose some locality.

Your your last proposal sounds interesting,  "inline short postings" into term 
dict , so for short postings (about the size of offset pointer into *.frq) with 
tf==1 (that is the always the case if you use omitTf(true) from LUCENE-1340)  
we spare one seek()... this could be a lot. Also, there is no need to store 
postings into *frq  (this complicates maintenance I guess)  

> Add optional storing of document numbers in term dictionary
> -----------------------------------------------------------
>
>                 Key: LUCENE-1278
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1278
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Index
>    Affects Versions: 2.3.1
>            Reporter: Jason Rutherglen
>            Priority: Minor
>         Attachments: lucene.1278.5.4.2008.patch, 
> lucene.1278.5.5.2008.2.patch, lucene.1278.5.5.2008.patch, 
> lucene.1278.5.7.2008.patch, lucene.1278.5.7.2008.test.patch, 
> TestTermEnumDocs.java
>
>
> Add optional storing of document numbers in term dictionary.  String index 
> field cache and range filter creation will be faster.  
> Example read code:
> {noformat}
> TermEnum termEnum = indexReader.terms(TermEnum.LOAD_DOCS);
> do {
>   Term term = termEnum.term();
>   if (term == null || term.field() != field) break;
>   int[] docs = termEnum.docs();
> } while (termEnum.next());
> {noformat}
> Example write code:
> {noformat}
> Document document = new Document();
> document.add(new Field("tag", "dog", Field.Store.YES, 
> Field.Index.UN_TOKENIZED, Field.Term.STORE_DOCS));
> indexWriter.addDocument(document);
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (LUCENE-1278) Add optional storing of document numbers in term dictionary

Reply via email to