[jira] Commented: (LUCENE-1278) Add optional storing of document numbers in term dictionary

2008-07-20 Thread Eks Dev (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12615077#action_12615077
 ] 

Eks Dev commented on LUCENE-1278:
-

in light of Mike's comments hier (Michael McCandless - 05/May/08 05:33 AM), I 
think it is worth mentioning that I am working on LUCENE-1340, that is storing 
postings without additional frq info. 

correct me if I am wrong, the only difference is that this approach with *.frq 
needs one seek more... at the same time, this could potentially increase term 
dict size, so we loose some locality.

Your your last proposal sounds interesting,  "inline short postings" into term 
dict , so for short postings (about the size of offset pointer into *.frq) with 
tf==1 (that is the always the case if you use omitTf(true) from LUCENE-1340)  
we spare one seek()... this could be a lot. Also, there is no need to store 
postings into *frq  (this complicates maintenance I guess)  

> Add optional storing of document numbers in term dictionary
> ---
>
> Key: LUCENE-1278
> URL: https://issues.apache.org/jira/browse/LUCENE-1278
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Index
>Affects Versions: 2.3.1
>Reporter: Jason Rutherglen
>Priority: Minor
> Attachments: lucene.1278.5.4.2008.patch, 
> lucene.1278.5.5.2008.2.patch, lucene.1278.5.5.2008.patch, 
> lucene.1278.5.7.2008.patch, lucene.1278.5.7.2008.test.patch, 
> TestTermEnumDocs.java
>
>
> Add optional storing of document numbers in term dictionary.  String index 
> field cache and range filter creation will be faster.  
> Example read code:
> {noformat}
> TermEnum termEnum = indexReader.terms(TermEnum.LOAD_DOCS);
> do {
>   Term term = termEnum.term();
>   if (term == null || term.field() != field) break;
>   int[] docs = termEnum.docs();
> } while (termEnum.next());
> {noformat}
> Example write code:
> {noformat}
> Document document = new Document();
> document.add(new Field("tag", "dog", Field.Store.YES, 
> Field.Index.UN_TOKENIZED, Field.Term.STORE_DOCS));
> indexWriter.addDocument(document);
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Updated: (LUCENE-1340) Make it posible not to include TF information in index

2008-07-20 Thread Eks Dev (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eks Dev updated LUCENE-1340:


Attachment: LUCENE-1340.patch

- fixed stupid bug in SegmentTermDocs (was doc = docCode; instead of doc += 
docCode;)
- TestOmitTf extended a bit 


> Make it posible not to include TF information in index
> --
>
> Key: LUCENE-1340
> URL: https://issues.apache.org/jira/browse/LUCENE-1340
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Index
>Reporter: Eks Dev
>Priority: Minor
> Attachments: LUCENE-1340.patch, LUCENE-1340.patch, LUCENE-1340.patch, 
> LUCENE-1340.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Term Frequency is typically not needed  for all fields, some CPU (reading one 
> VInt less and one X>>>1...) and IO can be spared by making pure boolen fields 
> possible in Lucene. This topic has already been discussed and accepted as a 
> part of Flexible Indexing... This issue tries to push things a bit faster 
> forward as I have some concrete customer demands.
> benefits can be expected for fields that are typical candidates for Filters, 
> enumerations, user rights, IDs or very short "texts", phone  numbers, zip 
> codes, names...
> Status: just passed standard test (compatibility), commited for early review, 
> I have not tried new feature, missing some asserts and one two unit tests
> Complexity: simpler than expected
> can be used via omitTf() (who used omitNorms() will know where to find it :)  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [jira] Commented: (LUCENE-1341) BoostingNearQuery class (prototype)

2008-07-20 Thread Grant Ingersoll

Hmm, being 1.5 means waiting until 3.0-dev to commit, so your call...

-Grant

On Jul 19, 2008, at 11:03 AM, Peter Keegan (JIRA) wrote:



   [ https://issues.apache.org/jira/browse/LUCENE-1341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12615004 
#action_12615004 ]


Peter Keegan commented on LUCENE-1341:
--

Note that this patch requires java 1.5 or later (easily modified to  
run on 1.4)



BoostingNearQuery class (prototype)
---

   Key: LUCENE-1341
   URL: https://issues.apache.org/jira/browse/LUCENE-1341
   Project: Lucene - Java
Issue Type: Improvement
Components: Query/Scoring
  Affects Versions: 2.3.1
  Reporter: Peter Keegan
  Priority: Minor
   Fix For: 2.3.2

   Attachments: bnq.patch, BoostingNearQuery.java


This patch implements term boosting for SpanNearQuery. Refer to: 
http://www.gossamer-threads.com/lists/lucene/java-user/62779
This patch works but probably needs more work. I don't like the use  
of 'instanceof', but I didn't want to touch Spans or TermSpans.  
Also, the payload code is mostly a copy of what's in  
BoostingTermQuery and could be common-sourced somewhere. Feel free  
to throw darts at it :)


--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



--
Grant Ingersoll
http://www.lucidimagination.com

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ








-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [jira] Commented: (LUCENE-1278) Add optional storing of document numbers in term dictionary

2008-07-20 Thread eks dev
It seams someone else had the same idea to "inline" very short postings into 
term dictionary (even for in-memory index) ans save one pointer (and seek, in 
disk setup)... nice reading

http://www.siam.org/proceedings/alenex/2008/alx08_01transierf.pdf




- Original Message 
> From: Eks Dev (JIRA) <[EMAIL PROTECTED]>
> To: java-dev@lucene.apache.org
> Sent: Sunday, 20 July, 2008 1:02:31 PM
> Subject: [jira] Commented: (LUCENE-1278) Add optional storing of document 
> numbers in term dictionary
> 
> 
> [ 
> https://issues.apache.org/jira/browse/LUCENE-1278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12615077#action_12615077
>  
> ] 
> 
> Eks Dev commented on LUCENE-1278:
> -
> 
> in light of Mike's comments hier (Michael McCandless - 05/May/08 05:33 AM), I 
> think it is worth mentioning that I am working on LUCENE-1340, that is 
> storing 
> postings without additional frq info. 
> 
> correct me if I am wrong, the only difference is that this approach with 
> *.frq 
> needs one seek more... at the same time, this could potentially increase term 
> dict size, so we loose some locality.
> 
> Your your last proposal sounds interesting,  "inline short postings" into 
> term 
> dict , so for short postings (about the size of offset pointer into *.frq) 
> with 
> tf==1 (that is the always the case if you use omitTf(true) from LUCENE-1340)  
> we 
> spare one seek()... this could be a lot. Also, there is no need to store 
> postings into *frq  (this complicates maintenance I guess)  
> 
> > Add optional storing of document numbers in term dictionary
> > ---
> >
> > Key: LUCENE-1278
> > URL: https://issues.apache.org/jira/browse/LUCENE-1278
> > Project: Lucene - Java
> >  Issue Type: New Feature
> >  Components: Index
> >Affects Versions: 2.3.1
> >Reporter: Jason Rutherglen
> >Priority: Minor
> > Attachments: lucene.1278.5.4.2008.patch, 
> > lucene.1278.5.5.2008.2.patch, 
> lucene.1278.5.5.2008.patch, lucene.1278.5.7.2008.patch, 
> lucene.1278.5.7.2008.test.patch, TestTermEnumDocs.java
> >
> >
> > Add optional storing of document numbers in term dictionary.  String index 
> field cache and range filter creation will be faster.  
> > Example read code:
> > {noformat}
> > TermEnum termEnum = indexReader.terms(TermEnum.LOAD_DOCS);
> > do {
> >   Term term = termEnum.term();
> >   if (term == null || term.field() != field) break;
> >   int[] docs = termEnum.docs();
> > } while (termEnum.next());
> > {noformat}
> > Example write code:
> > {noformat}
> > Document document = new Document();
> > document.add(new Field("tag", "dog", Field.Store.YES, 
> Field.Index.UN_TOKENIZED, Field.Term.STORE_DOCS));
> > indexWriter.addDocument(document);
> > {noformat}
> 
> -- 
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.
> 
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]



  __
Not happy with your email address?.
Get the one you really want - millions of new email addresses available now at 
Yahoo! http://uk.docs.yahoo.com/ymail/new.html

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]