Re: [jira] Commented: (LUCENE-1340) Make it posible not to include TF information in index

eks dev Thu, 24 Jul 2008 10:41:50 -0700

Great Mike, 

I am a bit short with time to write back about our tests in details, but we are 
getting very similar numbers on Indexing speed and index size.... Performance 
of queries is also  better... but I have to clean-up the numbers from our 
internal things before reporting...


The most important information is that there are no problems whatsoever with 
regression tests (30000 Queries in a complex setup with expansion of terms via 
our spell checker,  pushing BoooleanQuery to the limit in all possible 
variations, index size 80Mio short documents, two fields)  gave me 100% 
identical responses as our standard reference test! Just for info, NO Phrase 
Queries NOR Payloads were covered by our regression test.






----- Original Message ----
> From: Michael McCandless (JIRA) <[EMAIL PROTECTED]>
> To: [email protected]
> Sent: Thursday, 24 July, 2008 6:05:31 PM
> Subject: [jira] Commented: (LUCENE-1340) Make it posible not to include TF 
> information in index
> 
> 
>     [ 
> https://issues.apache.org/jira/browse/LUCENE-1340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12616513#action_12616513
>  
> ] 
> 
> Michael McCandless commented on LUCENE-1340:
> --------------------------------------------
> 
> bq. The delta between the proxPointers are written as vlong's. Since the 
> delta 
> will be zero it's now only 1 byte; only a bit worse than 0 bytes
> 
> One more thing here: since the tiis are loaded into RAM, that unused 
> proxPointer 
> wastes 8 bytes for each indexed terms.  For indices with alot of terms this 
> can 
> add up to alot of wasted ram.  But still I think we should wait and fix this 
> as 
> part of flexible indexing, when we maybe refactor the TermInfos to be "column 
> stride" instead.
> 
> > Make it posible not to include TF information in index
> > ------------------------------------------------------
> >
> >                 Key: LUCENE-1340
> >                 URL: https://issues.apache.org/jira/browse/LUCENE-1340
> >             Project: Lucene - Java
> >          Issue Type: New Feature
> >          Components: Index
> >            Reporter: Eks Dev
> >            Priority: Minor
> >         Attachments: LUCENE-1340.patch, LUCENE-1340.patch, 
> > LUCENE-1340.patch, 
> LUCENE-1340.patch, LUCENE-1340.patch, LUCENE-1340.patch
> >
> >   Original Estimate: 24h
> >  Remaining Estimate: 24h
> >
> > Term Frequency is typically not needed  for all fields, some CPU (reading 
> > one 
> VInt less and one X>>>1...) and IO can be spared by making pure boolen fields 
> possible in Lucene. This topic has already been discussed and accepted as a 
> part 
> of Flexible Indexing... This issue tries to push things a bit faster forward 
> as 
> I have some concrete customer demands.
> > benefits can be expected for fields that are typical candidates for 
> > Filters, 
> enumerations, user rights, IDs or very short "texts", phone  numbers, zip 
> codes, 
> names...
> > Status: just passed standard test (compatibility), commited for early 
> > review, 
> I have not tried new feature, missing some asserts and one two unit tests
> > Complexity: simpler than expected
> > can be used via omitTf() (who used omitNorms() will know where to find it 
> > :)  
> 
> -- 
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]



      __________________________________________________________
Not happy with your email address?.
Get the one you really want - millions of new email addresses available now at 
Yahoo! http://uk.docs.yahoo.com/ymail/new.html

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [jira] Commented: (LUCENE-1340) Make it posible not to include TF information in index

Reply via email to