[ 
https://issues.apache.org/jira/browse/LUCENE-8041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16510220#comment-16510220
 ] 

Robert Muir commented on LUCENE-8041:
-------------------------------------

{quote}
+1 to make term vectors consistent across the index; it has always been strange 
that Lucene allows this.  Maybe open a separate issue for that?
{quote}

This issue specifically asks why there is an iterator at all in the 
description, thats why i explained it.

But i also am concerned about this issue because i don't think its a real 
bottleneck for anyone. I don't want us doing anything risky that could 
potentially hurt ordinary users for some esoteric abuse case with a million 
fields: it would be better to just stay with treemap.

It is fine to sort a list in the constructor, or use a linkedhashmap. This 
won't hurt ordinary users, it will just cost more ram for abuse cases, so I am 
fine. I really don't want to see sneaky optimizations trying to avoid sorts or 
any of that, it does not belong here, this needs to be simple, clear, and safe. 
Instead any serious effort should go into trying to remove the problematic api 
(term vectors stuff), then it can even simpler since we won't need two data 
structures.

> All Fields.terms(fld) impls should be O(1) not O(log(N))
> --------------------------------------------------------
>
>                 Key: LUCENE-8041
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8041
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: David Smiley
>            Priority: Major
>         Attachments: LUCENE-8041.patch
>
>
> I've seen apps that have a good number of fields -- hundreds.  The O(log(N)) 
> of TreeMap definitely shows up in a profiler; sometimes 20% of search time, 
> if I recall.  There are many Field implementations that are impacted... in 
> part because Fields is the base class of FieldsProducer.  
> As an aside, I hope Fields to go away some day; FieldsProducer should be 
> TermsProducer and not have an iterator of fields. If DocValuesProducer 
> doesn't have this then why should the terms index part of our API have it?  
> If we did this then the issue here would be a simple transition to a HashMap.
> Or maybe we can switch to HashMap and relax the definition of Fields.iterator 
> to not necessarily be sorted?
> Perhaps the fix can be a relatively simple conversion over to LinkedHashMap 
> in many cases if we can assume when we initialize these internal maps that we 
> consume them in sorted order to begin with.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to