[jira] [Commented] (LUCENE-8041) All Fields.terms(fld) impls should be O(1) not O(log(N))

Simon Willnauer (JIRA) Tue, 12 Jun 2018 12:08:48 -0700


    [ 
https://issues.apache.org/jira/browse/LUCENE-8041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16510047#comment-16510047
 ]


Simon Willnauer commented on LUCENE-8041:
-----------------------------------------

bq. +1 to make term vectors consistent across the index; it has always been 
strange that Lucene allows this.  Maybe open a separate issue for that?

++ 

bq. This has the downside that it sorts all fields on every call to iterator(). 
My concern is mainly that it will introduce performance problems down the line, 
ones that are difficult to find/debug because of java's syntactic sugar around 
iterator(). Especially if someone is using MultiFields (slow-wrapper crap), 
they will be doing a bunch of sorts on each segment, then merging those, and 
all hidden behind a single call to iterator().

I do wonder if an intermediate step would be to have a map + a Iterable<String> 
so we don't need to do this sorting over and over again?

> All Fields.terms(fld) impls should be O(1) not O(log(N))
> --------------------------------------------------------
>
>                 Key: LUCENE-8041
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8041
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: David Smiley
>            Priority: Major
>         Attachments: LUCENE-8041.patch
>
>
> I've seen apps that have a good number of fields -- hundreds.  The O(log(N)) 
> of TreeMap definitely shows up in a profiler; sometimes 20% of search time, 
> if I recall.  There are many Field implementations that are impacted... in 
> part because Fields is the base class of FieldsProducer.  
> As an aside, I hope Fields to go away some day; FieldsProducer should be 
> TermsProducer and not have an iterator of fields. If DocValuesProducer 
> doesn't have this then why should the terms index part of our API have it?  
> If we did this then the issue here would be a simple transition to a HashMap.
> Or maybe we can switch to HashMap and relax the definition of Fields.iterator 
> to not necessarily be sorted?
> Perhaps the fix can be a relatively simple conversion over to LinkedHashMap 
> in many cases if we can assume when we initialize these internal maps that we 
> consume them in sorted order to begin with.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-8041) All Fields.terms(fld) impls should be O(1) not O(log(N))

Reply via email to