[jira] [Commented] (LUCENE-5316) Taxonomy tree traversing improvement

Gilad Barkai (JIRA) Sat, 23 Nov 2013 22:58:35 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-5316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13830884#comment-13830884
 ]


Gilad Barkai commented on LUCENE-5316:
--------------------------------------

I think we don't have to wait for LUCENE-5339, we could handle some of these 
issues now.

* {{if (kids == null)}} and hashing - there's a solution to that: not hold it 
in a map. We could hold it in an {{int[][]}} and NO_CHILDREN for the entries 
which has no children. So we'll have one array at the size of the taxonomy, and 
the sum of others will be significantly smaller if we have flat dimensions. 
We'll lose some RAM compared to the Map, but it will speed things up because 
we'll simply return what's in {{children\[ord\]}}. 

* {{if (children == null)}} is done only because it's being allocated lazily. 
Does it make sense to keep it that way? We could compute the children upon 
taxonomy opening and be done with it. Today (trunk) reopen is very costly as 
it's in the O(TaxonomySize) which affects NRT reopen (it's not guaranteed that 
for each reopen someone will actually need the new facet information), , but if 
the computation of the children is done according to the O(new segments) than 
we're not wasting much. 

> Taxonomy tree traversing improvement
> ------------------------------------
>
>                 Key: LUCENE-5316
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5316
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: modules/facet
>            Reporter: Gilad Barkai
>            Priority: Minor
>         Attachments: LUCENE-5316.patch, LUCENE-5316.patch, LUCENE-5316.patch, 
> LUCENE-5316.patch, LUCENE-5316.patch
>
>
> The taxonomy traversing is done today utilizing the 
> {{ParallelTaxonomyArrays}}. In particular, two taxonomy-size {{int}} arrays 
> which hold for each ordinal it's (array #1) youngest child and (array #2) 
> older sibling.
> This is a compact way of holding the tree information in memory, but it's not 
> perfect:
> * Large (8 bytes per ordinal in memory)
> * Exposes internal implementation
> * Utilizing these arrays for tree traversing is not straight forward
> * Lose reference locality while traversing (the array is accessed in 
> increasing only entries, but they may be distant from one another)
> * In NRT, a reopen is always (not worst case) done at O(Taxonomy-size)
> This issue is about making the traversing more easy, the code more readable, 
> and open it for future improvements (i.e memory footprint and NRT cost) - 
> without changing any of the internals. 
> A later issue(s?) could be opened to address the gaps once this one is done.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-5316) Taxonomy tree traversing improvement

Reply via email to