[jira] [Commented] (LUCENE-5316) Taxonomy tree traversing improvement

Gilad Barkai (JIRA) Sun, 03 Nov 2013 01:51:42 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-5316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13812297#comment-13812297
 ]


Gilad Barkai commented on LUCENE-5316:
--------------------------------------

The patch is not ready for commit - I just got a stackoverflow in my head while 
chasing this NPEs in the loop-as-recursion thing... 
Shai, you are right in the sense that we could avoid an extra loop/recursion if 
the kids are null, but this now works, doesn't throw NPEs in weird places and 
allows returning {{null}} while not calling the extra {{CI.next()}}.
I'm still looking into that. 
I touched the recurring loop as little as possible, as I'm not 100% sure I 
understand it fully (also, changing a small innocent thing breaks things bad). 
In the old code there were hidden assumptions which are now no longer true, I 
traced some but not sure about others.

bq.  how about if we consolidate getPTA() and getChildren() into a single 
getTaxonomyTree()
As for the API change - getPTA is being quietly - though with determination - 
swapped off. getChildren is all that is left.
I'm not sure .getChildren() should be encapsulated with a TaxononyTree object 
if it's the only API.
Do you think other API should be put there as well?

bq. I think that we should experiment here with a TaxoTree object which holds 
the children in a map
The current issue is not yet done, there are still tests to consider and hiding 
PTA even further from the users' eyes.
I feel that this issue is big enough to stand for itself, while the children 
map is also large enough? It would contain code for the iterations but also 
some code in the taxo-reopen phase, that will replace (hopefully ) the PTA 
reopen code.


> Taxonomy tree traversing improvement
> ------------------------------------
>
>                 Key: LUCENE-5316
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5316
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: modules/facet
>            Reporter: Gilad Barkai
>            Priority: Minor
>         Attachments: LUCENE-5316.patch, LUCENE-5316.patch
>
>
> The taxonomy traversing is done today utilizing the 
> {{ParallelTaxonomyArrays}}. In particular, two taxonomy-size {{int}} arrays 
> which hold for each ordinal it's (array #1) youngest child and (array #2) 
> older sibling.
> This is a compact way of holding the tree information in memory, but it's not 
> perfect:
> * Large (8 bytes per ordinal in memory)
> * Exposes internal implementation
> * Utilizing these arrays for tree traversing is not straight forward
> * Lose reference locality while traversing (the array is accessed in 
> increasing only entries, but they may be distant from one another)
> * In NRT, a reopen is always (not worst case) done at O(Taxonomy-size)
> This issue is about making the traversing more easy, the code more readable, 
> and open it for future improvements (i.e memory footprint and NRT cost) - 
> without changing any of the internals. 
> A later issue(s?) could be opened to address the gaps once this one is done.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-5316) Taxonomy tree traversing improvement

Reply via email to