[ https://issues.apache.org/jira/browse/LUCENE-5316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13812297#comment-13812297 ]
Gilad Barkai commented on LUCENE-5316: -------------------------------------- The patch is not ready for commit - I just got a stackoverflow in my head while chasing this NPEs in the loop-as-recursion thing... Shai, you are right in the sense that we could avoid an extra loop/recursion if the kids are null, but this now works, doesn't throw NPEs in weird places and allows returning {{null}} while not calling the extra {{CI.next()}}. I'm still looking into that. I touched the recurring loop as little as possible, as I'm not 100% sure I understand it fully (also, changing a small innocent thing breaks things bad). In the old code there were hidden assumptions which are now no longer true, I traced some but not sure about others. bq. how about if we consolidate getPTA() and getChildren() into a single getTaxonomyTree() As for the API change - getPTA is being quietly - though with determination - swapped off. getChildren is all that is left. I'm not sure .getChildren() should be encapsulated with a TaxononyTree object if it's the only API. Do you think other API should be put there as well? bq. I think that we should experiment here with a TaxoTree object which holds the children in a map The current issue is not yet done, there are still tests to consider and hiding PTA even further from the users' eyes. I feel that this issue is big enough to stand for itself, while the children map is also large enough? It would contain code for the iterations but also some code in the taxo-reopen phase, that will replace (hopefully ) the PTA reopen code. > Taxonomy tree traversing improvement > ------------------------------------ > > Key: LUCENE-5316 > URL: https://issues.apache.org/jira/browse/LUCENE-5316 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/facet > Reporter: Gilad Barkai > Priority: Minor > Attachments: LUCENE-5316.patch, LUCENE-5316.patch > > > The taxonomy traversing is done today utilizing the > {{ParallelTaxonomyArrays}}. In particular, two taxonomy-size {{int}} arrays > which hold for each ordinal it's (array #1) youngest child and (array #2) > older sibling. > This is a compact way of holding the tree information in memory, but it's not > perfect: > * Large (8 bytes per ordinal in memory) > * Exposes internal implementation > * Utilizing these arrays for tree traversing is not straight forward > * Lose reference locality while traversing (the array is accessed in > increasing only entries, but they may be distant from one another) > * In NRT, a reopen is always (not worst case) done at O(Taxonomy-size) > This issue is about making the traversing more easy, the code more readable, > and open it for future improvements (i.e memory footprint and NRT cost) - > without changing any of the internals. > A later issue(s?) could be opened to address the gaps once this one is done. -- This message was sent by Atlassian JIRA (v6.1#6144) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org