[ https://issues.apache.org/jira/browse/LUCENE-6422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14496658#comment-14496658 ]
David Smiley commented on LUCENE-6422: -------------------------------------- bq. 1. It uses an on-demand DFS through bit shifting completely eliminating the need for the stack DFS logic in TreeCellIterator.hasNext. I suppose code-wise it would be cleaner to subclass TreeCellIterator and just override hasNext (and possibly next since I don't set thisCell/current in next)? That's a good idea for code maintenance and reuse. Interesting; I need to look at that code closer. Nonetheless, I think a stack vs completely stream one by one is unlikely to be noticeable in a benchmark. The memory use is capped at a small amount either way. If we find there are measurable advantages here, then couldn't you provide this via overriding SpatialPrefixTree.getTreeCellIterator instead of overriding the SpatialStrategy? bq. 2. The on-demand DFS traversal already achieves a "leafy branch pruning" effect by not descending on Cells that already fall "within" the shape. This gives you pruning without having to buffer anything (other than the current and next cell). This does vary a little bit in that the RPT simply prunes all 4 "leaves" that "intersect" the shape. I think we may be talking about different things. A "leafy branch" as I defined it in Lucene spatial is a parent cell in which all sub-cells that could theoretically exist are there for this shape and are also leaves. So for a quad-tree, it's a parent with 4 sub-cells that are all leaves. A "leaf" is a cell that is _either_ within the shape from which it was derived _or_ it overlaps the edge but we've reached a precision threshold. Indexing the 4 leaves produces a larger index than not emitting those leaves at all and instead marking the parent as a leaf -- and the end effect is semantically equivalent in terms of the search predicates. There are some trade-offs; but I won't digress now. bq. Are you suggesting duplicating code and eventually deprecating the LegacyCell? Yes but I need to apply the patch and see how much code is at stake here. At the time I introduced "LegacyCell", I refined the SpatialPrefixTree API and was unsatisfied with the only two implementations at that time, in terms of efficiencies, so I called it LegacyPrefixTree and LegacyCell. Perhaps these could still stick around still; I welcome your input on what it might be named or if there is too little code to worry about. But keeping having this extend "LegacyCell" is problematic because RecursivePrefixTreeStrategy makes an assumption for the branch pruning you can see there, right at the beginning. I welcome whatever thoughts you may have on the API to make it more clear, faster, whatever. > Add StreamingQuadPrefixTree > --------------------------- > > Key: LUCENE-6422 > URL: https://issues.apache.org/jira/browse/LUCENE-6422 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/spatial > Affects Versions: 5.x > Reporter: Nicholas Knize > Attachments: LUCENE-6422.patch > > > To conform to Lucene's inverted index, SpatialStrategies use strings to > represent QuadCells and GeoHash cells. Yielding 1 byte per QuadCell and 5 > bits per GeoHash cell, respectively. To create the terms representing a > Shape, the BytesRefIteratorTokenStream first builds all of the terms into an > ArrayList of Cells in memory, then passes the ArrayList.Iterator back to > invert() which creates a second lexicographically sorted array of Terms. This > doubles the memory consumption when indexing a shape. > This task introduces a PackedQuadPrefixTree that uses a StreamingStrategy to > accomplish the following: > 1. Create a packed 8byte representation for a QuadCell > 2. Build the Packed cells 'on demand' when incrementToken is called > Improvements over this approach include the generation of the packed cells > using an AutoPrefixAutomaton -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org