[ 
https://issues.apache.org/jira/browse/LUCENE-6422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14496658#comment-14496658
 ] 

David Smiley commented on LUCENE-6422:
--------------------------------------

bq. 1. It uses an on-demand DFS through bit shifting completely eliminating the 
need for the stack DFS logic in TreeCellIterator.hasNext. I suppose code-wise 
it would be cleaner to subclass TreeCellIterator and just override hasNext (and 
possibly next since I don't set thisCell/current in next)? That's a good idea 
for code maintenance and reuse.

Interesting; I need to look at that code closer.  Nonetheless, I think a stack 
vs completely stream one by one is unlikely to be noticeable in a benchmark.  
The memory use is capped at a small amount either way.  If we find there are 
measurable advantages here, then couldn't you provide this via overriding 
SpatialPrefixTree.getTreeCellIterator instead of overriding the SpatialStrategy?

bq. 2. The on-demand DFS traversal already achieves a "leafy branch pruning" 
effect by not descending on Cells that already fall "within" the shape. This 
gives you pruning without having to buffer anything (other than the current and 
next cell). This does vary a little bit in that the RPT simply prunes all 4 
"leaves" that "intersect" the shape.

I think we may be talking about different things.  A "leafy branch" as I 
defined it in Lucene spatial is a parent cell in which all sub-cells that could 
theoretically exist are there for this shape and are also leaves.  So for a 
quad-tree, it's a parent with 4 sub-cells that are all leaves.  A "leaf" is a 
cell that is _either_ within the shape from which it was derived _or_ it 
overlaps the edge but we've reached a precision threshold.  Indexing the 4 
leaves produces a larger index than not emitting those leaves at all and 
instead marking the parent as a leaf -- and the end effect is semantically 
equivalent in terms of the search predicates.  There are some trade-offs; but I 
won't digress now.  

bq. Are you suggesting duplicating code and eventually deprecating the 
LegacyCell?

Yes but I need to apply the patch and see how much code is at stake here.  At 
the time I introduced "LegacyCell", I refined the SpatialPrefixTree API and was 
unsatisfied with the only two implementations at that time, in terms of 
efficiencies, so I called it LegacyPrefixTree and LegacyCell.  Perhaps these 
could still stick around still; I welcome your input on what it might be named 
or if there is too little code to worry about.  But keeping having this extend 
"LegacyCell" is problematic because RecursivePrefixTreeStrategy makes an 
assumption for the branch pruning you can see there, right at the beginning.

I welcome whatever thoughts you may have on the API to make it more clear, 
faster, whatever.


> Add StreamingQuadPrefixTree
> ---------------------------
>
>                 Key: LUCENE-6422
>                 URL: https://issues.apache.org/jira/browse/LUCENE-6422
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: modules/spatial
>    Affects Versions: 5.x
>            Reporter: Nicholas Knize
>         Attachments: LUCENE-6422.patch
>
>
> To conform to Lucene's inverted index, SpatialStrategies use strings to 
> represent QuadCells and GeoHash cells. Yielding 1 byte per QuadCell and 5 
> bits per GeoHash cell, respectively.  To create the terms representing a 
> Shape, the BytesRefIteratorTokenStream first builds all of the terms into an 
> ArrayList of Cells in memory, then passes the ArrayList.Iterator back to 
> invert() which creates a second lexicographically sorted array of Terms. This 
> doubles the memory consumption when indexing a shape.
> This task introduces a PackedQuadPrefixTree that uses a StreamingStrategy to 
> accomplish the following:
> 1.  Create a packed 8byte representation for a QuadCell
> 2.  Build the Packed cells 'on demand' when incrementToken is called
> Improvements over this approach include the generation of the packed cells 
> using an AutoPrefixAutomaton



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to