[ 
https://issues.apache.org/jira/browse/LUCENE-6422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14496326#comment-14496326
 ] 

Nicholas Knize commented on LUCENE-6422:
----------------------------------------

Thanks David. I will certainly be doing a more thorough benchmark and will 
start with the suggestions. I imagine the savings will not be as extreme as in 
the situation with Wales (that was just the most interesting case.)  

bq. With it disabled, the underlying BytesRefIteratortokenStream will consume a 
Iterator<Cell> that is a direct instance of TreeCellIterator, and then you get 
the "streaming" effect.

Just a few thoughts, the StreamingPrefixTreeIterator gives the benefit of a few 
worlds:  
1. It uses an on-demand DFS through bit shifting completely eliminating the 
need for the stack DFS logic in TreeCellIterator.hasNext.  I suppose code-wise 
it would be cleaner to subclass TreeCellIterator and just override hasNext (and 
possibly next since I don't set thisCell/current in next)?  That's a good idea 
for code maintenance and reuse.
2. The on-demand DFS traversal already achieves a "leafy branch pruning" effect 
by not descending on Cells that already fall "within" the shape. This gives you 
pruning without having to buffer anything (other than the current and next 
cell). This does vary a little bit in that the RPT simply prunes all 4 "leaves" 
that "intersect" the shape.

bq. Can you consider not subclassing Legacy* ?

I'll certainly take a look at this. I saw the comment about not subclassing and 
thought about it, but since there is so much reuse with the bytes[], b_off, and 
b_len (which could be a BytesRef) it didn't make much sense duplicating code. 
Are you suggesting duplicating code and eventually deprecating the LegacyCell?



> Add StreamingQuadPrefixTree
> ---------------------------
>
>                 Key: LUCENE-6422
>                 URL: https://issues.apache.org/jira/browse/LUCENE-6422
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: modules/spatial
>    Affects Versions: 5.x
>            Reporter: Nicholas Knize
>         Attachments: LUCENE-6422.patch
>
>
> To conform to Lucene's inverted index, SpatialStrategies use strings to 
> represent QuadCells and GeoHash cells. Yielding 1 byte per QuadCell and 5 
> bits per GeoHash cell, respectively.  To create the terms representing a 
> Shape, the BytesRefIteratorTokenStream first builds all of the terms into an 
> ArrayList of Cells in memory, then passes the ArrayList.Iterator back to 
> invert() which creates a second lexicographically sorted array of Terms. This 
> doubles the memory consumption when indexing a shape.
> This task introduces a PackedQuadPrefixTree that uses a StreamingStrategy to 
> accomplish the following:
> 1.  Create a packed 8byte representation for a QuadCell
> 2.  Build the Packed cells 'on demand' when incrementToken is called
> Improvements over this approach include the generation of the packed cells 
> using an AutoPrefixAutomaton



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to