[
https://issues.apache.org/jira/browse/LUCENE-6422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14498260#comment-14498260
]
David Smiley commented on LUCENE-6422:
--------------------------------------
bq. Can you attach the spatial.alg file you used so I can verify
It's in the patch.
bq. What choked specifically? I'm using PackedQuad with depth between 26 and
29. 1GB heap size using the shapes I described above.
-Xmx2G resulted in an OutOfMemoryError, be it for the legacy quad or this new
quad one.
bq. Out of curiosity, why is this option enabled by default if it uses
transient storage that doubles memory consumption? Seems backwards to me.
"leafy branch pruning" is enabled by default for RPT, although
"StreamingPrefixTreeStrategy" overrides the method that would trigger it. In
order to compare another tree (legacy quad in this case) fairly to packed-quad,
I disabled leafy branch pruning.
Please don't call what StreamingPrefixTreeStrategy does as leafy branch
pruning; it confuses the important distinction I'm trying to make. All SPTs
should stop traversing when the relation is _within_ -- that's expected/normal.
bq. IMHO I would avoid these kinds of absolute statements (especially with the
highly variable nature of spatial use-cases).
I'm sorry. To be more clear, I just don't yet understand how this packed quad
encoding is going to allow for distErrPct=0. I'd like to understand; please
help me. Knowing what I do know about the SPTs, the results were what I
expected -- similar disk size to existing quad.
bq. I think we can do better on simulated test data in the test framework.
Yes! I would love to have more realistic shapes to test with.
RE progress not perfection:
Yes, Mike uses that quote constantly and I even saw it in your code :-) I have
my catch-phrases too. Are you and Mike in a hurry to see this committed? It's
very normal, in this open-source project anyway, that there is back & forth &
peer-review and changes that are asked of the contributor. Don't worry;
something is going to get committed -- the query speed is a nice improvement!
It's not quite done -- that's all.
> Add StreamingQuadPrefixTree
> ---------------------------
>
> Key: LUCENE-6422
> URL: https://issues.apache.org/jira/browse/LUCENE-6422
> Project: Lucene - Core
> Issue Type: Improvement
> Components: modules/spatial
> Affects Versions: 5.x
> Reporter: Nicholas Knize
> Attachments: LUCENE-6422.patch,
> LUCENE-6422_with_SPT_factory_and_benchmark.patch
>
>
> To conform to Lucene's inverted index, SpatialStrategies use strings to
> represent QuadCells and GeoHash cells. Yielding 1 byte per QuadCell and 5
> bits per GeoHash cell, respectively. To create the terms representing a
> Shape, the BytesRefIteratorTokenStream first builds all of the terms into an
> ArrayList of Cells in memory, then passes the ArrayList.Iterator back to
> invert() which creates a second lexicographically sorted array of Terms. This
> doubles the memory consumption when indexing a shape.
> This task introduces a PackedQuadPrefixTree that uses a StreamingStrategy to
> accomplish the following:
> 1. Create a packed 8byte representation for a QuadCell
> 2. Build the Packed cells 'on demand' when incrementToken is called
> Improvements over this approach include the generation of the packed cells
> using an AutoPrefixAutomaton
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]