[
https://issues.apache.org/jira/browse/LUCENE-6422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14499138#comment-14499138
]
David Smiley commented on LUCENE-6422:
--------------------------------------
bq. We can always review/iterate/benchmark after they are committed, as long as
net/net the patch is a step forward as (I think?) this one is.
Mike, this project is defacto run as review then commit, not the other way
around. This isn't news to you, of course; it's weird to feel I need to remind
you -- we all know this for good & bad. I have the same expectation of patches
I send elsewhere. I'm not looking for perfection, just moving one thing around
(a "quality" issue as-is IMO), and the rest is minor. I appreciate the points
you make generally, but I don't feel I'm setting the bar too high on this
issue. Because of the fuss being made here; it's going to be harder for me to
give quality feedback on future patches without fear of inducing needless
drama. And that's not good; Lucene improves through solid peer review that I
see on many patches (particularly yours, by the way).
bq. It's important to show new people we are eager and excited for their
contributions
Let it be known that weeks ago, upon discovery that Elastic had a new spatial
SME, I contacted Nick to do a video chat so that I could basically welcome him
into the Lucene/Spatial4j spatial area, that I looked forward to seeing what he
will contribute, and that I seek'ed his honest impressions / feedback. I
simply can't be more sincere, and I hope this demonstrates that I don't want to
be the only spatial person around here.
RE abstractions: I respect your opinion although I don't agree that there are
too many here. On the Geo3D front, it's extremely close now to being
committed, and the _only_ abstraction it is fitting into is a Spatial4j Shape
interface (BTW Geo3D internally has a bunch of abstractions and I didn't
complain to Karl as to the merits of them). It would be good to have one more
Spatial4j abstraction hook but it's not a blocker. On this patch, I'm asking
Nick _to scale it back_ one, to just the SpatialPrefixTree (which technically
includes "Cell" as that's what SPTs generate). And because of these
abstractions, both of them in fact (Shape & SpatialPrefixTree), it's possible
to re-use indexing and search predicates (Intersects, Within, Contains) and
other code that work with these abstractions, without either implementations
needing to know they exist. I think it's very powerful and awesome that these
things can be combined / leveraged. Of course I don't believe any of these
APIs are perfect; the specifics are debatable and I wish there were more people
improving it than just me to help make it even better. Now it looks like there
are :-)
_Sigh_; these conversations are stressfull -- I look forward to getting back to
the technical matters of the patch and moving it forward. [~nknize] can you
please post a new patch:
* baselined on Lucene trunk (standard practice for contributing to Lucene)
* with the streamlined cell iterator implementation (what you call "streaming")
moved to the PackedQuadPrefixTree
My added patch had two little extras: the factory & benchmark module
integration. It would be nice if you could add those. If you don't, I will
any how, perhaps with a factory test which I didn't yet do. In fact if I sadly
never hear from you again, I'll do all that is spoken of here because we'd all
love for this awesomeness to get into Lucene spatial. You've done all the hard
work; what remains is small.
You certainly don't have to prove to me that the indexes are lean... or rather
can be lean under "realistic" circumstances. I remain very curious about that.
> Add StreamingQuadPrefixTree
> ---------------------------
>
> Key: LUCENE-6422
> URL: https://issues.apache.org/jira/browse/LUCENE-6422
> Project: Lucene - Core
> Issue Type: Improvement
> Components: modules/spatial
> Affects Versions: 5.x
> Reporter: Nicholas Knize
> Attachments: LUCENE-6422.patch,
> LUCENE-6422_with_SPT_factory_and_benchmark.patch
>
>
> To conform to Lucene's inverted index, SpatialStrategies use strings to
> represent QuadCells and GeoHash cells. Yielding 1 byte per QuadCell and 5
> bits per GeoHash cell, respectively. To create the terms representing a
> Shape, the BytesRefIteratorTokenStream first builds all of the terms into an
> ArrayList of Cells in memory, then passes the ArrayList.Iterator back to
> invert() which creates a second lexicographically sorted array of Terms. This
> doubles the memory consumption when indexing a shape.
> This task introduces a PackedQuadPrefixTree that uses a StreamingStrategy to
> accomplish the following:
> 1. Create a packed 8byte representation for a QuadCell
> 2. Build the Packed cells 'on demand' when incrementToken is called
> Improvements over this approach include the generation of the packed cells
> using an AutoPrefixAutomaton
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]