[ 
https://issues.apache.org/jira/browse/LUCENE-6422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14499138#comment-14499138
 ] 

David Smiley commented on LUCENE-6422:
--------------------------------------

bq. We can always review/iterate/benchmark after they are committed, as long as 
net/net the patch is a step forward as (I think?) this one is.

Mike, this project is defacto run as review then commit, not the other way 
around.  This isn't news to you, of course; it's weird to feel I need to remind 
you -- we all know this for good & bad.  I have the same expectation of patches 
I send elsewhere.  I'm not looking for perfection, just moving one thing around 
(a "quality" issue as-is IMO), and the rest is minor.  I appreciate the points 
you make generally, but I don't feel I'm setting the bar too high on this 
issue.  Because of the fuss being made here; it's going to be harder for me to 
give quality feedback on future patches without fear of inducing needless 
drama.  And that's not good; Lucene improves through solid peer review that I 
see on many patches (particularly yours, by the way).

bq. It's important to show new people we are eager and excited for their 
contributions

Let it be known that weeks ago, upon discovery that Elastic had a new spatial 
SME, I contacted Nick to do a video chat so that I could basically welcome him 
into the Lucene/Spatial4j spatial area, that I looked forward to seeing what he 
will contribute, and that I seek'ed his honest impressions / feedback.  I 
simply can't be more sincere, and I hope this demonstrates that I don't want to 
be the only spatial person around here.

RE abstractions: I respect your opinion although I don't agree that there are 
too many here.  On the Geo3D front, it's extremely close now to being 
committed, and the _only_ abstraction it is fitting into is a Spatial4j Shape 
interface (BTW Geo3D internally has a bunch of abstractions and I didn't 
complain to Karl as to the merits of them).  It would be good to have one more 
Spatial4j abstraction hook but it's not a blocker.  On this patch, I'm asking 
Nick _to scale it back_ one, to just the SpatialPrefixTree (which technically 
includes "Cell" as that's what SPTs generate).  And because of these 
abstractions, both of them in fact (Shape & SpatialPrefixTree), it's possible 
to re-use indexing and search predicates (Intersects, Within, Contains) and 
other code that work with these abstractions, without either implementations 
needing to know they exist.  I think it's very powerful and awesome that these 
things can be combined / leveraged.  Of course I don't believe any of these 
APIs are perfect; the specifics are debatable and I wish there were more people 
improving it than just me to help make it even better.  Now it looks like there 
are :-)

_Sigh_; these conversations are stressfull -- I look forward to getting back to 
the technical matters of the patch and moving it forward.  [~nknize] can you 
please post a new patch:
* baselined on Lucene trunk (standard practice for contributing to Lucene)
* with the streamlined cell iterator implementation (what you call "streaming") 
moved to the PackedQuadPrefixTree

My added patch had two little extras: the factory & benchmark module 
integration.  It would be nice if you could add those.  If you don't, I will 
any how, perhaps with a factory test which I didn't yet do.  In fact if I sadly 
never hear from you again, I'll do all that is spoken of here because we'd all 
love for this awesomeness to get into Lucene spatial.  You've done all the hard 
work; what remains is small.

You certainly don't have to prove to me that the indexes are lean... or rather 
can be lean under "realistic" circumstances.  I remain very curious about that.

> Add StreamingQuadPrefixTree
> ---------------------------
>
>                 Key: LUCENE-6422
>                 URL: https://issues.apache.org/jira/browse/LUCENE-6422
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: modules/spatial
>    Affects Versions: 5.x
>            Reporter: Nicholas Knize
>         Attachments: LUCENE-6422.patch, 
> LUCENE-6422_with_SPT_factory_and_benchmark.patch
>
>
> To conform to Lucene's inverted index, SpatialStrategies use strings to 
> represent QuadCells and GeoHash cells. Yielding 1 byte per QuadCell and 5 
> bits per GeoHash cell, respectively.  To create the terms representing a 
> Shape, the BytesRefIteratorTokenStream first builds all of the terms into an 
> ArrayList of Cells in memory, then passes the ArrayList.Iterator back to 
> invert() which creates a second lexicographically sorted array of Terms. This 
> doubles the memory consumption when indexing a shape.
> This task introduces a PackedQuadPrefixTree that uses a StreamingStrategy to 
> accomplish the following:
> 1.  Create a packed 8byte representation for a QuadCell
> 2.  Build the Packed cells 'on demand' when incrementToken is called
> Improvements over this approach include the generation of the packed cells 
> using an AutoPrefixAutomaton



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to