[ https://issues.apache.org/jira/browse/LUCENE-6422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14500327#comment-14500327 ]
Nicholas Knize commented on LUCENE-6422: ---------------------------------------- bq. If you can suggest a better name to what "leafy branch pruning" does, then at a minimum it could be expressed in the javadocs ++. Just 'prune' is probably more clear since its universally used all over data structures. We can add a javadoc comment that describes it in further detail if necessary. bq. if SpatialPrefixTree might have a better literature/industry based name then I'd love to know what that is. There are trie based spatial trees all over (kd-Trie, kd-b-trie, buddy tree) industry and the literature. The one you call QuadPrefixTree was originally introduced in 1991 called the QuadTrie. (reference: Gaston H. Gonnet and Ricardo Baeza-Yates, Handbook of Algorithms and Data Structures -- in Pascal and C, 2nd edition, Addison-Wesley, 1991.) Dr. Hanan Samet from UMD has a great section on MX and PR QuadTrees (same as QuadPrefixTree and a name someone mentioned to you in another issue). He provides a nice discussion on the differences between MX, PR and their point based counterparts (compared by the decomposition methods). There's certainly nothing wrong with an implementation specific name. If you are asking for suggestions then I offer: SpatialTrie, GeoHashTrie, QuadTrie as being shorter, more to the point, and probably more relate-able to other spatial SMEs (whom I'm hoping would be willing to get more involved). bq. It's not obvious to me but where in the code of PackedQuadCell are the 5 "depth bits" encoded & decoded? PackedQuadCell.getLevel() decodes, and its encoded in PackedQuadCell.nextTerm() bq. Preferably it would stay enabled but I think you indicated it's not supported by PackedQuadTree? I didn't look closer. That's correct. bq. I also wonder whether we need a new, lighter weight spatial module (spatial2? spatia_light?), or maybe spatial_sandbox, where the barrier is lower? +++ I think this is a great idea for experimental/research features we don't want cluttering up the spatial module. bq. RE abstractions: I respect your opinion although I don't agree that there are too many here. IMHO, this is a slippery slope. There are so many diverse spatial data structures we should be taking a look at for improving spatial search in higher order dimensions (4D space-time for just a start). That's a personal interest area for me, in how the most powerful high dimension structures (that already exist) can fit within the design and implementation of lucene-core (a green field to explore). Something like this does require a sophisticated abstraction framework and this particular one has a bit of a learning curve. I think that can work itself out over time with a bit of refactoring (which it sounds like all are open to?). In the meantime it does set the bar rather high for new contributors. This is another +1 for a spatial sandbox for experimental research (heck make it a separate repo). bq. Sigh; these conversations are stressfull They're very verbose, but maybe that's the kick in the pants needed to help the spatial module really take off. That is, after all, the common goal of the community? > Add StreamingQuadPrefixTree > --------------------------- > > Key: LUCENE-6422 > URL: https://issues.apache.org/jira/browse/LUCENE-6422 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/spatial > Affects Versions: 5.x > Reporter: Nicholas Knize > Attachments: LUCENE-6422.patch, LUCENE-6422.patch, > LUCENE-6422_with_SPT_factory_and_benchmark.patch > > > To conform to Lucene's inverted index, SpatialStrategies use strings to > represent QuadCells and GeoHash cells. Yielding 1 byte per QuadCell and 5 > bits per GeoHash cell, respectively. To create the terms representing a > Shape, the BytesRefIteratorTokenStream first builds all of the terms into an > ArrayList of Cells in memory, then passes the ArrayList.Iterator back to > invert() which creates a second lexicographically sorted array of Terms. This > doubles the memory consumption when indexing a shape. > This task introduces a PackedQuadPrefixTree that uses a StreamingStrategy to > accomplish the following: > 1. Create a packed 8byte representation for a QuadCell > 2. Build the Packed cells 'on demand' when incrementToken is called > Improvements over this approach include the generation of the packed cells > using an AutoPrefixAutomaton -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org