Hello Matteo, Welcome. You are not bothering/me-us; you are asking in the right place.
Jack’s right in terms of the field type dictating how it works. LatLonType, simply stores the latitude and longitude internally as separate floating point fields and it does efficient range queries over them for bounding-box queries. Lucene has remarkably fast/efficient range queries over numbers based on a Trie/PrefixTree. In fact systems like TitanDB leave such queries to Lucene. For point-radius, it iterates over all of them in-memory in a brute-force fashion (not scalable but may be fine). BBoxField is similar in spirit to LatLonType; each side of an indexed rectangle gets its own floating point field internally. Note that for both listed above, the underlying storage and range queries use built-in numeric fields. SpatialRecursivePrefixTreeFieldType (RPT for short) is interesting in that it supports indexing essentially any shape by representing the indexed shape as multiple grid squares. Non-point shapes (e.g. a polygon) are approximated; if you need accuracy, you should additionally store the vector geometry and validate the results in a 2nd pass (see SerializedDVStrategy for help with that). RPT, like Lucene’s numeric fields, uses a Trie/PrefixTree but encodes two dimensions, not one. The Trie/PrefixTree concept underlies both RPT and numeric fields, which are approaches to using Lucene’s terms index to encode prefixes. So the big point here is that Lucene/Solr doesn’t have side indexes using fundamentally different technologies for different types of data; no; Lucene’s one versatile index looks up terms (for keyword search), numbers, AND 2-d spatial. For keyword search, the term is a word, for numbers, the term represents a contiguous range of values (e.g. 100-200), and for 2-d spatial, a term is a grid square (a 2-D range). I am aware many other DBs put spatial data in R-Trees, and I have no interest investing energy in doing that in Lucene. That isn’t to say I think that other DBs shouldn’t be using R-Trees. I think a system based on sorted keys/terms (like Lucene and Cassandra, Accumulo, HBase, and others) already have a powerful/versatile index such that it doesn’t warrant complexity in adding something different. And Lucene’s underlying index continues to improve. I am most excited about an “auto-prefixing” technique McCandless has been working on that will bring performance up to the next level for numeric & spatial data in Lucene’s index. If you’d like to learn more about RPT and Lucene/Solr spatial, I suggest my “Spatial Deep Dive” presentation at Lucene Revolution in San Diego, May 2013: Lucene / Solr 4 Spatial Deep Dive <https://www.youtube.com/watch?v=L2cUGv0Rebs&list=PLsj1Ri57ZE94ulvk2vI_WoJrDYs3ckmH0&index=31> Also, my article here illustrates some RPT concepts in terms of indexing: http://opensourceconnections.com/blog/2014/04/11/indexing-polygons-in-lucene-with-accuracy/ ~ David Smiley Freelance Apache Lucene/Solr Search Consultant/Developer http://www.linkedin.com/in/davidwsmiley On Sat, Jan 10, 2015 at 10:26 AM, Matteo Tarantino < matteo.tarant...@gmail.com> wrote: > Hi all, > I hope to not bother you, but I think I'm writing to the only mailing list > that can help me with my question. > > I am writing my master thesis about Geographical Information Retrieval > (GIR) and I'm using Solr to create a little geospatial search engine. > Reading papers about GIR I noticed that these systems use a separate data > structure (like an R-tree http://it.wikipedia.org/wiki/R-tree) to save > geographical coordinates of documents, but I have found nothing about how > Solr manages coordinates. > > Can someone help me, and most of all, can someone address me to documents > that talk about how and where Solr saves spatial informations? > > Thank you in advance > Matteo >