Re: Solr geospatial index?

2015-01-11 Thread Matteo Tarantino
Wow, thank you David!
You are really kind to spend your time writing all these informations to
me. This will be very helpful for my thesis work.

Thank you again.
MT



2015-01-11 2:46 GMT+01:00 david.w.smi...@gmail.com :

> Hello Matteo,
>
> Welcome. You are not bothering/me-us; you are asking in the right place.
>
> Jack’s right in terms of the field type dictating how it works.
>
> LatLonType, simply stores the latitude and longitude internally as
> separate floating point fields and it does efficient range queries over
> them for bounding-box queries.  Lucene has remarkably fast/efficient range
> queries over numbers based on a Trie/PrefixTree. In fact systems like
> TitanDB leave such queries to Lucene.  For point-radius, it iterates over
> all of them in-memory in a brute-force fashion (not scalable but may be
> fine).
>
> BBoxField is similar in spirit to LatLonType; each side of an indexed
> rectangle gets its own floating point field internally.
>
> Note that for both listed above, the underlying storage and range queries
> use built-in numeric fields.
>
> SpatialRecursivePrefixTreeFieldType (RPT for short) is interesting in that
> it supports indexing essentially any shape by representing the indexed
> shape as multiple grid squares.  Non-point shapes (e.g. a polygon) are
> approximated; if you need accuracy, you should additionally store the
> vector geometry and validate the results in a 2nd pass (see
> SerializedDVStrategy for help with that).  RPT, like Lucene’s numeric
> fields, uses a Trie/PrefixTree but encodes two dimensions, not one.
>
> The Trie/PrefixTree concept underlies both RPT and numeric fields, which
> are approaches to using Lucene’s terms index to encode prefixes.  So the
> big point here is that Lucene/Solr doesn’t have side indexes using
> fundamentally different technologies for different types of data; no;
> Lucene’s one versatile index looks up terms (for keyword search), numbers,
> AND 2-d spatial.  For keyword search, the term is a word, for numbers, the
> term represents a contiguous range of values (e.g. 100-200), and for 2-d
> spatial, a term is a grid square (a 2-D range).
>
> I am aware many other DBs put spatial data in R-Trees, and I have no
> interest investing energy in doing that in Lucene.  That isn’t to say I
> think that other DBs shouldn’t be using R-Trees.  I think a system based on
> sorted keys/terms (like Lucene and Cassandra, Accumulo, HBase, and others)
> already have a powerful/versatile index such that it doesn’t warrant
> complexity in adding something different.  And Lucene’s underlying index
> continues to improve.  I am most excited about an “auto-prefixing”
> technique McCandless has been working on that will bring performance up to
> the next level for numeric & spatial data in Lucene’s index.
>
> If you’d like to learn more about RPT and Lucene/Solr spatial, I suggest
> my “Spatial Deep Dive” presentation at Lucene Revolution in San Diego, May
> 2013:  Lucene / Solr 4 Spatial Deep Dive
> 
> Also, my article here illustrates some RPT concepts in terms of indexing:
> http://opensourceconnections.com/blog/2014/04/11/indexing-polygons-in-lucene-with-accuracy/
>
> ~ David Smiley
> Freelance Apache Lucene/Solr Search Consultant/Developer
> http://www.linkedin.com/in/davidwsmiley
>
> On Sat, Jan 10, 2015 at 10:26 AM, Matteo Tarantino <
> matteo.tarant...@gmail.com> wrote:
>
>> Hi all,
>> I hope to not bother you, but I think I'm writing to the only mailing
>> list that can help me with my question.
>>
>> I am writing my master thesis about Geographical Information Retrieval
>> (GIR) and I'm using Solr to create a little geospatial search engine.
>> Reading  papers about GIR I noticed that these systems use a separate data
>> structure (like an R-tree http://it.wikipedia.org/wiki/R-tree) to save
>> geographical coordinates of documents, but I have found nothing about how
>> Solr manages coordinates.
>>
>> Can someone help me, and most of all, can someone address me to documents
>> that talk about how and where Solr saves spatial informations?
>>
>> Thank you in advance
>> Matteo
>>
>
>


Re: Solr geospatial index?

2015-01-10 Thread david.w.smi...@gmail.com
Hello Matteo,

Welcome. You are not bothering/me-us; you are asking in the right place.

Jack’s right in terms of the field type dictating how it works.

LatLonType, simply stores the latitude and longitude internally as separate
floating point fields and it does efficient range queries over them for
bounding-box queries.  Lucene has remarkably fast/efficient range queries
over numbers based on a Trie/PrefixTree. In fact systems like TitanDB leave
such queries to Lucene.  For point-radius, it iterates over all of them
in-memory in a brute-force fashion (not scalable but may be fine).

BBoxField is similar in spirit to LatLonType; each side of an indexed
rectangle gets its own floating point field internally.

Note that for both listed above, the underlying storage and range queries
use built-in numeric fields.

SpatialRecursivePrefixTreeFieldType (RPT for short) is interesting in that
it supports indexing essentially any shape by representing the indexed
shape as multiple grid squares.  Non-point shapes (e.g. a polygon) are
approximated; if you need accuracy, you should additionally store the
vector geometry and validate the results in a 2nd pass (see
SerializedDVStrategy for help with that).  RPT, like Lucene’s numeric
fields, uses a Trie/PrefixTree but encodes two dimensions, not one.

The Trie/PrefixTree concept underlies both RPT and numeric fields, which
are approaches to using Lucene’s terms index to encode prefixes.  So the
big point here is that Lucene/Solr doesn’t have side indexes using
fundamentally different technologies for different types of data; no;
Lucene’s one versatile index looks up terms (for keyword search), numbers,
AND 2-d spatial.  For keyword search, the term is a word, for numbers, the
term represents a contiguous range of values (e.g. 100-200), and for 2-d
spatial, a term is a grid square (a 2-D range).

I am aware many other DBs put spatial data in R-Trees, and I have no
interest investing energy in doing that in Lucene.  That isn’t to say I
think that other DBs shouldn’t be using R-Trees.  I think a system based on
sorted keys/terms (like Lucene and Cassandra, Accumulo, HBase, and others)
already have a powerful/versatile index such that it doesn’t warrant
complexity in adding something different.  And Lucene’s underlying index
continues to improve.  I am most excited about an “auto-prefixing”
technique McCandless has been working on that will bring performance up to
the next level for numeric & spatial data in Lucene’s index.

If you’d like to learn more about RPT and Lucene/Solr spatial, I suggest my
“Spatial Deep Dive” presentation at Lucene Revolution in San Diego, May
2013:  Lucene / Solr 4 Spatial Deep Dive

Also, my article here illustrates some RPT concepts in terms of indexing:
http://opensourceconnections.com/blog/2014/04/11/indexing-polygons-in-lucene-with-accuracy/

~ David Smiley
Freelance Apache Lucene/Solr Search Consultant/Developer
http://www.linkedin.com/in/davidwsmiley

On Sat, Jan 10, 2015 at 10:26 AM, Matteo Tarantino <
matteo.tarant...@gmail.com> wrote:

> Hi all,
> I hope to not bother you, but I think I'm writing to the only mailing list
> that can help me with my question.
>
> I am writing my master thesis about Geographical Information Retrieval
> (GIR) and I'm using Solr to create a little geospatial search engine.
> Reading  papers about GIR I noticed that these systems use a separate data
> structure (like an R-tree http://it.wikipedia.org/wiki/R-tree) to save
> geographical coordinates of documents, but I have found nothing about how
> Solr manages coordinates.
>
> Can someone help me, and most of all, can someone address me to documents
> that talk about how and where Solr saves spatial informations?
>
> Thank you in advance
> Matteo
>


Re: Solr geospatial index?

2015-01-10 Thread Jack Krupansky
Every field has its own index based of the type of the field.

-- Jack Krupansky

On Sat, Jan 10, 2015 at 11:25 AM, Matteo Tarantino <
matteo.tarant...@gmail.com> wrote:

> Thank you for your reply,
> I have read the documentation, but I still don't understand if Solr
> creates or not two different indexes, one for the text of the documents and
> one for the geographic information of the document (something like this:
> http://imgur.com/E0R3alo )
>
> 2015-01-10 17:03 GMT+01:00 Jack Krupansky :
>
>> See the Solr reference guide section on "Spatial Search":
>> https://cwiki.apache.org/confluence/display/solr/Spatial+Search
>>
>> -- Jack Krupansky
>>
>> On Sat, Jan 10, 2015 at 10:26 AM, Matteo Tarantino <
>> matteo.tarant...@gmail.com> wrote:
>>
>>> Hi all,
>>> I hope to not bother you, but I think I'm writing to the only mailing
>>> list that can help me with my question.
>>>
>>> I am writing my master thesis about Geographical Information Retrieval
>>> (GIR) and I'm using Solr to create a little geospatial search engine.
>>> Reading  papers about GIR I noticed that these systems use a separate data
>>> structure (like an R-tree http://it.wikipedia.org/wiki/R-tree) to save
>>> geographical coordinates of documents, but I have found nothing about how
>>> Solr manages coordinates.
>>>
>>> Can someone help me, and most of all, can someone address me to
>>> documents that talk about how and where Solr saves spatial informations?
>>>
>>> Thank you in advance
>>> Matteo
>>>
>>
>>
>


Re: Solr geospatial index?

2015-01-10 Thread Matteo Tarantino
Thank you for your reply,
I have read the documentation, but I still don't understand if Solr creates
or not two different indexes, one for the text of the documents and one for
the geographic information of the document (something like this:
http://imgur.com/E0R3alo )

2015-01-10 17:03 GMT+01:00 Jack Krupansky :

> See the Solr reference guide section on "Spatial Search":
> https://cwiki.apache.org/confluence/display/solr/Spatial+Search
>
> -- Jack Krupansky
>
> On Sat, Jan 10, 2015 at 10:26 AM, Matteo Tarantino <
> matteo.tarant...@gmail.com> wrote:
>
>> Hi all,
>> I hope to not bother you, but I think I'm writing to the only mailing
>> list that can help me with my question.
>>
>> I am writing my master thesis about Geographical Information Retrieval
>> (GIR) and I'm using Solr to create a little geospatial search engine.
>> Reading  papers about GIR I noticed that these systems use a separate data
>> structure (like an R-tree http://it.wikipedia.org/wiki/R-tree) to save
>> geographical coordinates of documents, but I have found nothing about how
>> Solr manages coordinates.
>>
>> Can someone help me, and most of all, can someone address me to documents
>> that talk about how and where Solr saves spatial informations?
>>
>> Thank you in advance
>> Matteo
>>
>
>


Re: Solr geospatial index?

2015-01-10 Thread Jack Krupansky
See the Solr reference guide section on "Spatial Search":
https://cwiki.apache.org/confluence/display/solr/Spatial+Search

-- Jack Krupansky

On Sat, Jan 10, 2015 at 10:26 AM, Matteo Tarantino <
matteo.tarant...@gmail.com> wrote:

> Hi all,
> I hope to not bother you, but I think I'm writing to the only mailing list
> that can help me with my question.
>
> I am writing my master thesis about Geographical Information Retrieval
> (GIR) and I'm using Solr to create a little geospatial search engine.
> Reading  papers about GIR I noticed that these systems use a separate data
> structure (like an R-tree http://it.wikipedia.org/wiki/R-tree) to save
> geographical coordinates of documents, but I have found nothing about how
> Solr manages coordinates.
>
> Can someone help me, and most of all, can someone address me to documents
> that talk about how and where Solr saves spatial informations?
>
> Thank you in advance
> Matteo
>


Solr geospatial index?

2015-01-10 Thread Matteo Tarantino
Hi all,
I hope to not bother you, but I think I'm writing to the only mailing list
that can help me with my question.

I am writing my master thesis about Geographical Information Retrieval
(GIR) and I'm using Solr to create a little geospatial search engine.
Reading  papers about GIR I noticed that these systems use a separate data
structure (like an R-tree http://it.wikipedia.org/wiki/R-tree) to save
geographical coordinates of documents, but I have found nothing about how
Solr manages coordinates.

Can someone help me, and most of all, can someone address me to documents
that talk about how and where Solr saves spatial informations?

Thank you in advance
Matteo