Re: range types in SOLR
David, thanks, looking forward to LUCENE-5648. I added a comment about supporting BC dates. We currently use the spatial support to index date ranges with a precision of one day, ranging from year - to . Just for the record, I had some issues converting bounding box Intersects queries to polygons with Solr 4.6.1. Polygon version found way more results than it should have. I upgraded to 4.8.0 (and to JTS 1.13 from 1.12), and now the results are correct. --Ere 6.5.2014 21.26, david.w.smi...@gmail.com kirjoitti: Hi Era, I appreciate the scattered documentation is confusing for users. The use of spatial for time durations is definitely not an official way to do it; it’s clearly a hack/trick — one that works pretty well if you know the issues to watch out for. So I don’t see it getting documented on the reference guide. But, you should be happy to know about this: https://issues.apache.org/jira/browse/LUCENE-5648 “Watch” that issue to stay abreast of my development on it, and the inevitable Solr FieldType to follow, and inevitable documentation in the reference guide. With luck it’ll get in by 4.9. The “Intersects(POLYGON(…))” syntax is something I suggest using when you have to — like when you have a polygon or linestring or if you are indexing circles. One of these days there will be a more Solr friendly query parser — definitely for 4.something. When that happens, it’ll get deprecated/removed in trunk/5. ~ David On Tue, May 6, 2014 at 4:22 AM, Ere Maijala ere.maij...@helsinki.fi wrote: David, I made a note about your mentioning the deprecation below to take it into account in our software, but now that I tried to find out more about this I ran into some confusion since the Solr documentation regarding spatial searches is currently quite badly scattered and partly obsolete [1]. I'd appreciate some clarification on what exactly is deprecated. We're currently using spatial for both time duration and geographic searches, and in the latter we also use e.g. Intersects(POLYGON(...)) in addition. Is this also deprecated and if so, how should I rewrite it? Thanks! --Ere [1] It would be really nice if it was possible to find up to date documentation of at least all this in one place: https://cwiki.apache.org/confluence/display/solr/Spatial+Search https://wiki.apache.org/solr/SolrAdaptersForLuceneSpatial4 http://wiki.apache.org/solr/SpatialForTimeDurations https://people.apache.org/~hossman/spatial-for-non- spatial-meetup-20130117/ http://mail-archives.apache.org/mod_mbox/lucene-solr-user/ 201212.mbox/%3c1355027722156-4025434.p...@n3.nabble.com%3E 3.3.2014 20.12, Smiley, David W. kirjoitti: The main reference for this approach is here: http://wiki.apache.org/solr/SpatialForTimeDurations Hoss’s illustrations he developed for the meetup presentation are great. However, there are bugs in the instruction — specifically it’s important to slightly buffer the query and choose an appropriate maxDistErr. Also, it’s more preferable to use the rectangle range query style of spatial query (e.g. field:[“minX minY” TO “maxX maxY”] as opposed to using “Intersects(minX minY maxX maxY)”. There’s no technical difference but the latter is deprecated and will eventually be removed from Solr 5 / trunk. All this said, recognize this is a bit of a hack (one that works well). There is a good chance a more ideal implementation approach is going to be developed this year. ~ David On 3/1/14, 2:54 PM, Shawn Heisey s...@elyograg.org wrote: On 3/1/2014 11:41 AM, Thomas Scheffler wrote: Am 01.03.14 18:24, schrieb Erick Erickson: I'm not clear what you're really after here. Solr certainly supports ranges, things like time:[* TO date_spec] or date_field:[date_spec TO date_spec] etc. There's also a really creative use of spatial (of all things) to, say answer questions involving multiple dates per record. Imagine, for instance, employees with different hours on different days. You can use spatial to answer questions like which employees are available on Wednesday between 4PM and 8PM. And if none of this is relevant, how about you give us some use-cases? This could well be an XY problem. Hi, lets try this example to show the problem. You have some old text that was written in two periods of time: 1.) 2nd half of 13th century: - 1250-1299 2.) Beginning of 18th century: - 1700-1715 You are searching for text that were written between 1300-1699, than this document described above should not be hit. If you make start date and end date multiple this results in: start: [1250, 1700] end: [1299, 1715] A search for documents written between 1300-1699 would be: (+start:[1300 TO 1699] +end:[1300-1699]) (+start:[* TO 1300] +end:[1300 TO *]) (+start:[*-1699] +end:[1700 TO *]) You see that the document above would obviously hit by (+start:[* TO 1300] +end:[1300 TO *]) This sounds exactly like the spatial use case that Erick just described. http://wiki.apache.org/solr/SpatialForTimeDurations
Re: range types in SOLR
David, I made a note about your mentioning the deprecation below to take it into account in our software, but now that I tried to find out more about this I ran into some confusion since the Solr documentation regarding spatial searches is currently quite badly scattered and partly obsolete [1]. I'd appreciate some clarification on what exactly is deprecated. We're currently using spatial for both time duration and geographic searches, and in the latter we also use e.g. Intersects(POLYGON(...)) in addition. Is this also deprecated and if so, how should I rewrite it? Thanks! --Ere [1] It would be really nice if it was possible to find up to date documentation of at least all this in one place: https://cwiki.apache.org/confluence/display/solr/Spatial+Search https://wiki.apache.org/solr/SolrAdaptersForLuceneSpatial4 http://wiki.apache.org/solr/SpatialForTimeDurations https://people.apache.org/~hossman/spatial-for-non-spatial-meetup-20130117/ http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201212.mbox/%3c1355027722156-4025434.p...@n3.nabble.com%3E 3.3.2014 20.12, Smiley, David W. kirjoitti: The main reference for this approach is here: http://wiki.apache.org/solr/SpatialForTimeDurations Hoss’s illustrations he developed for the meetup presentation are great. However, there are bugs in the instruction — specifically it’s important to slightly buffer the query and choose an appropriate maxDistErr. Also, it’s more preferable to use the rectangle range query style of spatial query (e.g. field:[“minX minY” TO “maxX maxY”] as opposed to using “Intersects(minX minY maxX maxY)”. There’s no technical difference but the latter is deprecated and will eventually be removed from Solr 5 / trunk. All this said, recognize this is a bit of a hack (one that works well). There is a good chance a more ideal implementation approach is going to be developed this year. ~ David On 3/1/14, 2:54 PM, Shawn Heisey s...@elyograg.org wrote: On 3/1/2014 11:41 AM, Thomas Scheffler wrote: Am 01.03.14 18:24, schrieb Erick Erickson: I'm not clear what you're really after here. Solr certainly supports ranges, things like time:[* TO date_spec] or date_field:[date_spec TO date_spec] etc. There's also a really creative use of spatial (of all things) to, say answer questions involving multiple dates per record. Imagine, for instance, employees with different hours on different days. You can use spatial to answer questions like which employees are available on Wednesday between 4PM and 8PM. And if none of this is relevant, how about you give us some use-cases? This could well be an XY problem. Hi, lets try this example to show the problem. You have some old text that was written in two periods of time: 1.) 2nd half of 13th century: - 1250-1299 2.) Beginning of 18th century: - 1700-1715 You are searching for text that were written between 1300-1699, than this document described above should not be hit. If you make start date and end date multiple this results in: start: [1250, 1700] end: [1299, 1715] A search for documents written between 1300-1699 would be: (+start:[1300 TO 1699] +end:[1300-1699]) (+start:[* TO 1300] +end:[1300 TO *]) (+start:[*-1699] +end:[1700 TO *]) You see that the document above would obviously hit by (+start:[* TO 1300] +end:[1300 TO *]) This sounds exactly like the spatial use case that Erick just described. http://wiki.apache.org/solr/SpatialForTimeDurations https://people.apache.org/~hossman/spatial-for-non-spatial-meetup-20130117 / I am not sure whether the following presentation covers time series with spatial, but it does say deep dive. It's over an hour long, and done by David Smiley, who wrote most of the Spatial code in Solr: http://www.lucenerevolution.org/2013/Lucene-Solr4-Spatial-Deep-Dive Hopefully someone who has actually used this can hop in and give you some additional pointers. Thanks, Shawn -- Ere Maijala Kansalliskirjasto / The National Library of Finland
Re: range types in SOLR
Hi Era, I appreciate the scattered documentation is confusing for users. The use of spatial for time durations is definitely not an official way to do it; it’s clearly a hack/trick — one that works pretty well if you know the issues to watch out for. So I don’t see it getting documented on the reference guide. But, you should be happy to know about this: https://issues.apache.org/jira/browse/LUCENE-5648 “Watch” that issue to stay abreast of my development on it, and the inevitable Solr FieldType to follow, and inevitable documentation in the reference guide. With luck it’ll get in by 4.9. The “Intersects(POLYGON(…))” syntax is something I suggest using when you have to — like when you have a polygon or linestring or if you are indexing circles. One of these days there will be a more Solr friendly query parser — definitely for 4.something. When that happens, it’ll get deprecated/removed in trunk/5. ~ David On Tue, May 6, 2014 at 4:22 AM, Ere Maijala ere.maij...@helsinki.fi wrote: David, I made a note about your mentioning the deprecation below to take it into account in our software, but now that I tried to find out more about this I ran into some confusion since the Solr documentation regarding spatial searches is currently quite badly scattered and partly obsolete [1]. I'd appreciate some clarification on what exactly is deprecated. We're currently using spatial for both time duration and geographic searches, and in the latter we also use e.g. Intersects(POLYGON(...)) in addition. Is this also deprecated and if so, how should I rewrite it? Thanks! --Ere [1] It would be really nice if it was possible to find up to date documentation of at least all this in one place: https://cwiki.apache.org/confluence/display/solr/Spatial+Search https://wiki.apache.org/solr/SolrAdaptersForLuceneSpatial4 http://wiki.apache.org/solr/SpatialForTimeDurations https://people.apache.org/~hossman/spatial-for-non- spatial-meetup-20130117/ http://mail-archives.apache.org/mod_mbox/lucene-solr-user/ 201212.mbox/%3c1355027722156-4025434.p...@n3.nabble.com%3E 3.3.2014 20.12, Smiley, David W. kirjoitti: The main reference for this approach is here: http://wiki.apache.org/solr/SpatialForTimeDurations Hoss’s illustrations he developed for the meetup presentation are great. However, there are bugs in the instruction — specifically it’s important to slightly buffer the query and choose an appropriate maxDistErr. Also, it’s more preferable to use the rectangle range query style of spatial query (e.g. field:[“minX minY” TO “maxX maxY”] as opposed to using “Intersects(minX minY maxX maxY)”. There’s no technical difference but the latter is deprecated and will eventually be removed from Solr 5 / trunk. All this said, recognize this is a bit of a hack (one that works well). There is a good chance a more ideal implementation approach is going to be developed this year. ~ David On 3/1/14, 2:54 PM, Shawn Heisey s...@elyograg.org wrote: On 3/1/2014 11:41 AM, Thomas Scheffler wrote: Am 01.03.14 18:24, schrieb Erick Erickson: I'm not clear what you're really after here. Solr certainly supports ranges, things like time:[* TO date_spec] or date_field:[date_spec TO date_spec] etc. There's also a really creative use of spatial (of all things) to, say answer questions involving multiple dates per record. Imagine, for instance, employees with different hours on different days. You can use spatial to answer questions like which employees are available on Wednesday between 4PM and 8PM. And if none of this is relevant, how about you give us some use-cases? This could well be an XY problem. Hi, lets try this example to show the problem. You have some old text that was written in two periods of time: 1.) 2nd half of 13th century: - 1250-1299 2.) Beginning of 18th century: - 1700-1715 You are searching for text that were written between 1300-1699, than this document described above should not be hit. If you make start date and end date multiple this results in: start: [1250, 1700] end: [1299, 1715] A search for documents written between 1300-1699 would be: (+start:[1300 TO 1699] +end:[1300-1699]) (+start:[* TO 1300] +end:[1300 TO *]) (+start:[*-1699] +end:[1700 TO *]) You see that the document above would obviously hit by (+start:[* TO 1300] +end:[1300 TO *]) This sounds exactly like the spatial use case that Erick just described. http://wiki.apache.org/solr/SpatialForTimeDurations https://people.apache.org/~hossman/spatial-for-non- spatial-meetup-20130117 / I am not sure whether the following presentation covers time series with spatial, but it does say deep dive. It's over an hour long, and done by David Smiley, who wrote most of the Spatial code in Solr: http://www.lucenerevolution.org/2013/Lucene-Solr4-Spatial-Deep-Dive Hopefully someone who has actually used this can hop in and give you some additional pointers. Thanks, Shawn
Re: range types in SOLR
The main reference for this approach is here: http://wiki.apache.org/solr/SpatialForTimeDurations Hoss’s illustrations he developed for the meetup presentation are great. However, there are bugs in the instruction — specifically it’s important to slightly buffer the query and choose an appropriate maxDistErr. Also, it’s more preferable to use the rectangle range query style of spatial query (e.g. field:[“minX minY” TO “maxX maxY”] as opposed to using “Intersects(minX minY maxX maxY)”. There’s no technical difference but the latter is deprecated and will eventually be removed from Solr 5 / trunk. All this said, recognize this is a bit of a hack (one that works well). There is a good chance a more ideal implementation approach is going to be developed this year. ~ David On 3/1/14, 2:54 PM, Shawn Heisey s...@elyograg.org wrote: On 3/1/2014 11:41 AM, Thomas Scheffler wrote: Am 01.03.14 18:24, schrieb Erick Erickson: I'm not clear what you're really after here. Solr certainly supports ranges, things like time:[* TO date_spec] or date_field:[date_spec TO date_spec] etc. There's also a really creative use of spatial (of all things) to, say answer questions involving multiple dates per record. Imagine, for instance, employees with different hours on different days. You can use spatial to answer questions like which employees are available on Wednesday between 4PM and 8PM. And if none of this is relevant, how about you give us some use-cases? This could well be an XY problem. Hi, lets try this example to show the problem. You have some old text that was written in two periods of time: 1.) 2nd half of 13th century: - 1250-1299 2.) Beginning of 18th century: - 1700-1715 You are searching for text that were written between 1300-1699, than this document described above should not be hit. If you make start date and end date multiple this results in: start: [1250, 1700] end: [1299, 1715] A search for documents written between 1300-1699 would be: (+start:[1300 TO 1699] +end:[1300-1699]) (+start:[* TO 1300] +end:[1300 TO *]) (+start:[*-1699] +end:[1700 TO *]) You see that the document above would obviously hit by (+start:[* TO 1300] +end:[1300 TO *]) This sounds exactly like the spatial use case that Erick just described. http://wiki.apache.org/solr/SpatialForTimeDurations https://people.apache.org/~hossman/spatial-for-non-spatial-meetup-20130117 / I am not sure whether the following presentation covers time series with spatial, but it does say deep dive. It's over an hour long, and done by David Smiley, who wrote most of the Spatial code in Solr: http://www.lucenerevolution.org/2013/Lucene-Solr4-Spatial-Deep-Dive Hopefully someone who has actually used this can hop in and give you some additional pointers. Thanks, Shawn
Re: range types in SOLR
Am 03.03.2014 19:12, schrieb Smiley, David W.: The main reference for this approach is here: http://wiki.apache.org/solr/SpatialForTimeDurations Hoss’s illustrations he developed for the meetup presentation are great. However, there are bugs in the instruction — specifically it’s important to slightly buffer the query and choose an appropriate maxDistErr. Also, it’s more preferable to use the rectangle range query style of spatial query (e.g. field:[“minX minY” TO “maxX maxY”] as opposed to using “Intersects(minX minY maxX maxY)”. There’s no technical difference but the latter is deprecated and will eventually be removed from Solr 5 / trunk. All this said, recognize this is a bit of a hack (one that works well). There is a good chance a more ideal implementation approach is going to be developed this year. Thank you, having a working example is great but having a practically working example that hides this implementation detail would even better. I would like to store: 2014-03-04T07:05:12,345Z, 2014-03-04, 2014-03 and 2014 into one field and make queries on that field. Currently I have to normalize all to the first format (inventing information). That is only the worst approximation. Normalize them to a range would be the best in my opinion. So a query like date:2014 would hit all but also date:[2014-01 TO 2014-03]. kind regards, Thomas
range types in SOLR
Hi, I am in the need of range types in SOLR - similar to PostgreSQL: https://wiki.postgresql.org/images/7/73/Range-types-pgopen-2012.pdf My schema should allow approximate dates and queries on that. When having a single such date per document one can split this information into two separate fields. But this is not an option if the date is multiple. One have to to split the document into two ore more documents. I wonder if that has to be so complicated. Does somebody know if SOLR already supports range types? If not, how difficult would it be to implement? Is anybody in the need for range types, too? kind regards, Thomas
Re: range types in SOLR
I'm not clear what you're really after here. Solr certainly supports ranges, things like time:[* TO date_spec] or date_field:[date_spec TO date_spec] etc. There's also a really creative use of spatial (of all things) to, say answer questions involving multiple dates per record. Imagine, for instance, employees with different hours on different days. You can use spatial to answer questions like which employees are available on Wednesday between 4PM and 8PM. And if none of this is relevant, how about you give us some use-cases? This could well be an XY problem. Best, Erick On Sat, Mar 1, 2014 at 6:19 AM, Thomas Scheffler thomas.scheff...@uni-jena.de wrote: Hi, I am in the need of range types in SOLR - similar to PostgreSQL: https://wiki.postgresql.org/images/7/73/Range-types-pgopen-2012.pdf My schema should allow approximate dates and queries on that. When having a single such date per document one can split this information into two separate fields. But this is not an option if the date is multiple. One have to to split the document into two ore more documents. I wonder if that has to be so complicated. Does somebody know if SOLR already supports range types? If not, how difficult would it be to implement? Is anybody in the need for range types, too? kind regards, Thomas
Re: range types in SOLR
Am 01.03.14 18:24, schrieb Erick Erickson: I'm not clear what you're really after here. Solr certainly supports ranges, things like time:[* TO date_spec] or date_field:[date_spec TO date_spec] etc. There's also a really creative use of spatial (of all things) to, say answer questions involving multiple dates per record. Imagine, for instance, employees with different hours on different days. You can use spatial to answer questions like which employees are available on Wednesday between 4PM and 8PM. And if none of this is relevant, how about you give us some use-cases? This could well be an XY problem. Hi, lets try this example to show the problem. You have some old text that was written in two periods of time: 1.) 2nd half of 13th century: - 1250-1299 2.) Beginning of 18th century: - 1700-1715 You are searching for text that were written between 1300-1699, than this document described above should not be hit. If you make start date and end date multiple this results in: start: [1250, 1700] end: [1299, 1715] A search for documents written between 1300-1699 would be: (+start:[1300 TO 1699] +end:[1300-1699]) (+start:[* TO 1300] +end:[1300 TO *]) (+start:[*-1699] +end:[1700 TO *]) You see that the document above would obviously hit by (+start:[* TO 1300] +end:[1300 TO *]) Hope you see the problem. This problem is the same when ever you face multiple ranges in one field of a document. For every duplication you need to create a separate SOLR document. If you have two such fields in one document, field one with n values and field two with m values. You are forced to create m*n documents. This fact and the rather unhandy query (s.o.) are my motivation to ask for range types like in PostgreSQL where the problem is solved. regards, Thomas
Re: range types in SOLR
Looks like you might be able to use sub-documents (or whatever it is called in SOLR) for this; create the parent document without any dates, and a child document for each date range. On 01 Mar 2014, at 19:41 , Thomas Scheffler thomas.scheff...@uni-jena.de wrote: Am 01.03.14 18:24, schrieb Erick Erickson: I'm not clear what you're really after here. Solr certainly supports ranges, things like time:[* TO date_spec] or date_field:[date_spec TO date_spec] etc. There's also a really creative use of spatial (of all things) to, say answer questions involving multiple dates per record. Imagine, for instance, employees with different hours on different days. You can use spatial to answer questions like which employees are available on Wednesday between 4PM and 8PM. And if none of this is relevant, how about you give us some use-cases? This could well be an XY problem. Hi, lets try this example to show the problem. You have some old text that was written in two periods of time: 1.) 2nd half of 13th century: - 1250-1299 2.) Beginning of 18th century: - 1700-1715 You are searching for text that were written between 1300-1699, than this document described above should not be hit. If you make start date and end date multiple this results in: start: [1250, 1700] end: [1299, 1715] A search for documents written between 1300-1699 would be: (+start:[1300 TO 1699] +end:[1300-1699]) (+start:[* TO 1300] +end:[1300 TO *]) (+start:[*-1699] +end:[1700 TO *]) You see that the document above would obviously hit by (+start:[* TO 1300] +end:[1300 TO *]) Hope you see the problem. This problem is the same when ever you face multiple ranges in one field of a document. For every duplication you need to create a separate SOLR document. If you have two such fields in one document, field one with n values and field two with m values. You are forced to create m*n documents. This fact and the rather unhandy query (s.o.) are my motivation to ask for range types like in PostgreSQL where the problem is solved. regards, Thomas
Re: range types in SOLR
On 3/1/2014 11:41 AM, Thomas Scheffler wrote: Am 01.03.14 18:24, schrieb Erick Erickson: I'm not clear what you're really after here. Solr certainly supports ranges, things like time:[* TO date_spec] or date_field:[date_spec TO date_spec] etc. There's also a really creative use of spatial (of all things) to, say answer questions involving multiple dates per record. Imagine, for instance, employees with different hours on different days. You can use spatial to answer questions like which employees are available on Wednesday between 4PM and 8PM. And if none of this is relevant, how about you give us some use-cases? This could well be an XY problem. Hi, lets try this example to show the problem. You have some old text that was written in two periods of time: 1.) 2nd half of 13th century: - 1250-1299 2.) Beginning of 18th century: - 1700-1715 You are searching for text that were written between 1300-1699, than this document described above should not be hit. If you make start date and end date multiple this results in: start: [1250, 1700] end: [1299, 1715] A search for documents written between 1300-1699 would be: (+start:[1300 TO 1699] +end:[1300-1699]) (+start:[* TO 1300] +end:[1300 TO *]) (+start:[*-1699] +end:[1700 TO *]) You see that the document above would obviously hit by (+start:[* TO 1300] +end:[1300 TO *]) This sounds exactly like the spatial use case that Erick just described. http://wiki.apache.org/solr/SpatialForTimeDurations https://people.apache.org/~hossman/spatial-for-non-spatial-meetup-20130117/ I am not sure whether the following presentation covers time series with spatial, but it does say deep dive. It's over an hour long, and done by David Smiley, who wrote most of the Spatial code in Solr: http://www.lucenerevolution.org/2013/Lucene-Solr4-Spatial-Deep-Dive Hopefully someone who has actually used this can hop in and give you some additional pointers. Thanks, Shawn
Re: range types in SOLR
Right, thanks Shawn for looking up the URLs. That's exactly what I was thinking of. Erick On Sat, Mar 1, 2014 at 11:54 AM, Shawn Heisey s...@elyograg.org wrote: On 3/1/2014 11:41 AM, Thomas Scheffler wrote: Am 01.03.14 18:24, schrieb Erick Erickson: I'm not clear what you're really after here. Solr certainly supports ranges, things like time:[* TO date_spec] or date_field:[date_spec TO date_spec] etc. There's also a really creative use of spatial (of all things) to, say answer questions involving multiple dates per record. Imagine, for instance, employees with different hours on different days. You can use spatial to answer questions like which employees are available on Wednesday between 4PM and 8PM. And if none of this is relevant, how about you give us some use-cases? This could well be an XY problem. Hi, lets try this example to show the problem. You have some old text that was written in two periods of time: 1.) 2nd half of 13th century: - 1250-1299 2.) Beginning of 18th century: - 1700-1715 You are searching for text that were written between 1300-1699, than this document described above should not be hit. If you make start date and end date multiple this results in: start: [1250, 1700] end: [1299, 1715] A search for documents written between 1300-1699 would be: (+start:[1300 TO 1699] +end:[1300-1699]) (+start:[* TO 1300] +end:[1300 TO *]) (+start:[*-1699] +end:[1700 TO *]) You see that the document above would obviously hit by (+start:[* TO 1300] +end:[1300 TO *]) This sounds exactly like the spatial use case that Erick just described. http://wiki.apache.org/solr/SpatialForTimeDurations https://people.apache.org/~hossman/spatial-for-non-spatial-meetup-20130117/ I am not sure whether the following presentation covers time series with spatial, but it does say deep dive. It's over an hour long, and done by David Smiley, who wrote most of the Spatial code in Solr: http://www.lucenerevolution.org/2013/Lucene-Solr4-Spatial-Deep-Dive Hopefully someone who has actually used this can hop in and give you some additional pointers. Thanks, Shawn