Re: Tuning caching of geofilt queries
Chris's response is quite good, and I have a couple things to add: 1. Since you can tolerate 1km slop, try defining the dynamic field *_coordinate as tfloat instead of tdouble. This will halve your memory requirements, but I'm not sure if it will be any faster -- it's worth a shot since you've already indicated that your requirements don't call for a double. Information I've read vary on exactly what is the accuracy of float vs double but at a kilometer there's no question a double is overkill. 2. Try my Solr 3.x spatial plugin called SOLR-2155 at github: https://github.com/dsmiley/SOLR-2155 It is very fast at filtering (even for circles) as indicated in this stackoverflow thread: http://stackoverflow.com/questions/11636376/solr-performance-on-ec2-for-geospatial-queries in which it destroys LatLonType in a big data speed test :-D. You should be happy to know that this technology is on its way into Solr 4, albeit not quite yet. Cheers, ~ David Smiley - Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book -- View this message in context: http://lucene.472066.n3.nabble.com/Tuning-caching-of-geofilt-queries-tp3998975p4000525.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Tuning caching of geofilt queries
On Fri, Aug 10, 2012 at 1:47 PM, David Smiley (@MITRE.org) dsmi...@mitre.org wrote: Information I've read vary on exactly what is the accuracy of float vs double but at a kilometer there's no question a double is overkill. Back of the envelope: 23 mantissa bits + 1 implied bit == 24 effective mantissa bits in a 32 bit float. 40,000 km circumference / (2^24) = .0024 km (i.e. our resolution at the equator is 2.4m at best - there will be some lost unused space at the beginning and end of the +-180 number-line). Is that in line with what you've read? -Yonik http://lucidworks.com
Re: Tuning caching of geofilt queries
Yeah it is... I rather like this write-up: https://sites.google.com/site/trescopter/Home/concepts/required-precision-for-gps-calculations#TOC-Precision-of-Float-and-Double -- which also arrives at 2.37m worse case. Aside from RAM savings, I wonder if there is any noticeable performance difference for LatLonType. - Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book -- View this message in context: http://lucene.472066.n3.nabble.com/Tuning-caching-of-geofilt-queries-tp3998975p4000534.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Tuning caching of geofilt queries
In other computations I found exactly zero performance difference between floats doubles. Even with long arrays number which you would expect to be sensitive to locality effects. On Fri, Aug 10, 2012 at 11:20 AM, David Smiley (@MITRE.org) dsmi...@mitre.org wrote: Yeah it is... I rather like this write-up: https://sites.google.com/site/trescopter/Home/concepts/required-precision-for-gps-calculations#TOC-Precision-of-Float-and-Double -- which also arrives at 2.37m worse case. Aside from RAM savings, I wonder if there is any noticeable performance difference for LatLonType. - Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book -- View this message in context: http://lucene.472066.n3.nabble.com/Tuning-caching-of-geofilt-queries-tp3998975p4000534.html Sent from the Solr - User mailing list archive at Nabble.com. -- Lance Norskog goks...@gmail.com
Re: Tuning caching of geofilt queries
: My question is: Does it make sense to round these coordinates (a) while : indexing and/or (b) while querying to optimize cache hits? Our maximum : required resolution for geo queries is 1km and we can tolerate minor errors : so I could round to two decimal points for most of our queries. : fq=_query_:{!geofilt sfield=user.location_p pt=48.19815,16.3943 : d=50.0}sfield=user.location_ppt=48.1981,16.394 1) i don't see any reason for the _query_ hack ... this should be more efficient, and easier on the eyes... fq={!geofilt sfield=user.location_p pt=48.19815,16.3943 d=50.0} sfield=user.location_p pt=48.1981,16.394 2) as Erick mentioned, rounding will only do you good if you expect lots of queries from differnet users that when rounded, result in the same point 3) you might consider disabling the caching of your geofilt queries completley using the cache=false param. for {!geofilt} you should also be able to combine this with the cost localparm to take advantage of post-filtering, so that the distance calculations are only computed for documents that already match your query and other cached filters... http://wiki.apache.org/solr/CommonQueryParameters#Caching_of_filters http://searchhub.org/dev/2012/02/10/advanced-filter-caching-in-solr/ 4) something you also might wnat to consider (depending on your data and how much geo surface area you are dealing with) is along the lines of Erick's bounding box suggestion: use two filters; a course bounding box that you cache, and a precise geofilt using teh cache cost params mentioned in #3. that way you have a fininite number of bounding box filters that will be cached and help quickly prune the total result set down, and then only for the results inside that bounding box will the distance calculations for your {!geofilt} filter be applied. (just make sure your bounding boxes overlap by at least as much as the max radius you search on, or you migh miss results when your search point is close to the edge of your grid) -Hoss
Re: Tuning caching of geofilt queries
I don't think rounding will affect cache hits in either case _unless_ the input point for different queries can be very close to each other. Think of the filter cache as being composed of a map where the key is the (raw) filter query and the value is the set of documents in your corpus that satisfy it. So the only time rounding would help, is if it's likely that two users enter very similar points at query time, i.e. 89.1234 and 89.1236. If you're giving them a set of choices that are pre-defined (city center, say), then the values should be identical to all the decimal places so rounding doesn't do you much good. You say you can tolerate some slop, so using bounding box might speed up your queries... Best Erick On Fri, Aug 3, 2012 at 4:56 AM, Thomas Heigl tho...@umschalt.com wrote: Hey all, Our production system is heavily optimized for caching and nearly all parts of queries are satisfied by filter caches. The only filter that varies a lot from user to user is the location and distance. Currently we use the default location field type and index lat/long coordinates as we get them from Geonames and GMaps with varying decimal precision. My question is: Does it make sense to round these coordinates (a) while indexing and/or (b) while querying to optimize cache hits? Our maximum required resolution for geo queries is 1km and we can tolerate minor errors so I could round to two decimal points for most of our queries. E.g. Instead of querying like this fq=_query_:{!geofilt sfield=user.location_p pt=48.19815,16.3943 d=50.0}sfield=user.location_ppt=48.1981,16.394 we would round to fq=_query_:{!geofilt sfield=user.location_p pt=48.19,16.39 d=50.0}sfield=user.location_ppt=48.19,16.39 Any feedback would be greatly appreciated. Cheers, Thomas
Tuning caching of geofilt queries
Hey all, Our production system is heavily optimized for caching and nearly all parts of queries are satisfied by filter caches. The only filter that varies a lot from user to user is the location and distance. Currently we use the default location field type and index lat/long coordinates as we get them from Geonames and GMaps with varying decimal precision. My question is: Does it make sense to round these coordinates (a) while indexing and/or (b) while querying to optimize cache hits? Our maximum required resolution for geo queries is 1km and we can tolerate minor errors so I could round to two decimal points for most of our queries. E.g. Instead of querying like this fq=_query_:{!geofilt sfield=user.location_p pt=48.19815,16.3943 d=50.0}sfield=user.location_ppt=48.1981,16.394 we would round to fq=_query_:{!geofilt sfield=user.location_p pt=48.19,16.39 d=50.0}sfield=user.location_ppt=48.19,16.39 Any feedback would be greatly appreciated. Cheers, Thomas