Re: custom ValueSource for decoding geohash into lat lon
On Mar 10, 2011, at 6:21 PM, William Bell wrote: OK. But I am concerned that you are trying to bite off more than can be done easily. The sample call is: http://localhost:8983/solr/select?q=*:*fq={!geofilt}sfieldmulti=storemvpt=43.17614,-90.57341d=100sfield=storesort=geomultidist%28%29%20ascsfieldmultidir=asc Notice that geomultidist() needs another field called storemv right now that is bar delimited. I tried to pull out the lat,long from geohash, but Dave stores the geohash values in Ngram for the purpose of filtering (I believe). yep. The field cache loader would have to filter out the grams not at full length. Pretty easy. Here are the issues as I see them: 1. ValueSources does not support MultiValue fields. ... Technically it does. A ValueSource's job seems to simply be to give access to abstract DocValues.java, which has methods like double doubleVal(int doc) but also void doubleVal(int doc, double[] vals), vals being an output-parameter.. Current use-cases assume a fixed number of values per document, not a variable number which is what I want. But I suppose there's nothing stopping me from using it for variable length values. Of course the caller would have to know that. It's a bit unfortunate that the signature of these methods don't return the array either since the caller doesn't know how big to make the array if it's variable length. And again, I suppose there's nothing stopping me from adding a different method that works the way I want to. The only consumer of this Values/DocValues would be a special function query of my design so it's safe. 2. Using ValueSource with one value is fast, and splitting it this way might be a lot slower to calculate distances. It is convenient, but could be slow. It might be better to just have solr.GeoHashField append to the interanal field so that it can use ValueSource directly. Use an internal field that uses bars internally: store_lat_long_bar = 39.90923,-86.19389|42.37577,-72.50858 For each lat,long value - Calculate geohash and Ngram store - Append to the internal field store_lat_long_bar based on the field name Option 2 is easier and makes it supportable now without waiting for redesign of ValueSource. As I suggest above, I'm not sure I really need to wait for some redesign. I could just add the methods I want in my DocValues subclass for use by my spatial function query. ~ David Smiley Author: http://www.packtpub.com/solr-1-4-enterprise-search-server/ - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: custom ValueSource for decoding geohash into lat lon
On Mar 10, 2011, at 6:21 PM, William Bell wrote: 1. ValueSources does not support MultiValue fields. I think the problem isn't ValueSources, it's the FieldCache. The FieldCache is fundamentally very limited to one indexed primitive value per document. I took a look at UninvertedField but that appears to be tied to faceting and it's not sufficiently flexible any way. I think I need to do, as UninvertedField does, create a cache registered in solrconfig.xml. The other tricky bit is somehow accessing it. I think I figured it out. In my field type's getValueSource(SchemaField field, QParser parser), the parser is a FunctionQParser implementation, which has access to SolrQueryRequest, which has access to SolrIndexSearcher, which allows me to lookup the cache by the name I choose. That's quite a chain of indirection that took time to track down; I nearly gave up :-). ~ David - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: custom ValueSource for decoding geohash into lat lon
Cool. I am definitely looking forward to that!! On 3/11/11 3:25 PM, Smiley, David W. dsmi...@mitre.org wrote: On Mar 10, 2011, at 6:21 PM, William Bell wrote: 1. ValueSources does not support MultiValue fields. I think the problem isn't ValueSources, it's the FieldCache. The FieldCache is fundamentally very limited to one indexed primitive value per document. I took a look at UninvertedField but that appears to be tied to faceting and it's not sufficiently flexible any way. I think I need to do, as UninvertedField does, create a cache registered in solrconfig.xml. The other tricky bit is somehow accessing it. I think I figured it out. In my field type's getValueSource(SchemaField field, QParser parser), the parser is a FunctionQParser implementation, which has access to SolrQueryRequest, which has access to SolrIndexSearcher, which allows me to lookup the cache by the name I choose. That's quite a chain of indirection that took time to track down; I nearly gave up :-). ~ David - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: custom ValueSource for decoding geohash into lat lon
Rather then use the FieldCache, you may consider a WeakHashMapIndexReader,YourObject solr uses this and the internals of FieldCache are implemented like this. Long term, I want to see the FieldCache moved to a map directly on the IndexReader (LUCENE-2665 but that has a ways to go) On Fri, Mar 11, 2011 at 5:25 PM, Smiley, David W. dsmi...@mitre.org wrote: On Mar 10, 2011, at 6:21 PM, William Bell wrote: 1. ValueSources does not support MultiValue fields. I think the problem isn't ValueSources, it's the FieldCache. The FieldCache is fundamentally very limited to one indexed primitive value per document. I took a look at UninvertedField but that appears to be tied to faceting and it's not sufficiently flexible any way. I think I need to do, as UninvertedField does, create a cache registered in solrconfig.xml. The other tricky bit is somehow accessing it. I think I figured it out. In my field type's getValueSource(SchemaField field, QParser parser), the parser is a FunctionQParser implementation, which has access to SolrQueryRequest, which has access to SolrIndexSearcher, which allows me to lookup the cache by the name I choose. That's quite a chain of indirection that took time to track down; I nearly gave up :-). ~ David - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
custom ValueSource for decoding geohash into lat lon
I'm looking for validation of my approach to geospatial sorting from committers. I'm starting work on implementing sorting for my geohash based filter code in https://issues.apache.org/jira/browse/SOLR-2155 The existing GeohashHaversineFunction uses ValueSources based on the the natural string value in the index, StrFieldSource, and it decodes them each pass through. This is obviously sub-optimal. So I think a remedy is to implement my own ValueSource extending MultiValueSource that will decode the geohash into a pair of doubles on initialization. It would do this using a CachedArrayCreator implementation of my design. I don't think I can/should use VectorValueSource since that one is predicated on being composed of multiple other value sources which is not my scenario. Unfortunately my proposed ValueSource subclass cannot simultaneously subclass both MultiValueSource and FieldCacheSource but the latter doesn't appear to really be necessary. Actually I'm surprised MultiValueSource isn't an interface since it only has an abstract method. Another aspect to this problem is that geohashes support multiple points per document. I intend to subclass DocValues() with a method that will return an array of simple objects holding the pair. If someone has hints as to some issues/problems with this approach then please let me know. Bill Bell, if you're reading this, I know you did a patch attached to SOLR-2155 for sorting but it uses separate fields to hold the lat lon for sorting and I'm trying to fix this. ~ David Smiley - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: custom ValueSource for decoding geohash into lat lon
OK. But I am concerned that you are trying to bite off more than can be done easily. The sample call is: http://localhost:8983/solr/select?q=*:*fq={!geofilt}sfieldmulti=storemvpt=43.17614,-90.57341d=100sfield=storesort=geomultidist%28%29%20ascsfieldmultidir=asc Notice that geomultidist() needs another field called storemv right now that is bar delimited. I tried to pull out the lat,long from geohash, but Dave stores the geohash values in Ngram for the purpose of filtering (I believe). Here are the issues as I see them: 1. ValueSources does not support MultiValue fields. The PointType.java would be extended or solr.GeoHashField would be extended to support this. The way the spatial stuff works is that the Lat, Long is created as separate fields and then the fc can get the matches values from the cache. I think this is totally sub-optimal. There should be a generic ValueSource that works on MultiValue fields directly and generically. It would be extremely useful if this all happened behind the scenes: field name=store_lat_lon type=geohash indexed=true stored=true multiValues=true / My code is dependent on the following to do distance quickly (using ValueSource): arr name=storemv str 39.90923,-86.19389|42.37577,-72.50858/str /arr storing: 39.90923,-86.19389 and 42.37577,-72.50858 Internally it could be stored as (and converted using the type): store_lat_long_num=2 store_lat_long_lat_1 = 39.90923 store_lat_long_lon_1 = -86.19389 store_lat_long_lon_2 = 42.37577 store_lat_long_lon_2 = -72.50858 2. Using ValueSource with one value is fast, and splitting it this way might be a lot slower to calculate distances. It is convenient, but could be slow. It might be better to just have solr.GeoHashField append to the interanal field so that it can use ValueSource directly. Use an internal field that uses bars internally: store_lat_long_bar = 39.90923,-86.19389|42.37577,-72.50858 For each lat,long value - Calculate geohash and Ngram store - Append to the internal field store_lat_long_bar based on the field name Option 2 is easier and makes it supportable now without waiting for redesign of ValueSource. On Thu, Mar 10, 2011 at 2:16 PM, Smiley, David W. dsmi...@mitre.org wrote: I'm looking for validation of my approach to geospatial sorting from committers. I'm starting work on implementing sorting for my geohash based filter code in https://issues.apache.org/jira/browse/SOLR-2155 The existing GeohashHaversineFunction uses ValueSources based on the the natural string value in the index, StrFieldSource, and it decodes them each pass through. This is obviously sub-optimal. So I think a remedy is to implement my own ValueSource extending MultiValueSource that will decode the geohash into a pair of doubles on initialization. It would do this using a CachedArrayCreator implementation of my design. I don't think I can/should use VectorValueSource since that one is predicated on being composed of multiple other value sources which is not my scenario. Unfortunately my proposed ValueSource subclass cannot simultaneously subclass both MultiValueSource and FieldCacheSource but the latter doesn't appear to really be necessary. Actually I'm surprised MultiValueSource isn't an interface since it only has an abstract method. Another aspect to this problem is that geohashes support multiple points per document. I intend to subclass DocValues() with a method that will return an array of simple objects holding the pair. If someone has hints as to some issues/problems with this approach then please let me know. Bill Bell, if you're reading this, I know you did a patch attached to SOLR-2155 for sorting but it uses separate fields to hold the lat lon for sorting and I'm trying to fix this. ~ David Smiley - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org