Re: custom ValueSource for decoding geohash into lat lon

2011-03-11 Thread Smiley, David W.

On Mar 10, 2011, at 6:21 PM, William Bell wrote:

 OK. But I am concerned that you are trying to bite off more than can
 be done easily. The sample call is:
 
 http://localhost:8983/solr/select?q=*:*fq={!geofilt}sfieldmulti=storemvpt=43.17614,-90.57341d=100sfield=storesort=geomultidist%28%29%20ascsfieldmultidir=asc
 
 Notice that geomultidist() needs another field called storemv right
 now that is bar delimited. I tried to pull out the lat,long from
 geohash, but Dave stores the geohash values in Ngram for the purpose
 of filtering (I believe).

yep.  The field cache loader would have to filter out the grams not at full 
length.  Pretty easy.

 Here are the issues as I see them:
 
 1. ValueSources does not support MultiValue fields.
...
Technically it does.  A ValueSource's job seems to simply be to give access to 
abstract DocValues.java, which has methods like double doubleVal(int doc) but 
also void doubleVal(int doc, double[] vals), vals being an output-parameter.. 
Current use-cases assume a fixed number of values per document, not a variable 
number which is what I want. But I suppose there's nothing stopping me from 
using it for variable length values.  Of course the caller would have to know 
that.  It's a bit unfortunate that the signature of these methods don't return 
the array either since the caller doesn't know how big to make the array if 
it's variable length.  And again, I suppose there's nothing stopping me from 
adding a different method that works the way I want to. The only consumer of 
this Values/DocValues would be a special function query of my design so it's 
safe.

 2. Using ValueSource with one value is fast, and splitting it this way
 might be a lot slower to calculate distances. It is convenient, but
 could be slow. It might be better to just have solr.GeoHashField
 append to the interanal field so that it can use ValueSource directly.
 
 Use an internal field that uses bars internally:
 
 store_lat_long_bar =  39.90923,-86.19389|42.37577,-72.50858
 
 For each lat,long value
- Calculate geohash and Ngram store
- Append to the internal field store_lat_long_bar based on the field name
 
 Option 2 is easier and makes it supportable now without waiting for
 redesign of ValueSource.

As I suggest above, I'm not sure I really need to wait for some redesign. I 
could just add the methods I want in my DocValues subclass for use by my 
spatial function query.

~ David Smiley
Author: http://www.packtpub.com/solr-1-4-enterprise-search-server/





-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: custom ValueSource for decoding geohash into lat lon

2011-03-11 Thread Smiley, David W.
On Mar 10, 2011, at 6:21 PM, William Bell wrote:

 1. ValueSources does not support MultiValue fields. 

I think the problem isn't ValueSources, it's the FieldCache.  The FieldCache is 
fundamentally very limited to one indexed primitive value per document. I took 
a look at UninvertedField but that appears to be tied to faceting and it's not 
sufficiently flexible any way. I think I need to do, as UninvertedField does, 
create a cache registered in solrconfig.xml.  The other tricky bit is somehow 
accessing it.  I think I figured it out. In my field type's 
getValueSource(SchemaField field, QParser parser), the parser is a 
FunctionQParser implementation, which has access to SolrQueryRequest, which has 
access to SolrIndexSearcher, which allows me to lookup the cache by the name I 
choose.  That's quite a chain of indirection that took time to track down; I 
nearly gave up :-).

~ David
-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: custom ValueSource for decoding geohash into lat lon

2011-03-11 Thread Bill Bell
Cool.

I am definitely looking forward to that!!



On 3/11/11 3:25 PM, Smiley, David W. dsmi...@mitre.org wrote:

On Mar 10, 2011, at 6:21 PM, William Bell wrote:

 1. ValueSources does not support MultiValue fields.

I think the problem isn't ValueSources, it's the FieldCache.  The
FieldCache is fundamentally very limited to one indexed primitive value
per document. I took a look at UninvertedField but that appears to be
tied to faceting and it's not sufficiently flexible any way. I think I
need to do, as UninvertedField does, create a cache registered in
solrconfig.xml.  The other tricky bit is somehow accessing it.  I think I
figured it out. In my field type's getValueSource(SchemaField field,
QParser parser), the parser is a FunctionQParser implementation, which
has access to SolrQueryRequest, which has access to SolrIndexSearcher,
which allows me to lookup the cache by the name I choose.  That's quite a
chain of indirection that took time to track down; I nearly gave up :-).

~ David
-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: custom ValueSource for decoding geohash into lat lon

2011-03-11 Thread Ryan McKinley
Rather then use the FieldCache, you may consider a
WeakHashMapIndexReader,YourObject  solr uses this and the internals
of FieldCache are implemented like this.  Long term, I want to see the
FieldCache moved to a map directly on the IndexReader (LUCENE-2665 but
that has a ways to go)



On Fri, Mar 11, 2011 at 5:25 PM, Smiley, David W. dsmi...@mitre.org wrote:
 On Mar 10, 2011, at 6:21 PM, William Bell wrote:

 1. ValueSources does not support MultiValue fields.

 I think the problem isn't ValueSources, it's the FieldCache.  The FieldCache 
 is fundamentally very limited to one indexed primitive value per document. I 
 took a look at UninvertedField but that appears to be tied to faceting and 
 it's not sufficiently flexible any way. I think I need to do, as 
 UninvertedField does, create a cache registered in solrconfig.xml.  The other 
 tricky bit is somehow accessing it.  I think I figured it out. In my field 
 type's getValueSource(SchemaField field, QParser parser), the parser is a 
 FunctionQParser implementation, which has access to SolrQueryRequest, which 
 has access to SolrIndexSearcher, which allows me to lookup the cache by the 
 name I choose.  That's quite a chain of indirection that took time to track 
 down; I nearly gave up :-).

 ~ David
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



custom ValueSource for decoding geohash into lat lon

2011-03-10 Thread Smiley, David W.
I'm looking for validation of my approach to geospatial sorting from committers.

I'm starting work on implementing sorting for my geohash based filter code in 
https://issues.apache.org/jira/browse/SOLR-2155  The existing 
GeohashHaversineFunction uses ValueSources based on the the natural string 
value in the index, StrFieldSource, and it decodes them each pass through.  
This is obviously sub-optimal.  So I think a remedy is to implement my own 
ValueSource extending MultiValueSource that will decode the geohash into a pair 
of doubles on initialization.  It would do this using a CachedArrayCreator 
implementation of my design.  I don't think I can/should use VectorValueSource 
since that one is predicated on being composed of multiple other value sources 
which is not my scenario.  Unfortunately my proposed ValueSource subclass 
cannot simultaneously subclass both MultiValueSource and FieldCacheSource but 
the latter doesn't appear to really be necessary.  Actually I'm surprised 
MultiValueSource isn't an interface since it only has an abstract method.

Another aspect to this problem is that geohashes support multiple points per 
document.  I intend to subclass DocValues() with a method that will return an 
array of simple objects holding the pair.  If someone has hints as to some 
issues/problems with this approach then please let me know.

Bill Bell, if you're reading this, I know you did a patch attached to SOLR-2155 
for sorting but it uses separate fields to hold the lat  lon for sorting and 
I'm trying to fix this.

~ David Smiley
-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: custom ValueSource for decoding geohash into lat lon

2011-03-10 Thread William Bell
OK. But I am concerned that you are trying to bite off more than can
be done easily. The sample call is:

http://localhost:8983/solr/select?q=*:*fq={!geofilt}sfieldmulti=storemvpt=43.17614,-90.57341d=100sfield=storesort=geomultidist%28%29%20ascsfieldmultidir=asc

Notice that geomultidist() needs another field called storemv right
now that is bar delimited. I tried to pull out the lat,long from
geohash, but Dave stores the geohash values in Ngram for the purpose
of filtering (I believe).

 Here are the issues as I see them:

1. ValueSources does not support MultiValue fields. The PointType.java
would be extended or solr.GeoHashField would be extended to support
this. The way the spatial stuff works is that the Lat, Long is created
as separate fields and then the fc can get the matches values from the
cache. I think this is totally sub-optimal. There should be a generic
ValueSource that works on MultiValue fields directly and generically.
It would be extremely useful if this all happened behind the scenes:

field name=store_lat_lon type=geohash indexed=true
stored=true multiValues=true /

My code is dependent on the following to do distance quickly (using
ValueSource):
arr name=storemv
str 39.90923,-86.19389|42.37577,-72.50858/str
/arr

storing: 39.90923,-86.19389 and 42.37577,-72.50858

Internally it could be stored as (and converted using the type):

store_lat_long_num=2
store_lat_long_lat_1 = 39.90923
store_lat_long_lon_1 = -86.19389
store_lat_long_lon_2 = 42.37577
store_lat_long_lon_2 = -72.50858

2. Using ValueSource with one value is fast, and splitting it this way
might be a lot slower to calculate distances. It is convenient, but
could be slow. It might be better to just have solr.GeoHashField
append to the interanal field so that it can use ValueSource directly.

Use an internal field that uses bars internally:

store_lat_long_bar =  39.90923,-86.19389|42.37577,-72.50858

For each lat,long value
- Calculate geohash and Ngram store
- Append to the internal field store_lat_long_bar based on the field name

Option 2 is easier and makes it supportable now without waiting for
redesign of ValueSource.



On Thu, Mar 10, 2011 at 2:16 PM, Smiley, David W. dsmi...@mitre.org wrote:
 I'm looking for validation of my approach to geospatial sorting from 
 committers.

 I'm starting work on implementing sorting for my geohash based filter code in 
 https://issues.apache.org/jira/browse/SOLR-2155  The existing 
 GeohashHaversineFunction uses ValueSources based on the the natural string 
 value in the index, StrFieldSource, and it decodes them each pass through.  
 This is obviously sub-optimal.  So I think a remedy is to implement my own 
 ValueSource extending MultiValueSource that will decode the geohash into a 
 pair of doubles on initialization.  It would do this using a 
 CachedArrayCreator implementation of my design.  I don't think I can/should 
 use VectorValueSource since that one is predicated on being composed of 
 multiple other value sources which is not my scenario.  Unfortunately my 
 proposed ValueSource subclass cannot simultaneously subclass both 
 MultiValueSource and FieldCacheSource but the latter doesn't appear to really 
 be necessary.  Actually I'm surprised MultiValueSource isn't an interface 
 since it only has an abstract method.

 Another aspect to this problem is that geohashes support multiple points per 
 document.  I intend to subclass DocValues() with a method that will return an 
 array of simple objects holding the pair.  If someone has hints as to some 
 issues/problems with this approach then please let me know.

 Bill Bell, if you're reading this, I know you did a patch attached to 
 SOLR-2155 for sorting but it uses separate fields to hold the lat  lon for 
 sorting and I'm trying to fix this.

 ~ David Smiley
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org