Re: Custom Solr indexer/searcher

2012-11-19 Thread Smiley, David W.
FWIW I helped someone a few days ago about a similar problem and similarly 
advised modifying SpatialPrefixTree:
http://lucene.472066.n3.nabble.com/PointType-multivalued-query-tt4020445.html

IMO GeoHashField should be deprecated because it ads no value.

~ David

On Nov 16, 2012, at 1:49 PM, Scott Smith wrote:

 Thanks for the suggestions.  I'll take a look at these things.
 
 -Original Message-
 From: Mikhail Khludnev [mailto:mkhlud...@griddynamics.com] 
 Sent: Thursday, November 15, 2012 11:54 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Custom Solr indexer/searcher
 
 Scott,
 It sounds like you need to look into few samples of similar things in Lucene. 
 On top of my head FuzzyQuery from 4.0, which finds terms similar to the given 
 in FST for query expansion. Generic query expansion is done via 
 MultiTermQuery. Index time terms expansion is shown in TrieField and btw 
 NumericRangeQuery (it should match with your goal a lot). All these are 
 single dimension samples, but AFAIK KD-tree is multidimensional, look into 
 GeoHashField which puts two dimensional points into single terms with ability 
 to build ranges on them see GeoHashField.createSpatialQuery().
 
 Happy hacking!
 
 
 On Fri, Nov 16, 2012 at 10:34 AM, John Whelan whelanl...@gmail.com wrote:
 
 Scott,
 
 I probably have no idea as to what I'm saying, but if you're looking 
 for finding results in a N-dimensional space, you might look at 
 creating a field of type 'point'. Point-type fields have a dimension 
 attribute; I believe that it can be set to a large integer value.
 
 Barring that, there is also a 'dist()' function that can be used to 
 work with multiple numeric fields in order sort results based on 
 closeness to a desired coordinate. The 'dist function takes a 
 parameter to specify the means of calculating the distance. (For example, 2 
 - 'Euclidean distance'.
 I don't know the other options.)
 
 In the worst case, my response is worthless, but pops your question 
 back up in the e-mails...
 
 Regards,
 John
 
 
 
 
 --
 Sincerely yours
 Mikhail Khludnev
 Principal Engineer,
 Grid Dynamics
 
 http://www.griddynamics.com
 mkhlud...@griddynamics.com



RE: Custom Solr indexer/searcher

2012-11-16 Thread Scott Smith
Thanks for the suggestions.  I'll take a look at these things.

-Original Message-
From: Mikhail Khludnev [mailto:mkhlud...@griddynamics.com] 
Sent: Thursday, November 15, 2012 11:54 PM
To: solr-user@lucene.apache.org
Subject: Re: Custom Solr indexer/searcher

Scott,
It sounds like you need to look into few samples of similar things in Lucene. 
On top of my head FuzzyQuery from 4.0, which finds terms similar to the given 
in FST for query expansion. Generic query expansion is done via MultiTermQuery. 
Index time terms expansion is shown in TrieField and btw NumericRangeQuery (it 
should match with your goal a lot). All these are single dimension samples, but 
AFAIK KD-tree is multidimensional, look into GeoHashField which puts two 
dimensional points into single terms with ability to build ranges on them see 
GeoHashField.createSpatialQuery().

Happy hacking!


On Fri, Nov 16, 2012 at 10:34 AM, John Whelan whelanl...@gmail.com wrote:

 Scott,

 I probably have no idea as to what I'm saying, but if you're looking 
 for finding results in a N-dimensional space, you might look at 
 creating a field of type 'point'. Point-type fields have a dimension 
 attribute; I believe that it can be set to a large integer value.

 Barring that, there is also a 'dist()' function that can be used to 
 work with multiple numeric fields in order sort results based on 
 closeness to a desired coordinate. The 'dist function takes a 
 parameter to specify the means of calculating the distance. (For example, 2 
 - 'Euclidean distance'.
 I don't know the other options.)

 In the worst case, my response is worthless, but pops your question 
 back up in the e-mails...

 Regards,
 John




--
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

http://www.griddynamics.com
 mkhlud...@griddynamics.com


Re: Custom Solr indexer/searcher

2012-11-15 Thread John Whelan
Scott,

I probably have no idea as to what I'm saying, but if you're looking for
finding results in a N-dimensional space, you might look at creating a
field of type 'point'. Point-type fields have a dimension attribute; I
believe that it can be set to a large integer value.

Barring that, there is also a 'dist()' function that can be used to work
with multiple numeric fields in order sort results based on closeness to a
desired coordinate. The 'dist function takes a parameter to specify the
means of calculating the distance. (For example, 2 - 'Euclidean distance'.
I don't know the other options.)

In the worst case, my response is worthless, but pops your question back up
in the e-mails...

Regards,
John


Re: Custom Solr indexer/searcher

2012-11-15 Thread Mikhail Khludnev
Scott,
It sounds like you need to look into few samples of similar things in
Lucene. On top of my head FuzzyQuery from 4.0, which finds terms similar to
the given in FST for query expansion. Generic query expansion is done via
MultiTermQuery. Index time terms expansion is shown in TrieField and btw
NumericRangeQuery (it should match with your goal a lot). All these are
single dimension samples, but AFAIK KD-tree is multidimensional, look into
GeoHashField which puts two dimensional points into single terms with
ability to build ranges on them see GeoHashField.createSpatialQuery().

Happy hacking!


On Fri, Nov 16, 2012 at 10:34 AM, John Whelan whelanl...@gmail.com wrote:

 Scott,

 I probably have no idea as to what I'm saying, but if you're looking for
 finding results in a N-dimensional space, you might look at creating a
 field of type 'point'. Point-type fields have a dimension attribute; I
 believe that it can be set to a large integer value.

 Barring that, there is also a 'dist()' function that can be used to work
 with multiple numeric fields in order sort results based on closeness to a
 desired coordinate. The 'dist function takes a parameter to specify the
 means of calculating the distance. (For example, 2 - 'Euclidean distance'.
 I don't know the other options.)

 In the worst case, my response is worthless, but pops your question back up
 in the e-mails...

 Regards,
 John




-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

http://www.griddynamics.com
 mkhlud...@griddynamics.com


Custom Solr indexer/searcher

2012-11-13 Thread Scott Smith
Suppose I have a special data search type (something different than a string or 
numeric value) that I want to integrate into the Solr server.  For example, 
suppose I wanted to implement a KD-tree as a filter that would integrate with 
standard Solr filters and queries.  I might want to say find all of the 
documents in the index with the word 'tree' in them that are within a certain 
distance of a particular document in the KD-tree.  Let me add that I'm not 
really looking for a KD-Tree implementation for Solr; I just assume that a fair 
number of people will know what a KD-tree is and so, have some idea that I'm 
talking about adding a new data type (different than string, long, etc.) that 
Solr will need to be able to index and search with.  It's important that the 
new data type should integrate with the existing standard Solr data types for 
searching purposes.

First, is there a way to build and specify a plugin that provides Solr both the 
indexer and search interfaces and therefore hides the internal details of 
what's going on in the search from Solr so it just thinks it's another search 
type?  Or, would I have to hack Solr in a lot of places to add my custom data 
type in?

Second, if the interface(s) exists to add in a new data type, is there 
documentation (tutorial, examples, etc.) anywhere on how to do this.  Or, is my 
only option to dig into the Solr code?

Mostly, I'm looking for some links or suggestions on where to start looking.  I 
doubt this subject is simple enough to fit into an email post (though I'd be 
happy to be surprised :) ).  You can assume Solr 4.0 if that makes things 
easier.  You can also assume that I have some familiarity with Lucene (though I 
haven't hacked that code either).

Hopefully, I've explained this well enough so that people know what I'm looking 
for.

Cheers

Scott