Chris,

In the field of Content-Based Image Recognition (CIBR), I think they
frequently use a Bag-of-Words approach, to classify image features into
words (these can be actual words like "dog" or just sequences of letters
that represent a numeric value). By doing this, it turns the image
similarity search problem into a document similarity search problem, for
which there are loads of (mostly Lucene-based) tools available.

http://stackoverflow.com/questions/16660177/algorithm-for-finding-visually-similar-photos-from-a-database

http://www.hindawi.com/journals/isrn/2012/376804/

http://www.semanticmetadata.net/lire/


- john



On Fri, Apr 11, 2014 at 1:18 PM, Guyren Howe <[email protected]> wrote:

> On Apr 11, 2014, at 12:59 PM, Chris McCann <[email protected]> wrote:
>
> I'm looking for a solution to a search problem and want to survey the
> community to see if anyone else has dealt with this type of search.
>
> The application I'm building supports an image processing system.  We have
> a mathematical way of uniquely representing any particular image as a
> vector of 16 values, each ranging between 0 and 255.
>
> I need to implement a search mechanism that finds the closest matches to a
> given image, also represented as a 16 element vector.  This is usually
> called a "vector space model" search, and it's implemented for full text
> search in Postgres as well as Lucene, and probably many other full text
> search systems.
>
> The problem I'm wrestling with is I'm not searching on text, I'm searching
> on integers.  I basically need to search for the closest match like this:
>
> Say my search image has a vector with elements q(1) to q(16),  [q(1) =
> 122, q(2) = 7, q(3) = 89,, ..., q(16) = 224].
>
> To compare that vector against the image vectors in the database I need to
> calculate the "distance" between the query vector (q) and each of the
> database vectors (d):
>
> distance = square_root( (q(1) - d(1))^2 + (q(2) - d(2))^2 + ... (q(16) -
> d(16))^2)
>
> The lower the distance the closer the match, with dist == 0 being an exact
> match.
>
> My research hasn't led me to a direct implementation of this in Postgres
> or Lucene since they are designed for text searching, though the underlying
> principles are the exact same.  Anyone ever tackle this type of search with
> numerical values?
>
>
> This isn't my area of expertise, but I believe Postgres is the go-to
> database of choice for spatial work because of its advanced indexing
> options. I'm 75% sure you can do something that will give you an index
> across your 16 values that will let you do a fast nearest-neighbor or
> nearest-k.
>
> Stack Overflow appears to agree:
>
> <
> http://stackoverflow.com/questions/16676644/postgresql-k-nearest-neighbor-knn-on-multidimensional-cube
> >
>
> Regards,
>
> Guyren G Howe
> Relevant Logic LLC
>
> guyren-at-relevantlogic.com ~ http://relevantlogic.com ~ +1 512 784 3178
>
> Ruby/Rails,  Xojo, PHP programming
> PostgreSQL, MySQL database design and consulting
> Technical writing and training
>
> Read my book, Real OOP with REALbasic: <
> http://relevantlogic.com/oop-book/about-the-oop-book.php>
>
>  --
> --
> SD Ruby mailing list
> [email protected]
> http://groups.google.com/group/sdruby
> ---
> You received this message because you are subscribed to the Google Groups
> "SD Ruby" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> For more options, visit https://groups.google.com/d/optout.
>

-- 
-- 
SD Ruby mailing list
[email protected]
http://groups.google.com/group/sdruby
--- 
You received this message because you are subscribed to the Google Groups "SD 
Ruby" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to