On Mon, May 14, 2007 at 11:11:50AM +0100, Alex Young wrote:
> Jeremy Hinegardner wrote:
> >Hi all,
> >
> >I'm looking at useing Ferret for categorizing documents.  
> >Essentially what I have are thousands of query rules that if a document
> >matches, then it belongs to the category that is associated with that
> >rule.  Normally what we all do is have documents indexed and then run a
> >query against the index to get back the documents that matche the query.
> >
> >What I want to do is the inverse.  I have thousands of queries and I
> >want to run all of them against one document at a time.  The queries
> >that match the document essentially categorize the document into the
> >associated category.
> <snip>
> >Thought, comments, rants, raves, brainstorms?
> Random thought that might or might not work, depending on whether your 
> queries are simple enough and how much data you want back:  just invert 
> the problem.  Store the queries in Ferret, and treat your document as 
> the query.  Random example:
> 
> irb(main):015:0> index = Index::Index.new
> irb(main):016:0> index << "hat"
> irb(main):017:0> index << "fox"
> irb(main):018:0> doc = "the quick brown fox jumped over the lazy dog"
> irb(main):018:0> index.search_each(doc) { |id, score| puts
>   index[id].load.to_yaml + score.to_s }
> --- !map:Ferret::Index::LazyDoc
> :id: fox
> 0.0425622686743736
> => 1
> 
> I've got absolutely no idea how well the query parser will handle larger 
> documents, but it's worth a try...

I did give some thought to this, but we have some fairly complex
categorization queries, some of which are the equivalent of
SpanTermQuery. Since there is no FQL for those type of queries yet, I
don't think your approach will work for me.  But it is a good idea.

enjoy,

-jeremy

-- 
========================================================================
 Jeremy Hinegardner                              [EMAIL PROTECTED] 

_______________________________________________
Ferret-talk mailing list
[email protected]
http://rubyforge.org/mailman/listinfo/ferret-talk

Reply via email to