On Mon, May 14, 2007 at 08:00:02AM +0000, Jan Prill wrote:
> interesting approach. You might build your Query-Objects once by calling
> QueryParser#parse, serialize these Query-Objects ans use them.

Yup, that's one item I need to look into.  One of hte issues is the
query language we're using right now has 'NEAR' keywords so we'll need
to convert those into SpanTermQuery's, I'm thinking to have the DSL
generate ruby code, then serialize those Query objects, or maybe just
run them as code. 

> IMHO your Problem wouldn't be query parsing but the amount of queries that
> you are issueing on each document. On the other hand ferret is quite fast
> and it may work out if your process is not that time critical. Have you
> considered to combine queries. Ferrets Query Language is quite powerful and
> you might bring down the number of queries if you combine the queries that
> are useful to only one catogorization anyway. Check out the QueryParser API
> regarding this approach.

I will investigate the API more, currently we don't have multiple
queries that equate to a single category, its a one-to-one relationship
between category and query.  The speed of my initial experiments is
within our tolerances, but may not be good for a serial execution.  Of
course, since all of this is in a single Memory index, per document, it
could be parallellized.

> At least the lines
> 
> top_docs = index.search(row[:boolean])
>           if top_docs.hits.size > 0 then
> 
> should read "if index.search(row[:boolean]).total_hits > 0" so that you
> don't need to read in the hits-array to get the size.

Good tip, thanks.

> As a last tip you might be interested in the underlying code of the
> more_like_this method of aaf to get the most used terms in your documents.
> This might be able to let your categorizations "learn" while documents get
> categorized.

I will definitely check more into that.  Who knows, maybe a
categorization engine based on Feret will fall out of this :-)

enjoy,

-jeremy

-- 
========================================================================
 Jeremy Hinegardner                              [EMAIL PROTECTED] 

_______________________________________________
Ferret-talk mailing list
[email protected]
http://rubyforge.org/mailman/listinfo/ferret-talk

Reply via email to