Re: Search agents

Wolfgang Hoschek Wed, 04 Jan 2006 12:53:26 -0800

If you'd consider using a MemoryIndex for this, I'd recommend alsohaving a look at nux.xom.pool.FullTextUtil andnux.xom.pool.FullTextPool, adding smart caching for indexes, queriesand results on top of a MemoryIndex. With some luck this (or somevariant of it) could help speed up your use cases, at least as far asI gather.


[It's part of the Nux download]


Wolfgang.

Snippet from the javadoc:

/**

* Thread-safe XQuery/XPath fulltext search utilities; implementedwith the

 * Lucene engine and a custom high-performance adapter for

* on-the-fly main memory indexing with smart caching for indexes,queries and results.

 * <p>
 * Complementing the standard XPath string and regular

* expression matching functionality, Lucene has a powerful querysyntax with support* for word stemming, fuzzy searches, similarity searches,approximate searches,* boolean operators, wildcards, grouping, range searches, termboosting, etc.

 * For details see the <a target="_blank"

* href="http://lucene.apache.org/java/docs/queryparsersyntax.html">Lucene Query

 * Syntax and Examples</a>.
 * Also see [EMAIL PROTECTED] org.apache.lucene.index.memory.MemoryIndex}
 * and [EMAIL PROTECTED] PatternAnalyzer} for detailed documentation.
 * <p>
 * Example Java usage:
 * <pre>
 * Analyzer analyzer = PatternAnalyzer.DEFAULT_ANALYZER;
 * float score = FullTextUtil.match(
 *    "Readings about Salmons and other select Alaska fishing Manuals",
 *    "+salmon~ +fish* manual~",
 *    analyzer, analyzer);
 * if (score &gt; 0.0f) {
 *     // query matches text
 * } else {
 *     // query does not match text
 * }
 * </pre>


On Jan 4, 2006, at 6:03 AM, karl wettin wrote:

Hello list,
I wrote a search agent thingy for Lucene. It was built to handlehuge amounts of agents.
Rather than one query per agent to find out if the new document isinteresting or not, agent trigger queries are stored in an indexthat is queried with the tokens of a new document.
Since it uses the index a bit backwards the agent trigger queriesare somewhat limited:
At least one token in a OR or FUZZY OR per agent field must matchthe new document.
Any NOT token in agent must not match the new document.
It is fairly easy to add more query types, but is limited to singletoken and non-wildcard types since the query if created from thenew document tokens.
Agents are clustered by required fields by agent, and each clusteris stored in an own index. When a new document is sent to theAgentManager it creates one query per possible cluster. I'm notsure this actually speeds things up, just a gut feeling.
Example agents in psuedo trigger query language:

Possible agent:

AND (OR ("category","media"))
AND (OR ("name", "hotel") OR ("name","rowanda"))
AND (NOT("name", "paradise"))

Impossible agent:

AND (OR ("category","media"))
AND (("name", "hotel") AND ("name","rowanda"))
AND (NOT("name", "paradise"))

In effect the agents can't trigger on AND queries of the same field.
One could of couse place a more complex query on the new documentas the agent triggers, use some classifier or whatever if speed isnot a big deal. The agent triggers could then be built from theoriginal query. I probably won't implement such a thing my self.
Should I post the code to the sandbox when I've tested it? Arethere any restrictions to the code if I do that?
--
karl

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Search agents

Reply via email to