really cool Stuff!!!

maurits van wijland wrote:

Hi All and Marc,

There is the carrot project :
http://www.cs.put.poznan.pl/dweiss/carrot/

The carrot system consists of webservices that can easily be fed by a lucene
resultlist. You simply have to create a JSP that creates this XML file and
create a custom process and input component. The input component
for lucene could look like:

<?xml version="1.0" encoding="UTF-8"?>
<service xmlns      =
"http://www.dawidweiss.com/projects/carrot/componentDescriptor"; framework  =
"Carrot2">
   <component id               = "carrot2.input.lucene"
              type             = "input"
              serviceURL       = "http://localhost/weblucene/c2.jsp";
              infoURL          = "http://localhost/weblucene/";
   />
</service>

The c2.jsp file simply has to translate a resultlist into an XLM file such
as:
<searchresult>
   <document id="1">
<title>...</title>
<weight>1.0</weight>
<url>http://...</url>
<summary>sum 1</summary>
<snippet>snip 2</snippet>
   </document>
   <document id="2">
<title>...</title>
<weight>1.0</weight>
<url>http://...</url>
<summary>sum 2</summary>
<snippet>snip 2</snippet>
   </document>
</searchresult>

Feed this into the carrot system, and you will get a nice clustered
result list. The amazing part is of this clustering mechanism is that
the cluster labels are incredible, their great!

Then there is a open source project called Classifier4J that can
be used for classification, the oposite of clustering. These other
open source projects are a great addition to the Lucene system.

I hope this helps...

Marc, what are you building?? Maybe we can help!

Kind regards,

Maurits


----- Original Message ----- From: "marc" <[EMAIL PROTECTED]>
To: "Lucene Users List" <[EMAIL PROTECTED]>
Sent: Tuesday, November 11, 2003 5:15 PM
Subject: Document Clustering



Hi,


does anyone have any sample code/documentation available for doing document
based clustering using lucene?

Thanks,
Marc



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]





-- day time: www.media-style.com spare time: www.text-mining.org | www.weta-group.net




--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to