respecting sentence boundaries and using them to affect a document's score in the 
ranking algorithm requires linguistic knowledge, not NLP knowledge. think about it.

Herb....

-----Original Message-----
From: Stefan Groschupf [mailto:[EMAIL PROTECTED]
Sent: Friday, November 14, 2003 9:13 PM
To: Lucene Users List
Subject: Re: inter-term correlation [was Re: Vector Space Model in Lucene?]


What you can do is use a pos tagger (i.e. a maximum entropy model based 
or  Brill tagger if you just have english) and use a data mining 
algorithm for weight your terms.
May be you can use a hidden Markov model for that.

You can build this on top of lucene, shouldn't be that difficult.

But may be I understand you wrong.. ..

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to