respecting sentence boundaries and using them to affect a document's score in the ranking algorithm requires linguistic knowledge, not NLP knowledge. think about it.
Herb.... -----Original Message----- From: Stefan Groschupf [mailto:[EMAIL PROTECTED] Sent: Friday, November 14, 2003 9:13 PM To: Lucene Users List Subject: Re: inter-term correlation [was Re: Vector Space Model in Lucene?] What you can do is use a pos tagger (i.e. a maximum entropy model based or Brill tagger if you just have english) and use a data mining algorithm for weight your terms. May be you can use a hidden Markov model for that. You can build this on top of lucene, shouldn't be that difficult. But may be I understand you wrong.. .. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]