On 4 Aug, 11:59, Fred Mangusta <[EMAIL PROTECTED]> wrote:
> Hi,
>
> are you aware of any nlp packages or algorithms in Python to spot
> whether a '.' represents an end of sentence or rather something else (eg
> Mr., [EMAIL PROTECTED], etc)?

I wouldn't mind finding out about such packages, either. I see that
NLTK offers a few options, with the following tokeniser being
interesting if you don't mind training the software:

http://nltk.org/doc/guides/tokenize.html#punkt-tokenizer

There was also discussion of this topic on Ned Batchelder's blog a
while back:

http://nedbatchelder.com/blog/200804/separating_sentences.html

My comment on there (that I'm using a regular expression with some
postprocessing) still stands.

Paul
--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to