Re: [Apertium-stuff] Capitalizing the first letter of every sentence

Mikel Forcada Sat, 12 Apr 2014 11:11:34 -0700

Al 04/12/2014 07:25 PM, En/na Rafi Kamal ha escrit:

Another question, in the script, I need to split the corpus intosentences. Is there any tool in apertium which does this task (likeNLTK tokenizer for Python)? I can simply split the corpus on . or ? or!. But in that case, sentences where '.' is not the last letter willcreate problem. For example "Washington D.C. is a beautiful place."will be spitted into three parts: "Washington D", "C", and "is abeautiful place".

First of all, the sentence is not a unit in Apertium. Apertium workswith the chunks its modules deal with (lexical units, patterns, etc.)If you check the English dictionary you will see that some words have"." inside. For instance, "Washington D.C." is an entry inApertium-en-es. There are many other entries there. Symbols such as "?"and "!" are analyzed as "sent", that is, sentence markers. like ".".

On the other hand, I believe the -m option (used in connection withtranslation memories) does deal with sentences in some way, but youshould check.


All the best

Mikel

--
Mikel L. Forcada (http://www.dlsi.ua.es/~mlf/)
Departament de Llenguatges i Sistemes Informàtics
Universitat d'Alacant
E-03071 Alacant, Spain
Phone: +34 96 590 9776
Fax: +34 96 590 9326

------------------------------------------------------------------------------
Put Bad Developers to Shame
Dominate Development with Jenkins Continuous Integration
Continuously Automate Build, Test & Deployment 
Start a new project now. Try Jenkins in the cloud.
http://p.sf.net/sfu/13600_Cloudbees

_______________________________________________
Apertium-stuff mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] Capitalizing the first letter of every sentence

Reply via email to