When Bayes tokenizes the message, it ignores words with length<3 along with a list of stop words using a regexp as they lie in the gray area. But for other languages, the presence of these English stop words can be a great indication for spam. Is there a way to not remove these words for other languages?
Regards, Shreyansh Shrivastava
