https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6119
--- Comment #7 from Karsten Bräckelmann <[email protected]> 2009-05-26 14:58:30 PST --- Just had a quick look at attachment 4452 and the BodyEval tvd_vertical_words() function, adding some noisy debugging love. The reason is quite simple -- the space to non-space ratio doesn't exceed 9%, which is less than the default 10% max. This didn't become apparent from looking at the code only without the debugging, though. I expected it to check the body line by line. However, it actually checks the space ratio for *paragraphs* in a traditional UN*X style. That paragraph ends with *two* newlines. This line for example would have a ratio of 18% on its own, still 13% with the longish header-style prefix and no (munged?) linebreak. Over_to_maintainer_(via_the_GNATS_Auto_Assign_Tool) The text being looked at is the entire paragraph, though, including all lines immediately preceding or following without an empty line. Resulting in 20/201, or about 9%. One reason, and an explanation why it loves to hit on such messages, are the very long words prefixing each line. Or, in other word: There's not much real, human generated text there. Compare it to this very paragraph... A quick and easy fix is, to lower the max threshold (second argument) in 20_body_tests.cf, which currently reads: body TVD_SPACE_RATIO eval:tvd_vertical_words('0','10') However, given the idea is to identify lots of *vertical* words, I seriously wonder if this used to work on actual *lines*, rather than whole paragraphs. Theo? -- Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug.
