http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5056





------- Additional Comments From [EMAIL PROTECTED]  2006-12-26 17:39 -------
Theo, kindly keeper of the assassinator of spam wrote:
> Linda, "corruptor" of programs, and occasional Pain in the *** wrote:
>> While Text::Wrap did change, the change uncovered an SA bug.

T> I disagree, please see below.

>> The pattern '(?<=[\s,])' (Passed in the line:
>> ./lib/Mail/SpamAssassin/PerMsgStatus.pm:996:      $hdr =
>> Mail::SpamAssassin::Util::wrap($hdr, "\t", "", 79, 0, '(?<=[\s,])');
>> is used with a *.

T> Due to a change in Text::Wrap, yes.

>> What may have been meant here was simply '[\s,]'.

T> Nope.  Originally that's what it was.  However, it caused bug 2165.  The
T> solution (r5605 if anyone's interested) was to use the zero width assertion
T> which avoided removing the break character.
---
    I was afraid it was something like that.  But....


T> Unless I'm missing something in the Text::Wrap docs, there's no other way to 
say
T> "break on this character, but don't remove it from the string".
---
    Line::Break modules states:

    "It is possible to control which characters terminate words by modifying
    $Text::Wrap::break. Set this to a string such as '[\s:]' (to break
    before spaces or colons) or a pre-compiled regexp such as "qr/[\s']/"
    (to break before spaces or apostrophes). The default is simply '\s';
    that is, words are terminated by spaces."

    Unless I read it improperly, "$break" must contain characters.
A zero width assertion doesn't (as much as it would be interesting) 
qualify as a character in any character set. :-)

    Having it, both correspond to the break character, and having the code
swallow extra break characters makes sense, in the general scheme of things: the
code would look for a break character "break character" when it is time to
"break" by using '(:$break)*'.  That would swallow up extra break characters
after a printing line.

    Looking at the code for Text::Wrap (TW), I don't believe it was designed
to accept a zero-width pattern.  By not passing in a match for 1 or more
characters, I believe SA is calling Text::Width with illegal parameters.

    No matter if it is decided to maintain "SA"'s call to TW with an invalid
parameter or if it is fixed, the patch hiding a *valid* perl warning is bad 
form.

    Why is "," needed as a breaking character?  Would it work (i.e. not cause
bug 2165) if you just used '\s'?

    Regarding the warning, it is a valid warning about broken/bad perl code. 
The broken code:
'(zero width assertion)<unbounded wildcard>' is invalid / undefined perl code.
I wouldn't be certain what it would do -- I suppose we should feel fortunate
that perl is smart enough to catch it -- since when trying to match a pattern, a
pattern like '(zero-width-pat)*' should repeat indefinitely.

    If SA needs to "fork" the Text::Wrap code from 0705 to solve SA's
need, that would be one "valid" solution, but hiding away valid warnings about
invalid perl regular expressions or "useless use of infinitely-repeating
zero-width assertion" is just hiding a bug and asking for trouble.  

Maybe Text::Wrap does need to be forked to provide the, functionality that SA
needs?  

But please, don't shut off a valid warning.  Anyone linking with TW after the
0705 version will silently be running the bad perl code.  

It is technically experimental, but its been there, I believe since 5.8.1,
maybe the construct '([\s,]+)(?{$mybreakchars=$^N})' would provide a workaround?
You'd still have the "\s," pattern, and for any given invocation of TW, you'd
save the value of the break char such that it could be inserted back in? 
If Text::Wrap might execute that line multiple times, maybe:
'([\s,]+)(?{push @mybreakchars,$^N})' would allow pushing on successive 
break characters for each time it's used.  

It sounds potentially ugly/messy.  Is it possible just to ask for zero-width
assertion "support" in the Text::Width function?  The code in 0705 was
technically "broken", in that it didn't properly use the value in "$break" as
the value to break-on -- it used '\s'.  It still would have swallowed the break
char though.

*sigh*


-l


    




------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

Reply via email to