This is pretty esoteric, I am looking for comments from those who understand.

Last week I modified NextToken to eat leading spaces.  I did this to deal 
with the following font definition:
{\f3\fnil       \fcharset0\fprq0 Sans Serif 10cpi;}

However, now importing unknown fields is broken, as importing
{\field\fldedit{\*\fldinst {\rtlch \af28 \ltrch title}}{\fldrslt {\rtlch 
\af28 \ltrch A Multiword Value For A Title Field}}}
results in AbiWord displaying
AMultiwordValueForATitleField
since NextToken is used to parse out the fldrslt value.

I went back to the RTF standard, and I quote:
-----------------------------------------
A control word takes the following form:
\LetterSequence<Delimiter>
Note that a backslash begins each control word.
The LetterSequence is made up of lowercase alphabetic characters between 
"a" and "z" inclusive. RTF is case sensitive, and all RTF control words 
must be lowercase.
The delimiter marks the end of an RTF control word, and can be one of the 
following:
�       A space. In this case, the space is part of the control word.
�       A digit or a hyphen (-), which indicates that a numeric parameter 
follows. ...
�       Any character other than a letter or a digit. In this case, the 
delimiting character terminates the control word but is not actually part 
of the control word.
If a space delimits the control word, the space does not appear in the 
document. Any characters following the delimiter, including spaces, will 
appear in the document. For this reason, you should use spaces only where 
necessary; do not use spaces merely to break up RTF code.
--------------------------------------------

According to this quote, the correct behavior is for NextToken to swallow 
the delimiter if it is a space.  Removing leading spaces is a big nono, and 
the application that created bug 1211 (document 1207.rtf in the test/bugs 
directory) is creating non-valid RTF.

I propose we stick to the standard, and closing bug 1207 by stating that 
the RTF file is non-compliant.  This requires backing out the change I made 
to ie_imp_RTF.cpp, specifically function NextToken where:
-----------------------
        while( pKeyword[0] == ' ')
        {
        if (!ReadCharFromFile(pKeyword))
        {
                tokenType = RTF_TOKEN_ERROR;
        }
        }
-----------------------
should be returned to:
----------------------
        if (!ReadCharFromFile(pKeyword))
        {
                tokenType = RTF_TOKEN_ERROR;
        }
---------------------

What are your thoughts?

Matti 


Reply via email to