On Nov 10, 5:56 pm, "Edward K. Ream" <edream...@gmail.com> wrote:

> > Here is the test case, boiled down to its essence from data.html::
>
> >     <td><a href="1">Standards</a> <a href="2">Fees</a></td>

> It should be possible to extend the first <a> element so that it
> contain the troublesome space.

This worked.  The new code is yet another scanner, this time at the
end of startsHelper.  All these scanners are similar, but at present
there seems to be no way to use common code.  It's not a big deal,
imo.

As of rev 4772 all unit tests pass, and data.html imports "correctly"
if not "well": tags are placed in odd locations.

The reason for this unsightly "perfect" import is that
skipToMatchingTag thinks there is a tag mismatch. Supposedly, this is
a "user error", which however, does not actually spoil the "perfect"
import.

I have my doubts that there really is a user error, and even if there
were unmatched tags, skipToMatchingTag should do a better job of error
recovery.  That's tomorrow's project.  That might be the end of this
saga.

Edward

P.S.  The html importer now uses a more rigorous version of
filterTokens.  It uses a two-pass algorithm.

The first pass inserts newlines before </ and after >, which is not
*quite* exactly right because the '>' need not terminate an open tag.
But that's likely to be a nit that will never cause a problem.

The second pass collapses adjacent ws tokens into a single blank, and
all runs of newlines into a single newline.  These operations are
separate, so inserting a newline in the first pass can *not* affect
the final ws tokens, and the presence or absence of ws tokens will
have no effect on the final newline tokens.

Because filterTokens is *almost* perfect, it will be very rare for the
perfect import checks to give false positives (falsely claiming to
have imported the file perfectly) and it should be impossible for the
perfect import checks to give false negatives (falsely reporting
import errors).  If the (rare!) false positives ever become a problem
I can "perfect" the filterTokens code, but that would be a bit
expensive, so let's see how things work for now.

EKR

-- 
You received this message because you are subscribed to the Google Groups 
"leo-editor" group.
To post to this group, send email to leo-editor@googlegroups.com.
To unsubscribe from this group, send email to 
leo-editor+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/leo-editor?hl=en.

Reply via email to