Re: Source scrubbing (fwd)

John-Mark Bell Wed, 17 Oct 2007 13:09:35 -0700


---------- Forwarded message ----------
Date: Wed, 17 Oct 2007 21:42:38 +0200
From: Franz Korntner <[EMAIL PROTECTED]>
To: John-Mark Bell <[EMAIL PROTECTED]>
Subject: Re: Source scrubbing

John,

I'm not sure if this is a private reply or if it gets posted on themailinglist.

I currently don't have time to look at your patch (it'll probably be nextweek when I get time), but 10000 lines is most likely too big to evaluate inone go. Is there any possibility that you could break it up into moremanageable chunks?

Nearly all patch snippets are independent. So you can slice it into chunks aslarge as you like. The smaller patchfile is an extract containing the cherries.The patch consists largely of adding typecasts and layout changes to fix thescope of inner enums. It also contains lots of explicit casts from float to intto mark loss of precision. At some places I uniformed signed/unsigned, int/longand const issues. Other stuff found I'll keep for later as not to make thepatch too complicated.


What I did bump into

- Is some const nasties with regard to tree nodes. This is because the textfield are sometimes populated by const strings, sometimes 'shared strings' andsometimes free()able. This make the code very complex and breaks the constattribute. I really suggest that when the node is allocated, you allocate acouple of bytes more and copy the title field into the node itself. Thisrelaxes management and you can drop the related code and be more const strict.- The rendering coordinate system is limping on two legs (ints and floats) andit feels that it's having hidden side-effects. I also believe I found somepoints (in CSS) where sizes and counts are getting mixed. I am currentlylooking into getting these two separated.- At some places enums are being used instead of bitfields resulting in lots ofwarnings.- For example, in css there is the struct css_border_width, it containscss_length and the field percent. Technically is not a length, but it seemsmutual-exclusive with css_length. It might seem that broadening the concept ofcss_length might optimize both code and storage size.

Finally I constructed autoconf/automake templates as I require certainfunctionality not delivered by the supplied makefile. I suggest you includethese files so that package/distribution builders can choose which system touse.
Are you suggesting that the existing build system be replaced? If so, that'sa non-starter as it's highly unlikely that autotools will ever work on RISCOS.

No, keep the build system! What I suggest is to include the autoconf/automaketemplates so that they can be activated on request. The templates by themselvesare harmless and do not stand in the way or interfere with the build system. Ineed them because of my non-standard environment.

For the same reasons I suggest you include precompiled versions of lemon/re2cgenerated files.
Currently, these are available fromhttp://netsurf.strcprstskrzkrk.co.uk/developer/ -- they're recreatedautomatically as needed. The CSS parser is due for a major overhaul at somepoint in the relatively near future. I currently have no idea whether thiswill have an impact on the use of lemon/re2c.

I also maintain distributions. In general it's a pain to recreate intermediatefiles and it's annoying if the contents is not effected by the build/hostingenvironment. I have even encountered massive problems reproducing olderpackages because the required tools were not (or difficult) reproducible. Thesefiles do not require build-time regeneration.

Good to know.
I have a good feeling that DOM functionality can easily be injected in thecurrent HTML parser.
The current layout engine cannot cope with the document changing.
That depends; whilst libxml already produces a tree which is fairly close tothe W3C DOM, its HTML parser is particularly non-robust to real-world webcontent. Additionally, its architecture is not suited to handling injectionof data into the document source stream (as required by certain scriptingmethods -- namely document.write()).

I formulated this one really poorly as I actually meant something different.However you did answer a question I had not yet asked. I'll return on thissubject later.

I am looking for a small footprint package and get unnervy with theforesight of the introduction of a complete and/or standalone DOM component.This might make things easier.
I'm not sure I understand this. Are you saying that a standalone DOMcomponent is a good thing or not? Note that a standalone HTML parser and coreDOM implementation is likely to be smaller than a binary of libxml. Note thatboth the HTML parser and DOM implementation will be standalone libraries.

I meant that libxml seems suitable enough for an implied DOM model. i.e. itseems overdone to maintain a DOM tree separate of libxml. I meant using libxmlfor the DOM.

The two libraries I currently favour are SpiderMonkey (which is what I thinkyou meant, not SeaMonkey ;) and libsee. However, JavaScript work is somewaydown the line, so a decision on this hasn't happened yet.

I just installed a newer version of seamonkey, guess that was echoing in myhead. I haven't investigated libsee but I did look at spidermonkey and it feltgood. It has a compiler, interpreter and object handling with objects youexpect and hooks for DOM. I was surprised about the language capabilities.Seems that js is more mature than I imagined. I was also looking into theparser/scanner. Having separate combos for js and css seems silly. If you arelooking for a lemon/re2c substitute, I don't know how well the spidermonkerscanner/parser can handle css. But as I said before, this has no high priorityas I expect many unexpected nasties. If I have time to spare, I prefer to getnetsurf though css compliancy testing.


Franz.

Re: Source scrubbing (fwd)

Reply via email to