---------- Forwarded message ----------
Date: Wed, 17 Oct 2007 21:42:38 +0200
From: Franz Korntner <[EMAIL PROTECTED]>
To: John-Mark Bell <[EMAIL PROTECTED]>
Subject: Re: Source scrubbing

John,

I'm not sure if this is a private reply or if it gets posted on the mailinglist.
I currently don't have time to look at your patch (it'll probably be next week when I get time), but 10000 lines is most likely too big to evaluate in one go. Is there any possibility that you could break it up into more manageable chunks?
Nearly all patch snippets are independent. So you can slice it into chunks as large as you like. The smaller patchfile is an extract containing the cherries. The patch consists largely of adding typecasts and layout changes to fix the scope of inner enums. It also contains lots of explicit casts from float to int to mark loss of precision. At some places I uniformed signed/unsigned, int/long and const issues. Other stuff found I'll keep for later as not to make the patch too complicated.

What I did bump into
- Is some const nasties with regard to tree nodes. This is because the text field are sometimes populated by const strings, sometimes 'shared strings' and sometimes free()able. This make the code very complex and breaks the const attribute. I really suggest that when the node is allocated, you allocate a couple of bytes more and copy the title field into the node itself. This relaxes management and you can drop the related code and be more const strict. - The rendering coordinate system is limping on two legs (ints and floats) and it feels that it's having hidden side-effects. I also believe I found some points (in CSS) where sizes and counts are getting mixed. I am currently looking into getting these two separated. - At some places enums are being used instead of bitfields resulting in lots of warnings. - For example, in css there is the struct css_border_width, it contains css_length and the field percent. Technically is not a length, but it seems mutual-exclusive with css_length. It might seem that broadening the concept of css_length might optimize both code and storage size.

Finally I constructed autoconf/automake templates as I require certain functionality not delivered by the supplied makefile. I suggest you include these files so that package/distribution builders can choose which system to use.

Are you suggesting that the existing build system be replaced? If so, that's a non-starter as it's highly unlikely that autotools will ever work on RISC OS.
No, keep the build system! What I suggest is to include the autoconf/automake templates so that they can be activated on request. The templates by themselves are harmless and do not stand in the way or interfere with the build system. I need them because of my non-standard environment.
For the same reasons I suggest you include precompiled versions of lemon/re2c generated files.

Currently, these are available from http://netsurf.strcprstskrzkrk.co.uk/developer/ -- they're recreated automatically as needed. The CSS parser is due for a major overhaul at some point in the relatively near future. I currently have no idea whether this will have an impact on the use of lemon/re2c.
I also maintain distributions. In general it's a pain to recreate intermediate files and it's annoying if the contents is not effected by the build/hosting environment. I have even encountered massive problems reproducing older packages because the required tools were not (or difficult) reproducible. These files do not require build-time regeneration.
Good to know.
I have a good feeling that DOM functionality can easily be injected in the current HTML parser.
The current layout engine cannot cope with the document changing.

That depends; whilst libxml already produces a tree which is fairly close to the W3C DOM, its HTML parser is particularly non-robust to real-world web content. Additionally, its architecture is not suited to handling injection of data into the document source stream (as required by certain scripting methods -- namely document.write()).
I formulated this one really poorly as I actually meant something different. However you did answer a question I had not yet asked. I'll return on this subject later.
I am looking for a small footprint package and get unnervy with the foresight of the introduction of a complete and/or standalone DOM component. This might make things easier.
I'm not sure I understand this. Are you saying that a standalone DOM component is a good thing or not? Note that a standalone HTML parser and core DOM implementation is likely to be smaller than a binary of libxml. Note that both the HTML parser and DOM implementation will be standalone libraries.
I meant that libxml seems suitable enough for an implied DOM model. i.e. it seems overdone to maintain a DOM tree separate of libxml. I meant using libxml for the DOM.
The two libraries I currently favour are SpiderMonkey (which is what I think you meant, not SeaMonkey ;) and libsee. However, JavaScript work is someway down the line, so a decision on this hasn't happened yet.
I just installed a newer version of seamonkey, guess that was echoing in my head. I haven't investigated libsee but I did look at spidermonkey and it felt good. It has a compiler, interpreter and object handling with objects you expect and hooks for DOM. I was surprised about the language capabilities. Seems that js is more mature than I imagined. I was also looking into the parser/scanner. Having separate combos for js and css seems silly. If you are looking for a lemon/re2c substitute, I don't know how well the spidermonker scanner/parser can handle css. But as I said before, this has no high priority as I expect many unexpected nasties. If I have time to spare, I prefer to get netsurf though css compliancy testing.

Franz.

Reply via email to