Re: robustness of html error handling and plucker
Blake Winton <[EMAIL PROTECTED]> wrote: > You say that as if it's an either/or choice. If Eugene wants > to work on making plucker more robust, who are we to stop him? [...] Oh yes, plucker's Free Software, so if any new developer wants to concentrate on that, then go ahead. I just didn't want anyone having illusions about the difficulty. Tidy is not a new project and still is nowhere near fixing 100% of the crap out there. It's already possible to use tidy before plucker, so I also wonder how "itchy" the task is. I agree that calling tidy as a filter is probably the right way to do it with the least effort. I'd be disappointed if existing (very busy, as far as I can tell) developers stopped working on the wonderful things we're seeing and started with that, though. Just my opinion as a counterweight to the original request, that's all. On the point from lower down the thread: emailing webmasters of dud pages is probably worth an option. I don't really believe in options, but that would be a useful trick, but you wouldn't always want to do it. -- MJR http://mjr.towers.org.uk/ IM: [EMAIL PROTECTED] This is my home web site. This for Jabber Messaging. How's my writing? Let me know via any of my contact details. ___ plucker-dev mailing list [EMAIL PROTECTED] http://lists.rubberchicken.org/mailman/listinfo/plucker-dev
RE: robustness of html error handling and plucker
> > Having said that, perhaps a better idea would be to have > > the parser automatically call tidy, if it's available, so > > that we could leverage other people's work. > I would prefer that to be a selective option, not mandatory. With an entry in the config file. Sounds great to me. > That being said, anything that makes Plucker better, I'm all > for it, but we shouldn't "allow" those webmasters to continue > to make garbage, invalid HTML, and assume that we, the tool > authors, will just compensate for it. And yet, that's what has to happen, unless you have some way of forcing content-producers to care about random technical issues. For more perspectives on the topic, I present: http://diveintomark.org/archives/2003/01/22/parse_at_all_costs.html which is about parsing XML, which is supposed to be less of a tag soup than HTML (which almost everyone has given up any hope of imposing structure on). Along those lines, perhaps when we hit an invalid html page, we can email the owner of that page (using some reasonably simple scheme to figure out the address to mail to) letting them know that their page has failed, and they should probably fix it... The more people who use Plucker, the more email they'll get, the quicker they'll fix it. The previous suggestion was made in jest. Well, maybe half in jest. Something similar which allowed the user to choose to send the mail or not, and even edit the mail before sending might not be such a bad idea. Later, Blake. ___ plucker-dev mailing list [EMAIL PROTECTED] http://lists.rubberchicken.org/mailman/listinfo/plucker-dev
RE: robustness of html error handling and plucker
> Having said that, perhaps a better idea would be to have the parser > automatically call tidy, if it's available, so that we could leverage > other people's work. I would prefer that to be a selective option, not mandatory. That being said, anything that makes Plucker better, I'm all for it, but we shouldn't "allow" those webmasters to continue to make garbage, invalid HTML, and assume that we, the tool authors, will just compensate for it. d. ___ plucker-dev mailing list [EMAIL PROTECTED] http://lists.rubberchicken.org/mailman/listinfo/plucker-dev
RE: robustness of html error handling and plucker
> Eugene Y. Vasserman <[EMAIL PROTECTED]> wrote: > > how feasable it is to have plucker handle obvious html > > errors "intelligently". > It's a compromise: I'd rather Plucker developers spent time > on improving functionality for real web sites (ie ones that > are valid and follow guidelines) than tried to square the > circle and understand the unintelligible. Wouldn't you? You say that as if it's an either/or choice. If Eugene wants to work on making plucker more robust, who are we to stop him? The beauty of Open Source software like plucker is that everyone is a developer, or at least can be. As long as the patches don't introduce bugs, or break other bits, or take too long to run, I don't think anyone with commit access will complain too heavily about them. Having said that, perhaps a better idea would be to have the parser automatically call tidy, if it's available, so that we could leverage other people's work. Then Eugene could spend his time making tidy better, and more people would benefit from his labour. Later, Blake. ___ plucker-dev mailing list [EMAIL PROTECTED] http://lists.rubberchicken.org/mailman/listinfo/plucker-dev
Re: robustness of html error handling and plucker
Eugene Y. Vasserman <[EMAIL PROTECTED]> wrote: > [...] how feasable it is to have plucker handle obvious html errors > "intelligently". [...] It's rather difficult to detect how to handle these "obvious" errors. Normally, it means that the site authors' are depending on some display logic error of particular (groups of) browser(s). It is possible to locally mirror pages with a tool like wget and then use tidy to correct some errors, but even that is not infallible and they've put a lot of time into how to do it, so Plucker probably won't exceed that. It will still fail on some sites. It's a compromise: I'd rather Plucker developers spent time on improving functionality for real web sites (ie ones that are valid and follow guidelines) than tried to square the circle and understand the unintelligible. Wouldn't you? -- MJR http://mjr.towers.org.uk/ IM: [EMAIL PROTECTED] This is my home web site. This for Jabber Messaging. How's my writing? Let me know via any of my contact details. ___ plucker-dev mailing list [EMAIL PROTECTED] http://lists.rubberchicken.org/mailman/listinfo/plucker-dev
robustness of html error handling and plucker
This is a general discussion question: I've found a number of things reported as bugs that are in fact the fault of invalid/faulty html on websites (see bugs 444, 366, etc.). I was wondering how feasable it is to have plucker handle obvious html errors "intelligently". Yes, this is pandering to those who can not be bothered to write correct html, but it is also very helpful to users who might just not care that the site is at fault and blame the software. I'm looking for input and thoughts, and this is not yet a suggestion. ;) Thanks eugene -- Eugene "Gman" Vasserman [EMAIL PROTECTED] http://uranium.cataclasm.org/~eugene/ ___ plucker-dev mailing list [EMAIL PROTECTED] http://lists.rubberchicken.org/mailman/listinfo/plucker-dev