Re: robustness of html error handling and plucker

2003-01-30 Thread MJ Ray
Blake Winton <[EMAIL PROTECTED]> wrote:
> You say that as if it's an either/or choice.  If Eugene wants
> to work on making plucker more robust, who are we to stop him? [...]

Oh yes, plucker's Free Software, so if any new developer wants to
concentrate on that, then go ahead.  I just didn't want anyone having
illusions about the difficulty.  Tidy is not a new project and still is
nowhere near fixing 100% of the crap out there.  It's already possible to
use tidy before plucker, so I also wonder how "itchy" the task is.

I agree that calling tidy as a filter is probably the right way to do it
with the least effort.  I'd be disappointed if existing (very busy, as far
as I can tell) developers stopped working on the wonderful things we're
seeing and started with that, though.  Just my opinion as a counterweight to
the original request, that's all.

On the point from lower down the thread: emailing webmasters of dud pages is
probably worth an option.  I don't really believe in options, but that would
be a useful trick, but you wouldn't always want to do it.
 
-- 
MJR   http://mjr.towers.org.uk/   IM: [EMAIL PROTECTED]
  This is my home web site.   This for Jabber Messaging.

How's my writing? Let me know via any of my contact details.

___
plucker-dev mailing list
[EMAIL PROTECTED]
http://lists.rubberchicken.org/mailman/listinfo/plucker-dev



RE: robustness of html error handling and plucker

2003-01-29 Thread Blake Winton
> > Having said that, perhaps a better idea would be to have
> > the parser automatically call tidy, if it's available, so
> > that we could leverage other people's work.
> I would prefer that to be a selective option, not mandatory.

With an entry in the config file.  Sounds great to me.

> That being said, anything that makes Plucker better, I'm all
> for it, but we shouldn't "allow" those webmasters to continue
> to make garbage, invalid HTML, and assume that we, the tool
> authors, will just compensate for it.

And yet, that's what has to happen, unless you have some way
of forcing content-producers to care about random technical
issues.  For more perspectives on the topic, I present:
http://diveintomark.org/archives/2003/01/22/parse_at_all_costs.html
which is about parsing XML, which is supposed to be less of
a tag soup than HTML (which almost everyone has given up any
hope of imposing structure on).

Along those lines, perhaps when we hit an invalid html page,
we can email the owner of that page (using some reasonably
simple scheme to figure out the address to mail to) letting
them know that their page has failed, and they should probably
fix it...  The more people who use Plucker, the more email
they'll get, the quicker they'll fix it.

The previous suggestion was made in jest.  Well, maybe half
in jest.  Something similar which allowed the user to choose
to send the mail or not, and even edit the mail before sending
might not be such a bad idea.

Later,
Blake.

___
plucker-dev mailing list
[EMAIL PROTECTED]
http://lists.rubberchicken.org/mailman/listinfo/plucker-dev



RE: robustness of html error handling and plucker

2003-01-29 Thread David A. Desrosiers

> Having said that, perhaps a better idea would be to have the parser
> automatically call tidy, if it's available, so that we could leverage
> other people's work.

I would prefer that to be a selective option, not mandatory.

That being said, anything that makes Plucker better, I'm all for it,
but we shouldn't "allow" those webmasters to continue to make garbage,
invalid HTML, and assume that we, the tool authors, will just compensate for
it.

d.


___
plucker-dev mailing list
[EMAIL PROTECTED]
http://lists.rubberchicken.org/mailman/listinfo/plucker-dev



RE: robustness of html error handling and plucker

2003-01-29 Thread Blake Winton
> Eugene Y. Vasserman <[EMAIL PROTECTED]> wrote:
> > how feasable it is to have plucker handle obvious html
> > errors "intelligently".
> It's a compromise: I'd rather Plucker developers spent time 
> on improving functionality for real web sites (ie ones that
> are valid and follow guidelines) than tried to square the
> circle and understand the unintelligible.  Wouldn't you?

You say that as if it's an either/or choice.  If Eugene wants
to work on making plucker more robust, who are we to stop him?
The beauty of Open Source software like plucker is that everyone
is a developer, or at least can be.

As long as the patches don't introduce bugs, or break other
bits, or take too long to run, I don't think anyone with
commit access will complain too heavily about them.

Having said that, perhaps a better idea would be to have the
parser automatically call tidy, if it's available, so that we
could leverage other people's work.  Then Eugene could spend
his time making tidy better, and more people would benefit from
his labour.

Later,
Blake.

___
plucker-dev mailing list
[EMAIL PROTECTED]
http://lists.rubberchicken.org/mailman/listinfo/plucker-dev



Re: robustness of html error handling and plucker

2003-01-29 Thread MJ Ray
Eugene Y. Vasserman <[EMAIL PROTECTED]> wrote:
> [...] how feasable it is to have plucker handle obvious html errors
> "intelligently". [...]

It's rather difficult to detect how to handle these "obvious" errors. 
Normally, it means that the site authors' are depending on some display
logic error of particular (groups of) browser(s).  It is possible to locally
mirror pages with a tool like wget and then use tidy to correct some errors,
but even that is not infallible and they've put a lot of time into how to do
it, so Plucker probably won't exceed that.  It will still fail on some
sites.

It's a compromise: I'd rather Plucker developers spent time on improving
functionality for real web sites (ie ones that are valid and follow
guidelines) than tried to square the circle and understand the
unintelligible.  Wouldn't you?

-- 
MJR   http://mjr.towers.org.uk/   IM: [EMAIL PROTECTED]
  This is my home web site.   This for Jabber Messaging.

How's my writing? Let me know via any of my contact details.

___
plucker-dev mailing list
[EMAIL PROTECTED]
http://lists.rubberchicken.org/mailman/listinfo/plucker-dev



robustness of html error handling and plucker

2003-01-29 Thread Eugene Y. Vasserman
This is a general discussion question:
I've found a number of things reported as bugs that are in fact the
fault of invalid/faulty html on websites (see bugs 444, 366, etc.). I was
wondering how feasable it is to have plucker handle obvious html errors
"intelligently". Yes, this is pandering to those who can not be bothered
to write correct html, but it is also very helpful to users who might
just not care that the site is at fault and blame the software. I'm
looking for input and thoughts, and this is not yet a suggestion. ;)
Thanks
eugene

-- 
Eugene "Gman" Vasserman
[EMAIL PROTECTED]
http://uranium.cataclasm.org/~eugene/
___
plucker-dev mailing list
[EMAIL PROTECTED]
http://lists.rubberchicken.org/mailman/listinfo/plucker-dev