In article <roy-df05da.11460324122...@news.panix.com>,
Roy Smith  <r...@panix.com> wrote:
>In article <rn%Bs.693798$nB6.605938@fx21.am4>,
> Alister <alister.w...@ntlworld.com> wrote:
>
>> Indeed due to the poor quality of most websites it is not possible to be
>> 100% accurate for all sites.
>>
>> personally I would start by checking the doc type & then the meta data as
>> these should be quick & correct, I then use chardectect only if these
>> fail to provide any result.
>
>I agree that checking the metadata is the right thing to do.  But, I
>wouldn't go so far as to assume it will always be correct.  There's a
>lot of crap out there with perfectly formed metadata which just happens
>to be wrong.
>
>Although it pains me greatly to quote Ronald Reagan as a source of
>wisdom, I have to admit he got it right with "Trust, but verify".  It's

Not surprisingly, as an actor, Reagan was as good as his script.
This one he got from Stalin.

>the only way to survive in the unicode world.  Write defensive code.
>Wrap try blocks around calls that might raise exceptions if the external
>data is borked w/r/t what the metadata claims it should be.

The way to go, of course.

Groetjes Albert
-- 
Albert van der Horst, UTRECHT,THE NETHERLANDS
Economic growth -- being exponential -- ultimately falters.
albert@spe&ar&c.xs4all.nl &=n http://home.hccnet.nl/a.w.m.van.der.horst

-- 
http://mail.python.org/mailman/listinfo/python-list

Reply via email to