2010/12/20 Doug Winter <doug.win...@isotoma.com>:
> On 20/12/10 17:53, Alec Battles wrote:
>>>>
>>>> I seem to remember that 'file' in Linux detects encodings, but it's
>>>> also a matter of calling it by the exact same name...
>>>
>>> There is no foolproof way of detecting encoding unfortunately - you just
>>> need to know what it is before you read the file.
>>
>> That's interesting. I wonder if there's a mathematical proof of the
>> 'undecidability' of text encodings.
>
> Hofstadter describes the problem in Godel, Escher, Bach as the "Envelope
> Problem" IIRC - you need to have some idea of how to decode any message you
> are sent, and you even need to understand that it is a "message".
>
> UNIX manages the latter for us by providing a filename - but how to
> interpret the contents is entirely up to you.  It might be UTF-8, it might
> be a jpeg, it might be encrypted using AES.  You need to know what to expect
> to try and interpret the contents.
>
> I bet there is a name for this (although probably not a proof), but I don't
> know what it is ;)

You could give some heuristic on well known message domains as well,
but it would lead to
false negative or false positive. For example you could pattern match
on file initial contents.
 This problem reminds me what we did several years ago on intrusion
detection systems.
Best Regards,
Giorgio.
-- 
Quiero ser el rayo de sol que cada día te despierta
para hacerte respirar y vivir en me.
"Favola -Moda".
_______________________________________________
python-uk mailing list
python-uk@python.org
http://mail.python.org/mailman/listinfo/python-uk

Reply via email to