On 3/30/2011 6:39 PM, Toshio Kuratomi wrote:

Really, surrogates are a red herring to this whole issue.  The issue is that
the original code was trying to compare two different transformations of
byte sequences and expecting them to be equal.  Let's say that you have the
following byte value::
   b_test_value = b'\xa4\xaf'

This is something that's stored in a file or the filename of something on
a unix filesystem or stored in a database or any number of other things.
Now you want to compare that to another piece of data that you've read in
from somewhere outside of python.  You'd expect any of the following to
work::
   b_test_value == b_other_byte_value
   b_test_value.encode('utf-8', 'surrogateescape') == 
b_other_byte_value('utf-8', 'surrogateescape')
   b_test_value.encode('latin-1') == b_other_byte_value('latin-1')
   b_test_value.encode('euc_jp') == b_other_byte_value('euc_jp')

You wouldn't expect this to work::
   b_test_value.encode('latin-1') == b_other_byte_value('euc_jp')

Once you see that, you realize that the following is only a specific case of
the former, surrogateescape doesn't really matter::
   b_test_value.encode('utf-8', 'surrogateescape') == 
b_other_byte_value('euc_jp')

All the encodes above should be decodes instead. Aside from that. your point is correct, and not limited to CS. The whole art of disguise, for instance, is about effecting a transformation to falsely pass or fail an identity or equality comparison.

--
Terry Jan Reedy

_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to