Re: PEP 249 Compliant error handling

MRAB Tue, 17 Oct 2017 13:05:53 -0700

On 2017-10-17 20:25, Israel Brewster wrote:

On Oct 17, 2017, at 10:35 AM, MRAB <[email protected]<mailto:[email protected]>> wrote:
On 2017-10-17 18:26, Israel Brewster wrote:
I have written and maintain a PEP 249 compliant (hopefully) DB APIfor the 4D database, and I've run into a situation where corruptedstring data from the database can cause the module to error out.Specifically, when decoding the string, I get a "UnicodeDecodeError:'utf-16-le' codec can't decode bytes in position 86-87: illegalUTF-16 surrogate" error. This makes sense, given that the stringdata got corrupted somehow, but the question is "what is the properway to deal with this in the module?" Should I just throw an erroron bad data? Or would it be better to set the errors parameter tosomething like "replace"? The former feels a bit more "proper" to me(there's an error here, so we throw an error), but leaves the enduser dead in the water, with no way to retrieve *any* of the data(from that row at least, and perhaps any rows after it as well). Thelatter option sort of feels like sweeping the problem under the rug,but does at least leave an error character in the s
tring to
l
et them know there was an error, and will allow retrieval of anygood data.Of course, if this was in my own code I could decide on acase-by-case basis what the proper action is, but since this amodule that has to work in any situation, it's a bit more complicated.
If a particular text field is corrupted, then raisingUnicodeDecodeError when trying to get the contents of that field as aUnicode string seems reasonable to me.
Is there a way to get the contents as a bytestring, or to get thecontents with a different errors parameter, so that the user has themeans to fix it (if it's fixable)?
That's certainly a possibility, if that behavior conforms to the DBAPI "standards". My concern in this front is that in my experienceworking with other PEP 249 modules (specifically psycopg2), I'm prettysure that columns designated as type VARCHAR or TEXT are returned asstrings (unicode in python 2, although that may have been a setting Iused), not bytes. The other complication here is that the 4D databasedoesn't use the UTF-8 encoding typically found, but rather UTF-16LE,and I don't know how well this is documented. So not only is the bytesrepresentation completely unintelligible for human consumption, I'mnot sure the average end-user would know what decoding to use.
In the end though, the main thing in my mind is to maintain"standards" compatibility - I don't want to be returning bytes if allother DB API modules return strings, or visa-versa for that matter.There may be some flexibility there, but as much as possible I want toconform to the majority/standard/whatever

The average end-user might not know which encoding is being used, butproviding a way to read the underlying bytes will give a moreexperienced user the means to investigate and possibly fix it: get thebytes, figure out what the string should be, update the field with thecorrectly decoded string using normal DB instructions.

--
https://mail.python.org/mailman/listinfo/python-list

Re: PEP 249 Compliant error handling

Reply via email to