[issue22549] bug in accessing bytes, inconsistent with normal strings and python 2.7
New submission from Kevin Hendricks: Hi, I am working on porting my ebook code from Python 2.7 to work with both Python 2.7 and Python 3.4 and have found the following inconsistency I think is a bug ... KevinsiMac:~ kbhend$ python3 Python 3.4.1 (v3.4.1:c0e311e010fc, May 18 2014, 00:54:21) [GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin Type help, copyright, credits or license for more information. o = '123456789' o[-3] '7' type(o[-3]) class 'str' type(o) class 'str' the above is what I expected but under python 3 for bytes you get the following instead: o = b'123456789' o[-3] 55 type(o[-3]) class 'int' type(o) class 'bytes' When I compare this to Python 2.7 for both bytestrings and unicode I see the expected behaviour. Python 2.7.7 (v2.7.7:f89216059edf, May 31 2014, 12:53:48) [GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin Type help, copyright, credits or license for more information. o = '123456789' o[-3] '7' type(o[-3]) type 'str' type(o) type 'str' o = u'123456789' o[-3] u'7' type(o[-3]) type 'unicode' type(o) type 'unicode' I would consider this a bug as it makes it much harder to write python code that works on both python 2.7 and python 3.4 -- components: Interpreter Core messages: 228348 nosy: kevinbhendricks priority: normal severity: normal status: open title: bug in accessing bytes, inconsistent with normal strings and python 2.7 type: behavior versions: Python 2.7, Python 3.4 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue22549 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue22549] bug in accessing bytes, inconsistent with normal strings and python 2.7
Kevin Hendricks added the comment: Thanks for letting me know this was expected behaviour. I see the same issue holds true while using: for c in b'0123456789': print(ord(c)) I ended up using slices nearly everyplace. Still ran into iterator issues. Horrible hack really. I think I will spend some time reading the python dev archives to figure out how anyone could defend this approach. FWIW, introducing a bytes class that works exactly like byte (non-unicode strings) in python 2.X but disallowing any automatic up-conversion to full unicode (like during concatenation), would have been a useful step. I work on decoding binary formatted ebook files all of the time, and python 3's second class treatment of bytes makes no sense to me. Perfectly valid code can be written using only utf-8 and latin-1 encoded bytestrings with no need to upconvert to anything. It is practically impossible to support code like that in Python 3. Boggles the mind. Thanks again for the fast response. Kevin -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue22549 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10694] zipfile.py end of central directory detection not robust
Kevin Hendricks kevin.hendri...@sympatico.ca added the comment: I have not looked at how other tools handle this. They could simply ignore what comes after a valid endrecdata is found, they could strip it out (truncate it) or make it into a final comment. I guess for simply unpacking a zip archive, all of these are equivalent (it goes unused). But if you are copying a zip archive to another archive then ignoring it and truncating it may be safer in some sense (since you have no idea what this extra data is for and why it might be attached) but then you are not being faithful to the original but at the same time you do not want to create improper zip archives. If you change the extra data into a final comment, then at least none of the original data is actually lost (just moved slightly in the copied zip and protected as a comment) and could be recovered if it turns out to have been useful. With so many things using/making the zip archive format (jars, normal zips, epubs, etc) you never know what might have been left over at the end of the zip file and if it was important. So I am not really sure how to deal with this properly. Also I know nothing about _EndRecData64 and if it needs to somehow be handled in a different way. So someone who is very familiar with the code should look at this and tell us what is the right thing to do and even if the approach I took is correct (it works fine for me and I have taken to including my own zipfile.py in my own projects until this gets worked out) but it might not be the right thing to do. As for a test case, I know nothing about that but will look at test_zipfile.py. I am a Mac OS X user/developer so all of my code is targeted to running on both Python 2.5 (Mac OS X 10.5.X) and Python 2.6 (Mac OS 10.6.X). Python 3.X and even Python 2.7 are not on my horizon and not even on my build machine (trying to get Mac OS X users to install either would be an experience in frustration). I simply looked at the source in Python 2.7 and Python 3.1.3 (from the official Python releases from python.org) to see that the problem still exists (and it does). -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10694 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10694] zipfile.py end of central directory detection not robust
Kevin Hendricks kevin.hendri...@sympatico.ca added the comment: Been programming on unix/vax and then linux since the mid 80s and on punch cards in the late 70s. Grew my first beard writing 8080 and Z80 assembler. All of that was over 30 years ago. All I want to do is report a damn bug! Then I get nudged for a test case (although how to recreate the bug was already described) so I add the two lines to create a test case. Then I get nudged for a patch, so I give a patch even though there are many ways to deal with the issue. Then I get nudged for patches for other branches, then I get nudged for official test_zipfile.py patches. All of this **before** the damn owner has even bothered to look at it and say if he/she even wants it or if the patch is even correct. I have my own working code for the epub ebook stuff I am working on so this issue no longer impacts me. How did I go from a simple bug report to having to build python's latest checkout just to get someone to look at the bug. You have got to be kidding me! I am done here, do what you want with the bug. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10694 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10694] zipfile.py end of central directory detection not robust
Kevin Hendricks kevin.hendri...@sympatico.ca added the comment: Final patches against the trees make no sense as no developer has decided which way they want to actually handle the problem. My patch is only one way and I note it may not be the way the owners of the code want. Also, this patch is very straight forward (one hunk) and should apply to 2.6, 2.7, and 3.1 (although I have not tried it with 3.1) with only line offsets. So if the owner of this code actually looks that the patch and the bug report and makes a decision on how they want to handle this issue and they like the patch I have suggested, then I would be happy to diff it against whatever zipfile.py versions he/she wants. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10694 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10694] zipfile.py end of central directory detection not robust
Kevin Hendricks kevin.hendri...@sympatico.ca added the comment: Same problem exists in Python 3.1.3 and in later versions as well. Same patch should also work. -- versions: +Python 3.1, Python 3.2 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10694 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10694] zipfile.py end of central directory detection not robust
Kevin Hendricks kevin.hendri...@sympatico.ca added the comment: If you read the bug report it explains how to generate a testcase (i.e. append any data to the end of a zip archive) Here it is as a step by step process 1. simply take any working zip and call it testcase.zip 2. do the following: echo \r\n testcase.zip If you run unzip -t on testcase.zip it will pass with flying colors and will properly unzip on every piece of zip software I have tried. However if you try to use python to copy the zip archive to another zip archive python ./zipfix.py testcase.zip junk.zip Error Occurred File is not a zip file All because of the appended carriage return / linefeed at the end. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10694 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10694] zipfile.py end of central directory detection not robust
Kevin Hendricks kevin.hendri...@sympatico.ca added the comment: Here is one potential patch. It simply incorporates and non-comment extraneous data into a final comment so that nothing is lost. Another solution that might be safer, would be to truncate the zip archive immediately after the endrec is found if the extraneous data is not a properly formatted comment. The right solution is obviously up to the developers of zipfile.py --- zipfile_orig.py 2010-12-14 10:23:58.0 -0500 +++ zipfile.py 2010-12-14 10:30:21.0 -0500 @@ -228,6 +228,13 @@ # structure present, so go look for it return _EndRecData64(fpin, start - filesize, endrec) return endrec +else : +# be robust to non-comment extaneous data after endrec +# by making it a comment so that nothing is ever lost +endrec[_ECD_COMMENT_SIZE] = len(comment) +endrec.append(comment) +endrec.append(maxCommentStart + start) +return endrec # Unable to find a valid end of central directory structure return -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10694 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10694] zipfile.py end of central directory detection not robust
Changes by Kevin Hendricks kevin.hendri...@sympatico.ca: -- keywords: +patch versions: +Python 2.5, Python 2.6 Added file: http://bugs.python.org/file20040/more_robust.diff ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10694 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10694] zipfile.py end of central directory detection not robust
New submission from Kevin Hendricks kevin.hendri...@sympatico.ca: The current version of zipfile.py is not robust to slight errors at the end of zip archives. Many file servers **improperly** append a new line to the end of files that do not have a new line when they are uploaded from a browser. This bug ends up adding 0x0d 0xa to the end of the zip archive. This in turn makes zipfile.py eventually throw a Not a zip file exception when no other zip tools seem to have trouble with them. Even unzip -t passes these problem zip archives with flying colours. I hate to have to extract and create my own zipfile.py script just to be robust to zip archives that are commonly found on the net and that are handled more robustly by other software. So please consider changing this code from _EndRecData below to simply ignore any trailing data after the proper stringEndArchive and structEndArchive are found instead of looking for the comment and verifying if the comment is properly formatted and throwing an exception if not correct. Ignoring the comment seems to be more robust in this case as everything needed to unpack the zip archive has been found. # Either this is not a ZIP file, or it is a ZIP file with an archive # comment. Search the end of the file for the end of central directory # record signature. The comment is the last item in the ZIP file and may be # up to 64K long. It is assumed that the end of central directory magic # number does not appear in the comment. maxCommentStart = max(filesize - (1 16) - sizeEndCentDir, 0) fpin.seek(maxCommentStart, 0) data = fpin.read() start = data.rfind(stringEndArchive) if start = 0: # found the magic number; attempt to unpack and interpret recData = data[start:start+sizeEndCentDir] endrec = list(struct.unpack(structEndArchive, recData)) comment = data[start+sizeEndCentDir:] # check that comment length is correct if endrec[_ECD_COMMENT_SIZE] == len(comment): # Append the archive comment and start offset endrec.append(comment) endrec.append(maxCommentStart + start) if endrec[_ECD_OFFSET] == 0x: # There is apparently a Zip64 end of central directory # structure present, so go look for it return _EndRecData64(fpin, start - filesize, endrec) return endrec This will in turn make the Python implementation of zipfile.py more robust to data improperly appended when some zip archives are uploaded or downloaded (similar to how other zip tools handle this issue). Thank you for your time and consideration. -- messages: 123891 nosy: KevinH priority: normal severity: normal status: open title: zipfile.py end of central directory detection not robust type: behavior versions: Python 2.7 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10694 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com