[issue22549] bug in accessing bytes, inconsistent with normal strings and python 2.7

2014-10-03 Thread Kevin Hendricks

New submission from Kevin Hendricks:

Hi,

I am working on porting my ebook code from Python 2.7 to work with both Python 
2.7 and Python 3.4 and have found the following inconsistency I think is a bug 
...

KevinsiMac:~ kbhend$ python3
Python 3.4.1 (v3.4.1:c0e311e010fc, May 18 2014, 00:54:21) 
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin
Type help, copyright, credits or license for more information.

 o = '123456789'

 o[-3]
'7'

 type(o[-3])
class 'str'

 type(o)
class 'str'

the above is what I expected but under python 3 for bytes you get the following 
instead:

 o = b'123456789'

 o[-3]
55

 type(o[-3])
class 'int'

 type(o)
class 'bytes'
 


When I compare this to Python 2.7 for both bytestrings and unicode I see the 
expected behaviour. 

Python 2.7.7 (v2.7.7:f89216059edf, May 31 2014, 12:53:48) 
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin
Type help, copyright, credits or license for more information.


 o = '123456789'

 o[-3]
'7'

 type(o[-3])
type 'str'

 type(o)
type 'str'


 o = u'123456789'

 o[-3]
u'7'

 type(o[-3])
type 'unicode'

 type(o)
type 'unicode'


I would consider this a bug as it makes it much harder to write python code 
that works on both python 2.7 and python 3.4

--
components: Interpreter Core
messages: 228348
nosy: kevinbhendricks
priority: normal
severity: normal
status: open
title: bug in accessing bytes, inconsistent with normal strings and python 2.7
type: behavior
versions: Python 2.7, Python 3.4

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22549
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22549] bug in accessing bytes, inconsistent with normal strings and python 2.7

2014-10-03 Thread Kevin Hendricks

Kevin Hendricks added the comment:

Thanks for letting me know this was expected behaviour.  I see the same issue 
holds true while using:

for c in b'0123456789':
   print(ord(c))
 
I ended up using slices nearly everyplace.  Still ran into iterator issues.  
Horrible hack really.  

I think I will spend some time reading the python dev archives to figure out 
how anyone could defend this approach.

FWIW, introducing a bytes class that works exactly like byte (non-unicode 
strings) in python 2.X but disallowing any automatic up-conversion to full 
unicode (like during concatenation), would have been a useful step.  

I work on decoding binary formatted ebook files all of the time, and python 3's 
second class treatment of bytes makes no sense to me.  Perfectly valid code can 
be written using only utf-8 and latin-1 encoded bytestrings with no need to 
upconvert to anything.  It is practically impossible to support code like that 
in Python 3.

Boggles the mind.

Thanks again for the fast response.

Kevin

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22549
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10694] zipfile.py end of central directory detection not robust

2010-12-20 Thread Kevin Hendricks

Kevin Hendricks kevin.hendri...@sympatico.ca added the comment:

I have not looked at how other tools handle this.  They could simply ignore 
what comes after a valid endrecdata is found, they could strip it out (truncate 
it) or make it into a final comment.  I guess for simply unpacking a zip 
archive, all of these are equivalent (it goes unused).

But if you are copying a zip archive to another archive then ignoring it and 
truncating it may be safer in some sense (since you have no idea what this 
extra data is for and why it might be attached) but then you are not being 
faithful to the original but at the same time you do not want to create 
improper zip archives.  If you change the extra data into a final comment, then 
at least none of the original data is actually lost (just moved slightly in the 
copied zip and protected as a comment) and could be recovered if it turns out 
to have been useful.  With so many things using/making the zip archive format 
(jars, normal zips, epubs, etc) you never know what might have been left over 
at the end of the zip file and if it was important.

So I am not really sure how to deal with this properly.  Also I know nothing 
about _EndRecData64 and if it needs to somehow be handled in a different way.

So someone who is very familiar with the code should look at this and tell us 
what is the right thing to do and even if the approach I took is correct (it 
works fine for me and I have taken to including my own zipfile.py in my own 
projects until this gets worked out) but it might not be the right thing to do.

As for a test case, I know nothing about that but will look at test_zipfile.py. 
 I am a Mac OS X user/developer so all of my code is targeted to running on 
both Python 2.5 (Mac OS X 10.5.X) and Python 2.6 (Mac OS 10.6.X). Python 3.X 
and even Python 2.7 are not on my horizon and not even on my build machine 
(trying to get Mac OS X users to install either would be an experience in 
frustration). I simply looked at the source in Python 2.7 and Python 3.1.3 
(from the official Python releases from python.org) to see that the problem 
still exists (and it does).

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue10694
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10694] zipfile.py end of central directory detection not robust

2010-12-20 Thread Kevin Hendricks

Kevin Hendricks kevin.hendri...@sympatico.ca added the comment:

Been programming on unix/vax and then linux since the mid 80s and on punch 
cards in the late 70s.  Grew my first beard writing 8080 and Z80 assembler.  
All of that was over 30 years ago. 

All I want to do is report a damn bug!

Then I get nudged for a test case (although how to recreate the bug was already 
described) so I add the two lines to create a test case.  

Then I get nudged for a patch, so I give a patch even though there are many 
ways to deal with the issue.

Then I get nudged for patches for other branches, then I get nudged for 
official test_zipfile.py patches.

All of this **before** the damn owner has even bothered to look at it and say 
if he/she even wants it or if the patch is even correct.

I have my own working code for the epub ebook stuff I am working on so this 
issue no longer impacts me.

How did I go from a simple bug report to having to build python's latest 
checkout just to get someone to look at the bug.

You have got to be kidding me!

I am done here,  do what you want with the bug.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue10694
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10694] zipfile.py end of central directory detection not robust

2010-12-19 Thread Kevin Hendricks

Kevin Hendricks kevin.hendri...@sympatico.ca added the comment:

Final patches against the trees make no sense as no developer has decided which 
way they want to actually handle the problem. 

My patch is only one way and I note it may not be the way the owners of the 
code want.

Also, this patch is very straight forward (one hunk) and should apply to 2.6, 
2.7, and 3.1 (although I have not tried it with 3.1) with only line offsets.

So if the owner of this code actually looks that the patch and the bug report 
and makes a decision on how they want to handle this issue and they like the 
patch I have suggested, then I would be happy to diff it against whatever 
zipfile.py versions he/she wants.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue10694
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10694] zipfile.py end of central directory detection not robust

2010-12-18 Thread Kevin Hendricks

Kevin Hendricks kevin.hendri...@sympatico.ca added the comment:

Same problem exists in Python 3.1.3 and in later versions as well.

Same patch should also work.

--
versions: +Python 3.1, Python 3.2

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue10694
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10694] zipfile.py end of central directory detection not robust

2010-12-14 Thread Kevin Hendricks

Kevin Hendricks kevin.hendri...@sympatico.ca added the comment:

If you read the bug report it explains how to generate a testcase (i.e. append 
any data to the end of a zip archive)

Here it is as a step by step process 

1. simply take any working zip and call it testcase.zip 

2. do the following:

echo \r\n  testcase.zip 

If you run unzip -t on testcase.zip it will pass with flying colors and will 
properly unzip on every piece of zip software I have tried.

However if you try to use python to copy the zip archive to another zip archive

python ./zipfix.py testcase.zip junk.zip
Error Occurred  File is not a zip file

All because of the appended carriage return / linefeed at the end.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue10694
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10694] zipfile.py end of central directory detection not robust

2010-12-14 Thread Kevin Hendricks

Kevin Hendricks kevin.hendri...@sympatico.ca added the comment:

Here is one potential patch.  It simply incorporates and non-comment extraneous 
data into a final comment so that nothing is lost.  Another solution that might 
be safer, would be to truncate the zip archive immediately after the endrec is 
found if the extraneous data is not a properly formatted comment.  The right 
solution is obviously up to the developers of zipfile.py

--- zipfile_orig.py 2010-12-14 10:23:58.0 -0500
+++ zipfile.py  2010-12-14 10:30:21.0 -0500
@@ -228,6 +228,13 @@
 # structure present, so go look for it
 return _EndRecData64(fpin, start - filesize, endrec)
 return endrec
+else :
+# be robust to non-comment extaneous data after endrec
+# by making it a comment so that nothing is ever lost
+endrec[_ECD_COMMENT_SIZE] = len(comment)
+endrec.append(comment)
+endrec.append(maxCommentStart + start)
+return endrec
 
 # Unable to find a valid end of central directory structure
 return

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue10694
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10694] zipfile.py end of central directory detection not robust

2010-12-14 Thread Kevin Hendricks

Changes by Kevin Hendricks kevin.hendri...@sympatico.ca:


--
keywords: +patch
versions: +Python 2.5, Python 2.6
Added file: http://bugs.python.org/file20040/more_robust.diff

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue10694
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10694] zipfile.py end of central directory detection not robust

2010-12-13 Thread Kevin Hendricks

New submission from Kevin Hendricks kevin.hendri...@sympatico.ca:

The current version of zipfile.py is not robust to slight errors at the end of 
zip archives.  Many file servers **improperly** append a new line to the end of 
files that do not have a new line when they are uploaded from a browser.  This 
bug ends up adding 0x0d 0xa to the end of the zip archive.  This in turn makes 
zipfile.py eventually throw a Not a zip file exception when no other zip 
tools seem to have trouble with them.  Even unzip -t passes these problem zip 
archives with flying colours.

I hate to have to extract and create my own zipfile.py script just to be robust 
to zip archives that are commonly found on the net and that are handled more 
robustly by other software.

So please consider changing this code from _EndRecData below to simply ignore 
any trailing data after the proper stringEndArchive and structEndArchive are 
found instead of looking for the comment and verifying if the comment is 
properly formatted and throwing an exception if not correct.  Ignoring the 
comment seems to be more robust in this case as everything needed to unpack 
the zip archive has been found.


# Either this is not a ZIP file, or it is a ZIP file with an archive
# comment.  Search the end of the file for the end of central directory
# record signature. The comment is the last item in the ZIP file and may be
# up to 64K long.  It is assumed that the end of central directory magic
# number does not appear in the comment.
maxCommentStart = max(filesize - (1  16) - sizeEndCentDir, 0)
fpin.seek(maxCommentStart, 0)
data = fpin.read()
start = data.rfind(stringEndArchive)
if start = 0:
# found the magic number; attempt to unpack and interpret
recData = data[start:start+sizeEndCentDir]
endrec = list(struct.unpack(structEndArchive, recData))
comment = data[start+sizeEndCentDir:]
# check that comment length is correct
if endrec[_ECD_COMMENT_SIZE] == len(comment):
# Append the archive comment and start offset
endrec.append(comment)
endrec.append(maxCommentStart + start)
if endrec[_ECD_OFFSET] == 0x:
# There is apparently a Zip64 end of central directory
# structure present, so go look for it
return _EndRecData64(fpin, start - filesize, endrec)
return endrec


This will in turn make the Python implementation of zipfile.py more robust to 
data improperly appended when some zip archives are uploaded or downloaded 
(similar to how other zip tools handle this issue).

Thank you for your time and consideration.

--
messages: 123891
nosy: KevinH
priority: normal
severity: normal
status: open
title: zipfile.py end of central directory detection not robust
type: behavior
versions: Python 2.7

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue10694
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com