On 27/06/2006 6:39 AM, Mike Orr wrote: > Tim Peters wrote: >> [EP <[EMAIL PROTECTED]>] >>> This inquiry may either turn out to be about the suitability of the >>> SHA-1 (160 bit digest) for file identification, the sha function in >>> Python ... or about some error in my script >> It's your script. Always open binary files in binary mode. It's a >> disaster on Windows if you don't (if you open a file in text mode on >> Windows, the OS pretends that EOF occurs at the first instance of byte >> chr(26) -- this is an ancient Windows behavior that made an odd kind >> of sense in the mists of history, and has persisted in worship of >> Backward Compatibility despite that the original reason for it went >> away _long_ ago). > > On a semi-related note, I have a database on Linux that imports from a > Macintosh CSV file. The 'csv' module says to always open files in > binary mode, but this didn't work in my case: I had to open it as 'rU' > (text with universal newlines) or 'csv' misparsed it. I'd like the > program to be portable to Windows and Mac. Is there a way around this? > Will I really burn in hell for using 'rU'?
Yes, you will burn in hell for using any old kludge that gets results (by accident) instead of reading the manual to find a principled solution: """ lineterminator The string used to terminate lines in the CSV file. It defaults to '\r\n'. """ In the case of a Mac CSV file, '\r' is probably required. You will burn in hell for asking questions w/o supplying sufficient information, like (a) repr(first few lines of your Mac CSV file) (b) what was the result from the csv module ("didn't work" doesn't cut it). > > What was the odd bit of sense? I know you end console input by typing > ctrl-Z, but I thought it was just like Unix ctrl-D which ends the input > but doesn't actually insert that character. > Pace timbot, the "ancient Windows behavior" was inherited via MS-DOS from CP/M. Sectors on disk were 128 bytes. File sizes were recorded as numbers of sectors, not numbers of bytes. The convention was that the end of a text file was indicated by ^Z. You are correct, modern software shouldn't and usually doesn't gratuitously write ^Z to files, but there is is some software out there that still does, hence the preservation of the convention on reading. More importantly for CSV files, the data may contain *embedded* CRs and LFs that the users had in their spreadsheet file. Reading that with "r" or "rU" will certainly result in "didn't work". HTH, John -- http://mail.python.org/mailman/listinfo/python-list