[Tutor] reading binary file on windows and linux

2010-05-09 Thread Jan Jansen
Hello,

I've got some trouble reading binary files with struct.unpack on windows.
According to the documentation of the binary file's content, at the
beginning there're some simple bytes (labeled as 'UChar: 8-bit unsigned
byte'). Within those bytes there's a sequence to check the file's sanity.
The sequence is (in ascii C-Notation):
 
\n
\r
\n
 
I've downloaded the file from the same website from two machines. One is a
Windows 7 64-Bit, the other one is a virtual Linux machine. Now the trouble
is while on linux everything is fine, on windows the carriage return does
not appear when reading the file with struct.unpack.

The file sizes on Linux and Windows are exaktly the same, and also my script
determines the file sizes correctly on both plattforms (according to the
OS). When I open the file on Windows in an editor and display the
whitespaces, the linefeed and cariage-return are shown a expected.

The code I'm using to check the first 80 bytes of the file is:

import struct
import sys

with open(sys.argv[1]) as source:
size = struct.calcsize(80B)
raw_data = struct.unpack(80B, source.read(size))
for i, data in enumerate(raw_data):
print i, data, chr(data)
source.seek(0, 2)
print source.tell()


Any suggestions are highly appreciated.

Cheers,

Jan
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] reading binary file on windows and linux

2010-05-09 Thread Adam Bark
On 9 May 2010 18:33, Jan Jansen knack...@googlemail.com wrote:

 Hello,

 I've got some trouble reading binary files with struct.unpack on windows.
 According to the documentation of the binary file's content, at the
 beginning there're some simple bytes (labeled as 'UChar: 8-bit unsigned
 byte'). Within those bytes there's a sequence to check the file's sanity.
 The sequence is (in ascii C-Notation):
  
 \n
 \r
 \n
  
 I've downloaded the file from the same website from two machines. One is a
 Windows 7 64-Bit, the other one is a virtual Linux machine. Now the trouble
 is while on linux everything is fine, on windows the carriage return does
 not appear when reading the file with struct.unpack.

 The file sizes on Linux and Windows are exaktly the same, and also my
 script determines the file sizes correctly on both plattforms (according to
 the OS). When I open the file on Windows in an editor and display the
 whitespaces, the linefeed and cariage-return are shown a expected.

 The code I'm using to check the first 80 bytes of the file is:

 import struct
 import sys

 with open(sys.argv[1]) as source:
 size = struct.calcsize(80B)
 raw_data = struct.unpack(80B, source.read(size))
 for i, data in enumerate(raw_data):
 print i, data, chr(data)
 source.seek(0, 2)
 print source.tell()


 Any suggestions are highly appreciated.

 Cheers,

 Jan


I'd guess that it's because newline in windows is /r/n and in linux it's
just /n. If you read the file as binary rather than text then it should work
the same on both platforms ie use:
open(sys.argv[1], rb)

HTH,
Adam.
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] reading binary file on windows and linux

2010-05-09 Thread Hugo Arts
On Sun, May 9, 2010 at 7:33 PM, Jan Jansen knack...@googlemail.com wrote:
 Hello,

 I've got some trouble reading binary files with struct.unpack on windows.
 According to the documentation of the binary file's content, at the
 beginning there're some simple bytes (labeled as 'UChar: 8-bit unsigned
 byte'). Within those bytes there's a sequence to check the file's sanity.
 The sequence is (in ascii C-Notation):
  
 \n
 \r
 \n
  
 I've downloaded the file from the same website from two machines. One is a
 Windows 7 64-Bit, the other one is a virtual Linux machine. Now the trouble
 is while on linux everything is fine, on windows the carriage return does
 not appear when reading the file with struct.unpack.

 The file sizes on Linux and Windows are exaktly the same, and also my script
 determines the file sizes correctly on both plattforms (according to the
 OS). When I open the file on Windows in an editor and display the
 whitespaces, the linefeed and cariage-return are shown a expected.

 The code I'm using to check the first 80 bytes of the file is:

 import struct
 import sys

 with open(sys.argv[1]) as source:
     size = struct.calcsize(80B)
     raw_data = struct.unpack(80B, source.read(size))
     for i, data in enumerate(raw_data):
         print i, data, chr(data)
     source.seek(0, 2)
     print source.tell()


Since the file is binary, you should use the b mode when opening it:

with open(sys.argv[1], rb) as source:

otherwise, the file will open in text mode, which converts newline
characters to/from a platform specific representation when reading or
writing. In windows, that representation is \r\n, meaning that that
sequence is converted to just \n when you read from the file. That is
why the carriage return disappears.

Hugo
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] reading binary file on windows and linux

2010-05-09 Thread Steven D'Aprano
On Mon, 10 May 2010 03:33:51 am Jan Jansen wrote:
 Hello,

 I've got some trouble reading binary files with struct.unpack on
 windows. 
[...] 
 The code I'm using to check the first 80 bytes of the file is:

 import struct
 import sys

 with open(sys.argv[1]) as source:

You're opening the file in text mode. On Linux, there's no difference, 
but on Windows, it will do strange things to the end of lines. You need 
to open the file in binary mode:

open(sys.argv[1], 'rb') 



-- 
Steven D'Aprano
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] reading binary file on windows and linux

2010-05-09 Thread spir ☣
On Sun, 9 May 2010 19:33:51 +0200
Jan Jansen knack...@googlemail.com wrote:

 Hello,
 
 I've got some trouble reading binary files with struct.unpack on windows.
 According to the documentation of the binary file's content, at the
 beginning there're some simple bytes (labeled as 'UChar: 8-bit unsigned
 byte'). Within those bytes there's a sequence to check the file's sanity.
 The sequence is (in ascii C-Notation):
  
 \n
 \r
 \n
  
 I've downloaded the file from the same website from two machines. One is a
 Windows 7 64-Bit, the other one is a virtual Linux machine. Now the trouble
 is while on linux everything is fine, on windows the carriage return does
 not appear when reading the file with struct.unpack.
 
 The file sizes on Linux and Windows are exaktly the same, and also my script
 determines the file sizes correctly on both plattforms (according to the
 OS). When I open the file on Windows in an editor and display the
 whitespaces, the linefeed and cariage-return are shown a expected.
 
 The code I'm using to check the first 80 bytes of the file is:
 
 import struct
 import sys
 
 with open(sys.argv[1]) as source:
 size = struct.calcsize(80B)
 raw_data = struct.unpack(80B, source.read(size))
 for i, data in enumerate(raw_data):
 print i, data, chr(data)
 source.seek(0, 2)
 print source.tell()

I guess (but am not 100% sure because never use 'b'), the issue will be solved 
using:

   with open(sys.argv[1], 'rb') as source:

The reason is by default files are opened in read 'r' and text mode. In text 
mode, whatever char seq is used by a given OS with the sense of line 
separator (\r\n' under win) is silently converted by python to a canonical 
code made of the single '\n' (char #0xa). So that, in your case, in the header 
sub-sequence '\r'+'\n' you lose '\r'.
In so-called bynary mode 'b' instead, python does not perform this replacement 
anymore, so that you get the raw byte sequence.

Hope I'm right on this and it helps.


Denis


vit esse estrany ☣

spir.wikidot.com
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor