On Wed, Sep 2, 2009 at 13:28, Gökhan Sever<gokhanse...@gmail.com> wrote: > Put the reference manual in: > > http://drop.io/1plh5rt > > First few pages describe the data format they use.
Ah. The fields are *not* delimited by a fixed value. Regexes are no help to you for pulling out the information you need, except perhaps later to parse the text fields. I think you are also getting spurious results because your regex matches things inside data fields. Instead, you have a header containing the length of the data field followed by the data field. Create a structured dtype that corresponds to the DataDir struct on page 15. Note that "unsigned int" there is actually a numpy.uint16, not a uint32. dt = np.dtype([('tagNumber', np.uint16), ('dataOffset', np.uint16), ('numberBytes', np.uint16), ('samples', np.uint16), ('bytesPerSample', np.uint16), ('type', np.uint8), ('param1', np.uint8), ('param2', np.uint8), ('param3', np.uint8), ('address', np.uint16)]) Now read dt.itemsize bytes from the file and use header = fromstring(f.read(dt.itemsize), dt)[0] to get a record object that corresponds to the header. Use the dataOffset and numberBytes fields to extract the actual data bytes from the file. For example, if we go to the second header field: In [28]: f.seek(dt.itemsize,0) In [29]: header = np.fromstring(f.read(dt.itemsize), dt)[0] In [30]: header Out[30]: (65530, 100, 8, 1, 8, 255, 0, 0, 0, 43605) In [31]: f.seek(header['dataOffset'], 0) In [32]: f.read(header['numberBytes']) Out[32]: 'prj.300\x00' There are still some semantic issues you need to work out, still. There are multiple "buffers" per file, and the dataOffsets are relative to the start of the buffer, not the file. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion