On Wed, Sep 2, 2009 at 1:58 PM, Robert Kern <robert.k...@gmail.com> wrote:
> On Wed, Sep 2, 2009 at 13:28, Gökhan Sever<gokhanse...@gmail.com> wrote: > > Put the reference manual in: > > > > http://drop.io/1plh5rt > > > > First few pages describe the data format they use. > > Ah. The fields are *not* delimited by a fixed value. Regexes are no > help to you for pulling out the information you need, except perhaps > later to parse the text fields. I think you are also getting spurious > results because your regex matches things inside data fields. > > Instead, you have a header containing the length of the data field > followed by the data field. Create a structured dtype that corresponds > to the DataDir struct on page 15. Note that "unsigned int" there is > actually a numpy.uint16, not a uint32. > > dt = np.dtype([('tagNumber', np.uint16), ('dataOffset', np.uint16), > ('numberBytes', np.uint16), ('samples', np.uint16), ('bytesPerSample', > np.uint16), ('type', np.uint8), ('param1', np.uint8), ('param2', > np.uint8), ('param3', np.uint8), ('address', np.uint16)]) > > Now read dt.itemsize bytes from the file and use > > header = fromstring(f.read(dt.itemsize), dt)[0] > > to get a record object that corresponds to the header. Use the > dataOffset and numberBytes fields to extract the actual data bytes > from the file. > > For example, if we go to the second header field: > > In [28]: f.seek(dt.itemsize,0) > > In [29]: header = np.fromstring(f.read(dt.itemsize), dt)[0] > > In [30]: header > Out[30]: (65530, 100, 8, 1, 8, 255, 0, 0, 0, 43605) > > In [31]: f.seek(header['dataOffset'], 0) > > In [32]: f.read(header['numberBytes']) > Out[32]: 'prj.300\x00' > > > There are still some semantic issues you need to work out, still. > There are multiple "buffers" per file, and the dataOffsets are > relative to the start of the buffer, not the file. > > -- > Robert Kern > > "I have come to believe that the whole world is an enigma, a harmless > enigma that is made terrible by our own mad attempt to interpret it as > though it had an underlying truth." > -- Umberto Eco > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > Robert, You must have thrown a couple RTFM's while replying my emails :) I usually take trial-error approaches initially, and don't give up unless I hit a hurdle so fast, which in this case resulted with the unsuccessful regex approach. However from the good point I have learnt the basics of regular expressions and realized how powerful could they be during a text parsing task. Enough prattle, below is what I am working on: So far I was successfully able to extract the file names and the data associated with those names (with the exception of multiple buffer per file cases). However not reading time increments correctly, I should be seeing 1 sec incremental time ticks from the time segment reading, but all it does is to return the same first time information. Furthermore, I still couldn't figure out how to wrap the main looping suite (range(500) is just a dummy number which will let me process whole binary data) I don't know yet how to make the range input generic which will work any size of similar binary file. import numpy as np import struct f = open('test.sea', 'rb') dt = np.dtype([('tagNumber', np.uint16), ('dataOffset', np.uint16), ('numberBytes', np.uint16), ('samples', np.uint16), ('bytesPerSample', np.uint16), ('type', np.uint8), ('param1', np.uint8), ('param2', np.uint8), ('param3', np.uint8), ('address', np.uint16)]) start = 0 ct = 0 for i in range(500): header = np.fromstring(f.read(dt.itemsize), dt)[0] if header['tagNumber'] == 65530: loc = f.tell() f.seek(start + header['dataOffset']) f.read(header['numberBytes']) f.seek(loc) elif header['tagNumber'] == 65531: loc = f.tell() f.seek(start + header['dataOffset']) f.read(header['numberBytes']) start = f.tell() elif header['tagNumber'] == 0: loc = f.tell() f.seek(start + header['dataOffset']) print f.tell() k = f.read(header['numberBytes'] print struct.unpack('9h', k[:18]) f.seek(loc) ct += 1 -- Gökhan
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion