Chris, David, and Ingo Thank you for your replies today. This is great information and I'll try working through some of these suggestion ASAP, however, I juggle this work (PhD in Ocean Sciences) with primary work as a technician for HF radar. So the data that I'm processing is scientific. I can do this in Matlab, however, the IO in Matlab is woefully slow and I need to go through thousands of these files, hence my switch to Perl and PDL. Since I knew a bit of Perl for file organisation I thought that converting my code from Matlab to Perl (with PDL) would be straightforward ... it's not, at least, for me!
@Chris, Starting off, I'm confused on $tmpl and what this should look like. I have read through perlpacktut and feel even stupider now. Actually, I understood the words but do not know how to apply it to this particular type of file ... I have tried just unpacking the first few bytes and just this was poking holes in the dark. I do realise $tmpl = 'n+xn+xn ... ' should be something like this, but I'm lacking something fundamental here because the documentation and what I implement don't produce the desired result. So if pack/unpack is my friend is a very distant friendship! Getting at the header information is critical to extracting the data and I think I now understand NDims and Dims. Once past the hurdle of the header I think that I'll try both the for loop and the vectorised version of extracting the code. Cutting out portions of the data via slice will be the next step as I'll be doing a peak search in the data. Thanks for your help. Kind Regards. -------------------------------------------- Daniel Atwater Australian Coastal Ocean Radar Network James Cook University P: +61(0)7 4781 4184 M: +61(0)4 2991 4545 E: *[email protected] * On 27 June 2011 21:03, chm <[email protected]> wrote: > On 6/26/2011 8:01 PM, Chris Marshall wrote: > >> I can't help you with specifics (the description is >> a bit sketchy) but I can suggest a few things. >> >> (1) PDL::IO::FlexRaw is well suited for reading >> binary data files of multidimensional arrays >> but for mixed-type headers (like a struct in >> C) pack/unpack is your friend. >> >> (2) Read the header part into a byte pdl of the >> appropriate size (assuming you know how big >> it is). E.g., >> >> $hdr = readflex(FH, [{Type=>'byte',NDims=>1,Dims=>**[$hdrsize]}]); >> > > This should be either \*FH (a reference to a file > handle/typeglob) or a handle from IO::File->new(). > > Now you have $hdr as a $hdrsize piddle of bytes. You >> can access the bytes in the piddle using the get_dataref >> method which returns a perl ref to the pdl data as a string >> which you can use with unpack to extract any needed >> fields: >> >> @fields = unpack $tmpl, ${ $hdr->get_dataref }; >> >> where $tmpl is the pack/unpack template for the >> header data you have. >> >> (3) Now you can read the piddle data using the info >> in the header. Since you appear to have an >> array of structures, you need to loop over the >> nRangeCells: >> >> for (my $i=0; $i<nRangeCells; $i++) { >> @data = readflex(FH, [ >> { Type=>'float', NDims=>1, Dims=>[nDopplerCells] }, >> { Type=>'float', NDims=>1, Dims=>[nDopplerCells] }, >> { Type=>'float', NDims=>1, Dims=>[nDopplerCells] }, >> { Type=>'float', NDims=>2, Dims=>[2,nDopplerCells] }, >> { Type=>'float', NDims=>2, Dims=>[2,nDopplerCells] }, >> { Type=>'float', NDims=>2, Dims=>[2,nDopplerCells] } ]); >> # do something with @data here >> # you'll have to handle any special cases as well >> } >> >> Also, note the use of Type=>'float' with [2,nDopplerCells] >> >> rather than Type=>'complex' with [nDopplerCells] since PDL >> doesn't have a native C complex data type. >> > > For more performance (and perhaps clarity), you > could take advantage of the fact that all of the > data appears to be in chunks of 'float' assuming > the complex data is single precision. You could > replace the entire loop with a single read and > use PDL slicing operations to rearrange the data. > E.g., the above example could become: > > $data = readflex($fh, > [ { Type=>'float', > NDims=>3, > Dims=>[nDopplerCells,9,**nRangeCells] } ] ); > > where 9 is 1+1+1+2+2+2 and you could slice out > the first complex chunk for $i==7 as > > $data(:,3:4,(7))->clump(2)->**splitdim(0,2) > > where in the (untested) code above the clump > and splitdim are used to make the slice dims > match the actual data dims. It might be simpler > to read a 1-dim piddle of 'float' and slice from > that instead.... > > (4) Whatever you have for NDims should be the same as >> the number of elements of your Dims=>[] array ref >> so your use of an 80 dimensional array is not >> consistent with the single dimension you specify. >> NOTE: I've never seen data with more than several >> dimensions. If you have 80, then something may >> be suspect... >> >> Hope this helps, >> Chris >> >> On 6/26/2011 6:20 PM, dpath2o wrote: >> >>> Dear PDL-List, >>> >>> I have a binary file that I need to extract information. Before coming to >>> PDL::IO::FlexRaw I had been trying (w/o success) to get this file read >>> with >>> Perl functions read/unpack. This was proving tedious -- like picking up >>> freshly caught sardines on stainless steel with your elbows. Possibly >>> that >>> was foreseeable, nonetheless, I scooped the Perl monks with a query >>> and they >>> pointed out the obvious, ``why not use PDL::IO::FlexRaw''. Alas, this >>> module >>> seems rife with possibility for slurping what I need into piddles and >>> then >>> doing what I want with the piddles. Unfortunately I cannot make out the >>> format of the header file. Does it take signed and unsigned integers >>> (16,32,64 bit) ? How about IEEE compliant floats? >>> >>> More directly, can someone that knows FlexRaw have a look below at the >>> outline of the format for the binary files that I have and suggest a >>> header >>> template? >>> >>> I have tried the following, but this is not exactly right, and I'm a >>> little >>> in the dark on how to proceed given the documentation (FlexRaw) doesn't >>> outline a general case for the header file, rather just an example for >>> Fortran 77, which my binary files are not ... my attempt: >>> my $header = [ >>> { Type => 'byte', NDims => 1, Dims => [ 4 ] }, >>> { Type => 'byte', NDims => 1, Dims => [ 4 ] }, >>> { Type => 'byte', NDims => 1, Dims => [ 4 ] }, >>> { Type => 'byte', NDims => 1, Dims => [ 4 ] }, >>> { Type => 'byte', NDims => 1, Dims => [ 4 ] }, >>> { Type => 'byte', NDims => 1, Dims => [ 4 ] }, >>> { Type => 'byte', NDims => 1, Dims => [ 4 ] }, >>> { Type => 'byte', NDims => 1, Dims => [ 4 ] }, >>> { Type => 'byte', NDims => 1, Dims => [ 4 ] }, >>> { Type => 'byte', NDims => 1, Dims => [ 4 ] }, >>> { Type => 'byte', NDims => 1, Dims => [ 4 ] }, >>> { Type => 'byte', NDims => 1, Dims => [ 4 ] }, >>> { Type => 'byte', NDims => 1, Dims => [ 4 ] }, >>> { Type => 'byte', NDims => 1, Dims => [ 4 ] }, >>> { Type => 'byte', NDims => 1, Dims => [ 4 ] }, >>> { Type => 'byte', NDims => 1, Dims => [ 4 ] }, >>> { Type => 'byte', NDims => 1, Dims => [ 4 ] }, >>> { Type => 'byte', NDims => 1, Dims => [ 4 ] }, >>> { Type => 'byte', NDims => 1, Dims => [ 4 ] }, >>> { Type => 'byte', NDims => 80, Dims => [ 512 ] }, >>> { Type => 'byte', NDims => 80, Dims => [ 512 ] }, >>> { Type => 'byte', NDims => 80, Dims => [ 512 ] }, >>> { Type => 'byte', NDims => 80, Dims => [ 512 ] }, >>> { Type => 'byte', NDims => 80, Dims => [ 512 ] }, >>> { Type => 'byte', NDims => 80, Dims => [ 512 ] }, >>> { Type => 'byte', NDims => 80, Dims => [ 512 ] } >>> ]; >>> my @dat = readflex($file,$header); >>> >>> And now the complete description of the file: >>> # HEADER: >>> # Each File has two major sections. A Header section and a Data section. >>> The Header section is as follows: >>> # - The header is expandable. Each newer version also contains the >>> information used the by older version. >>> # - When reading a CrossSpectra file that is a newer version than you >>> expect then use the Extent field to skip to the beginning of the cross >>> spectra data. >>> # - The following Header description is a set of data fields in order >>> where each field description is a value type with implied size, >>> followed by >>> the field name, and followed by the fieldʼs description. >>> # * Note. If version is 3 or less, then nRangeCells=31, >>> nDopplerCells=512, nFirstRangeCell=1 >>> # >>> # Version 1: >>> # * SInt16 -> nCsaFileVersion -> File Version 1 to latest. (If greater >>> than 32, itʼs probably not a spectra file.) >>> # * UInt32 -> nDateTime -> TimeStamp. Seconds from Jan 1,1904 local >>> computer time at site. The timestamp for CSQ files represents the >>> start time >>> of the data (nCsaKind = 1). The timestamp for CSS and CSA files is the >>> center time of the data (nCsaKind = 2). >>> # * SInt32 -> nV1Extent -> Header Bytes extension (Version 4 is +62 >>> Bytes Till Data) >>> # >>> # Version 2: >>> # * SInt16 -> nCsKind -> Type of CrossSpectra Data. 1 is self spectra >>> for all used channels, followed by cross spectra. Timestamp is start >>> time of >>> data. 2 is self spectra for all used channels, followed by cross spectra, >>> followed by quality data. Timestamp is center time of data. >>> # * SInt32 -> nV2Extent -> Header Bytes extension (Version 4 is +56 >>> Bytes Till Data) >>> # >>> # Version 3: >>> # * Char4 -> nSiteCodeName -> Four character site code 'site' >>> # * SInt32 -> nV3Extent -> Header Bytes extension (Version 4 is +48 >>> Bytes Till Data) >>> # >>> # Version 4: >>> # * SInt32 -> nCoverageMinutes -> Coverage Time in minutes for the >>> data. ʻCSQ' is normally 5minutes (4.5 rounded). 'CSS' is normally >>> 15minutes >>> average. 'CSA' is normally 60minutes average. >>> # * SInt32 -> bDeletedSource -> Was the ʻCSQ' deleted by CSPro after >>> reading. >>> # * SInt32 -> bOverrideSourceInfo -> If not zero, CSPro used its own >>> preferences to override the source ʻCSQʼ spectra sweep settings. >>> # * Float -> fStartFreqMHz -> Transmit Start Freq in MHz >>> # * Float -> fRepFreqHz -> Transmit Sweep Rate in Hz >>> # * Float -> fBandwidthKHz -> Transmit Sweep bandwidth in kHz >>> # * SInt32 -> bSweepUp -> Transmit Sweep Freq direction is up if non >>> zero, else down. NOTE: CenterFreq is fStartFreqMHz + fBandwidthKHz/2 * >>> -2^(bSweepUp==0) >>> # * SInt32 -> nDopplerCells -> Number of Doppler Cells (nominally 512) >>> # * SInt32 -> nRangeCells -> Number of RangeCells (nominally 32 for >>> ʻCSQ', 31 for 'CSS'& 'CSA') >>> # * SInt32 -> nFirstRangeCell -> Index of First Range Cell in data from >>> zero at the receiver. ʻCSQ' files nominally use zero. 'CSS' or 'CSA' >>> files >>> nominally use one because CSPro cuts off the first range cell as >>> meaningless. >>> # * Float -> fRangeCellDistKm -> Distance between range cells in >>> kilometers. >>> # * SInt32 -> nV4Extent -> Header Bytes extension (Version 4 is +0 >>> Bytes Till Data) >>> # >>> # Version 5: >>> # * SInt32 -> nOutputInterval -> The Output Interval in Minutes. >>> # * Char4 -> nCreatorTypeCode -> The creator application type code. >>> # * Char4 -> nCreatorVersion -> The creator application version. >>> # * SInt32 -> nActiveChannels -> Number of active antennas >>> # * SInt32 -> nSpectraChannels -> Number antenna used in cross spectra >>> # * UInt32 -> nActiveChannelBits -> Bit indicator of which antennas are >>> in use msb is ant#1 to lsb #32 >>> # * SInt32 -> nV5Extent -> Header Bytes extension (Version 5 is +0 >>> Bytes Till Data) If zero then cross spectra data follows, but if this >>> file >>> were version 6 or greater then the nV5Extent would tell you how many more >>> bytes the version 6 and greater uses until the data. >>> # >>> # DATA: >>> # The data section is a multi-dimensional array of self and cross spectra >>> data. >>> # Repeat For 1 to nRangeCells: >>> # * Float[nDopplerCells] Antenna1 voltage squared amplitude self >>> spectra. >>> # * Float[nDopplerCells] Antenna2 voltage squared amplitude self >>> spectra. >>> # * Float[nDopplerCells] Antenna3 voltage squared amplitude self >>> spectra. >>> # (Warning: Some Antenna3 amplitude values may be negative to >>> indicate noise or interference at those doppler bins. These negative >>> values >>> should be absoluted before use.) >>> # * Complex[nDopplerCells] Antenna 1 to Antenna 2 cross spectra. >>> # * Complex[nDopplerCells] Antenna 1 to Antenna 3 cross spectra. >>> # * Complex[nDopplerCells] Antenna 2 to Antenna 3 cross spectra. >>> # if nCsaKind is 2 then also read or skip >>> # * Float[nDopplerCells] Quality array from zero to one in value. >>> # End Repeat >>> # >>> # Note: To convert self spectra to dBm use: >>> # 10*log10(abs(voltagesquared)) - (-40. + 5.8) >>> * * >>> # The -40. is conversion loss in the receiver and +5.8 is processing >>> computational gain. >>> >>> >>> >>> >>> ______________________________**_________________ >>> Perldl mailing list >>> [email protected] >>> http://mailman.jach.hawaii.**edu/mailman/listinfo/perldl<http://mailman.jach.hawaii.edu/mailman/listinfo/perldl> >>> >> >> >> ______________________________**_________________ >> Perldl mailing list >> [email protected] >> http://mailman.jach.hawaii.**edu/mailman/listinfo/perldl<http://mailman.jach.hawaii.edu/mailman/listinfo/perldl> >> >> >> ----- >> No virus found in this message. >> Checked by AVG - www.avg.com >> Version: 10.0.1388 / Virus Database: 1513/3727 - Release Date: 06/26/11 >> > >
_______________________________________________ Perldl mailing list [email protected] http://mailman.jach.hawaii.edu/mailman/listinfo/perldl
