On Mon, Jun 27, 2011 at 9:01 AM, dpath2o <[email protected]> wrote:
> Chris, David, and Ingo
>
> ... I can do this in Matlab, however, the IO in Matlab is woefully
> slow and I need to go through thousands of these files, hence my
> switch to Perl and PDL. Since I knew a bit of Perl for file organisation
> I thought that converting my code from Matlab to Perl (with PDL)
> would be straightforward ... it's not, at least, for me!
If you have matlab code to read these files, it should
be possible to transcribe them to perl/PDL by mapping
to corresponding IO functions. Please keep track of
where you get stuck or where something is not clearly
documented/discoverable.
> @Chris, Starting off, I'm confused on $tmpl and what this should look like.
> I have read through perlpacktut and feel even stupider now. Actually, I
> understood the words but do not know how to apply it to this particular type
> of file ... I have tried just unpacking the first few bytes and just this
> was poking holes in the dark. I do realise $tmpl = 'n+xn+xn ... ' should be
> something like this, but I'm lacking something fundamental here because the
> documentation and what I implement don't produce the desired result.
I suggest working out the unpack arguments by going
interactively in the pdl2 (or perldl) shell. For example,
when I need to build a template string, I start by reading
in some example data, and then repeating unpack's
with different template strings and printing the output
until I get what I want. E.g.,
pdl> $fh = IO::File->new('datafile')
pdl> { local $/; $file = <$fh>; }
pdl> p $file
This is from the
datafile. Your
data would need
to be unpacked..
pdl> $tmpl = 'S S C C S' # i.e., ushort, ushort, uchar, uchar, ushort
pdl> @hdr = unpack $tmpl, $file
pdl> print "@hdr"
If your datatypes match existing PDL ones, you
can use readflex to read the header directly as the
example from David shows.
Good luck,
Chris
> --------------------------------------------
> Daniel Atwater
> Australian Coastal Ocean Radar Network
> James Cook University
> P: +61(0)7 4781 4184
> M: +61(0)4 2991 4545
> E: [email protected]
>
>
> On 27 June 2011 21:03, chm <[email protected]> wrote:
>>
>> On 6/26/2011 8:01 PM, Chris Marshall wrote:
>>>
>>> I can't help you with specifics (the description is
>>> a bit sketchy) but I can suggest a few things.
>>>
>>> (1) PDL::IO::FlexRaw is well suited for reading
>>> binary data files of multidimensional arrays
>>> but for mixed-type headers (like a struct in
>>> C) pack/unpack is your friend.
>>>
>>> (2) Read the header part into a byte pdl of the
>>> appropriate size (assuming you know how big
>>> it is). E.g.,
>>>
>>> $hdr = readflex(FH, [{Type=>'byte',NDims=>1,Dims=>[$hdrsize]}]);
>>
>> This should be either \*FH (a reference to a file
>> handle/typeglob) or a handle from IO::File->new().
>>
>>> Now you have $hdr as a $hdrsize piddle of bytes. You
>>> can access the bytes in the piddle using the get_dataref
>>> method which returns a perl ref to the pdl data as a string
>>> which you can use with unpack to extract any needed
>>> fields:
>>>
>>> @fields = unpack $tmpl, ${ $hdr->get_dataref };
>>>
>>> where $tmpl is the pack/unpack template for the
>>> header data you have.
>>>
>>> (3) Now you can read the piddle data using the info
>>> in the header. Since you appear to have an
>>> array of structures, you need to loop over the
>>> nRangeCells:
>>>
>>> for (my $i=0; $i<nRangeCells; $i++) {
>>> @data = readflex(FH, [
>>> { Type=>'float', NDims=>1, Dims=>[nDopplerCells] },
>>> { Type=>'float', NDims=>1, Dims=>[nDopplerCells] },
>>> { Type=>'float', NDims=>1, Dims=>[nDopplerCells] },
>>> { Type=>'float', NDims=>2, Dims=>[2,nDopplerCells] },
>>> { Type=>'float', NDims=>2, Dims=>[2,nDopplerCells] },
>>> { Type=>'float', NDims=>2, Dims=>[2,nDopplerCells] } ]);
>>> # do something with @data here
>>> # you'll have to handle any special cases as well
>>> }
>>>
>>> Also, note the use of Type=>'float' with [2,nDopplerCells]
>>> rather than Type=>'complex' with [nDopplerCells] since PDL
>>> doesn't have a native C complex data type.
>>
>> For more performance (and perhaps clarity), you
>> could take advantage of the fact that all of the
>> data appears to be in chunks of 'float' assuming
>> the complex data is single precision. You could
>> replace the entire loop with a single read and
>> use PDL slicing operations to rearrange the data.
>> E.g., the above example could become:
>>
>> $data = readflex($fh,
>> [ { Type=>'float',
>> NDims=>3,
>> Dims=>[nDopplerCells,9,nRangeCells] } ] );
>>
>> where 9 is 1+1+1+2+2+2 and you could slice out
>> the first complex chunk for $i==7 as
>>
>> $data(:,3:4,(7))->clump(2)->splitdim(0,2)
>>
>> where in the (untested) code above the clump
>> and splitdim are used to make the slice dims
>> match the actual data dims. It might be simpler
>> to read a 1-dim piddle of 'float' and slice from
>> that instead....
>>
>>> (4) Whatever you have for NDims should be the same as
>>> the number of elements of your Dims=>[] array ref
>>> so your use of an 80 dimensional array is not
>>> consistent with the single dimension you specify.
>>> NOTE: I've never seen data with more than several
>>> dimensions. If you have 80, then something may
>>> be suspect...
>>>
>>> Hope this helps,
>>> Chris
>>>
>>> On 6/26/2011 6:20 PM, dpath2o wrote:
>>>>
>>>> Dear PDL-List,
>>>>
>>>> I have a binary file that I need to extract information. Before coming
>>>> to
>>>> PDL::IO::FlexRaw I had been trying (w/o success) to get this file read
>>>> with
>>>> Perl functions read/unpack. This was proving tedious -- like picking up
>>>> freshly caught sardines on stainless steel with your elbows. Possibly
>>>> that
>>>> was foreseeable, nonetheless, I scooped the Perl monks with a query
>>>> and they
>>>> pointed out the obvious, ``why not use PDL::IO::FlexRaw''. Alas, this
>>>> module
>>>> seems rife with possibility for slurping what I need into piddles and
>>>> then
>>>> doing what I want with the piddles. Unfortunately I cannot make out the
>>>> format of the header file. Does it take signed and unsigned integers
>>>> (16,32,64 bit) ? How about IEEE compliant floats?
>>>>
>>>> More directly, can someone that knows FlexRaw have a look below at the
>>>> outline of the format for the binary files that I have and suggest a
>>>> header
>>>> template?
>>>>
>>>> I have tried the following, but this is not exactly right, and I'm a
>>>> little
>>>> in the dark on how to proceed given the documentation (FlexRaw) doesn't
>>>> outline a general case for the header file, rather just an example for
>>>> Fortran 77, which my binary files are not ... my attempt:
>>>> my $header = [
>>>> { Type => 'byte', NDims => 1, Dims => [ 4 ] },
>>>> { Type => 'byte', NDims => 1, Dims => [ 4 ] },
>>>> { Type => 'byte', NDims => 1, Dims => [ 4 ] },
>>>> { Type => 'byte', NDims => 1, Dims => [ 4 ] },
>>>> { Type => 'byte', NDims => 1, Dims => [ 4 ] },
>>>> { Type => 'byte', NDims => 1, Dims => [ 4 ] },
>>>> { Type => 'byte', NDims => 1, Dims => [ 4 ] },
>>>> { Type => 'byte', NDims => 1, Dims => [ 4 ] },
>>>> { Type => 'byte', NDims => 1, Dims => [ 4 ] },
>>>> { Type => 'byte', NDims => 1, Dims => [ 4 ] },
>>>> { Type => 'byte', NDims => 1, Dims => [ 4 ] },
>>>> { Type => 'byte', NDims => 1, Dims => [ 4 ] },
>>>> { Type => 'byte', NDims => 1, Dims => [ 4 ] },
>>>> { Type => 'byte', NDims => 1, Dims => [ 4 ] },
>>>> { Type => 'byte', NDims => 1, Dims => [ 4 ] },
>>>> { Type => 'byte', NDims => 1, Dims => [ 4 ] },
>>>> { Type => 'byte', NDims => 1, Dims => [ 4 ] },
>>>> { Type => 'byte', NDims => 1, Dims => [ 4 ] },
>>>> { Type => 'byte', NDims => 1, Dims => [ 4 ] },
>>>> { Type => 'byte', NDims => 80, Dims => [ 512 ] },
>>>> { Type => 'byte', NDims => 80, Dims => [ 512 ] },
>>>> { Type => 'byte', NDims => 80, Dims => [ 512 ] },
>>>> { Type => 'byte', NDims => 80, Dims => [ 512 ] },
>>>> { Type => 'byte', NDims => 80, Dims => [ 512 ] },
>>>> { Type => 'byte', NDims => 80, Dims => [ 512 ] },
>>>> { Type => 'byte', NDims => 80, Dims => [ 512 ] }
>>>> ];
>>>> my @dat = readflex($file,$header);
>>>>
>>>> And now the complete description of the file:
>>>> # HEADER:
>>>> # Each File has two major sections. A Header section and a Data section.
>>>> The Header section is as follows:
>>>> # - The header is expandable. Each newer version also contains the
>>>> information used the by older version.
>>>> # - When reading a CrossSpectra file that is a newer version than you
>>>> expect then use the Extent field to skip to the beginning of the cross
>>>> spectra data.
>>>> # - The following Header description is a set of data fields in order
>>>> where each field description is a value type with implied size,
>>>> followed by
>>>> the field name, and followed by the fieldʼs description.
>>>> # * Note. If version is 3 or less, then nRangeCells=31,
>>>> nDopplerCells=512, nFirstRangeCell=1
>>>> #
>>>> # Version 1:
>>>> # * SInt16 -> nCsaFileVersion -> File Version 1 to latest. (If greater
>>>> than 32, itʼs probably not a spectra file.)
>>>> # * UInt32 -> nDateTime -> TimeStamp. Seconds from Jan 1,1904 local
>>>> computer time at site. The timestamp for CSQ files represents the
>>>> start time
>>>> of the data (nCsaKind = 1). The timestamp for CSS and CSA files is the
>>>> center time of the data (nCsaKind = 2).
>>>> # * SInt32 -> nV1Extent -> Header Bytes extension (Version 4 is +62
>>>> Bytes Till Data)
>>>> #
>>>> # Version 2:
>>>> # * SInt16 -> nCsKind -> Type of CrossSpectra Data. 1 is self spectra
>>>> for all used channels, followed by cross spectra. Timestamp is start
>>>> time of
>>>> data. 2 is self spectra for all used channels, followed by cross
>>>> spectra,
>>>> followed by quality data. Timestamp is center time of data.
>>>> # * SInt32 -> nV2Extent -> Header Bytes extension (Version 4 is +56
>>>> Bytes Till Data)
>>>> #
>>>> # Version 3:
>>>> # * Char4 -> nSiteCodeName -> Four character site code 'site'
>>>> # * SInt32 -> nV3Extent -> Header Bytes extension (Version 4 is +48
>>>> Bytes Till Data)
>>>> #
>>>> # Version 4:
>>>> # * SInt32 -> nCoverageMinutes -> Coverage Time in minutes for the
>>>> data. ʻCSQ' is normally 5minutes (4.5 rounded). 'CSS' is normally
>>>> 15minutes
>>>> average. 'CSA' is normally 60minutes average.
>>>> # * SInt32 -> bDeletedSource -> Was the ʻCSQ' deleted by CSPro after
>>>> reading.
>>>> # * SInt32 -> bOverrideSourceInfo -> If not zero, CSPro used its own
>>>> preferences to override the source ʻCSQʼ spectra sweep settings.
>>>> # * Float -> fStartFreqMHz -> Transmit Start Freq in MHz
>>>> # * Float -> fRepFreqHz -> Transmit Sweep Rate in Hz
>>>> # * Float -> fBandwidthKHz -> Transmit Sweep bandwidth in kHz
>>>> # * SInt32 -> bSweepUp -> Transmit Sweep Freq direction is up if non
>>>> zero, else down. NOTE: CenterFreq is fStartFreqMHz + fBandwidthKHz/2 *
>>>> -2^(bSweepUp==0)
>>>> # * SInt32 -> nDopplerCells -> Number of Doppler Cells (nominally 512)
>>>> # * SInt32 -> nRangeCells -> Number of RangeCells (nominally 32 for
>>>> ʻCSQ', 31 for 'CSS'& 'CSA')
>>>> # * SInt32 -> nFirstRangeCell -> Index of First Range Cell in data from
>>>> zero at the receiver. ʻCSQ' files nominally use zero. 'CSS' or 'CSA'
>>>> files
>>>> nominally use one because CSPro cuts off the first range cell as
>>>> meaningless.
>>>> # * Float -> fRangeCellDistKm -> Distance between range cells in
>>>> kilometers.
>>>> # * SInt32 -> nV4Extent -> Header Bytes extension (Version 4 is +0
>>>> Bytes Till Data)
>>>> #
>>>> # Version 5:
>>>> # * SInt32 -> nOutputInterval -> The Output Interval in Minutes.
>>>> # * Char4 -> nCreatorTypeCode -> The creator application type code.
>>>> # * Char4 -> nCreatorVersion -> The creator application version.
>>>> # * SInt32 -> nActiveChannels -> Number of active antennas
>>>> # * SInt32 -> nSpectraChannels -> Number antenna used in cross spectra
>>>> # * UInt32 -> nActiveChannelBits -> Bit indicator of which antennas are
>>>> in use msb is ant#1 to lsb #32
>>>> # * SInt32 -> nV5Extent -> Header Bytes extension (Version 5 is +0
>>>> Bytes Till Data) If zero then cross spectra data follows, but if this
>>>> file
>>>> were version 6 or greater then the nV5Extent would tell you how many
>>>> more
>>>> bytes the version 6 and greater uses until the data.
>>>> #
>>>> # DATA:
>>>> # The data section is a multi-dimensional array of self and cross
>>>> spectra
>>>> data.
>>>> # Repeat For 1 to nRangeCells:
>>>> # * Float[nDopplerCells] Antenna1 voltage squared amplitude self
>>>> spectra.
>>>> # * Float[nDopplerCells] Antenna2 voltage squared amplitude self
>>>> spectra.
>>>> # * Float[nDopplerCells] Antenna3 voltage squared amplitude self
>>>> spectra.
>>>> # (Warning: Some Antenna3 amplitude values may be negative to
>>>> indicate noise or interference at those doppler bins. These negative
>>>> values
>>>> should be absoluted before use.)
>>>> # * Complex[nDopplerCells] Antenna 1 to Antenna 2 cross spectra.
>>>> # * Complex[nDopplerCells] Antenna 1 to Antenna 3 cross spectra.
>>>> # * Complex[nDopplerCells] Antenna 2 to Antenna 3 cross spectra.
>>>> # if nCsaKind is 2 then also read or skip
>>>> # * Float[nDopplerCells] Quality array from zero to one in value.
>>>> # End Repeat
>>>> #
>>>> # Note: To convert self spectra to dBm use:
>>>> # 10*log10(abs(voltagesquared)) - (-40. + 5.8)
>>>> * *
>>>> # The -40. is conversion loss in the receiver and +5.8 is processing
>>>> computational gain.
>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Perldl mailing list
>>>> [email protected]
>>>> http://mailman.jach.hawaii.edu/mailman/listinfo/perldl
>>>
>>>
>>> _______________________________________________
>>> Perldl mailing list
>>> [email protected]
>>> http://mailman.jach.hawaii.edu/mailman/listinfo/perldl
>>>
>>>
>>> -----
>>> No virus found in this message.
>>> Checked by AVG - www.avg.com
>>> Version: 10.0.1388 / Virus Database: 1513/3727 - Release Date: 06/26/11
>>
>
>
_______________________________________________
Perldl mailing list
[email protected]
http://mailman.jach.hawaii.edu/mailman/listinfo/perldl