Ah, I should have mentioned this. Personally I work on Macs (Leopard) 
and PC's (XP Pro and XP Pro x64). Even though the PC's do have Cygwin, 
I'm trying to make this code portable. So I want to avoid such things as 
sed, perl, etc.

I want to do this in R, even if processing is a bit slower. Eventually, 
I'll hide the code in a class, so the code can be a bit complex.

     Marsh Feldman

On 3/2/2010 12:29 PM, Chidambaram Annamalai wrote:
> I tried to shoehorn the read.* functions and match both the fixed 
> width and the variable width fields
> in the data but it doesn't seem evident to me. (read.fwf reads fixed 
> width data properly but the rest
> of the fields must be processed separately -- maybe insert NULL stubs 
> in the remaining fields and
> fill them in later?)
>
> One way is to sidestep the entire issue and convert the structured 
> data you have into a csv
> file using sed (usually available on  most *nix systems) with 
> something like so:
>
> cat data | sed -r 's/^(..)(.)(..)(.{6})(..)[ \t]*([^ \t]*)[ \t]*([^ 
> \t]*)[ \t]*([^ \t]*)[ \t]*([^ \t]*)[ \t]*([^ 
> \t]*)/\1,\2,\3,\4,\5,\6,\7,\8,\9/' | less
>
> and see if the output is alright and use the resulting .csv file 
> directly in R using read.csv
>
> If that does not satisfy you maybe the R Wizards on the list might be 
> able to point you to a
> native R way of doing this possibly using scan? I'm not sure though.
>
> Hope this helps,
> Chillu
>
> On Tue, Mar 2, 2010 at 9:42 PM, Marshall Feldman <ma...@uri.edu 
> <mailto:ma...@uri.edu>> wrote:
>
>     Hello R wizards,
>
>     What is the best way to read a data file containing both
>     fixed-width and
>     tab-delimited files? (More detail follows.)
>
>     _*Details:*_
>     The U.S. Bureau of Labor Statistics provides local area unemployment
>     statistics at ftp://ftp.bls.gov/pub/time.series/la/, and the data are
>     documented in the file la.txt
>     <ftp://ftp.bls.gov/pub/time.series/la/la.txt>. Each data file has five
>     tab-delimited fields:
>
>        * series_id
>        * year
>        * period (codes for things like quarter or month of year)
>        * value
>        * footnote_codes
>
>     The series_id consists of five fixed-width subfields (length in
>     parentheses):
>
>        * survey abbreviation (2)
>        * seasonal code (1)
>        * area type code (2)
>        * area code (6)
>        * measure code (2)
>
>     So an example record might be:
>
>     LASPS36040003   1990    M01     8.8     L
>
>     I want to read in the data in one pass and convert them to a data
>     frame with the following columns (actual name, class in parentheses):
>
>        Survey abbreviation (survey, character)
>        Seasonal (seasonal, logical seasonal=T)
>        Area type (area_type_code, factor)
>        Area (area_code, factor)
>        Measure (measure_code, factor)
>        Year (year, Date)
>        Period (period, factor)
>        Value (value, numeric)
>        Footnote (footnote_codes, character but see note)
>
>     (Regarding the Footnote, I have to look at the data more. If there's
>     just one code per record, this will be a factor; if there are
>     multiple,
>     it will either be character or a list. For not I'm making it only
>     character.)
>
>     Currently I can read the data just fine using read.table, but this
>     makes
>     series_id the first variable. I want to break out the subfields as
>     separate columns.
>
>     Any suggestions?
>
>     Thanks.
>         Marsh Feldman
>
>
>
>
>            [[alternative HTML version deleted]]
>
>     ______________________________________________
>     R-help@r-project.org <mailto:R-help@r-project.org> mailing list
>     https://stat.ethz.ch/mailman/listinfo/r-help
>     PLEASE do read the posting guide
>     http://www.R-project.org/posting-guide.html
>     and provide commented, minimal, self-contained, reproducible code.
>
>

-- 
Dr. Marshall Feldman, PhD
Director of Research and Academic Affairs
CUSR Logo
Center for Urban Studies and Research
The University of Rhode Island
email: marsh @ uri .edu (remove spaces)


      Contact Information:


        Kingston:

202 Hart House
Charles T. Schmidt Labor Research Center
The University of Rhode Island
36 Upper College Road
Kingston, RI 02881-0815
tel. (401) 874-5953:
fax: (401) 874-5511


        Providence:

206E Shepard Building
URI Feinstein Providence Campus
80 Washington Street
Providence, RI 02903-1819
tel. (401) 277-5218
fax: (401) 277-5464

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to