Re: [R] Reading data file with both fixed and tab-delimited fields

Chidambaram Annamalai Tue, 02 Mar 2010 09:30:33 -0800

I tried to shoehorn the read.* functions and match both the fixed width and
the variable width fields
in the data but it doesn't seem evident to me. (read.fwf reads fixed width
data properly but the rest
of the fields must be processed separately -- maybe insert NULL stubs in the
remaining fields and
fill them in later?)


One way is to sidestep the entire issue and convert the structured data you
have into a csv
file using sed (usually available on  most *nix systems) with something like
so:

cat data | sed -r 's/^(..)(.)(..)(.{6})(..)[ \t]*([^ \t]*)[ \t]*([^ \t]*)[
\t]*([^ \t]*)[ \t]*([^ \t]*)[ \t]*([^ \t]*)/\1,\2,\3,\4,\5,\6,\7,\8,\9/' |
less

and see if the output is alright and use the resulting .csv file directly in
R using read.csv

If that does not satisfy you maybe the R Wizards on the list might be able
to point you to a
native R way of doing this possibly using scan? I'm not sure though.

Hope this helps,
Chillu

On Tue, Mar 2, 2010 at 9:42 PM, Marshall Feldman <ma...@uri.edu> wrote:

> Hello R wizards,
>
> What is the best way to read a data file containing both fixed-width and
> tab-delimited files? (More detail follows.)
>
> _*Details:*_
> The U.S. Bureau of Labor Statistics provides local area unemployment
> statistics at ftp://ftp.bls.gov/pub/time.series/la/, and the data are
> documented in the file la.txt
> <ftp://ftp.bls.gov/pub/time.series/la/la.txt>. Each data file has five
> tab-delimited fields:
>
>    * series_id
>    * year
>    * period (codes for things like quarter or month of year)
>    * value
>    * footnote_codes
>
> The series_id consists of five fixed-width subfields (length in
> parentheses):
>
>    * survey abbreviation (2)
>    * seasonal code (1)
>    * area type code (2)
>    * area code (6)
>    * measure code (2)
>
> So an example record might be:
>
> LASPS36040003   1990    M01     8.8     L
>
> I want to read in the data in one pass and convert them to a data frame
> with the following columns (actual name, class in parentheses):
>
>    Survey abbreviation (survey, character)
>    Seasonal (seasonal, logical seasonal=T)
>    Area type (area_type_code, factor)
>    Area (area_code, factor)
>    Measure (measure_code, factor)
>    Year (year, Date)
>    Period (period, factor)
>    Value (value, numeric)
>    Footnote (footnote_codes, character but see note)
>
> (Regarding the Footnote, I have to look at the data more. If there's
> just one code per record, this will be a factor; if there are multiple,
> it will either be character or a list. For not I'm making it only
> character.)
>
> Currently I can read the data just fine using read.table, but this makes
> series_id the first variable. I want to break out the subfields as
> separate columns.
>
> Any suggestions?
>
> Thanks.
>     Marsh Feldman
>
>
>
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Reading data file with both fixed and tab-delimited fields

Reply via email to