On Jun 7, 2011, at 3:55 PM, Abraham Mathew wrote: > I'm running R 2.13 on Ubuntu 10.10 > > I have a data set which is comprised of character strings. > > site = readLines('http://www.census.gov/tiger/tms/gazetteer/zips.txt') > > dat <- c("01, 35004, AL, ACMAR, 86.51557, 33.584132, 6055, 0.001499") > dat > > I want to loop through the data and construct a data frame with the zip > code, > state abbreviation, and city name in seperate columns. Given the size of > this > data set, I was wondering if there was an efficient way to get the desired > results. > > Thanks > Abraham
Since the original text file is a CSV file (without a header), just use: > system.time(DF <- > read.csv("http://www.census.gov/tiger/tms/gazetteer/zips.txt", header = > FALSE)) user system elapsed 0.385 0.033 1.832 > str(DF) 'data.frame': 29470 obs. of 8 variables: $ V1: int 1 1 1 1 1 1 1 1 1 1 ... $ V2: int 35004 35005 35006 35007 35010 35014 35016 35019 35020 35023 ... $ V3: Factor w/ 51 levels "AK","AL","AR",..: 2 2 2 2 2 2 2 2 2 2 ... $ V4: Factor w/ 16698 levels "02821","04465",..: 150 168 180 7710 10434 348 547 812 1250 7044 ... $ V5: num 86.5 87 87.2 86.8 86 ... $ V6: num 33.6 33.6 33.4 33.2 32.9 ... $ V7: int 6055 10616 3205 14218 19942 3062 13650 1781 40549 39677 ... $ V8: num 0.001499 0.002627 0.000793 0.003519 0.004935 ... > head(DF) V1 V2 V3 V4 V5 V6 V7 V8 1 1 35004 AL ACMAR 86.51557 33.58413 6055 0.001499 2 1 35005 AL ADAMSVILLE 86.95973 33.58844 10616 0.002627 3 1 35006 AL ADGER 87.16746 33.43428 3205 0.000793 4 1 35007 AL KEYSTONE 86.81286 33.23687 14218 0.003519 5 1 35010 AL NEW SITE 85.95109 32.94145 19942 0.004935 6 1 35014 AL ALPINE 86.20893 33.33116 3062 0.000758 HTH, Marc Schwartz ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.