You can always read a portion of the file and then write it out. For large files, I will read in 10,000 line, fix them up and then write them out and go back and process the next batch of lines. You haven't shown us what a sample of your input/output is, or how you are processing them. Depending on what type of preprocessing needs to be done to the data, PERL is also an option. But most things I used to use PERL for, I can do within R these days.
Here is an example of reading in your IDs: > x <- read.table(textConnection("1234567890123456789012 > 987654321234567898765432 98765432123456789876543 + 1234567890123456789012 987654321234567898765432 98765432123456789876543 + 1234567890123456789012 987654321234567898765432 98765432123456789876543 + 1234567890123456789012 987654321234567898765432 98765432123456789876543 + 1234567890123456789012 987654321234567898765432 98765432123456789876543 + 1234567890123456789012 987654321234567898765432 98765432123456789876543 + 1234567890123456789012 987654321234567898765432 98765432123456789876543") + , colClasses = rep('character', 3)) > closeAllConnections() > str(x) 'data.frame': 7 obs. of 3 variables: $ V1: chr "1234567890123456789012" "1234567890123456789012" "1234567890123456789012" "1234567890123456789012" ... $ V2: chr "987654321234567898765432" "987654321234567898765432" "987654321234567898765432" "987654321234567898765432" ... $ V3: chr "98765432123456789876543" "98765432123456789876543" "98765432123456789876543" "98765432123456789876543" ... > x V1 V2 V3 1 1234567890123456789012 987654321234567898765432 98765432123456789876543 2 1234567890123456789012 987654321234567898765432 98765432123456789876543 3 1234567890123456789012 987654321234567898765432 98765432123456789876543 4 1234567890123456789012 987654321234567898765432 98765432123456789876543 5 1234567890123456789012 987654321234567898765432 98765432123456789876543 6 1234567890123456789012 987654321234567898765432 98765432123456789876543 7 1234567890123456789012 987654321234567898765432 98765432123456789876543 On Mon, Oct 25, 2010 at 4:41 AM, ZeMajik <zema...@gmail.com> wrote: > Thanks Jim, but I still got the problem that the pre-processing becomes way > too computationally expensive. R seems to handle characters and factors much > much worse than numeric IDs. I don't have enough RAM to even write the file > when they are viewed as chars instead of numeric values! > > Anyone have any other ideas? Is it not possible to tell R not to rewrite > upon import? It wouldn't matter if it only would write the correct IDs to > the exported csv file, but it exports the abbreviated version which is of no > use. > > Mike > > On Sat, Oct 23, 2010 at 3:56 AM, jim holtman <jholt...@gmail.com> wrote: >> >> Your best bet is to make sure that you read the IDs in as characters. >> If they are being read in as floating point numbers, then there is >> only 15 digits of accuracy, so if you have IDs 18-22 digits, you will >> be missing data. So if you are using read.table, then look at >> colClasses to see how to do this. >> >> Provide a subset of your data and the statements that you are using to >> read in the data. >> >> On Fri, Oct 22, 2010 at 1:15 PM, ZeMajik <zema...@gmail.com> wrote: >> > Hey, >> > >> > I'm using R as a pre-processor for a large dataset with IDs which are >> > numeric (but has no numeric meaning so can be seen as factors). >> > I do some data formating and then write it out to a csv file. >> > >> > However the problem is that the IDs are very long, 18-22 chars long more >> > precisely. R is constantly rewriting these IDs to the abbreviated +eX >> > which >> > hinders me from exporting the data to the csv since the IDs are no >> > longer >> > intact. >> > I've tried telling R that ID column is a factor, but this results in two >> > problems: 1) Since I have millions of rows and R is slower handling >> > factors >> > than numbers my comp can't run the process in any kind of reasonable >> > time. >> > and 2) Some IDs STILL seem to be rewritten somehow. The second point >> > made me >> > believe that perhaps R is rewriting upon import? >> > >> > Does anyone have any tips on how to solve this problem? >> > >> > Thanks, >> > Mike >> > >> > [[alternative HTML version deleted]] >> > >> > ______________________________________________ >> > R-help@r-project.org mailing list >> > https://stat.ethz.ch/mailman/listinfo/r-help >> > PLEASE do read the posting guide >> > http://www.R-project.org/posting-guide.html >> > and provide commented, minimal, self-contained, reproducible code. >> > >> >> >> >> -- >> Jim Holtman >> Cincinnati, OH >> +1 513 646 9390 >> >> What is the problem that you are trying to solve? > > -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.