use 'count.fields' to determine which line have 6 and 7 fields in them. then use 'readLines' to read in the entire file and the use the data from count.fields to write out to separate files"
x <- count.fields(...) input <- readLines(..) writeLines(input[x == 6], file = '6fields.csv') writeLines(input[x==7], file = '7fields.csv') On Tue, Mar 20, 2012 at 11:43 AM, Ashish Agarwal <ashish.agarw...@gmail.com> wrote: > The file is 20MB having 2 Million rows. > I understand that I two different formats - 6 columns and 7 columns. > How do I read chunks to different files by using scan with modifying > skip and nlines parameters? > > On Mon, Mar 19, 2012 at 3:59 PM, Petr PIKAL <petr.pi...@precheza.cz> wrote: >> >> I would follow Jims suggestion, >> nFields <- count.fields(fileName, sep = ',') >> count fields and read chunks to different files by using scan with >> modifying skip and nlines parameters. However if there is only few lines >> which differ it would be better to correct those few lines manually in >> some suitable editor. >> >> Elaborating omnipotent function for reading any kind of >> corrupted/nonstandard files seems to me suited only if you expect to read >> such files many times. >> >> Regards >> Petr >> >> >>> >>> >>> >>> On Sat, Mar 17, 2012 at 4:54 AM, jim holtman <jholt...@gmail.com> wrote: >>> > Here is a solution that looks for the line with 7 elements and inserts >>> > the quotes: >>> > >>> > >>> >> fileName <- '/temp/text.txt' >>> >> input <- readLines(fileName) >>> >> # count the fields to find 7 >>> >> nFields <- count.fields(fileName, sep = ',') >>> >> # now fix the data >>> >> for (i in which(nFields == 7)){ >>> > + # split on comma >>> > + z <- strsplit(input[i], ',')[[1]] >>> > + input[i] <- paste(z[1], z[2] >>> > + , paste('"', z[3], ',', z[4], '"', sep = '') # put on quotes >>> > + , z[5], z[6], z[7], sep = ',' >>> > + ) >>> > + } >>> >> >>> >> # now read in the data >>> >> result <- read.table(textConnection(input), sep = ',') >>> >> >>> >> result >>> > V1 V2 V3 V4 V5 V6 >>> > 1 1968 21 0 >>> > 2 Boston 1968 13 0 >>> > 3 Boston 1968 18 0 >>> > 4 Chicago 1967 44 0 >>> > 5 Providence 1968 17 0 >>> > 6 Providence 1969 48 0 >>> > 7 Binky 1968 24 0 >>> > 8 Chicago 1968 23 0 >>> > 9 Dally 1968 7 0 >>> > 10 Raleigh, North Carol 1968 25 0 >>> > 11 Addy ABC-Dogs Stars-W8.1 Providence 1968 38 0 >>> > 12 DEF_REQPRF/ Dartmouth 1967 31 1 >>> > 13 PL 1967 38 1 >>> > 14 XY PopatLal 1967 5 1 >>> > 15 XY PopatLal 1967 6 8 >>> > 16 XY PopatLal 1967 7 7 >>> > 17 XY PopatLal 1967 9 1 >>> > 18 XY PopatLal 1967 10 1 >>> > 19 XY PopatLal 1967 13 1 >>> > 20 XY PopatLal Boston 1967 6 1 >>> > 21 XY PopatLal Boston 1967 7 11 >>> > 22 XY PopatLal Boston 1967 9 2 >>> > 23 XY PopatLal Boston 1967 10 3 >>> > 24 XY PopatLal Boston 1967 7 2 >>> >> >>> > >>> > >>> > On Fri, Mar 16, 2012 at 2:17 PM, Ashish Agarwal >>> > <ashish.agarw...@gmail.com> wrote: >>> >> I have a file that is 5000 records and to edit that file is not easy. >>> >> Is there any way to line 10 differently to account for changes in the >>> >> third field? >>> >> >>> >> On Fri, Mar 16, 2012 at 11:35 PM, Peter Ehlers <ehl...@ucalgary.ca> >> wrote: >>> >>> On 2012-03-16 10:48, Ashish Agarwal wrote: >>> >>>> >>> >>>> Line 10 has City and State that too separated by comma. For line 10 >>> >>>> how can I read differently as compared to the other lines? >>> >>> >>> >>> >>> >>> Edit the file and put quotes around the city-state combination: >>> >>> "Raleigh, North Carol" >>> >>> >>> >> >>> >> ______________________________________________ >>> >> R-help@r-project.org mailing list >>> >> https://stat.ethz.ch/mailman/listinfo/r-help >>> >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >>> >> and provide commented, minimal, self-contained, reproducible code. >>> > >>> > >>> > >>> > -- >>> > Jim Holtman >>> > Data Munger Guru >>> > >>> > What is the problem that you are trying to solve? >>> > Tell me what you want to do, not how you want to do it. >>> >>> ______________________________________________ >>> R-help@r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >> -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.