Given x<- count.fields(..) could you pls help in following: 1. how to create a data vector with data being line numbers of original file where x==6? 2. what is the way to read only the nth line (only) of an input file into a data vector with first three attributes to be read as string, 4th being categorical, 5th and 6th being numeric with width 10?
On Tue, Mar 20, 2012 at 9:37 PM, jim holtman <jholt...@gmail.com> wrote: > use 'count.fields' to determine which line have 6 and 7 fields in them. > > then use 'readLines' to read in the entire file and the use the data > from count.fields to write out to separate files" > > x <- count.fields(...) > input <- readLines(..) > writeLines(input[x == 6], file = '6fields.csv') > writeLines(input[x==7], file = '7fields.csv') > > On Tue, Mar 20, 2012 at 11:43 AM, Ashish Agarwal > <ashish.agarw...@gmail.com> wrote: >> The file is 20MB having 2 Million rows. >> I understand that I two different formats - 6 columns and 7 columns. >> How do I read chunks to different files by using scan with modifying >> skip and nlines parameters? >> >> On Mon, Mar 19, 2012 at 3:59 PM, Petr PIKAL <petr.pi...@precheza.cz> wrote: >>> >>> I would follow Jims suggestion, >>> nFields <- count.fields(fileName, sep = ',') >>> count fields and read chunks to different files by using scan with >>> modifying skip and nlines parameters. However if there is only few lines >>> which differ it would be better to correct those few lines manually in >>> some suitable editor. >>> >>> Elaborating omnipotent function for reading any kind of >>> corrupted/nonstandard files seems to me suited only if you expect to read >>> such files many times. >>> >>> Regards >>> Petr >>> >>> >>>> >>>> >>>> >>>> On Sat, Mar 17, 2012 at 4:54 AM, jim holtman <jholt...@gmail.com> wrote: >>>> > Here is a solution that looks for the line with 7 elements and inserts >>>> > the quotes: >>>> > >>>> > >>>> >> fileName <- '/temp/text.txt' >>>> >> input <- readLines(fileName) >>>> >> # count the fields to find 7 >>>> >> nFields <- count.fields(fileName, sep = ',') >>>> >> # now fix the data >>>> >> for (i in which(nFields == 7)){ >>>> > + # split on comma >>>> > + z <- strsplit(input[i], ',')[[1]] >>>> > + input[i] <- paste(z[1], z[2] >>>> > + , paste('"', z[3], ',', z[4], '"', sep = '') # put on quotes >>>> > + , z[5], z[6], z[7], sep = ',' >>>> > + ) >>>> > + } >>>> >> >>>> >> # now read in the data >>>> >> result <- read.table(textConnection(input), sep = ',') >>>> >> >>>> >> result >>>> > V1 V2 V3 V4 V5 V6 >>>> > 1 1968 21 0 >>>> > 2 Boston 1968 13 0 >>>> > 3 Boston 1968 18 0 >>>> > 4 Chicago 1967 44 0 >>>> > 5 Providence 1968 17 0 >>>> > 6 Providence 1969 48 0 >>>> > 7 Binky 1968 24 0 >>>> > 8 Chicago 1968 23 0 >>>> > 9 Dally 1968 7 0 >>>> > 10 Raleigh, North Carol 1968 25 0 >>>> > 11 Addy ABC-Dogs Stars-W8.1 Providence 1968 38 0 >>>> > 12 DEF_REQPRF/ Dartmouth 1967 31 1 >>>> > 13 PL 1967 38 1 >>>> > 14 XY PopatLal 1967 5 1 >>>> > 15 XY PopatLal 1967 6 8 >>>> > 16 XY PopatLal 1967 7 7 >>>> > 17 XY PopatLal 1967 9 1 >>>> > 18 XY PopatLal 1967 10 1 >>>> > 19 XY PopatLal 1967 13 1 >>>> > 20 XY PopatLal Boston 1967 6 1 >>>> > 21 XY PopatLal Boston 1967 7 11 >>>> > 22 XY PopatLal Boston 1967 9 2 >>>> > 23 XY PopatLal Boston 1967 10 3 >>>> > 24 XY PopatLal Boston 1967 7 2 >>>> >> >>>> > >>>> > >>>> > On Fri, Mar 16, 2012 at 2:17 PM, Ashish Agarwal >>>> > <ashish.agarw...@gmail.com> wrote: >>>> >> I have a file that is 5000 records and to edit that file is not easy. >>>> >> Is there any way to line 10 differently to account for changes in the >>>> >> third field? >>>> >> >>>> >> On Fri, Mar 16, 2012 at 11:35 PM, Peter Ehlers <ehl...@ucalgary.ca> >>> wrote: >>>> >>> On 2012-03-16 10:48, Ashish Agarwal wrote: >>>> >>>> >>>> >>>> Line 10 has City and State that too separated by comma. For line 10 >>>> >>>> how can I read differently as compared to the other lines? >>>> >>> >>>> >>> >>>> >>> Edit the file and put quotes around the city-state combination: >>>> >>> "Raleigh, North Carol" >>>> >>> >>>> >> >>>> >> ______________________________________________ >>>> >> R-help@r-project.org mailing list >>>> >> https://stat.ethz.ch/mailman/listinfo/r-help >>>> >> PLEASE do read the posting guide >>> http://www.R-project.org/posting-guide.html >>>> >> and provide commented, minimal, self-contained, reproducible code. >>>> > >>>> > >>>> > >>>> > -- >>>> > Jim Holtman >>>> > Data Munger Guru >>>> > >>>> > What is the problem that you are trying to solve? >>>> > Tell me what you want to do, not how you want to do it. >>>> >>>> ______________________________________________ >>>> R-help@r-project.org mailing list >>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>> PLEASE do read the posting guide >>> http://www.R-project.org/posting-guide.html >>>> and provide commented, minimal, self-contained, reproducible code. >>> > > > > -- > Jim Holtman > Data Munger Guru > > What is the problem that you are trying to solve? > Tell me what you want to do, not how you want to do it. [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.