Re: [R] Problem reading mixed CSV file
Need to fix up the file having 6 and 7 columns to be read as 6 columns only. Here is the working. Can somebody please let me know how do I maintain the order in which rows were read and append the two files into one: > count.fields(textConnection("LL1532Ap,ABC# Depot-A+,,1971,8,2 LL1532Ap,ABC# Depot-A+,Bhutan,1971,6,1 LL1532Ap,ABC# Depot-A+,China,1971,17,1 LL1532Ap,ABC# Depot-A+,China,1971,33,1 LL1532Ap,ABC# Depot-A+,,1971,16,2 LL1532Ap,ABC# Depot-A+,,1971,17,1 LL1532Ap,ABC# Depot-A+,HongKong, Asia,1971,22,1 LL1532Ap,ABC# Depot-A+,HongKong, Asia,1971,49,1 LL1532Ap,ABC# Depot-A+,,1971,20,1 LL1532Ap,ABC# Depot-A+,Kazakhstan,1971,27,1 LL1532Ap,ABC# Depot-A+,,1971,33,1 LL1532Ap,ABC# Depot-A+,Kazakhstan, Asia,1973,15,1 LL1532Ap,ABC# Depot-A+,Romania-Europe,1971,10,1 LL1532Ap,ABC# Depot-A+,Romania-Europe,1973,4,1 LL1532Ap,ABC# Depot-A+,Sanchez-America,1973,9,1 LL1532An,ABC# Depot-A-,,1971,8,2"),sep=",",comment.char = "") [1] 6 6 6 6 6 6 7 7 6 6 6 7 6 6 6 6 > filName <- "temp.csv" > nFields <- count.fields(filName, sep = ',', comment.char = "") > input <- readLines(filName) > writeLines(input[nFields == 6], con = (file = "6fields.csv")) > writeLines(input[nFields == 7], con = (file = "7fields.csv")) > filName <- "7fields.csv" > length(count.fields(filName, sep = ',', comment.char = "")) -> nFields2 > input <- readLines(filName) > for (i in 1:nFields2){ strsplit(input[i],",")[[1]] -> z paste (z[1], z[2], paste('"',z[3],',',z[4],'"',sep =' '), z[5],z[6],z[7],sep = ',') -> input[i] } > result <- read.table(textConnection(input), sep = ',') Need the output to look like Sno ID Title Location Year x y 1 LL1532Ap ABC# Depot-A+ 1971 8 2 2 LL1532Ap ABC# Depot-A+ Bhutan 1971 6 1 3 LL1532Ap ABC# Depot-A+ China 1971 17 1 4 LL1532Ap ABC# Depot-A+ China 1971 33 1 5 LL1532Ap ABC# Depot-A+ 1971 16 2 6 LL1532Ap ABC# Depot-A+ 1971 17 1 7 LL1532Ap ABC# Depot-A+ HongKong, Asia 1971 22 1 8 LL1532Ap ABC# Depot-A+ HongKong, Asia 1971 49 1 9 LL1532Ap ABC# Depot-A+ 1971 20 1 10 LL1532Ap ABC# Depot-A+ Kazakhstan 1971 27 1 11 LL1532Ap ABC# Depot-A+ 1971 33 1 12 LL1532Ap ABC# Depot-A+ Kazakhstan, Asia 1973 15 1 13 LL1532Ap ABC# Depot-A+ Romania-Europe 1971 10 1 14 LL1532Ap ABC# Depot-A+ Romania-Europe 1973 4 1 15 LL1532Ap ABC# Depot-A+ Sanchez-America 1973 9 1 16 LL1532An ABC# Depot-A- 1971 8 2 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problem reading mixed CSV file
thanks. works very well. On Mon, Mar 26, 2012 at 12:12 PM, Berend Hasselman wrote: > > On 26-03-2012, at 08:40, Ashish Agarwal wrote: > > > comment.char = NULL does not work. > > Is there any way to make it NULL rather than having a specific character > like '%'? > > > Why don't you try something? > > comment.char="" > > looks quite obvious and worth a try. > > Berend > > > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problem reading mixed CSV file
On 26-03-2012, at 08:33, Ashish Agarwal wrote: > OMG. > > I think it uses comment character # as default in the argument. > > comment.char = "#" > > How do I turn it off? ??? How about comment.car="%" for example? Berend __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problem reading mixed CSV file
On 26-03-2012, at 08:40, Ashish Agarwal wrote: > comment.char = NULL does not work. > Is there any way to make it NULL rather than having a specific character like > '%'? > Why don't you try something? comment.char="" looks quite obvious and worth a try. Berend __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problem reading mixed CSV file
comment.char = NULL does not work. Is there any way to make it NULL rather than having a specific character like '%'? On Mon, Mar 26, 2012 at 12:06 PM, Berend Hasselman wrote: > > > comment.char = "#" > > > > How do I turn it off? > ??? > > How about comment.car="%" for example? > > Berend > > > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problem reading mixed CSV file
On 26-03-2012, at 08:26, Berend Hasselman wrote: > > On 26-03-2012, at 08:16, Ashish Agarwal wrote: > >> Why does the output in the following say 2 and not 6? >> >>> count.fields(textConnection("LL1532Ap,ABC# Depot-A+,,1971,8,2 >> + LL1532Ap,ABC# Depot-A+,Bhutan,1971,6,1 >> + LL1532Ap,ABC# Depot-A+,China,1971,17,1 >> + LL1532Ap,ABC# Depot-A+,China,1971,33,1 >> + LL1532Ap,ABC# Depot-A+,HongKong,1971,16,2 >> + LL1532Ap,ABC# Depot-A+,HongKong,1971,17,1 >> + LL1532Ap,ABC# Depot-A+,HongKong,1971,22,1 >> + LL1532Ap,ABC# Depot-A+,HongKong,1971,49,1 >> + LL1532Ap,ABC# Depot-A+,Kazakhstan,1971,20,1 >> + LL1532Ap,ABC# Depot-A+,Kazakhstan,1971,27,1 >> + LL1532Ap,ABC# Depot-A+,Kazakhstan,1971,33,1 >> + LL1532Ap,ABC# Depot-A+,Kazakhstan,1973,15,1 >> + LL1532Ap,ABC# Depot-A+,Romania-Europe,1971,10,1 >> + LL1532Ap,ABC# Depot-A+,Romania-Europe,1973,4,1 >> + LL1532Ap,ABC# Depot-A+,Sanchez-America,1973,9,1 >> + LL1532An,ABC# Depot-A-,,1971,8,2"),sep=",") >> [1] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 >>> >> > > Have you done > > ?count.fields > > and read what it says about the argument "sep" and the default? > > So if a comma is the separator what value would you give sep? Sorry I should have had a closer look at what you had done. But still ?count.fields should have given you a pointer. Look at what it says in the entry for argument "comment.char". You have a character # in your text. Set comment.char to something other than # . Berend __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problem reading mixed CSV file
OMG. I think it uses comment character # as default in the argument. comment.char = "#" How do I turn it off? On Mon, Mar 26, 2012 at 11:56 AM, Berend Hasselman wrote: > > On 26-03-2012, at 08:16, Ashish Agarwal wrote: > > > Why does the output in the following say 2 and not 6? > > > >> count.fields(textConnection("LL1532Ap,ABC# Depot-A+,,1971,8,2 > > + LL1532Ap,ABC# Depot-A+,Bhutan,1971,6,1 > > + LL1532Ap,ABC# Depot-A+,China,1971,17,1 > > + LL1532Ap,ABC# Depot-A+,China,1971,33,1 > > + LL1532Ap,ABC# Depot-A+,HongKong,1971,16,2 > > + LL1532Ap,ABC# Depot-A+,HongKong,1971,17,1 > > + LL1532Ap,ABC# Depot-A+,HongKong,1971,22,1 > > + LL1532Ap,ABC# Depot-A+,HongKong,1971,49,1 > > + LL1532Ap,ABC# Depot-A+,Kazakhstan,1971,20,1 > > + LL1532Ap,ABC# Depot-A+,Kazakhstan,1971,27,1 > > + LL1532Ap,ABC# Depot-A+,Kazakhstan,1971,33,1 > > + LL1532Ap,ABC# Depot-A+,Kazakhstan,1973,15,1 > > + LL1532Ap,ABC# Depot-A+,Romania-Europe,1971,10,1 > > + LL1532Ap,ABC# Depot-A+,Romania-Europe,1973,4,1 > > + LL1532Ap,ABC# Depot-A+,Sanchez-America,1973,9,1 > > + LL1532An,ABC# Depot-A-,,1971,8,2"),sep=",") > > [1] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 > >> > > > > > Have you done > > ?count.fields > > and read what it says about the argument "sep" and the default? > > So if a comma is the separator what value would you give sep? > > Berend > > > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problem reading mixed CSV file
On 26-03-2012, at 08:16, Ashish Agarwal wrote: > Why does the output in the following say 2 and not 6? > >> count.fields(textConnection("LL1532Ap,ABC# Depot-A+,,1971,8,2 > + LL1532Ap,ABC# Depot-A+,Bhutan,1971,6,1 > + LL1532Ap,ABC# Depot-A+,China,1971,17,1 > + LL1532Ap,ABC# Depot-A+,China,1971,33,1 > + LL1532Ap,ABC# Depot-A+,HongKong,1971,16,2 > + LL1532Ap,ABC# Depot-A+,HongKong,1971,17,1 > + LL1532Ap,ABC# Depot-A+,HongKong,1971,22,1 > + LL1532Ap,ABC# Depot-A+,HongKong,1971,49,1 > + LL1532Ap,ABC# Depot-A+,Kazakhstan,1971,20,1 > + LL1532Ap,ABC# Depot-A+,Kazakhstan,1971,27,1 > + LL1532Ap,ABC# Depot-A+,Kazakhstan,1971,33,1 > + LL1532Ap,ABC# Depot-A+,Kazakhstan,1973,15,1 > + LL1532Ap,ABC# Depot-A+,Romania-Europe,1971,10,1 > + LL1532Ap,ABC# Depot-A+,Romania-Europe,1973,4,1 > + LL1532Ap,ABC# Depot-A+,Sanchez-America,1973,9,1 > + LL1532An,ABC# Depot-A-,,1971,8,2"),sep=",") > [1] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 >> > Have you done ?count.fields and read what it says about the argument "sep" and the default? So if a comma is the separator what value would you give sep? Berend __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problem reading mixed CSV file
Why does the output in the following say 2 and not 6? > count.fields(textConnection("LL1532Ap,ABC# Depot-A+,,1971,8,2 + LL1532Ap,ABC# Depot-A+,Bhutan,1971,6,1 + LL1532Ap,ABC# Depot-A+,China,1971,17,1 + LL1532Ap,ABC# Depot-A+,China,1971,33,1 + LL1532Ap,ABC# Depot-A+,HongKong,1971,16,2 + LL1532Ap,ABC# Depot-A+,HongKong,1971,17,1 + LL1532Ap,ABC# Depot-A+,HongKong,1971,22,1 + LL1532Ap,ABC# Depot-A+,HongKong,1971,49,1 + LL1532Ap,ABC# Depot-A+,Kazakhstan,1971,20,1 + LL1532Ap,ABC# Depot-A+,Kazakhstan,1971,27,1 + LL1532Ap,ABC# Depot-A+,Kazakhstan,1971,33,1 + LL1532Ap,ABC# Depot-A+,Kazakhstan,1973,15,1 + LL1532Ap,ABC# Depot-A+,Romania-Europe,1971,10,1 + LL1532Ap,ABC# Depot-A+,Romania-Europe,1973,4,1 + LL1532Ap,ABC# Depot-A+,Sanchez-America,1973,9,1 + LL1532An,ABC# Depot-A-,,1971,8,2"),sep=",") [1] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 > On Fri, Mar 16, 2012 at 10:59 PM, David Winsemius wrote: Looks like an encoding mismatch. You have not offered the requested information about you setup so further comment would all be guesswork. But you can perhaps educate yourself by reading: ?Encoding And line ten has 7 elements. > count.fields(textConnection(",**,,1968,21,0 + ,,Boston,1968,13,0 + ,,Boston,1968,18,0 + ,,Chicago,1967,44,0 + ,,Providence,1968,17,0 + ,,Providence,1969,48,0 + ,,Binky,1968,24,0 + ,,Chicago,1968,23,0 + ,,Dally,1968,7,0 + ,,Raleigh, North Carol,1968,25,0 + Addy ABC-Dogs Stars-W8.1,,Providence,1968,**38,0 + DEF_REQPRF/,,Dartmouth,1967,**31,1 + PL,,,1967,38,1 + XY,PopatLal,,1967,5,1 + XY,PopatLal,,1967,6,8 + XY,PopatLal,,1967,7,7 + XY,PopatLal,,1967,9,1 + XY,PopatLal,,1967,10,1 + XY,PopatLal,,1967,13,1 + XY,PopatLal,Boston,1967,6,1 + XY,PopatLal,Boston,1967,7,11 + XY,PopatLal,Boston,1967,9,2 + XY,PopatLal,Boston,1967,10,3 + XY,PopatLal,Boston,1967,7,2"),**sep=",") [1] 6 6 6 6 6 6 6 6 6 7 6 6 6 6 6 6 6 6 6 6 6 6 6 6 > > > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problem reading mixed CSV file
On Tue, Mar 20, 2012 at 3:17 PM, Ashish Agarwal wrote: > Given x<- count.fields(..) could you pls help in following: > 1. how to create a data vector with data being line numbers of original file > where x==6? That is what the expression: writeLines(input[x == 6], file = '6fields.csv') is doing. 'x == 6' is a logical vector with TRUE in the position of the line that has 6 fields in it, so it is only extracting the lines with 6 fields and writing them to the output file. You probably need to read the section on "indexing" in the "Intro to R" manual. > 2. what is the way to read only the nth line (only) of an input file into a > data vector with first three attributes to be read as string, 4th > being categorical, 5th and 6th being numeric with width 10? You might want to give an example of the the line looks like. I would use 'readLines' to read in the file and then I could index to the 'nth' line and parse it using 'strsplit' or 'regexpr' depending on its complexity. This would depend on the format of the line which has not been provided. > > > On Tue, Mar 20, 2012 at 9:37 PM, jim holtman wrote: >> use 'count.fields' to determine which line have 6 and 7 fields in them. >> >> then use 'readLines' to read in the entire file and the use the data >> from count.fields to write out to separate files" >> >> x <- count.fields(...) >> input <- readLines(..) >> writeLines(input[x == 6], file = '6fields.csv') >> writeLines(input[x==7], file = '7fields.csv') >> >> On Tue, Mar 20, 2012 at 11:43 AM, Ashish Agarwal >> wrote: >>> The file is 20MB having 2 Million rows. >>> I understand that I two different formats - 6 columns and 7 columns. >>> How do I read chunks to different files by using scan with modifying >>> skip and nlines parameters? >>> >>> On Mon, Mar 19, 2012 at 3:59 PM, Petr PIKAL >>> wrote: I would follow Jims suggestion, nFields <- count.fields(fileName, sep = ',') count fields and read chunks to different files by using scan with modifying skip and nlines parameters. However if there is only few lines which differ it would be better to correct those few lines manually in some suitable editor. Elaborating omnipotent function for reading any kind of corrupted/nonstandard files seems to me suited only if you expect to read such files many times. Regards Petr > > > > On Sat, Mar 17, 2012 at 4:54 AM, jim holtman > wrote: > > Here is a solution that looks for the line with 7 elements and > > inserts > > the quotes: > > > > > >> fileName <- '/temp/text.txt' > >> input <- readLines(fileName) > >> # count the fields to find 7 > >> nFields <- count.fields(fileName, sep = ',') > >> # now fix the data > >> for (i in which(nFields == 7)){ > > + # split on comma > > + z <- strsplit(input[i], ',')[[1]] > > + input[i] <- paste(z[1], z[2] > > + , paste('"', z[3], ',', z[4], '"', sep = '') # put on > > quotes > > + , z[5], z[6], z[7], sep = ',' > > + ) > > + } > >> > >> # now read in the data > >> result <- read.table(textConnection(input), sep = ',') > >> > >> result > > V1 V2 V3 V4 V5 V6 > > 1 1968 21 0 > > 2 Boston 1968 13 0 > > 3 Boston 1968 18 0 > > 4 Chicago 1967 44 0 > > 5 Providence 1968 17 0 > > 6 Providence 1969 48 0 > > 7 Binky 1968 24 0 > > 8 Chicago 1968 23 0 > > 9 Dally 1968 7 0 > > 10 Raleigh, North Carol 1968 25 0 > > 11 Addy ABC-Dogs Stars-W8.1 Providence 1968 38 0 > > 12 DEF_REQPRF/ Dartmouth 1967 31 1 > > 13 PL 1967 38 1 > > 14 XY PopatLal 1967 5 1 > > 15 XY PopatLal 1967 6 8 > > 16 XY PopatLal 1967 7 7 > > 17 XY PopatLal 1967 9 1 > > 18 XY PopatLal 1967 10 1 > > 19 XY PopatLal 1967 13 1 > > 20 XY PopatLal Boston 1967 6 1 > > 21 XY PopatLal Boston 1967 7 11 > > 22 XY
Re: [R] Problem reading mixed CSV file
Given x<- count.fields(..) could you pls help in following: 1. how to create a data vector with data being line numbers of original file where x==6? 2. what is the way to read only the nth line (only) of an input file into a data vector with first three attributes to be read as string, 4th being categorical, 5th and 6th being numeric with width 10? On Tue, Mar 20, 2012 at 9:37 PM, jim holtman wrote: > use 'count.fields' to determine which line have 6 and 7 fields in them. > > then use 'readLines' to read in the entire file and the use the data > from count.fields to write out to separate files" > > x <- count.fields(...) > input <- readLines(..) > writeLines(input[x == 6], file = '6fields.csv') > writeLines(input[x==7], file = '7fields.csv') > > On Tue, Mar 20, 2012 at 11:43 AM, Ashish Agarwal > wrote: >> The file is 20MB having 2 Million rows. >> I understand that I two different formats - 6 columns and 7 columns. >> How do I read chunks to different files by using scan with modifying >> skip and nlines parameters? >> >> On Mon, Mar 19, 2012 at 3:59 PM, Petr PIKAL wrote: >>> >>> I would follow Jims suggestion, >>> nFields <- count.fields(fileName, sep = ',') >>> count fields and read chunks to different files by using scan with >>> modifying skip and nlines parameters. However if there is only few lines >>> which differ it would be better to correct those few lines manually in >>> some suitable editor. >>> >>> Elaborating omnipotent function for reading any kind of >>> corrupted/nonstandard files seems to me suited only if you expect to read >>> such files many times. >>> >>> Regards >>> Petr >>> >>> On Sat, Mar 17, 2012 at 4:54 AM, jim holtman wrote: > Here is a solution that looks for the line with 7 elements and inserts > the quotes: > > >> fileName <- '/temp/text.txt' >> input <- readLines(fileName) >> # count the fields to find 7 >> nFields <- count.fields(fileName, sep = ',') >> # now fix the data >> for (i in which(nFields == 7)){ > + # split on comma > + z <- strsplit(input[i], ',')[[1]] > + input[i] <- paste(z[1], z[2] > + , paste('"', z[3], ',', z[4], '"', sep = '') # put on quotes > + , z[5], z[6], z[7], sep = ',' > + ) > + } >> >> # now read in the data >> result <- read.table(textConnection(input), sep = ',') >> >> result > V1 V2 V3 V4 V5 V6 > 1 1968 21 0 > 2 Boston 1968 13 0 > 3 Boston 1968 18 0 > 4 Chicago 1967 44 0 > 5 Providence 1968 17 0 > 6 Providence 1969 48 0 > 7 Binky 1968 24 0 > 8 Chicago 1968 23 0 > 9 Dally 1968 7 0 > 10 Raleigh, North Carol 1968 25 0 > 11 Addy ABC-Dogs Stars-W8.1Providence 1968 38 0 > 12 DEF_REQPRF/ Dartmouth 1967 31 1 > 13 PL 1967 38 1 > 14 XY PopatLal 1967 5 1 > 15 XY PopatLal 1967 6 8 > 16 XY PopatLal 1967 7 7 > 17 XY PopatLal 1967 9 1 > 18 XY PopatLal 1967 10 1 > 19 XY PopatLal 1967 13 1 > 20 XY PopatLal Boston 1967 6 1 > 21 XY PopatLal Boston 1967 7 11 > 22 XY PopatLal Boston 1967 9 2 > 23 XY PopatLal Boston 1967 10 3 > 24 XY PopatLal Boston 1967 7 2 >> > > > On Fri, Mar 16, 2012 at 2:17 PM, Ashish Agarwal > wrote: >> I have a file that is 5000 records and to edit that file is not easy. >> Is there any way to line 10 differently to account for changes in the >> third field? >> >> On Fri, Mar 16, 2012 at 11:35 PM, Peter Ehlers >>> wrote: >>> On 2012-03-16 10:48, Ashish Agarwal wrote: Line 10 has City and State that too separated by comma. For line 10 how can I read differently as compared to the other lines? >>> >>> >>> Edit the file and put quotes around the city-state combination:
Re: [R] Problem reading mixed CSV file
use 'count.fields' to determine which line have 6 and 7 fields in them. then use 'readLines' to read in the entire file and the use the data from count.fields to write out to separate files" x <- count.fields(...) input <- readLines(..) writeLines(input[x == 6], file = '6fields.csv') writeLines(input[x==7], file = '7fields.csv') On Tue, Mar 20, 2012 at 11:43 AM, Ashish Agarwal wrote: > The file is 20MB having 2 Million rows. > I understand that I two different formats - 6 columns and 7 columns. > How do I read chunks to different files by using scan with modifying > skip and nlines parameters? > > On Mon, Mar 19, 2012 at 3:59 PM, Petr PIKAL wrote: >> >> I would follow Jims suggestion, >> nFields <- count.fields(fileName, sep = ',') >> count fields and read chunks to different files by using scan with >> modifying skip and nlines parameters. However if there is only few lines >> which differ it would be better to correct those few lines manually in >> some suitable editor. >> >> Elaborating omnipotent function for reading any kind of >> corrupted/nonstandard files seems to me suited only if you expect to read >> such files many times. >> >> Regards >> Petr >> >> >>> >>> >>> >>> On Sat, Mar 17, 2012 at 4:54 AM, jim holtman wrote: >>> > Here is a solution that looks for the line with 7 elements and inserts >>> > the quotes: >>> > >>> > >>> >> fileName <- '/temp/text.txt' >>> >> input <- readLines(fileName) >>> >> # count the fields to find 7 >>> >> nFields <- count.fields(fileName, sep = ',') >>> >> # now fix the data >>> >> for (i in which(nFields == 7)){ >>> > + # split on comma >>> > + z <- strsplit(input[i], ',')[[1]] >>> > + input[i] <- paste(z[1], z[2] >>> > + , paste('"', z[3], ',', z[4], '"', sep = '') # put on quotes >>> > + , z[5], z[6], z[7], sep = ',' >>> > + ) >>> > + } >>> >> >>> >> # now read in the data >>> >> result <- read.table(textConnection(input), sep = ',') >>> >> >>> >> result >>> > V1 V2 V3 V4 V5 V6 >>> > 1 1968 21 0 >>> > 2 Boston 1968 13 0 >>> > 3 Boston 1968 18 0 >>> > 4 Chicago 1967 44 0 >>> > 5 Providence 1968 17 0 >>> > 6 Providence 1969 48 0 >>> > 7 Binky 1968 24 0 >>> > 8 Chicago 1968 23 0 >>> > 9 Dally 1968 7 0 >>> > 10 Raleigh, North Carol 1968 25 0 >>> > 11 Addy ABC-Dogs Stars-W8.1 Providence 1968 38 0 >>> > 12 DEF_REQPRF/ Dartmouth 1967 31 1 >>> > 13 PL 1967 38 1 >>> > 14 XY PopatLal 1967 5 1 >>> > 15 XY PopatLal 1967 6 8 >>> > 16 XY PopatLal 1967 7 7 >>> > 17 XY PopatLal 1967 9 1 >>> > 18 XY PopatLal 1967 10 1 >>> > 19 XY PopatLal 1967 13 1 >>> > 20 XY PopatLal Boston 1967 6 1 >>> > 21 XY PopatLal Boston 1967 7 11 >>> > 22 XY PopatLal Boston 1967 9 2 >>> > 23 XY PopatLal Boston 1967 10 3 >>> > 24 XY PopatLal Boston 1967 7 2 >>> >> >>> > >>> > >>> > On Fri, Mar 16, 2012 at 2:17 PM, Ashish Agarwal >>> > wrote: >>> >> I have a file that is 5000 records and to edit that file is not easy. >>> >> Is there any way to line 10 differently to account for changes in the >>> >> third field? >>> >> >>> >> On Fri, Mar 16, 2012 at 11:35 PM, Peter Ehlers >> wrote: >>> >>> On 2012-03-16 10:48, Ashish Agarwal wrote: >>> >>> Line 10 has City and State that too separated by comma. For line 10 >>> how can I read differently as compared to the other lines? >>> >>> >>> >>> >>> >>> Edit the file and put quotes around the city-state combination: >>> >>> "Raleigh, North Carol" >>> >>> >>> >> >>> >> __ >>> >> R-help@r-project.org mailing list >>> >> https://stat.ethz.ch/mailman/listinfo/r-help >>> >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >>> >> and provide commented, minimal, self-contained, reproducible code. >>> > >>> > >>> > >>> > -- >>> > Jim Holtman >>> > Data Munger Guru >>> > >>> > What is the problem that you are trying to solve? >>> > Tell me what you want to d
Re: [R] Problem reading mixed CSV file
The file is 20MB having 2 Million rows. I understand that I two different formats - 6 columns and 7 columns. How do I read chunks to different files by using scan with modifying skip and nlines parameters? On Mon, Mar 19, 2012 at 3:59 PM, Petr PIKAL wrote: > > I would follow Jims suggestion, > nFields <- count.fields(fileName, sep = ',') > count fields and read chunks to different files by using scan with > modifying skip and nlines parameters. However if there is only few lines > which differ it would be better to correct those few lines manually in > some suitable editor. > > Elaborating omnipotent function for reading any kind of > corrupted/nonstandard files seems to me suited only if you expect to read > such files many times. > > Regards > Petr > > >> >> >> >> On Sat, Mar 17, 2012 at 4:54 AM, jim holtman wrote: >> > Here is a solution that looks for the line with 7 elements and inserts >> > the quotes: >> > >> > >> >> fileName <- '/temp/text.txt' >> >> input <- readLines(fileName) >> >> # count the fields to find 7 >> >> nFields <- count.fields(fileName, sep = ',') >> >> # now fix the data >> >> for (i in which(nFields == 7)){ >> > + # split on comma >> > + z <- strsplit(input[i], ',')[[1]] >> > + input[i] <- paste(z[1], z[2] >> > + , paste('"', z[3], ',', z[4], '"', sep = '') # put on quotes >> > + , z[5], z[6], z[7], sep = ',' >> > + ) >> > + } >> >> >> >> # now read in the data >> >> result <- read.table(textConnection(input), sep = ',') >> >> >> >> result >> > V1 V2 V3 V4 V5 V6 >> > 1 1968 21 0 >> > 2 Boston 1968 13 0 >> > 3 Boston 1968 18 0 >> > 4 Chicago 1967 44 0 >> > 5 Providence 1968 17 0 >> > 6 Providence 1969 48 0 >> > 7 Binky 1968 24 0 >> > 8 Chicago 1968 23 0 >> > 9 Dally 1968 7 0 >> > 10 Raleigh, North Carol 1968 25 0 >> > 11 Addy ABC-Dogs Stars-W8.1 Providence 1968 38 0 >> > 12 DEF_REQPRF/ Dartmouth 1967 31 1 >> > 13 PL 1967 38 1 >> > 14 XY PopatLal 1967 5 1 >> > 15 XY PopatLal 1967 6 8 >> > 16 XY PopatLal 1967 7 7 >> > 17 XY PopatLal 1967 9 1 >> > 18 XY PopatLal 1967 10 1 >> > 19 XY PopatLal 1967 13 1 >> > 20 XY PopatLal Boston 1967 6 1 >> > 21 XY PopatLal Boston 1967 7 11 >> > 22 XY PopatLal Boston 1967 9 2 >> > 23 XY PopatLal Boston 1967 10 3 >> > 24 XY PopatLal Boston 1967 7 2 >> >> >> > >> > >> > On Fri, Mar 16, 2012 at 2:17 PM, Ashish Agarwal >> > wrote: >> >> I have a file that is 5000 records and to edit that file is not easy. >> >> Is there any way to line 10 differently to account for changes in the >> >> third field? >> >> >> >> On Fri, Mar 16, 2012 at 11:35 PM, Peter Ehlers > wrote: >> >>> On 2012-03-16 10:48, Ashish Agarwal wrote: >> >> Line 10 has City and State that too separated by comma. For line 10 >> how can I read differently as compared to the other lines? >> >>> >> >>> >> >>> Edit the file and put quotes around the city-state combination: >> >>> "Raleigh, North Carol" >> >>> >> >> >> >> __ >> >> R-help@r-project.org mailing list >> >> https://stat.ethz.ch/mailman/listinfo/r-help >> >> PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html >> >> and provide commented, minimal, self-contained, reproducible code. >> > >> > >> > >> > -- >> > Jim Holtman >> > Data Munger Guru >> > >> > What is the problem that you are trying to solve? >> > Tell me what you want to do, not how you want to do it. >> >> __ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.or
Re: [R] Problem reading mixed CSV file
Hi > This is quite a CPu consuming process. My system got hung up for the > big file I have. > > Within the for loop that you have suggested, can't I have a case > statement for different value of nfields to be read and specify what > format does the variable needs to be read? > something like > case > # input format for 6 fields > when nFields == 6 > read.csv as string, string, string, numeric, numeric, numeric into dataframe1 > #input format for 7 fields > when nFields == 7 > read.csv as string, string, string, string, numeric, numeric, numeric > into dataframe2 > end case > # Output the two dataframes via some way of tracking the original line > numbers of the input file - similar to _N_ in SAS > . Dataframe1 to be outputed as it is while in dataframe2, > concatenating the 3rd and the 4th strings. > > Could you please help with the format for the above? I would follow Jims suggestion, nFields <- count.fields(fileName, sep = ',') count fields and read chunks to different files by using scan with modifying skip and nlines parameters. However if there is only few lines which differ it would be better to correct those few lines manually in some suitable editor. Elaborating omnipotent function for reading any kind of corrupted/nonstandard files seems to me suited only if you expect to read such files many times. Regards Petr > > > > On Sat, Mar 17, 2012 at 4:54 AM, jim holtman wrote: > > Here is a solution that looks for the line with 7 elements and inserts > > the quotes: > > > > > >> fileName <- '/temp/text.txt' > >> input <- readLines(fileName) > >> # count the fields to find 7 > >> nFields <- count.fields(fileName, sep = ',') > >> # now fix the data > >> for (i in which(nFields == 7)){ > > + # split on comma > > + z <- strsplit(input[i], ',')[[1]] > > + input[i] <- paste(z[1], z[2] > > + , paste('"', z[3], ',', z[4], '"', sep = '') # put on quotes > > + , z[5], z[6], z[7], sep = ',' > > + ) > > + } > >> > >> # now read in the data > >> result <- read.table(textConnection(input), sep = ',') > >> > >> result > > V1 V2 V3 V4 V5 V6 > > 1 1968 21 0 > > 2 Boston 1968 13 0 > > 3 Boston 1968 18 0 > > 4 Chicago 1967 44 0 > > 5 Providence 1968 17 0 > > 6 Providence 1969 48 0 > > 7 Binky 1968 24 0 > > 8 Chicago 1968 23 0 > > 9 Dally 1968 7 0 > > 10 Raleigh, North Carol 1968 25 0 > > 11 Addy ABC-Dogs Stars-W8.1Providence 1968 38 0 > > 12 DEF_REQPRF/ Dartmouth 1967 31 1 > > 13 PL 1967 38 1 > > 14 XY PopatLal 1967 5 1 > > 15 XY PopatLal 1967 6 8 > > 16 XY PopatLal 1967 7 7 > > 17 XY PopatLal 1967 9 1 > > 18 XY PopatLal 1967 10 1 > > 19 XY PopatLal 1967 13 1 > > 20 XY PopatLal Boston 1967 6 1 > > 21 XY PopatLal Boston 1967 7 11 > > 22 XY PopatLal Boston 1967 9 2 > > 23 XY PopatLal Boston 1967 10 3 > > 24 XY PopatLal Boston 1967 7 2 > >> > > > > > > On Fri, Mar 16, 2012 at 2:17 PM, Ashish Agarwal > > wrote: > >> I have a file that is 5000 records and to edit that file is not easy. > >> Is there any way to line 10 differently to account for changes in the > >> third field? > >> > >> On Fri, Mar 16, 2012 at 11:35 PM, Peter Ehlers wrote: > >>> On 2012-03-16 10:48, Ashish Agarwal wrote: > > Line 10 has City and State that too separated by comma. For line 10 > how can I read differently as compared to the other lines? > >>> > >>> > >>> Edit the file and put quotes around the city-state combination: > >>> "Raleigh, North Carol" > >>> > >> > >> __ > >> R-help@r-project.org mailing list > >> https://stat.ethz.ch/mailman/listinfo/r-help > >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > >> and provide commented, minimal, self-contained, reproducible code. > > > > > > > > -- > > Jim Holtman > > Data Munger Guru > > > > What is the problem that you are trying to solve? > > Tell me what you want
Re: [R] Problem reading mixed CSV file
How big is the file? In the example I sent I waa using 'textConnection' to reread the input. If the file is large, this can be slow. You will have better luck writing the converted data outmto a temporarynfile and reading it right back in. I am not such exactly what you are asking. You can crate output file names based on the input file name. What is it you want to do with the 'case' statement? Sent from my iPad On Mar 19, 2012, at 2:46, Ashish Agarwal wrote: > This is quite a CPu consuming process. My system got hung up for the > big file I have. > > Within the for loop that you have suggested, can't I have a case > statement for different value of nfields to be read and specify what > format does the variable needs to be read? > something like > case > # input format for 6 fields > when nFields == 6 > read.csv as string, string, string, numeric, numeric, numeric into dataframe1 > #input format for 7 fields > when nFields == 7 > read.csv as string, string, string, string, numeric, numeric, numeric > into dataframe2 > end case > # Output the two dataframes via some way of tracking the original line > numbers of the input file - similar to _N_ in SAS > . Dataframe1 to be outputed as it is while in dataframe2, > concatenating the 3rd and the 4th strings. > > Could you please help with the format for the above? > > > > On Sat, Mar 17, 2012 at 4:54 AM, jim holtman wrote: >> Here is a solution that looks for the line with 7 elements and inserts >> the quotes: >> >> >>> fileName <- '/temp/text.txt' >>> input <- readLines(fileName) >>> # count the fields to find 7 >>> nFields <- count.fields(fileName, sep = ',') >>> # now fix the data >>> for (i in which(nFields == 7)){ >> + # split on comma >> + z <- strsplit(input[i], ',')[[1]] >> + input[i] <- paste(z[1], z[2] >> + , paste('"', z[3], ',', z[4], '"', sep = '') # put on quotes >> + , z[5], z[6], z[7], sep = ',' >> + ) >> + } >>> >>> # now read in the data >>> result <- read.table(textConnection(input), sep = ',') >>> >>> result >> V1 V2 V3 V4 V5 V6 >> 1 1968 21 0 >> 2 Boston 1968 13 0 >> 3 Boston 1968 18 0 >> 4 Chicago 1967 44 0 >> 5 Providence 1968 17 0 >> 6 Providence 1969 48 0 >> 7 Binky 1968 24 0 >> 8 Chicago 1968 23 0 >> 9 Dally 1968 7 0 >> 10 Raleigh, North Carol 1968 25 0 >> 11 Addy ABC-Dogs Stars-W8.1Providence 1968 38 0 >> 12 DEF_REQPRF/ Dartmouth 1967 31 1 >> 13 PL 1967 38 1 >> 14 XY PopatLal 1967 5 1 >> 15 XY PopatLal 1967 6 8 >> 16 XY PopatLal 1967 7 7 >> 17 XY PopatLal 1967 9 1 >> 18 XY PopatLal 1967 10 1 >> 19 XY PopatLal 1967 13 1 >> 20 XY PopatLal Boston 1967 6 1 >> 21 XY PopatLal Boston 1967 7 11 >> 22 XY PopatLal Boston 1967 9 2 >> 23 XY PopatLal Boston 1967 10 3 >> 24 XY PopatLal Boston 1967 7 2 >>> >> >> >> On Fri, Mar 16, 2012 at 2:17 PM, Ashish Agarwal >> wrote: >>> I have a file that is 5000 records and to edit that file is not easy. >>> Is there any way to line 10 differently to account for changes in the >>> third field? >>> >>> On Fri, Mar 16, 2012 at 11:35 PM, Peter Ehlers wrote: On 2012-03-16 10:48, Ashish Agarwal wrote: > > Line 10 has City and State that too separated by comma. For line 10 > how can I read differently as compared to the other lines? Edit the file and put quotes around the city-state combination: "Raleigh, North Carol" >>> >>> __ >>> R-help@r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >> >> >> >> -- >> Jim Holtman >> Data Munger Guru >> >> What is the problem that you are trying to solve? >> Tell me what you want to do, not how you want to do it. __
Re: [R] Problem reading mixed CSV file
This is quite a CPu consuming process. My system got hung up for the big file I have. Within the for loop that you have suggested, can't I have a case statement for different value of nfields to be read and specify what format does the variable needs to be read? something like case # input format for 6 fields when nFields == 6 read.csv as string, string, string, numeric, numeric, numeric into dataframe1 #input format for 7 fields when nFields == 7 read.csv as string, string, string, string, numeric, numeric, numeric into dataframe2 end case # Output the two dataframes via some way of tracking the original line numbers of the input file - similar to _N_ in SAS . Dataframe1 to be outputed as it is while in dataframe2, concatenating the 3rd and the 4th strings. Could you please help with the format for the above? On Sat, Mar 17, 2012 at 4:54 AM, jim holtman wrote: > Here is a solution that looks for the line with 7 elements and inserts > the quotes: > > >> fileName <- '/temp/text.txt' >> input <- readLines(fileName) >> # count the fields to find 7 >> nFields <- count.fields(fileName, sep = ',') >> # now fix the data >> for (i in which(nFields == 7)){ > + # split on comma > + z <- strsplit(input[i], ',')[[1]] > + input[i] <- paste(z[1], z[2] > + , paste('"', z[3], ',', z[4], '"', sep = '') # put on quotes > + , z[5], z[6], z[7], sep = ',' > + ) > + } >> >> # now read in the data >> result <- read.table(textConnection(input), sep = ',') >> >> result > V1 V2 V3 V4 V5 V6 > 1 1968 21 0 > 2 Boston 1968 13 0 > 3 Boston 1968 18 0 > 4 Chicago 1967 44 0 > 5 Providence 1968 17 0 > 6 Providence 1969 48 0 > 7 Binky 1968 24 0 > 8 Chicago 1968 23 0 > 9 Dally 1968 7 0 > 10 Raleigh, North Carol 1968 25 0 > 11 Addy ABC-Dogs Stars-W8.1 Providence 1968 38 0 > 12 DEF_REQPRF/ Dartmouth 1967 31 1 > 13 PL 1967 38 1 > 14 XY PopatLal 1967 5 1 > 15 XY PopatLal 1967 6 8 > 16 XY PopatLal 1967 7 7 > 17 XY PopatLal 1967 9 1 > 18 XY PopatLal 1967 10 1 > 19 XY PopatLal 1967 13 1 > 20 XY PopatLal Boston 1967 6 1 > 21 XY PopatLal Boston 1967 7 11 > 22 XY PopatLal Boston 1967 9 2 > 23 XY PopatLal Boston 1967 10 3 > 24 XY PopatLal Boston 1967 7 2 >> > > > On Fri, Mar 16, 2012 at 2:17 PM, Ashish Agarwal > wrote: >> I have a file that is 5000 records and to edit that file is not easy. >> Is there any way to line 10 differently to account for changes in the >> third field? >> >> On Fri, Mar 16, 2012 at 11:35 PM, Peter Ehlers wrote: >>> On 2012-03-16 10:48, Ashish Agarwal wrote: Line 10 has City and State that too separated by comma. For line 10 how can I read differently as compared to the other lines? >>> >>> >>> Edit the file and put quotes around the city-state combination: >>> "Raleigh, North Carol" >>> >> >> __ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > > > -- > Jim Holtman > Data Munger Guru > > What is the problem that you are trying to solve? > Tell me what you want to do, not how you want to do it. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problem reading mixed CSV file
Here is a solution that looks for the line with 7 elements and inserts the quotes: > fileName <- '/temp/text.txt' > input <- readLines(fileName) > # count the fields to find 7 > nFields <- count.fields(fileName, sep = ',') > # now fix the data > for (i in which(nFields == 7)){ + # split on comma + z <- strsplit(input[i], ',')[[1]] + input[i] <- paste(z[1], z[2] + , paste('"', z[3], ',', z[4], '"', sep = '') # put on quotes + , z[5], z[6], z[7], sep = ',' + ) + } > > # now read in the data > result <- read.table(textConnection(input), sep = ',') > > result V1 V2 V3 V4 V5 V6 1 1968 21 0 2 Boston 1968 13 0 3 Boston 1968 18 0 4 Chicago 1967 44 0 5 Providence 1968 17 0 6 Providence 1969 48 0 7 Binky 1968 24 0 8 Chicago 1968 23 0 9 Dally 1968 7 0 10 Raleigh, North Carol 1968 25 0 11 Addy ABC-Dogs Stars-W8.1Providence 1968 38 0 12 DEF_REQPRF/ Dartmouth 1967 31 1 13 PL 1967 38 1 14 XY PopatLal 1967 5 1 15 XY PopatLal 1967 6 8 16 XY PopatLal 1967 7 7 17 XY PopatLal 1967 9 1 18 XY PopatLal 1967 10 1 19 XY PopatLal 1967 13 1 20 XY PopatLal Boston 1967 6 1 21 XY PopatLal Boston 1967 7 11 22 XY PopatLal Boston 1967 9 2 23 XY PopatLal Boston 1967 10 3 24 XY PopatLal Boston 1967 7 2 > On Fri, Mar 16, 2012 at 2:17 PM, Ashish Agarwal wrote: > I have a file that is 5000 records and to edit that file is not easy. > Is there any way to line 10 differently to account for changes in the > third field? > > On Fri, Mar 16, 2012 at 11:35 PM, Peter Ehlers wrote: >> On 2012-03-16 10:48, Ashish Agarwal wrote: >>> >>> Line 10 has City and State that too separated by comma. For line 10 >>> how can I read differently as compared to the other lines? >> >> >> Edit the file and put quotes around the city-state combination: >> "Raleigh, North Carol" >> > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problem reading mixed CSV file
I have a file that is 5000 records and to edit that file is not easy. Is there any way to line 10 differently to account for changes in the third field? On Fri, Mar 16, 2012 at 11:35 PM, Peter Ehlers wrote: > On 2012-03-16 10:48, Ashish Agarwal wrote: >> >> Line 10 has City and State that too separated by comma. For line 10 >> how can I read differently as compared to the other lines? > > > Edit the file and put quotes around the city-state combination: > "Raleigh, North Carol" > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problem reading mixed CSV file
On 2012-03-16 10:48, Ashish Agarwal wrote: Line 10 has City and State that too separated by comma. For line 10 how can I read differently as compared to the other lines? Edit the file and put quotes around the city-state combination: "Raleigh, North Carol" Also: always run count.fields() on your files before importing. Peter Ehlers On Fri, Mar 16, 2012 at 10:59 PM, David Winsemius wrote: On Mar 16, 2012, at 1:11 PM, Ashish Agarwal wrote: I want to import this CSV file into R. The CSV file is ,,,1968,21,0 ,,Boston,1968,13,0 ,,Boston,1968,18,0 ,,Chicago,1967,44,0 ,,Providence,1968,17,0 ,,Providence,1969,48,0 ,,Binky,1968,24,0 ,,Chicago,1968,23,0 ,,Dally,1968,7,0 ,,Raleigh, North Carol,1968,25,0 Addy ABC-Dogs Stars-W8.1,,Providence,1968,38,0 DEF_REQPRF/,,Dartmouth,1967,31,1 PL,,,1967,38,1 XY,PopatLal,,1967,5,1 XY,PopatLal,,1967,6,8 XY,PopatLal,,1967,7,7 XY,PopatLal,,1967,9,1 XY,PopatLal,,1967,10,1 XY,PopatLal,,1967,13,1 XY,PopatLal,Boston,1967,6,1 XY,PopatLal,Boston,1967,7,11 XY,PopatLal,Boston,1967,9,2 XY,PopatLal,Boston,1967,10,3 XY,PopatLal,Boston,1967,7,2 I tried using scan and read.table but results are not visible :( scan("D:/data/temp.csv",list("","","",0,0,0),sep=",") ->x Read 51 records x [[1]] [1] "ÿþ" "" "" "" "" "" "" "" "" "" "" "" "" "" "" [16] "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" [31] "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" [46] "" "" "" "" "" "" read.table("D:/data/temp.csv",header=F,sep=",") ->x x V1 V2 1 ÿþ NA 2 NA 3 NA 4 NA Can somebody please help in importing this CSV file? Looks like an encoding mismatch. You have not offered the requested information about you setup so further comment would all be guesswork. But you can perhaps educate yourself by reading: ?Encoding And line ten has 7 elements. count.fields(textConnection(",,,1968,21,0 + ,,Boston,1968,13,0 + ,,Boston,1968,18,0 + ,,Chicago,1967,44,0 + ,,Providence,1968,17,0 + ,,Providence,1969,48,0 + ,,Binky,1968,24,0 + ,,Chicago,1968,23,0 + ,,Dally,1968,7,0 + ,,Raleigh, North Carol,1968,25,0 + Addy ABC-Dogs Stars-W8.1,,Providence,1968,38,0 + DEF_REQPRF/,,Dartmouth,1967,31,1 + PL,,,1967,38,1 + XY,PopatLal,,1967,5,1 + XY,PopatLal,,1967,6,8 + XY,PopatLal,,1967,7,7 + XY,PopatLal,,1967,9,1 + XY,PopatLal,,1967,10,1 + XY,PopatLal,,1967,13,1 + XY,PopatLal,Boston,1967,6,1 + XY,PopatLal,Boston,1967,7,11 + XY,PopatLal,Boston,1967,9,2 + XY,PopatLal,Boston,1967,10,3 + XY,PopatLal,Boston,1967,7,2"),sep=",") [1] 6 6 6 6 6 6 6 6 6 7 6 6 6 6 6 6 6 6 6 6 6 6 6 6 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problem reading mixed CSV file
Line 10 has City and State that too separated by comma. For line 10 how can I read differently as compared to the other lines? On Fri, Mar 16, 2012 at 10:59 PM, David Winsemius wrote: > > On Mar 16, 2012, at 1:11 PM, Ashish Agarwal wrote: > >> I want to import this CSV file into R. >> >> The CSV file is >> >> ,,,1968,21,0 >> ,,Boston,1968,13,0 >> ,,Boston,1968,18,0 >> ,,Chicago,1967,44,0 >> ,,Providence,1968,17,0 >> ,,Providence,1969,48,0 >> ,,Binky,1968,24,0 >> ,,Chicago,1968,23,0 >> ,,Dally,1968,7,0 >> ,,Raleigh, North Carol,1968,25,0 >> Addy ABC-Dogs Stars-W8.1,,Providence,1968,38,0 >> DEF_REQPRF/,,Dartmouth,1967,31,1 >> PL,,,1967,38,1 >> XY,PopatLal,,1967,5,1 >> XY,PopatLal,,1967,6,8 >> XY,PopatLal,,1967,7,7 >> XY,PopatLal,,1967,9,1 >> XY,PopatLal,,1967,10,1 >> XY,PopatLal,,1967,13,1 >> XY,PopatLal,Boston,1967,6,1 >> XY,PopatLal,Boston,1967,7,11 >> XY,PopatLal,Boston,1967,9,2 >> XY,PopatLal,Boston,1967,10,3 >> XY,PopatLal,Boston,1967,7,2 >> >> I tried using scan and read.table but results are not visible :( >> >>> scan("D:/data/temp.csv",list("","","",0,0,0),sep=",") ->x >> >> Read 51 records >>> >>> x >> >> [[1]] >> [1] "ÿþ" "" "" "" "" "" "" "" "" "" "" "" "" "" >> "" >> [16] "" "" "" "" "" "" "" "" "" "" "" "" "" "" >> "" >> [31] "" "" "" "" "" "" "" "" "" "" "" "" "" "" >> "" >> [46] "" "" "" "" "" "" >> >> >>> read.table("D:/data/temp.csv",header=F,sep=",") ->x >>> x >> >> V1 V2 >> 1 ÿþ NA >> 2 NA >> 3 NA >> 4 NA >> >> Can somebody please help in importing this CSV file? > > > Looks like an encoding mismatch. You have not offered the requested > information about you setup so further comment would all be guesswork. But > you can perhaps educate yourself by reading: > > ?Encoding > > And line ten has 7 elements. > >> count.fields(textConnection(",,,1968,21,0 > + ,,Boston,1968,13,0 > + ,,Boston,1968,18,0 > + ,,Chicago,1967,44,0 > + ,,Providence,1968,17,0 > + ,,Providence,1969,48,0 > + ,,Binky,1968,24,0 > + ,,Chicago,1968,23,0 > + ,,Dally,1968,7,0 > + ,,Raleigh, North Carol,1968,25,0 > + Addy ABC-Dogs Stars-W8.1,,Providence,1968,38,0 > + DEF_REQPRF/,,Dartmouth,1967,31,1 > + PL,,,1967,38,1 > + XY,PopatLal,,1967,5,1 > + XY,PopatLal,,1967,6,8 > + XY,PopatLal,,1967,7,7 > + XY,PopatLal,,1967,9,1 > + XY,PopatLal,,1967,10,1 > + XY,PopatLal,,1967,13,1 > + XY,PopatLal,Boston,1967,6,1 > + XY,PopatLal,Boston,1967,7,11 > + XY,PopatLal,Boston,1967,9,2 > + XY,PopatLal,Boston,1967,10,3 > + XY,PopatLal,Boston,1967,7,2"),sep=",") > [1] 6 6 6 6 6 6 6 6 6 7 6 6 6 6 6 6 6 6 6 6 6 6 6 6 > > >> >> __ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > > David Winsemius, MD > West Hartford, CT > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problem reading mixed CSV file
On Mar 16, 2012, at 1:11 PM, Ashish Agarwal wrote: I want to import this CSV file into R. The CSV file is ,,,1968,21,0 ,,Boston,1968,13,0 ,,Boston,1968,18,0 ,,Chicago,1967,44,0 ,,Providence,1968,17,0 ,,Providence,1969,48,0 ,,Binky,1968,24,0 ,,Chicago,1968,23,0 ,,Dally,1968,7,0 ,,Raleigh, North Carol,1968,25,0 Addy ABC-Dogs Stars-W8.1,,Providence,1968,38,0 DEF_REQPRF/,,Dartmouth,1967,31,1 PL,,,1967,38,1 XY,PopatLal,,1967,5,1 XY,PopatLal,,1967,6,8 XY,PopatLal,,1967,7,7 XY,PopatLal,,1967,9,1 XY,PopatLal,,1967,10,1 XY,PopatLal,,1967,13,1 XY,PopatLal,Boston,1967,6,1 XY,PopatLal,Boston,1967,7,11 XY,PopatLal,Boston,1967,9,2 XY,PopatLal,Boston,1967,10,3 XY,PopatLal,Boston,1967,7,2 I tried using scan and read.table but results are not visible :( scan("D:/data/temp.csv",list("","","",0,0,0),sep=",") ->x Read 51 records x [[1]] [1] "ÿþ" "" "" "" "" "" "" "" "" "" "" "" "" "" "" [16] "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" [31] "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" [46] "" "" "" "" "" "" read.table("D:/data/temp.csv",header=F,sep=",") ->x x V1 V2 1 ÿþ NA 2 NA 3 NA 4 NA Can somebody please help in importing this CSV file? Looks like an encoding mismatch. You have not offered the requested information about you setup so further comment would all be guesswork. But you can perhaps educate yourself by reading: ?Encoding And line ten has 7 elements. > count.fields(textConnection(",,,1968,21,0 + ,,Boston,1968,13,0 + ,,Boston,1968,18,0 + ,,Chicago,1967,44,0 + ,,Providence,1968,17,0 + ,,Providence,1969,48,0 + ,,Binky,1968,24,0 + ,,Chicago,1968,23,0 + ,,Dally,1968,7,0 + ,,Raleigh, North Carol,1968,25,0 + Addy ABC-Dogs Stars-W8.1,,Providence,1968,38,0 + DEF_REQPRF/,,Dartmouth,1967,31,1 + PL,,,1967,38,1 + XY,PopatLal,,1967,5,1 + XY,PopatLal,,1967,6,8 + XY,PopatLal,,1967,7,7 + XY,PopatLal,,1967,9,1 + XY,PopatLal,,1967,10,1 + XY,PopatLal,,1967,13,1 + XY,PopatLal,Boston,1967,6,1 + XY,PopatLal,Boston,1967,7,11 + XY,PopatLal,Boston,1967,9,2 + XY,PopatLal,Boston,1967,10,3 + XY,PopatLal,Boston,1967,7,2"),sep=",") [1] 6 6 6 6 6 6 6 6 6 7 6 6 6 6 6 6 6 6 6 6 6 6 6 6 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problem reading mixed CSV file
I want to import this CSV file into R. The CSV file is ,,,1968,21,0 ,,Boston,1968,13,0 ,,Boston,1968,18,0 ,,Chicago,1967,44,0 ,,Providence,1968,17,0 ,,Providence,1969,48,0 ,,Binky,1968,24,0 ,,Chicago,1968,23,0 ,,Dally,1968,7,0 ,,Raleigh, North Carol,1968,25,0 Addy ABC-Dogs Stars-W8.1,,Providence,1968,38,0 DEF_REQPRF/,,Dartmouth,1967,31,1 PL,,,1967,38,1 XY,PopatLal,,1967,5,1 XY,PopatLal,,1967,6,8 XY,PopatLal,,1967,7,7 XY,PopatLal,,1967,9,1 XY,PopatLal,,1967,10,1 XY,PopatLal,,1967,13,1 XY,PopatLal,Boston,1967,6,1 XY,PopatLal,Boston,1967,7,11 XY,PopatLal,Boston,1967,9,2 XY,PopatLal,Boston,1967,10,3 XY,PopatLal,Boston,1967,7,2 I tried using scan and read.table but results are not visible :( > scan("D:/data/temp.csv",list("","","",0,0,0),sep=",") ->x Read 51 records > x [[1]] [1] "ÿþ" "" "" "" "" "" "" "" "" "" "" "" "" "" "" [16] "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" [31] "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" [46] "" "" "" "" "" "" > read.table("D:/data/temp.csv",header=F,sep=",") ->x > x V1 V2 1 ÿþ NA 2 NA 3 NA 4 NA Can somebody please help in importing this CSV file? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problem reading mixed CSV file
What do you mean by Mixed? If a field has a comma, then it is supposed be to enclosed in quotes. You could preprocess the file looking for cases where there are more fields than there there are supposed to be, and if they are always in the same place, you could enclose them in quotes and then reprocess. You would really have to show what the file looks like for the different "mixed" cases to get a good answer to your question. And of course, R can do it, if we knew what it was we are supposed to do. So at least provide commented, minimal, self-contained, reproducible code and data. On Fri, Mar 16, 2012 at 7:03 AM, Ashish Agarwal wrote: > I am having trouble reading this CSV file in R. There are six attributes > that I need to read - CVar1, CVar2, Location, Year, Nvar3, Nvar4. Can > somebody help in reading this file? > On line 10 it has city and state separated by comma. I had been a user of > SAS where I can use different format to read in for this line. Can I do > this in R too? > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.