Re: [R] Problem reading mixed CSV file

2012-03-26 Thread Ashish Agarwal
Need to fix up the file having 6 and 7 columns to be read as 6 columns
only. Here is the working. Can somebody please let me know how do I
maintain the order in which rows were read and append the two files into
one:

> count.fields(textConnection("LL1532Ap,ABC# Depot-A+,,1971,8,2
LL1532Ap,ABC# Depot-A+,Bhutan,1971,6,1
LL1532Ap,ABC# Depot-A+,China,1971,17,1
LL1532Ap,ABC# Depot-A+,China,1971,33,1
LL1532Ap,ABC# Depot-A+,,1971,16,2
LL1532Ap,ABC# Depot-A+,,1971,17,1
LL1532Ap,ABC# Depot-A+,HongKong, Asia,1971,22,1
LL1532Ap,ABC# Depot-A+,HongKong, Asia,1971,49,1
LL1532Ap,ABC# Depot-A+,,1971,20,1
LL1532Ap,ABC# Depot-A+,Kazakhstan,1971,27,1
LL1532Ap,ABC# Depot-A+,,1971,33,1
LL1532Ap,ABC# Depot-A+,Kazakhstan, Asia,1973,15,1
LL1532Ap,ABC# Depot-A+,Romania-Europe,1971,10,1
LL1532Ap,ABC# Depot-A+,Romania-Europe,1973,4,1
LL1532Ap,ABC# Depot-A+,Sanchez-America,1973,9,1
LL1532An,ABC# Depot-A-,,1971,8,2"),sep=",",comment.char = "")
 [1] 6 6 6 6 6 6 7 7 6 6 6 7 6 6 6 6

> filName <- "temp.csv"
> nFields <- count.fields(filName, sep = ',', comment.char = "")
> input <- readLines(filName)
> writeLines(input[nFields == 6], con = (file = "6fields.csv"))
> writeLines(input[nFields == 7], con = (file = "7fields.csv"))

> filName <- "7fields.csv"
> length(count.fields(filName, sep = ',', comment.char = "")) -> nFields2
> input <- readLines(filName)
> for (i in 1:nFields2){
 strsplit(input[i],",")[[1]] -> z
 paste (z[1], z[2], paste('"',z[3],',',z[4],'"',sep =' '),
z[5],z[6],z[7],sep = ',') -> input[i]
}
> result <- read.table(textConnection(input), sep = ',')

Need the output to look like
Sno ID Title Location Year x y 1 LL1532Ap ABC# Depot-A+  1971 8 2 2
LL1532Ap ABC# Depot-A+ Bhutan 1971 6 1 3 LL1532Ap ABC# Depot-A+ China 1971
17 1 4 LL1532Ap ABC# Depot-A+ China 1971 33 1 5 LL1532Ap ABC# Depot-A+  1971
16 2 6 LL1532Ap ABC# Depot-A+  1971 17 1 7 LL1532Ap ABC# Depot-A+ HongKong,
Asia 1971 22 1 8 LL1532Ap ABC# Depot-A+ HongKong, Asia 1971 49 1 9
LL1532Ap ABC#
Depot-A+  1971 20 1 10 LL1532Ap ABC# Depot-A+ Kazakhstan 1971 27 1 11
LL1532Ap ABC# Depot-A+  1971 33 1 12 LL1532Ap ABC# Depot-A+ Kazakhstan, Asia
1973 15 1 13 LL1532Ap ABC# Depot-A+ Romania-Europe 1971 10 1 14 LL1532Ap ABC#
Depot-A+ Romania-Europe 1973 4 1 15 LL1532Ap ABC# Depot-A+ Sanchez-America
1973 9 1 16 LL1532An ABC# Depot-A-  1971 8 2

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Problem reading mixed CSV file

2012-03-25 Thread Ashish Agarwal
thanks. works very well.

On Mon, Mar 26, 2012 at 12:12 PM, Berend Hasselman  wrote:

>
> On 26-03-2012, at 08:40, Ashish Agarwal wrote:
>
> > comment.char = NULL does not work.
> > Is there any way to make it NULL rather than having a specific character
> like '%'?
> >
> Why don't you try something?
>
> comment.char=""
>
> looks quite obvious and worth a try.
>
> Berend
>
>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Problem reading mixed CSV file

2012-03-25 Thread Berend Hasselman

On 26-03-2012, at 08:33, Ashish Agarwal wrote:

> OMG.
>  
> I think it uses comment character # as default in the argument.
>  
> comment.char = "#"
>  
> How do I turn it off?
???

How about comment.car="%" for example?

Berend

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Problem reading mixed CSV file

2012-03-25 Thread Berend Hasselman

On 26-03-2012, at 08:40, Ashish Agarwal wrote:

> comment.char = NULL does not work.
> Is there any way to make it NULL rather than having a specific character like 
> '%'?
> 
Why don't you try something?

comment.char=""

looks quite obvious and worth a try.

Berend

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Problem reading mixed CSV file

2012-03-25 Thread Ashish Agarwal
comment.char = NULL does not work.
Is there any way to make it NULL rather than having a specific character
like '%'?


On Mon, Mar 26, 2012 at 12:06 PM, Berend Hasselman  wrote:

>
> > comment.char = "#"
> >
> > How do I turn it off?
> ???
>
> How about comment.car="%" for example?
>
> Berend
>
>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Problem reading mixed CSV file

2012-03-25 Thread Berend Hasselman

On 26-03-2012, at 08:26, Berend Hasselman wrote:

> 
> On 26-03-2012, at 08:16, Ashish Agarwal wrote:
> 
>> Why does the output in the following say 2 and not 6?
>> 
>>> count.fields(textConnection("LL1532Ap,ABC# Depot-A+,,1971,8,2
>> + LL1532Ap,ABC# Depot-A+,Bhutan,1971,6,1
>> + LL1532Ap,ABC# Depot-A+,China,1971,17,1
>> + LL1532Ap,ABC# Depot-A+,China,1971,33,1
>> + LL1532Ap,ABC# Depot-A+,HongKong,1971,16,2
>> + LL1532Ap,ABC# Depot-A+,HongKong,1971,17,1
>> + LL1532Ap,ABC# Depot-A+,HongKong,1971,22,1
>> + LL1532Ap,ABC# Depot-A+,HongKong,1971,49,1
>> + LL1532Ap,ABC# Depot-A+,Kazakhstan,1971,20,1
>> + LL1532Ap,ABC# Depot-A+,Kazakhstan,1971,27,1
>> + LL1532Ap,ABC# Depot-A+,Kazakhstan,1971,33,1
>> + LL1532Ap,ABC# Depot-A+,Kazakhstan,1973,15,1
>> + LL1532Ap,ABC# Depot-A+,Romania-Europe,1971,10,1
>> + LL1532Ap,ABC# Depot-A+,Romania-Europe,1973,4,1
>> + LL1532Ap,ABC# Depot-A+,Sanchez-America,1973,9,1
>> + LL1532An,ABC# Depot-A-,,1971,8,2"),sep=",")
>> [1] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
>>> 
>> 
> 
> Have you done
> 
> ?count.fields
> 
> and read what it says about the argument "sep" and the default?
> 
> So if a comma is the separator what value would you give sep?
Sorry I should have had a closer look at what you had done.

But still ?count.fields should have given you a pointer.

Look at what it says in the entry for argument "comment.char".

You have a character # in your text.
Set comment.char to something other than # .

Berend

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Problem reading mixed CSV file

2012-03-25 Thread Ashish Agarwal
OMG.

I think it uses comment character # as default in the argument.

comment.char = "#"

How do I turn it off?

On Mon, Mar 26, 2012 at 11:56 AM, Berend Hasselman  wrote:

>
> On 26-03-2012, at 08:16, Ashish Agarwal wrote:
>
> > Why does the output in the following say 2 and not 6?
> >
> >> count.fields(textConnection("LL1532Ap,ABC# Depot-A+,,1971,8,2
> > + LL1532Ap,ABC# Depot-A+,Bhutan,1971,6,1
> > + LL1532Ap,ABC# Depot-A+,China,1971,17,1
> > + LL1532Ap,ABC# Depot-A+,China,1971,33,1
> > + LL1532Ap,ABC# Depot-A+,HongKong,1971,16,2
> > + LL1532Ap,ABC# Depot-A+,HongKong,1971,17,1
> > + LL1532Ap,ABC# Depot-A+,HongKong,1971,22,1
> > + LL1532Ap,ABC# Depot-A+,HongKong,1971,49,1
> > + LL1532Ap,ABC# Depot-A+,Kazakhstan,1971,20,1
> > + LL1532Ap,ABC# Depot-A+,Kazakhstan,1971,27,1
> > + LL1532Ap,ABC# Depot-A+,Kazakhstan,1971,33,1
> > + LL1532Ap,ABC# Depot-A+,Kazakhstan,1973,15,1
> > + LL1532Ap,ABC# Depot-A+,Romania-Europe,1971,10,1
> > + LL1532Ap,ABC# Depot-A+,Romania-Europe,1973,4,1
> > + LL1532Ap,ABC# Depot-A+,Sanchez-America,1973,9,1
> > + LL1532An,ABC# Depot-A-,,1971,8,2"),sep=",")
> > [1] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
> >>
> >
>
>
> Have you done
>
> ?count.fields
>
> and read what it says about the argument "sep" and the default?
>
> So if a comma is the separator what value would you give sep?
>
> Berend
>
>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Problem reading mixed CSV file

2012-03-25 Thread Berend Hasselman

On 26-03-2012, at 08:16, Ashish Agarwal wrote:

> Why does the output in the following say 2 and not 6?
> 
>> count.fields(textConnection("LL1532Ap,ABC# Depot-A+,,1971,8,2
> + LL1532Ap,ABC# Depot-A+,Bhutan,1971,6,1
> + LL1532Ap,ABC# Depot-A+,China,1971,17,1
> + LL1532Ap,ABC# Depot-A+,China,1971,33,1
> + LL1532Ap,ABC# Depot-A+,HongKong,1971,16,2
> + LL1532Ap,ABC# Depot-A+,HongKong,1971,17,1
> + LL1532Ap,ABC# Depot-A+,HongKong,1971,22,1
> + LL1532Ap,ABC# Depot-A+,HongKong,1971,49,1
> + LL1532Ap,ABC# Depot-A+,Kazakhstan,1971,20,1
> + LL1532Ap,ABC# Depot-A+,Kazakhstan,1971,27,1
> + LL1532Ap,ABC# Depot-A+,Kazakhstan,1971,33,1
> + LL1532Ap,ABC# Depot-A+,Kazakhstan,1973,15,1
> + LL1532Ap,ABC# Depot-A+,Romania-Europe,1971,10,1
> + LL1532Ap,ABC# Depot-A+,Romania-Europe,1973,4,1
> + LL1532Ap,ABC# Depot-A+,Sanchez-America,1973,9,1
> + LL1532An,ABC# Depot-A-,,1971,8,2"),sep=",")
> [1] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
>> 
> 


Have you done

?count.fields

and read what it says about the argument "sep" and the default?

So if a comma is the separator what value would you give sep?

Berend

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Problem reading mixed CSV file

2012-03-25 Thread Ashish Agarwal
Why does the output in the following say 2 and not 6?

> count.fields(textConnection("LL1532Ap,ABC# Depot-A+,,1971,8,2
+ LL1532Ap,ABC# Depot-A+,Bhutan,1971,6,1
+ LL1532Ap,ABC# Depot-A+,China,1971,17,1
+ LL1532Ap,ABC# Depot-A+,China,1971,33,1
+ LL1532Ap,ABC# Depot-A+,HongKong,1971,16,2
+ LL1532Ap,ABC# Depot-A+,HongKong,1971,17,1
+ LL1532Ap,ABC# Depot-A+,HongKong,1971,22,1
+ LL1532Ap,ABC# Depot-A+,HongKong,1971,49,1
+ LL1532Ap,ABC# Depot-A+,Kazakhstan,1971,20,1
+ LL1532Ap,ABC# Depot-A+,Kazakhstan,1971,27,1
+ LL1532Ap,ABC# Depot-A+,Kazakhstan,1971,33,1
+ LL1532Ap,ABC# Depot-A+,Kazakhstan,1973,15,1
+ LL1532Ap,ABC# Depot-A+,Romania-Europe,1971,10,1
+ LL1532Ap,ABC# Depot-A+,Romania-Europe,1973,4,1
+ LL1532Ap,ABC# Depot-A+,Sanchez-America,1973,9,1
+ LL1532An,ABC# Depot-A-,,1971,8,2"),sep=",")
 [1] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
>




On Fri, Mar 16, 2012 at 10:59 PM, David Winsemius wrote:
Looks like an encoding mismatch. You have not offered the requested
information about you setup so further comment would all be guesswork. But
you can perhaps educate yourself by reading:

?Encoding

And line ten has 7 elements.

> count.fields(textConnection(",**,,1968,21,0
+ ,,Boston,1968,13,0
+ ,,Boston,1968,18,0
+ ,,Chicago,1967,44,0
+ ,,Providence,1968,17,0
+ ,,Providence,1969,48,0
+ ,,Binky,1968,24,0
+ ,,Chicago,1968,23,0
+ ,,Dally,1968,7,0
+ ,,Raleigh, North Carol,1968,25,0
+ Addy ABC-Dogs Stars-W8.1,,Providence,1968,**38,0
+ DEF_REQPRF/,,Dartmouth,1967,**31,1
+ PL,,,1967,38,1
+ XY,PopatLal,,1967,5,1
+ XY,PopatLal,,1967,6,8
+ XY,PopatLal,,1967,7,7
+ XY,PopatLal,,1967,9,1
+ XY,PopatLal,,1967,10,1
+ XY,PopatLal,,1967,13,1
+ XY,PopatLal,Boston,1967,6,1
+ XY,PopatLal,Boston,1967,7,11
+ XY,PopatLal,Boston,1967,9,2
+ XY,PopatLal,Boston,1967,10,3
+ XY,PopatLal,Boston,1967,7,2"),**sep=",")
 [1] 6 6 6 6 6 6 6 6 6 7 6 6 6 6 6 6 6 6 6 6 6 6 6 6
>
>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Problem reading mixed CSV file

2012-03-20 Thread jim holtman
On Tue, Mar 20, 2012 at 3:17 PM, Ashish Agarwal
 wrote:
> Given x<- count.fields(..) could you pls help in following:
> 1. how to create a data vector with data being line numbers of original file
> where x==6?

That is what the expression:

 writeLines(input[x == 6], file = '6fields.csv')

is doing.  'x == 6' is a logical vector with TRUE in the position of
the line that has 6 fields in it, so it is only extracting the lines
with 6 fields and writing them to the output file.  You probably need
to read the section on "indexing" in the "Intro to R" manual.


> 2. what is the way to read only the nth line (only) of an input file into a
> data vector with first three attributes to be read as string, 4th
> being categorical, 5th and 6th being numeric with width 10?

You might want to give an example of the the line looks like.  I would
use 'readLines' to read in the file and then I could index to the
'nth' line and parse it using 'strsplit' or 'regexpr' depending on its
complexity.  This would depend on the format of the line which has not
been provided.


>
>
> On Tue, Mar 20, 2012 at 9:37 PM, jim holtman  wrote:
>> use 'count.fields' to determine which line have 6 and 7 fields in them.
>>
>> then use 'readLines' to read in the entire file and the use the data
>> from count.fields to write out to separate files"
>>
>> x <- count.fields(...)
>> input <- readLines(..)
>> writeLines(input[x == 6], file = '6fields.csv')
>> writeLines(input[x==7], file = '7fields.csv')
>>
>> On Tue, Mar 20, 2012 at 11:43 AM, Ashish Agarwal
>>  wrote:
>>> The file is 20MB having 2 Million rows.
>>> I understand that I two different formats  - 6 columns and 7 columns.
>>> How do I read chunks to different files by using scan with modifying
>>> skip and nlines parameters?
>>>
>>> On Mon, Mar 19, 2012 at 3:59 PM, Petr PIKAL 
>>> wrote:

 I would follow Jims suggestion,
 nFields <- count.fields(fileName, sep = ',')
 count fields and read chunks to different files by using scan with
 modifying skip and nlines parameters. However if there is only few lines
 which differ it would be better to correct those few lines manually in
 some suitable editor.

 Elaborating omnipotent function for reading any kind of
 corrupted/nonstandard files seems to me suited only if you expect to
 read
 such files many times.

 Regards
 Petr


>
>
>
> On Sat, Mar 17, 2012 at 4:54 AM, jim holtman 
> wrote:
> > Here is a solution that looks for the line with 7 elements and
> > inserts
> > the quotes:
> >
> >
> >> fileName <- '/temp/text.txt'
> >> input <- readLines(fileName)
> >> # count the fields to find 7
> >> nFields <- count.fields(fileName, sep = ',')
> >> # now fix the data
> >> for (i in which(nFields == 7)){
> > +     # split on comma
> > +     z <- strsplit(input[i], ',')[[1]]
> > +     input[i] <- paste(z[1], z[2]
> > +         , paste('"', z[3], ',', z[4], '"', sep = '') # put on
> > quotes
> > +         , z[5], z[6], z[7], sep = ','
> > +         )
> > + }
> >>
> >> # now read in the data
> >> result <- read.table(textConnection(input), sep = ',')
> >>
> >>         result
> >                         V1       V2                   V3   V4 V5 V6
> > 1                                                         1968 21  0
> > 2                                                  Boston 1968 13  0
> > 3                                                  Boston 1968 18  0
> > 4                                                 Chicago 1967 44  0
> > 5                                              Providence 1968 17  0
> > 6                                              Providence 1969 48  0
> > 7                                                   Binky 1968 24  0
> > 8                                                 Chicago 1968 23  0
> > 9                                                   Dally 1968  7  0
> > 10                                   Raleigh, North Carol 1968 25  0
> > 11 Addy ABC-Dogs Stars-W8.1                    Providence 1968 38  0
> > 12              DEF_REQPRF/                     Dartmouth 1967 31  1
> > 13                       PL                               1967 38  1
> > 14                       XY PopatLal                      1967  5  1
> > 15                       XY PopatLal                      1967  6  8
> > 16                       XY PopatLal                      1967  7  7
> > 17                       XY PopatLal                      1967  9  1
> > 18                       XY PopatLal                      1967 10  1
> > 19                       XY PopatLal                      1967 13  1
> > 20                       XY PopatLal               Boston 1967  6  1
> > 21                       XY PopatLal               Boston 1967  7 11
> > 22                       XY 

Re: [R] Problem reading mixed CSV file

2012-03-20 Thread Ashish Agarwal
Given x<- count.fields(..) could you pls help in following:
1. how to create a data vector with data being line numbers of original
file where x==6?
2. what is the way to read only the nth line (only) of an input file into a
data vector with first three attributes to be read as string, 4th
being categorical, 5th and 6th being numeric with width 10?


On Tue, Mar 20, 2012 at 9:37 PM, jim holtman  wrote:
> use 'count.fields' to determine which line have 6 and 7 fields in them.
>
> then use 'readLines' to read in the entire file and the use the data
> from count.fields to write out to separate files"
>
> x <- count.fields(...)
> input <- readLines(..)
> writeLines(input[x == 6], file = '6fields.csv')
> writeLines(input[x==7], file = '7fields.csv')
>
> On Tue, Mar 20, 2012 at 11:43 AM, Ashish Agarwal
>  wrote:
>> The file is 20MB having 2 Million rows.
>> I understand that I two different formats  - 6 columns and 7 columns.
>> How do I read chunks to different files by using scan with modifying
>> skip and nlines parameters?
>>
>> On Mon, Mar 19, 2012 at 3:59 PM, Petr PIKAL 
wrote:
>>>
>>> I would follow Jims suggestion,
>>> nFields <- count.fields(fileName, sep = ',')
>>> count fields and read chunks to different files by using scan with
>>> modifying skip and nlines parameters. However if there is only few lines
>>> which differ it would be better to correct those few lines manually in
>>> some suitable editor.
>>>
>>> Elaborating omnipotent function for reading any kind of
>>> corrupted/nonstandard files seems to me suited only if you expect to
read
>>> such files many times.
>>>
>>> Regards
>>> Petr
>>>
>>>



 On Sat, Mar 17, 2012 at 4:54 AM, jim holtman 
wrote:
 > Here is a solution that looks for the line with 7 elements and
inserts
 > the quotes:
 >
 >
 >> fileName <- '/temp/text.txt'
 >> input <- readLines(fileName)
 >> # count the fields to find 7
 >> nFields <- count.fields(fileName, sep = ',')
 >> # now fix the data
 >> for (i in which(nFields == 7)){
 > + # split on comma
 > + z <- strsplit(input[i], ',')[[1]]
 > + input[i] <- paste(z[1], z[2]
 > + , paste('"', z[3], ',', z[4], '"', sep = '') # put on
quotes
 > + , z[5], z[6], z[7], sep = ','
 > + )
 > + }
 >>
 >> # now read in the data
 >> result <- read.table(textConnection(input), sep = ',')
 >>
 >> result
 > V1   V2   V3   V4 V5 V6
 > 1 1968 21  0
 > 2  Boston 1968 13  0
 > 3  Boston 1968 18  0
 > 4 Chicago 1967 44  0
 > 5  Providence 1968 17  0
 > 6  Providence 1969 48  0
 > 7   Binky 1968 24  0
 > 8 Chicago 1968 23  0
 > 9   Dally 1968  7  0
 > 10   Raleigh, North Carol 1968 25  0
 > 11 Addy ABC-Dogs Stars-W8.1Providence 1968 38  0
 > 12  DEF_REQPRF/ Dartmouth 1967 31  1
 > 13   PL   1967 38  1
 > 14   XY PopatLal  1967  5  1
 > 15   XY PopatLal  1967  6  8
 > 16   XY PopatLal  1967  7  7
 > 17   XY PopatLal  1967  9  1
 > 18   XY PopatLal  1967 10  1
 > 19   XY PopatLal  1967 13  1
 > 20   XY PopatLal   Boston 1967  6  1
 > 21   XY PopatLal   Boston 1967  7 11
 > 22   XY PopatLal   Boston 1967  9  2
 > 23   XY PopatLal   Boston 1967 10  3
 > 24   XY PopatLal   Boston 1967  7  2
 >>
 >
 >
 > On Fri, Mar 16, 2012 at 2:17 PM, Ashish Agarwal
 >  wrote:
 >> I have a file that is 5000 records and to edit that file is not
easy.
 >> Is there any way to line 10 differently to account for changes in
the
 >> third field?
 >>
 >> On Fri, Mar 16, 2012 at 11:35 PM, Peter Ehlers 
>>> wrote:
 >>> On 2012-03-16 10:48, Ashish Agarwal wrote:
 
  Line 10 has City and State that too separated by comma. For line
10
  how can I read differently as compared to the other lines?
 >>>
 >>>
 >>> Edit the file and put quotes around the city-state combination:

Re: [R] Problem reading mixed CSV file

2012-03-20 Thread jim holtman
use 'count.fields' to determine which line have 6 and 7 fields in them.

then use 'readLines' to read in the entire file and the use the data
from count.fields to write out to separate files"

x <- count.fields(...)
input <- readLines(..)
writeLines(input[x == 6], file = '6fields.csv')
writeLines(input[x==7], file = '7fields.csv')

On Tue, Mar 20, 2012 at 11:43 AM, Ashish Agarwal
 wrote:
> The file is 20MB having 2 Million rows.
> I understand that I two different formats  - 6 columns and 7 columns.
> How do I read chunks to different files by using scan with modifying
> skip and nlines parameters?
>
> On Mon, Mar 19, 2012 at 3:59 PM, Petr PIKAL  wrote:
>>
>> I would follow Jims suggestion,
>> nFields <- count.fields(fileName, sep = ',')
>> count fields and read chunks to different files by using scan with
>> modifying skip and nlines parameters. However if there is only few lines
>> which differ it would be better to correct those few lines manually in
>> some suitable editor.
>>
>> Elaborating omnipotent function for reading any kind of
>> corrupted/nonstandard files seems to me suited only if you expect to read
>> such files many times.
>>
>> Regards
>> Petr
>>
>>
>>>
>>>
>>>
>>> On Sat, Mar 17, 2012 at 4:54 AM, jim holtman  wrote:
>>> > Here is a solution that looks for the line with 7 elements and inserts
>>> > the quotes:
>>> >
>>> >
>>> >> fileName <- '/temp/text.txt'
>>> >> input <- readLines(fileName)
>>> >> # count the fields to find 7
>>> >> nFields <- count.fields(fileName, sep = ',')
>>> >> # now fix the data
>>> >> for (i in which(nFields == 7)){
>>> > +     # split on comma
>>> > +     z <- strsplit(input[i], ',')[[1]]
>>> > +     input[i] <- paste(z[1], z[2]
>>> > +         , paste('"', z[3], ',', z[4], '"', sep = '') # put on quotes
>>> > +         , z[5], z[6], z[7], sep = ','
>>> > +         )
>>> > + }
>>> >>
>>> >> # now read in the data
>>> >> result <- read.table(textConnection(input), sep = ',')
>>> >>
>>> >>         result
>>> >                         V1       V2                   V3   V4 V5 V6
>>> > 1                                                         1968 21  0
>>> > 2                                                  Boston 1968 13  0
>>> > 3                                                  Boston 1968 18  0
>>> > 4                                                 Chicago 1967 44  0
>>> > 5                                              Providence 1968 17  0
>>> > 6                                              Providence 1969 48  0
>>> > 7                                                   Binky 1968 24  0
>>> > 8                                                 Chicago 1968 23  0
>>> > 9                                                   Dally 1968  7  0
>>> > 10                                   Raleigh, North Carol 1968 25  0
>>> > 11 Addy ABC-Dogs Stars-W8.1                    Providence 1968 38  0
>>> > 12              DEF_REQPRF/                     Dartmouth 1967 31  1
>>> > 13                       PL                               1967 38  1
>>> > 14                       XY PopatLal                      1967  5  1
>>> > 15                       XY PopatLal                      1967  6  8
>>> > 16                       XY PopatLal                      1967  7  7
>>> > 17                       XY PopatLal                      1967  9  1
>>> > 18                       XY PopatLal                      1967 10  1
>>> > 19                       XY PopatLal                      1967 13  1
>>> > 20                       XY PopatLal               Boston 1967  6  1
>>> > 21                       XY PopatLal               Boston 1967  7 11
>>> > 22                       XY PopatLal               Boston 1967  9  2
>>> > 23                       XY PopatLal               Boston 1967 10  3
>>> > 24                       XY PopatLal               Boston 1967  7  2
>>> >>
>>> >
>>> >
>>> > On Fri, Mar 16, 2012 at 2:17 PM, Ashish Agarwal
>>> >  wrote:
>>> >> I have a file that is 5000 records and to edit that file is not easy.
>>> >> Is there any way to line 10 differently to account for changes in the
>>> >> third field?
>>> >>
>>> >> On Fri, Mar 16, 2012 at 11:35 PM, Peter Ehlers 
>> wrote:
>>> >>> On 2012-03-16 10:48, Ashish Agarwal wrote:
>>> 
>>>  Line 10 has City and State that too separated by comma. For line 10
>>>  how can I read differently as compared to the other lines?
>>> >>>
>>> >>>
>>> >>> Edit the file and put quotes around the city-state combination:
>>> >>>  "Raleigh, North Carol"
>>> >>>
>>> >>
>>> >> __
>>> >> R-help@r-project.org mailing list
>>> >> https://stat.ethz.ch/mailman/listinfo/r-help
>>> >> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>>> >> and provide commented, minimal, self-contained, reproducible code.
>>> >
>>> >
>>> >
>>> > --
>>> > Jim Holtman
>>> > Data Munger Guru
>>> >
>>> > What is the problem that you are trying to solve?
>>> > Tell me what you want to d

Re: [R] Problem reading mixed CSV file

2012-03-20 Thread Ashish Agarwal
The file is 20MB having 2 Million rows.
I understand that I two different formats  - 6 columns and 7 columns.
How do I read chunks to different files by using scan with modifying
skip and nlines parameters?

On Mon, Mar 19, 2012 at 3:59 PM, Petr PIKAL  wrote:
>
> I would follow Jims suggestion,
> nFields <- count.fields(fileName, sep = ',')
> count fields and read chunks to different files by using scan with
> modifying skip and nlines parameters. However if there is only few lines
> which differ it would be better to correct those few lines manually in
> some suitable editor.
>
> Elaborating omnipotent function for reading any kind of
> corrupted/nonstandard files seems to me suited only if you expect to read
> such files many times.
>
> Regards
> Petr
>
>
>>
>>
>>
>> On Sat, Mar 17, 2012 at 4:54 AM, jim holtman  wrote:
>> > Here is a solution that looks for the line with 7 elements and inserts
>> > the quotes:
>> >
>> >
>> >> fileName <- '/temp/text.txt'
>> >> input <- readLines(fileName)
>> >> # count the fields to find 7
>> >> nFields <- count.fields(fileName, sep = ',')
>> >> # now fix the data
>> >> for (i in which(nFields == 7)){
>> > +     # split on comma
>> > +     z <- strsplit(input[i], ',')[[1]]
>> > +     input[i] <- paste(z[1], z[2]
>> > +         , paste('"', z[3], ',', z[4], '"', sep = '') # put on quotes
>> > +         , z[5], z[6], z[7], sep = ','
>> > +         )
>> > + }
>> >>
>> >> # now read in the data
>> >> result <- read.table(textConnection(input), sep = ',')
>> >>
>> >>         result
>> >                         V1       V2                   V3   V4 V5 V6
>> > 1                                                         1968 21  0
>> > 2                                                  Boston 1968 13  0
>> > 3                                                  Boston 1968 18  0
>> > 4                                                 Chicago 1967 44  0
>> > 5                                              Providence 1968 17  0
>> > 6                                              Providence 1969 48  0
>> > 7                                                   Binky 1968 24  0
>> > 8                                                 Chicago 1968 23  0
>> > 9                                                   Dally 1968  7  0
>> > 10                                   Raleigh, North Carol 1968 25  0
>> > 11 Addy ABC-Dogs Stars-W8.1                    Providence 1968 38  0
>> > 12              DEF_REQPRF/                     Dartmouth 1967 31  1
>> > 13                       PL                               1967 38  1
>> > 14                       XY PopatLal                      1967  5  1
>> > 15                       XY PopatLal                      1967  6  8
>> > 16                       XY PopatLal                      1967  7  7
>> > 17                       XY PopatLal                      1967  9  1
>> > 18                       XY PopatLal                      1967 10  1
>> > 19                       XY PopatLal                      1967 13  1
>> > 20                       XY PopatLal               Boston 1967  6  1
>> > 21                       XY PopatLal               Boston 1967  7 11
>> > 22                       XY PopatLal               Boston 1967  9  2
>> > 23                       XY PopatLal               Boston 1967 10  3
>> > 24                       XY PopatLal               Boston 1967  7  2
>> >>
>> >
>> >
>> > On Fri, Mar 16, 2012 at 2:17 PM, Ashish Agarwal
>> >  wrote:
>> >> I have a file that is 5000 records and to edit that file is not easy.
>> >> Is there any way to line 10 differently to account for changes in the
>> >> third field?
>> >>
>> >> On Fri, Mar 16, 2012 at 11:35 PM, Peter Ehlers 
> wrote:
>> >>> On 2012-03-16 10:48, Ashish Agarwal wrote:
>> 
>>  Line 10 has City and State that too separated by comma. For line 10
>>  how can I read differently as compared to the other lines?
>> >>>
>> >>>
>> >>> Edit the file and put quotes around the city-state combination:
>> >>>  "Raleigh, North Carol"
>> >>>
>> >>
>> >> __
>> >> R-help@r-project.org mailing list
>> >> https://stat.ethz.ch/mailman/listinfo/r-help
>> >> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
>> >> and provide commented, minimal, self-contained, reproducible code.
>> >
>> >
>> >
>> > --
>> > Jim Holtman
>> > Data Munger Guru
>> >
>> > What is the problem that you are trying to solve?
>> > Tell me what you want to do, not how you want to do it.
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.or

Re: [R] Problem reading mixed CSV file

2012-03-19 Thread Petr PIKAL
Hi
 
> This is quite a CPu consuming process. My system got hung up for the
> big file I have.
> 
> Within the for loop that you have suggested, can't I have a case
> statement for different value of nfields to be read and specify what
> format does the variable needs to be read?
> something like
> case
> # input format for 6 fields
> when nFields == 6
> read.csv as string, string, string, numeric, numeric, numeric into 
dataframe1
> #input format for 7 fields
> when nFields == 7
> read.csv as string, string, string, string, numeric, numeric, numeric
> into dataframe2
> end case
> # Output the two dataframes via some way of tracking the original line
> numbers of the input file - similar to _N_ in SAS
> . Dataframe1 to be outputed as it is while in dataframe2,
> concatenating the 3rd and the 4th strings.
> 
> Could you please help with the format for the above?

I would follow Jims suggestion, 
nFields <- count.fields(fileName, sep = ',')
count fields and read chunks to different files by using scan with 
modifying skip and nlines parameters. However if there is only few lines 
which differ it would be better to correct those few lines manually in 
some suitable editor.

Elaborating omnipotent function for reading any kind of 
corrupted/nonstandard files seems to me suited only if you expect to read 
such files many times.

Regards
Petr


> 
> 
> 
> On Sat, Mar 17, 2012 at 4:54 AM, jim holtman  wrote:
> > Here is a solution that looks for the line with 7 elements and inserts
> > the quotes:
> >
> >
> >> fileName <- '/temp/text.txt'
> >> input <- readLines(fileName)
> >> # count the fields to find 7
> >> nFields <- count.fields(fileName, sep = ',')
> >> # now fix the data
> >> for (i in which(nFields == 7)){
> > + # split on comma
> > + z <- strsplit(input[i], ',')[[1]]
> > + input[i] <- paste(z[1], z[2]
> > + , paste('"', z[3], ',', z[4], '"', sep = '') # put on quotes
> > + , z[5], z[6], z[7], sep = ','
> > + )
> > + }
> >>
> >> # now read in the data
> >> result <- read.table(textConnection(input), sep = ',')
> >>
> >> result
> > V1   V2   V3   V4 V5 V6
> > 1 1968 21  0
> > 2  Boston 1968 13  0
> > 3  Boston 1968 18  0
> > 4 Chicago 1967 44  0
> > 5  Providence 1968 17  0
> > 6  Providence 1969 48  0
> > 7   Binky 1968 24  0
> > 8 Chicago 1968 23  0
> > 9   Dally 1968  7  0
> > 10   Raleigh, North Carol 1968 25  0
> > 11 Addy ABC-Dogs Stars-W8.1Providence 1968 38  0
> > 12  DEF_REQPRF/ Dartmouth 1967 31  1
> > 13   PL   1967 38  1
> > 14   XY PopatLal  1967  5  1
> > 15   XY PopatLal  1967  6  8
> > 16   XY PopatLal  1967  7  7
> > 17   XY PopatLal  1967  9  1
> > 18   XY PopatLal  1967 10  1
> > 19   XY PopatLal  1967 13  1
> > 20   XY PopatLal   Boston 1967  6  1
> > 21   XY PopatLal   Boston 1967  7 11
> > 22   XY PopatLal   Boston 1967  9  2
> > 23   XY PopatLal   Boston 1967 10  3
> > 24   XY PopatLal   Boston 1967  7  2
> >>
> >
> >
> > On Fri, Mar 16, 2012 at 2:17 PM, Ashish Agarwal
> >  wrote:
> >> I have a file that is 5000 records and to edit that file is not easy.
> >> Is there any way to line 10 differently to account for changes in the
> >> third field?
> >>
> >> On Fri, Mar 16, 2012 at 11:35 PM, Peter Ehlers  
wrote:
> >>> On 2012-03-16 10:48, Ashish Agarwal wrote:
> 
>  Line 10 has City and State that too separated by comma. For line 10
>  how can I read differently as compared to the other lines?
> >>>
> >>>
> >>> Edit the file and put quotes around the city-state combination:
> >>>  "Raleigh, North Carol"
> >>>
> >>
> >> __
> >> R-help@r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> >
> >
> >
> > --
> > Jim Holtman
> > Data Munger Guru
> >
> > What is the problem that you are trying to solve?
> > Tell me what you want 

Re: [R] Problem reading mixed CSV file

2012-03-19 Thread Jim Holtman
How big is the file? In the example I sent I waa using 'textConnection' to 
reread the input.  If the file is large, this can be slow.  You will have 
better luck writing the converted data outmto a temporarynfile and reading it 
right back in.

I am not such exactly what you are asking.  You can crate output file names 
based on the input file name.  What is it you want to do with the 'case' 
statement?

Sent from my iPad

On Mar 19, 2012, at 2:46, Ashish Agarwal  wrote:

> This is quite a CPu consuming process. My system got hung up for the
> big file I have.
> 
> Within the for loop that you have suggested, can't I have a case
> statement for different value of nfields to be read and specify what
> format does the variable needs to be read?
> something like
> case
> # input format for 6 fields
> when nFields == 6
> read.csv as string, string, string, numeric, numeric, numeric into dataframe1
> #input format for 7 fields
> when nFields == 7
> read.csv as string, string, string, string, numeric, numeric, numeric
> into dataframe2
> end case
> # Output the two dataframes via some way of tracking the original line
> numbers of the input file - similar to _N_ in SAS
> . Dataframe1 to be outputed as it is while in dataframe2,
> concatenating the 3rd and the 4th strings.
> 
> Could you please help with the format for the above?
> 
> 
> 
> On Sat, Mar 17, 2012 at 4:54 AM, jim holtman  wrote:
>> Here is a solution that looks for the line with 7 elements and inserts
>> the quotes:
>> 
>> 
>>> fileName <- '/temp/text.txt'
>>> input <- readLines(fileName)
>>> # count the fields to find 7
>>> nFields <- count.fields(fileName, sep = ',')
>>> # now fix the data
>>> for (i in which(nFields == 7)){
>> + # split on comma
>> + z <- strsplit(input[i], ',')[[1]]
>> + input[i] <- paste(z[1], z[2]
>> + , paste('"', z[3], ',', z[4], '"', sep = '') # put on quotes
>> + , z[5], z[6], z[7], sep = ','
>> + )
>> + }
>>> 
>>> # now read in the data
>>> result <- read.table(textConnection(input), sep = ',')
>>> 
>>> result
>> V1   V2   V3   V4 V5 V6
>> 1 1968 21  0
>> 2  Boston 1968 13  0
>> 3  Boston 1968 18  0
>> 4 Chicago 1967 44  0
>> 5  Providence 1968 17  0
>> 6  Providence 1969 48  0
>> 7   Binky 1968 24  0
>> 8 Chicago 1968 23  0
>> 9   Dally 1968  7  0
>> 10   Raleigh, North Carol 1968 25  0
>> 11 Addy ABC-Dogs Stars-W8.1Providence 1968 38  0
>> 12  DEF_REQPRF/ Dartmouth 1967 31  1
>> 13   PL   1967 38  1
>> 14   XY PopatLal  1967  5  1
>> 15   XY PopatLal  1967  6  8
>> 16   XY PopatLal  1967  7  7
>> 17   XY PopatLal  1967  9  1
>> 18   XY PopatLal  1967 10  1
>> 19   XY PopatLal  1967 13  1
>> 20   XY PopatLal   Boston 1967  6  1
>> 21   XY PopatLal   Boston 1967  7 11
>> 22   XY PopatLal   Boston 1967  9  2
>> 23   XY PopatLal   Boston 1967 10  3
>> 24   XY PopatLal   Boston 1967  7  2
>>> 
>> 
>> 
>> On Fri, Mar 16, 2012 at 2:17 PM, Ashish Agarwal
>>  wrote:
>>> I have a file that is 5000 records and to edit that file is not easy.
>>> Is there any way to line 10 differently to account for changes in the
>>> third field?
>>> 
>>> On Fri, Mar 16, 2012 at 11:35 PM, Peter Ehlers  wrote:
 On 2012-03-16 10:48, Ashish Agarwal wrote:
> 
> Line 10 has City and State that too separated by comma. For line 10
> how can I read differently as compared to the other lines?
 
 
 Edit the file and put quotes around the city-state combination:
  "Raleigh, North Carol"
 
>>> 
>>> __
>>> R-help@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>> 
>> 
>> 
>> --
>> Jim Holtman
>> Data Munger Guru
>> 
>> What is the problem that you are trying to solve?
>> Tell me what you want to do, not how you want to do it.

__

Re: [R] Problem reading mixed CSV file

2012-03-18 Thread Ashish Agarwal
This is quite a CPu consuming process. My system got hung up for the
big file I have.

Within the for loop that you have suggested, can't I have a case
statement for different value of nfields to be read and specify what
format does the variable needs to be read?
something like
case
# input format for 6 fields
when nFields == 6
read.csv as string, string, string, numeric, numeric, numeric into dataframe1
#input format for 7 fields
when nFields == 7
read.csv as string, string, string, string, numeric, numeric, numeric
into dataframe2
end case
# Output the two dataframes via some way of tracking the original line
numbers of the input file - similar to _N_ in SAS
. Dataframe1 to be outputed as it is while in dataframe2,
concatenating the 3rd and the 4th strings.

Could you please help with the format for the above?



On Sat, Mar 17, 2012 at 4:54 AM, jim holtman  wrote:
> Here is a solution that looks for the line with 7 elements and inserts
> the quotes:
>
>
>> fileName <- '/temp/text.txt'
>> input <- readLines(fileName)
>> # count the fields to find 7
>> nFields <- count.fields(fileName, sep = ',')
>> # now fix the data
>> for (i in which(nFields == 7)){
> +     # split on comma
> +     z <- strsplit(input[i], ',')[[1]]
> +     input[i] <- paste(z[1], z[2]
> +         , paste('"', z[3], ',', z[4], '"', sep = '') # put on quotes
> +         , z[5], z[6], z[7], sep = ','
> +         )
> + }
>>
>> # now read in the data
>> result <- read.table(textConnection(input), sep = ',')
>>
>>         result
>                         V1       V2                   V3   V4 V5 V6
> 1                                                         1968 21  0
> 2                                                  Boston 1968 13  0
> 3                                                  Boston 1968 18  0
> 4                                                 Chicago 1967 44  0
> 5                                              Providence 1968 17  0
> 6                                              Providence 1969 48  0
> 7                                                   Binky 1968 24  0
> 8                                                 Chicago 1968 23  0
> 9                                                   Dally 1968  7  0
> 10                                   Raleigh, North Carol 1968 25  0
> 11 Addy ABC-Dogs Stars-W8.1                    Providence 1968 38  0
> 12              DEF_REQPRF/                     Dartmouth 1967 31  1
> 13                       PL                               1967 38  1
> 14                       XY PopatLal                      1967  5  1
> 15                       XY PopatLal                      1967  6  8
> 16                       XY PopatLal                      1967  7  7
> 17                       XY PopatLal                      1967  9  1
> 18                       XY PopatLal                      1967 10  1
> 19                       XY PopatLal                      1967 13  1
> 20                       XY PopatLal               Boston 1967  6  1
> 21                       XY PopatLal               Boston 1967  7 11
> 22                       XY PopatLal               Boston 1967  9  2
> 23                       XY PopatLal               Boston 1967 10  3
> 24                       XY PopatLal               Boston 1967  7  2
>>
>
>
> On Fri, Mar 16, 2012 at 2:17 PM, Ashish Agarwal
>  wrote:
>> I have a file that is 5000 records and to edit that file is not easy.
>> Is there any way to line 10 differently to account for changes in the
>> third field?
>>
>> On Fri, Mar 16, 2012 at 11:35 PM, Peter Ehlers  wrote:
>>> On 2012-03-16 10:48, Ashish Agarwal wrote:

 Line 10 has City and State that too separated by comma. For line 10
 how can I read differently as compared to the other lines?
>>>
>>>
>>> Edit the file and put quotes around the city-state combination:
>>>  "Raleigh, North Carol"
>>>
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>
>
> --
> Jim Holtman
> Data Munger Guru
>
> What is the problem that you are trying to solve?
> Tell me what you want to do, not how you want to do it.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Problem reading mixed CSV file

2012-03-16 Thread jim holtman
Here is a solution that looks for the line with 7 elements and inserts
the quotes:


> fileName <- '/temp/text.txt'
> input <- readLines(fileName)
> # count the fields to find 7
> nFields <- count.fields(fileName, sep = ',')
> # now fix the data
> for (i in which(nFields == 7)){
+ # split on comma
+ z <- strsplit(input[i], ',')[[1]]
+ input[i] <- paste(z[1], z[2]
+ , paste('"', z[3], ',', z[4], '"', sep = '') # put on quotes
+ , z[5], z[6], z[7], sep = ','
+ )
+ }
>
> # now read in the data
> result <- read.table(textConnection(input), sep = ',')
>
> result
 V1   V2   V3   V4 V5 V6
1 1968 21  0
2  Boston 1968 13  0
3  Boston 1968 18  0
4 Chicago 1967 44  0
5  Providence 1968 17  0
6  Providence 1969 48  0
7   Binky 1968 24  0
8 Chicago 1968 23  0
9   Dally 1968  7  0
10   Raleigh, North Carol 1968 25  0
11 Addy ABC-Dogs Stars-W8.1Providence 1968 38  0
12  DEF_REQPRF/ Dartmouth 1967 31  1
13   PL   1967 38  1
14   XY PopatLal  1967  5  1
15   XY PopatLal  1967  6  8
16   XY PopatLal  1967  7  7
17   XY PopatLal  1967  9  1
18   XY PopatLal  1967 10  1
19   XY PopatLal  1967 13  1
20   XY PopatLal   Boston 1967  6  1
21   XY PopatLal   Boston 1967  7 11
22   XY PopatLal   Boston 1967  9  2
23   XY PopatLal   Boston 1967 10  3
24   XY PopatLal   Boston 1967  7  2
>


On Fri, Mar 16, 2012 at 2:17 PM, Ashish Agarwal
 wrote:
> I have a file that is 5000 records and to edit that file is not easy.
> Is there any way to line 10 differently to account for changes in the
> third field?
>
> On Fri, Mar 16, 2012 at 11:35 PM, Peter Ehlers  wrote:
>> On 2012-03-16 10:48, Ashish Agarwal wrote:
>>>
>>> Line 10 has City and State that too separated by comma. For line 10
>>> how can I read differently as compared to the other lines?
>>
>>
>> Edit the file and put quotes around the city-state combination:
>>  "Raleigh, North Carol"
>>
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Problem reading mixed CSV file

2012-03-16 Thread Ashish Agarwal
I have a file that is 5000 records and to edit that file is not easy.
Is there any way to line 10 differently to account for changes in the
third field?

On Fri, Mar 16, 2012 at 11:35 PM, Peter Ehlers  wrote:
> On 2012-03-16 10:48, Ashish Agarwal wrote:
>>
>> Line 10 has City and State that too separated by comma. For line 10
>> how can I read differently as compared to the other lines?
>
>
> Edit the file and put quotes around the city-state combination:
>  "Raleigh, North Carol"
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Problem reading mixed CSV file

2012-03-16 Thread Peter Ehlers

On 2012-03-16 10:48, Ashish Agarwal wrote:

Line 10 has City and State that too separated by comma. For line 10
how can I read differently as compared to the other lines?


Edit the file and put quotes around the city-state combination:
 "Raleigh, North Carol"

Also: always run count.fields() on your files before importing.

Peter Ehlers



On Fri, Mar 16, 2012 at 10:59 PM, David Winsemius
  wrote:


On Mar 16, 2012, at 1:11 PM, Ashish Agarwal wrote:


I want to import this CSV file into R.

The CSV file is

,,,1968,21,0
,,Boston,1968,13,0
,,Boston,1968,18,0
,,Chicago,1967,44,0
,,Providence,1968,17,0
,,Providence,1969,48,0
,,Binky,1968,24,0
,,Chicago,1968,23,0
,,Dally,1968,7,0
,,Raleigh, North Carol,1968,25,0
Addy ABC-Dogs Stars-W8.1,,Providence,1968,38,0
DEF_REQPRF/,,Dartmouth,1967,31,1
PL,,,1967,38,1
XY,PopatLal,,1967,5,1
XY,PopatLal,,1967,6,8
XY,PopatLal,,1967,7,7
XY,PopatLal,,1967,9,1
XY,PopatLal,,1967,10,1
XY,PopatLal,,1967,13,1
XY,PopatLal,Boston,1967,6,1
XY,PopatLal,Boston,1967,7,11
XY,PopatLal,Boston,1967,9,2
XY,PopatLal,Boston,1967,10,3
XY,PopatLal,Boston,1967,7,2

I tried using scan and read.table but results are not visible :(


scan("D:/data/temp.csv",list("","","",0,0,0),sep=",") ->x


Read 51 records


x


[[1]]
  [1] "ÿþ" ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""
""
[16] ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""
""
[31] ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""
""
[46] ""   ""   ""   ""   ""   ""



read.table("D:/data/temp.csv",header=F,sep=",") ->x
x


   V1 V2
1   ÿþ NA
2  NA
3  NA
4  NA

Can somebody please help in importing this CSV file?



Looks like an encoding mismatch. You have not offered the requested
information about you setup so further comment would all be guesswork. But
you can perhaps educate yourself by reading:

?Encoding

And line ten has 7 elements.


count.fields(textConnection(",,,1968,21,0

+ ,,Boston,1968,13,0
+ ,,Boston,1968,18,0
+ ,,Chicago,1967,44,0
+ ,,Providence,1968,17,0
+ ,,Providence,1969,48,0
+ ,,Binky,1968,24,0
+ ,,Chicago,1968,23,0
+ ,,Dally,1968,7,0
+ ,,Raleigh, North Carol,1968,25,0
+ Addy ABC-Dogs Stars-W8.1,,Providence,1968,38,0
+ DEF_REQPRF/,,Dartmouth,1967,31,1
+ PL,,,1967,38,1
+ XY,PopatLal,,1967,5,1
+ XY,PopatLal,,1967,6,8
+ XY,PopatLal,,1967,7,7
+ XY,PopatLal,,1967,9,1
+ XY,PopatLal,,1967,10,1
+ XY,PopatLal,,1967,13,1
+ XY,PopatLal,Boston,1967,6,1
+ XY,PopatLal,Boston,1967,7,11
+ XY,PopatLal,Boston,1967,9,2
+ XY,PopatLal,Boston,1967,10,3
+ XY,PopatLal,Boston,1967,7,2"),sep=",")
  [1] 6 6 6 6 6 6 6 6 6 7 6 6 6 6 6 6 6 6 6 6 6 6 6 6




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



David Winsemius, MD
West Hartford, CT



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Problem reading mixed CSV file

2012-03-16 Thread Ashish Agarwal
Line 10 has City and State that too separated by comma. For line 10
how can I read differently as compared to the other lines?

On Fri, Mar 16, 2012 at 10:59 PM, David Winsemius
 wrote:
>
> On Mar 16, 2012, at 1:11 PM, Ashish Agarwal wrote:
>
>> I want to import this CSV file into R.
>>
>> The CSV file is
>>
>> ,,,1968,21,0
>> ,,Boston,1968,13,0
>> ,,Boston,1968,18,0
>> ,,Chicago,1967,44,0
>> ,,Providence,1968,17,0
>> ,,Providence,1969,48,0
>> ,,Binky,1968,24,0
>> ,,Chicago,1968,23,0
>> ,,Dally,1968,7,0
>> ,,Raleigh, North Carol,1968,25,0
>> Addy ABC-Dogs Stars-W8.1,,Providence,1968,38,0
>> DEF_REQPRF/,,Dartmouth,1967,31,1
>> PL,,,1967,38,1
>> XY,PopatLal,,1967,5,1
>> XY,PopatLal,,1967,6,8
>> XY,PopatLal,,1967,7,7
>> XY,PopatLal,,1967,9,1
>> XY,PopatLal,,1967,10,1
>> XY,PopatLal,,1967,13,1
>> XY,PopatLal,Boston,1967,6,1
>> XY,PopatLal,Boston,1967,7,11
>> XY,PopatLal,Boston,1967,9,2
>> XY,PopatLal,Boston,1967,10,3
>> XY,PopatLal,Boston,1967,7,2
>>
>> I tried using scan and read.table but results are not visible :(
>>
>>> scan("D:/data/temp.csv",list("","","",0,0,0),sep=",") ->x
>>
>> Read 51 records
>>>
>>> x
>>
>> [[1]]
>>  [1] "ÿþ" ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""
>> ""
>> [16] ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""
>> ""
>> [31] ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""
>> ""
>> [46] ""   ""   ""   ""   ""   ""
>> 
>>
>>> read.table("D:/data/temp.csv",header=F,sep=",") ->x
>>> x
>>
>>   V1 V2
>> 1   ÿþ NA
>> 2      NA
>> 3      NA
>> 4      NA
>>
>> Can somebody please help in importing this CSV file?
>
>
> Looks like an encoding mismatch. You have not offered the requested
> information about you setup so further comment would all be guesswork. But
> you can perhaps educate yourself by reading:
>
> ?Encoding
>
> And line ten has 7 elements.
>
>> count.fields(textConnection(",,,1968,21,0
> + ,,Boston,1968,13,0
> + ,,Boston,1968,18,0
> + ,,Chicago,1967,44,0
> + ,,Providence,1968,17,0
> + ,,Providence,1969,48,0
> + ,,Binky,1968,24,0
> + ,,Chicago,1968,23,0
> + ,,Dally,1968,7,0
> + ,,Raleigh, North Carol,1968,25,0
> + Addy ABC-Dogs Stars-W8.1,,Providence,1968,38,0
> + DEF_REQPRF/,,Dartmouth,1967,31,1
> + PL,,,1967,38,1
> + XY,PopatLal,,1967,5,1
> + XY,PopatLal,,1967,6,8
> + XY,PopatLal,,1967,7,7
> + XY,PopatLal,,1967,9,1
> + XY,PopatLal,,1967,10,1
> + XY,PopatLal,,1967,13,1
> + XY,PopatLal,Boston,1967,6,1
> + XY,PopatLal,Boston,1967,7,11
> + XY,PopatLal,Boston,1967,9,2
> + XY,PopatLal,Boston,1967,10,3
> + XY,PopatLal,Boston,1967,7,2"),sep=",")
>  [1] 6 6 6 6 6 6 6 6 6 7 6 6 6 6 6 6 6 6 6 6 6 6 6 6
>
>
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>
> David Winsemius, MD
> West Hartford, CT
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Problem reading mixed CSV file

2012-03-16 Thread David Winsemius


On Mar 16, 2012, at 1:11 PM, Ashish Agarwal wrote:


I want to import this CSV file into R.

The CSV file is

,,,1968,21,0
,,Boston,1968,13,0
,,Boston,1968,18,0
,,Chicago,1967,44,0
,,Providence,1968,17,0
,,Providence,1969,48,0
,,Binky,1968,24,0
,,Chicago,1968,23,0
,,Dally,1968,7,0
,,Raleigh, North Carol,1968,25,0
Addy ABC-Dogs Stars-W8.1,,Providence,1968,38,0
DEF_REQPRF/,,Dartmouth,1967,31,1
PL,,,1967,38,1
XY,PopatLal,,1967,5,1
XY,PopatLal,,1967,6,8
XY,PopatLal,,1967,7,7
XY,PopatLal,,1967,9,1
XY,PopatLal,,1967,10,1
XY,PopatLal,,1967,13,1
XY,PopatLal,Boston,1967,6,1
XY,PopatLal,Boston,1967,7,11
XY,PopatLal,Boston,1967,9,2
XY,PopatLal,Boston,1967,10,3
XY,PopatLal,Boston,1967,7,2

I tried using scan and read.table but results are not visible :(


scan("D:/data/temp.csv",list("","","",0,0,0),sep=",") ->x

Read 51 records

x

[[1]]
 [1] "ÿþ" ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""
""   ""   ""
[16] ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""
""   ""   ""
[31] ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""
""   ""   ""

[46] ""   ""   ""   ""   ""   ""



read.table("D:/data/temp.csv",header=F,sep=",") ->x
x

   V1 V2
1   ÿþ NA
2  NA
3  NA
4  NA

Can somebody please help in importing this CSV file?


Looks like an encoding mismatch. You have not offered the requested  
information about you setup so further comment would all be guesswork.  
But you can perhaps educate yourself by reading:


?Encoding

And line ten has 7 elements.

> count.fields(textConnection(",,,1968,21,0
+ ,,Boston,1968,13,0
+ ,,Boston,1968,18,0
+ ,,Chicago,1967,44,0
+ ,,Providence,1968,17,0
+ ,,Providence,1969,48,0
+ ,,Binky,1968,24,0
+ ,,Chicago,1968,23,0
+ ,,Dally,1968,7,0
+ ,,Raleigh, North Carol,1968,25,0
+ Addy ABC-Dogs Stars-W8.1,,Providence,1968,38,0
+ DEF_REQPRF/,,Dartmouth,1967,31,1
+ PL,,,1967,38,1
+ XY,PopatLal,,1967,5,1
+ XY,PopatLal,,1967,6,8
+ XY,PopatLal,,1967,7,7
+ XY,PopatLal,,1967,9,1
+ XY,PopatLal,,1967,10,1
+ XY,PopatLal,,1967,13,1
+ XY,PopatLal,Boston,1967,6,1
+ XY,PopatLal,Boston,1967,7,11
+ XY,PopatLal,Boston,1967,9,2
+ XY,PopatLal,Boston,1967,10,3
+ XY,PopatLal,Boston,1967,7,2"),sep=",")
 [1] 6 6 6 6 6 6 6 6 6 7 6 6 6 6 6 6 6 6 6 6 6 6 6 6



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Problem reading mixed CSV file

2012-03-16 Thread Ashish Agarwal
I want to import this CSV file into R.

The CSV file is

,,,1968,21,0
,,Boston,1968,13,0
,,Boston,1968,18,0
,,Chicago,1967,44,0
,,Providence,1968,17,0
,,Providence,1969,48,0
,,Binky,1968,24,0
,,Chicago,1968,23,0
,,Dally,1968,7,0
,,Raleigh, North Carol,1968,25,0
Addy ABC-Dogs Stars-W8.1,,Providence,1968,38,0
DEF_REQPRF/,,Dartmouth,1967,31,1
PL,,,1967,38,1
XY,PopatLal,,1967,5,1
XY,PopatLal,,1967,6,8
XY,PopatLal,,1967,7,7
XY,PopatLal,,1967,9,1
XY,PopatLal,,1967,10,1
XY,PopatLal,,1967,13,1
XY,PopatLal,Boston,1967,6,1
XY,PopatLal,Boston,1967,7,11
XY,PopatLal,Boston,1967,9,2
XY,PopatLal,Boston,1967,10,3
XY,PopatLal,Boston,1967,7,2

I tried using scan and read.table but results are not visible :(

> scan("D:/data/temp.csv",list("","","",0,0,0),sep=",") ->x
Read 51 records
> x
[[1]]
 [1] "ÿþ" ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""
[16] ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""
[31] ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""
[46] ""   ""   ""   ""   ""   ""


> read.table("D:/data/temp.csv",header=F,sep=",") ->x
> x
V1 V2
1   ÿþ NA
2  NA
3  NA
4  NA

Can somebody please help in importing this CSV file?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Problem reading mixed CSV file

2012-03-16 Thread jim holtman
What do you mean by Mixed?  If a field has a comma, then it is
supposed be to enclosed in quotes.  You could preprocess the file
looking for cases where there are more fields than there there are
supposed to be, and if they are always in the same place, you could
enclose them in quotes and then reprocess.  You would really have to
show what the file looks like for the different "mixed" cases to get a
good answer to your question.  And of course, R can do it, if we knew
what it was we are supposed to do.

So at least  provide commented, minimal, self-contained, reproducible
code and data.

On Fri, Mar 16, 2012 at 7:03 AM, Ashish Agarwal
 wrote:
> I am having trouble reading this CSV file in R. There are six attributes
> that I need to read  - CVar1, CVar2, Location, Year, Nvar3, Nvar4. Can
> somebody help in reading this file?
> On line 10 it has city and state separated by comma. I had been a user of
> SAS where I can use different format to read in for this line. Can I do
> this in R too?
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.