Re: [R] Problem reading mixed CSV file

Ashish Agarwal Tue, 20 Mar 2012 12:22:34 -0700

Given x<- count.fields(..) could you pls help in following:
1. how to create a data vector with data being line numbers of original
file where x==6?
2. what is the way to read only the nth line (only) of an input file into a
data vector with first three attributes to be read as string, 4th
being categorical, 5th and 6th being numeric with width 10?



On Tue, Mar 20, 2012 at 9:37 PM, jim holtman <jholt...@gmail.com> wrote:
> use 'count.fields' to determine which line have 6 and 7 fields in them.
>
> then use 'readLines' to read in the entire file and the use the data
> from count.fields to write out to separate files"
>
> x <- count.fields(...)
> input <- readLines(..)
> writeLines(input[x == 6], file = '6fields.csv')
> writeLines(input[x==7], file = '7fields.csv')
>
> On Tue, Mar 20, 2012 at 11:43 AM, Ashish Agarwal
> <ashish.agarw...@gmail.com> wrote:
>> The file is 20MB having 2 Million rows.
>> I understand that I two different formats  - 6 columns and 7 columns.
>> How do I read chunks to different files by using scan with modifying
>> skip and nlines parameters?
>>
>> On Mon, Mar 19, 2012 at 3:59 PM, Petr PIKAL <petr.pi...@precheza.cz>
wrote:
>>>
>>> I would follow Jims suggestion,
>>> nFields <- count.fields(fileName, sep = ',')
>>> count fields and read chunks to different files by using scan with
>>> modifying skip and nlines parameters. However if there is only few lines
>>> which differ it would be better to correct those few lines manually in
>>> some suitable editor.
>>>
>>> Elaborating omnipotent function for reading any kind of
>>> corrupted/nonstandard files seems to me suited only if you expect to
read
>>> such files many times.
>>>
>>> Regards
>>> Petr
>>>
>>>
>>>>
>>>>
>>>>
>>>> On Sat, Mar 17, 2012 at 4:54 AM, jim holtman <jholt...@gmail.com>
wrote:
>>>> > Here is a solution that looks for the line with 7 elements and
inserts
>>>> > the quotes:
>>>> >
>>>> >
>>>> >> fileName <- '/temp/text.txt'
>>>> >> input <- readLines(fileName)
>>>> >> # count the fields to find 7
>>>> >> nFields <- count.fields(fileName, sep = ',')
>>>> >> # now fix the data
>>>> >> for (i in which(nFields == 7)){
>>>> > +     # split on comma
>>>> > +     z <- strsplit(input[i], ',')[[1]]
>>>> > +     input[i] <- paste(z[1], z[2]
>>>> > +         , paste('"', z[3], ',', z[4], '"', sep = '') # put on
quotes
>>>> > +         , z[5], z[6], z[7], sep = ','
>>>> > +         )
>>>> > + }
>>>> >>
>>>> >> # now read in the data
>>>> >> result <- read.table(textConnection(input), sep = ',')
>>>> >>
>>>> >>         result
>>>> >                         V1       V2                   V3   V4 V5 V6
>>>> > 1                                                         1968 21  0
>>>> > 2                                                  Boston 1968 13  0
>>>> > 3                                                  Boston 1968 18  0
>>>> > 4                                                 Chicago 1967 44  0
>>>> > 5                                              Providence 1968 17  0
>>>> > 6                                              Providence 1969 48  0
>>>> > 7                                                   Binky 1968 24  0
>>>> > 8                                                 Chicago 1968 23  0
>>>> > 9                                                   Dally 1968  7  0
>>>> > 10                                   Raleigh, North Carol 1968 25  0
>>>> > 11 Addy ABC-Dogs Stars-W8.1                    Providence 1968 38  0
>>>> > 12              DEF_REQPRF/                     Dartmouth 1967 31  1
>>>> > 13                       PL                               1967 38  1
>>>> > 14                       XY PopatLal                      1967  5  1
>>>> > 15                       XY PopatLal                      1967  6  8
>>>> > 16                       XY PopatLal                      1967  7  7
>>>> > 17                       XY PopatLal                      1967  9  1
>>>> > 18                       XY PopatLal                      1967 10  1
>>>> > 19                       XY PopatLal                      1967 13  1
>>>> > 20                       XY PopatLal               Boston 1967  6  1
>>>> > 21                       XY PopatLal               Boston 1967  7 11
>>>> > 22                       XY PopatLal               Boston 1967  9  2
>>>> > 23                       XY PopatLal               Boston 1967 10  3
>>>> > 24                       XY PopatLal               Boston 1967  7  2
>>>> >>
>>>> >
>>>> >
>>>> > On Fri, Mar 16, 2012 at 2:17 PM, Ashish Agarwal
>>>> > <ashish.agarw...@gmail.com> wrote:
>>>> >> I have a file that is 5000 records and to edit that file is not
easy.
>>>> >> Is there any way to line 10 differently to account for changes in
the
>>>> >> third field?
>>>> >>
>>>> >> On Fri, Mar 16, 2012 at 11:35 PM, Peter Ehlers <ehl...@ucalgary.ca>
>>> wrote:
>>>> >>> On 2012-03-16 10:48, Ashish Agarwal wrote:
>>>> >>>>
>>>> >>>> Line 10 has City and State that too separated by comma. For line
10
>>>> >>>> how can I read differently as compared to the other lines?
>>>> >>>
>>>> >>>
>>>> >>> Edit the file and put quotes around the city-state combination:
>>>> >>>  "Raleigh, North Carol"
>>>> >>>
>>>> >>
>>>> >> ______________________________________________
>>>> >> R-help@r-project.org mailing list
>>>> >> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> >> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>>> >> and provide commented, minimal, self-contained, reproducible code.
>>>> >
>>>> >
>>>> >
>>>> > --
>>>> > Jim Holtman
>>>> > Data Munger Guru
>>>> >
>>>> > What is the problem that you are trying to solve?
>>>> > Tell me what you want to do, not how you want to do it.
>>>>
>>>> ______________________________________________
>>>> R-help@r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>
>
>
> --
> Jim Holtman
> Data Munger Guru
>
> What is the problem that you are trying to solve?
> Tell me what you want to do, not how you want to do it.

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Problem reading mixed CSV file

Reply via email to