Hi:

Thanks to Jakson Aquino, who showed me how to do a proper text substitution,
we have a way out. It also turns out that in the last line, the last numeric
field was missing, so I inserted an NA| in the last line of the data file
before calling readLines(). His (correct) code is at the bottom of the mail.

The first two lines of code below are courtesy of Jakson. Afterward, I tried
to shape the result into a data frame for export as a flat file. There's an
interesting lesson to be (re)learned in the process, so bear with me.

Input file file1.txt (revised):
1|7|30| "dog"
2|6|25| ""cat"
3|4|20|""
4|5| 56| "mouse"
5|3|56| ""horse"
6|56|NA| ""

x <- readLines("file1.txt")
y <- sub('""(.)', '"\\1', x)
d <- do.call(rbind, strsplit(y, split = '\\|'))
d <- as.data.frame(d)
d
  V1 V2  V3       V4
1  1  7  30    "dog"
2  2  6  25    "cat"
3  3  4  20       ""
4  4  5  56  "mouse"
5  5  3  56  "horse"
6  6 56  NA       ""
> str(d)
'data.frame':   6 obs. of  4 variables:
 $ V1: Factor w/ 6 levels "1","2","3","4",..: 1 2 3 4 5 6
 $ V2: Factor w/ 6 levels "3","4","5","56",..: 6 5 2 3 1 4
 $ V3: Factor w/ 6 levels " 56","20","25",..: 4 3 2 1 5 6
 $ V4: Factor w/ 6 levels " \"\""," \"cat\"",..: 3 2 6 5 4 1

Everything is a factor, as it should be since we converted a character
matrix into a data frame. Now convert the factors to numeric and character
and write out to a file.

d$V1 <- as.numeric(d$V1)
d$V2 <- as.numeric(d$V2)
d$V3 <- as.numeric(d$V3)
d$V4 <- as.character(d$V4)
d
  V1 V2 V3       V4
1  1  6  4    "dog"
2  2  5  3    "cat"
3  3  2  2       ""
4  4  3  1  "mouse"
5  5  1  5  "horse"
6  6  4  6       ""

Oopsie. We got the numeric factor codes back in V2 and V3. The FAQ 7.10
trap...

# Back to the drawing board.
d <- do.call(rbind, strsplit(y, split = '\\|'))
d <- as.data.frame(d)
d1 <- d
d1$V1 <- as.numeric(as.character(d1$V1))
d1$V2 <- as.numeric(as.character(d1$V2))
d1$V3 <- as.numeric(as.character(d1$V3))
d1$V4 <- as.character(as.character(d1$V4))

> d1
  V1 V2  V3       V4
1  1  7  30    "dog"
2  2  6  25    "cat"
3  3  4  20       ""
4  4  5  56  "mouse"
5  5  3  56  "horse"
6  6 56  NA       ""

Much better. Let's double check that we're OK.
str(d1)
'data.frame':   6 obs. of  4 variables:
 $ V1: num  1 2 3 4 5 6
 $ V2: num  7 6 4 5 3 56
 $ V3: num  30 25 20 56 56 NA
 $ V4: chr  " \"dog\"" " \"cat\"" "\"\"" " \"mouse\"" ...

# NOW write it out...
write.table(d1, file = 'file3.dat', quote = FALSE)   # looks good

And that's why FAQ 7.10 is written the way it is.

If one is happy with y (just the paired double quotes removed), then
Jakson's final line is sufficient:
writeLines(y, "file2.txt")


Dennis

On Sun, Sep 12, 2010 at 5:05 PM, Jakson A. Aquino <jaksonaqu...@gmail.com>wrote:

> On Sun, Sep 12, 2010 at 7:27 PM, Dennis Murphy <djmu...@gmail.com> wrote:
> > Hi:
> >
> > On Sun, Sep 12, 2010 at 1:05 PM, Wil M Contreras Arbaje <
> > wil.contre...@gmail.com> wrote:
> >
> >> While you are looking for a solution within R, it might be simpler to
> open
> >> your text file in almost any free text editor (Notepad++, Textwrangler,
> >> Smultron, vim come to mind), and do Replace all "' for ".
> >
> >
> > There's one problem with that solution: if the character string at the
> end
> > of the line is blank (i.e., ""), then your suggestion will leave one
> double
> > quote at the end of a line. Not good. What is needed is a gsub that takes
> > two double quotes plus a wild card character and replaces it with one
> double
> > quote and a wild card character. If you have an editor that can do that,
> let
> > me know...seriously. I suspect emacs can do this, but none of the basic
> > editors I know have that capability.
> >
> > Dennis
> >
> >
> >>
> >>
> >> On Sep 12, 2010, at 3:58 PM, jim holtman wrote:
> >>
> >>  You can use the 'gsub' command to remove the quote marks.  You could
> >>> readLines/writeLines the file to clean it up with gsub before using
> >>> read.table on it so it can all be done within R.
> >>>
> >>> On Sun, Sep 12, 2010 at 1:58 PM, Eva Nordstrom <
> eva.nordst...@yahoo.com>
> >>> wrote:
> >>>
> >>>> I am using read.table to import a text file within R.
> >>>>
> >>>> There are several "errors" in my text file.  An "extra" quotation mark
> >>>> has
> >>>> inadvertently been included within a few text fields.
> >>>>
> >>>>
> >>>> e.g. for a pipe (|) delimited text file, I have something similar to
> >>>> this:
> >>>>
> >>>> 1|7|30| "dog"
> >>>> 2|6|25| ""cat"
> >>>> 3|4|20|""
> >>>> 4|5| 56| "mouse"
> >>>> 5|3|56| ""horse"
> >>>> 6|56| ""
>
> x <- readLines("file1.txt")
> y <- sub('""(.)', '"\\1', x)
> writeLines(y, "file2.txt")
>

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to