Gabor wrote:
>Assuming that the problem is that your input file has 
>additional embedded characters added by the data base
>program you could try extracting just the text using
>the UNIX strings program:
>
>   strings myfile.csv > myfile.txt

Spencer wrote:
>"strsplit" can break character strings into single 
>characters, and "%in%" can be used to classify them.

The first suggestion helped me identify and remove
some of the embedded characters, namely "^K".  Many more remained
hidden.

The second suggestion gave me the idea of
splitting the string on whitespace first, and seeing if the
embedded character problem would go way along with the "blank"
spaces.  It did.  In the snippet below, x is the character variable
I am trying to process:

      str.vec <- strsplit(x, "\\s+", perl=T)[[1]]
      if(length(str.vec) > 0) {
        x <- paste(str.vec, collapse=" ")
        x <- gsub("^\\s+", "", x, perl=T)
        x <- gsub("\\s+$", "", x, perl=T)
      }

There were no problems in processing x thereafter.

Thank you, gentlemen.

Scott Waichler

______________________________________________
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Reply via email to