jim holtman wrote: > How do you define a carriage return in the middle of a line if a carriage > return is also used to delimit a line? One of the things you can do is to > use 'count.fields' to determine the number of fields in each line. For > those lines that are not the right length, you could combine them together > with a 'paste' command when you write them out. > > On 3/7/07, Walter R. Paczkowski <[EMAIL PROTECTED]> wrote: > >> >> Hi, >> I'm hoping someone has a suggestion for handling a simple problem. A >> client gave me a comma separated value file (call it x.csv) that has >> an id and name and address for about 25,000 people (25,000 records). >> I used read.table to read it, but then discovered that there are stray >> carriage returns on several records. This plays havoc with read.table >> since it starts a new input line when it sees the carriage return. In >> short, the read is all wrong. >> I thought I could write a simple function to parse a line and write it >> back out, character by character. If a carriage return is found, it >> would simply be ignored on the writing back out part. But how do I >> identify a carriage return? What is the code or symbol? Is there any >> easier way to rid the file of carriage returns in the middle of the >> input lines? >> Any help is appreciated. >> Walt Paczkowski >> Probably using Windows with a CR/LF newline. You can have carriage returns (Ctrl-M - ASCII 13) or line feeds (Ctrl-L - ASCII 10) embedded in lines. You can probably just write a function in C or something that reads characters, checks that it and the last character is not a CR/LF pair and throws out the second character if it is CR or LF or any other troublesome byte. (I once had to trace null characters that were embedded in files - they didn't show up on the display, but clobbered the file reads).
Jim ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.