[R] read.table() with \t as seperator, all other programs report equal fields each row, read.table() returns unequal row length error

2011-03-16 Thread Yong Wang
hi, list

R is undoudtedly my favorite statistic tool, however, the data
inputnpart has long been a pain. most data I have to deal with are
irregular and contains special character.

Recently I get a tab delimited data, read.table(filename,sep=\t)
constantly return erors for certain rows does not has xyz elements
while all other programs such as perl,python, awk all report equal row
length if use \t as seperator.

I scout through the problematic row, sometimes it is because a row
contains a #, so I go back to specify comment.char=
next it will be some other problems, for some rows I simply can't
figure out what the problem is.

can I have any guru suggestion to save this pain now and in the
future, is CSV a safer format? or can anyone let me know what are the
fundamental principles I must bear in mind when do preliminary data
processing using other programs such as perl to ensure the output can
be readily feed into R.

best

yong

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] read.table() with \t as seperator, all other programs report equal fields each row, read.table() returns unequal row length error

2011-03-16 Thread peter dalgaard

On Mar 16, 2011, at 17:37 , Yong Wang wrote:

 hi, list
 
 R is undoudtedly my favorite statistic tool, however, the data
 inputnpart has long been a pain. most data I have to deal with are
 irregular and contains special character.
 
 Recently I get a tab delimited data, read.table(filename,sep=\t)
 constantly return erors for certain rows does not has xyz elements
 while all other programs such as perl,python, awk all report equal row
 length if use \t as seperator.
 
 I scout through the problematic row, sometimes it is because a row
 contains a #, so I go back to specify comment.char=
 next it will be some other problems, for some rows I simply can't
 figure out what the problem is.
 
 can I have any guru suggestion to save this pain now and in the
 future, is CSV a safer format? or can anyone let me know what are the
 fundamental principles I must bear in mind when do preliminary data
 processing using other programs such as perl to ensure the output can
 be readily feed into R.

A couple of other things can get messed up, e.g. quote symbols. Does 
read.delim()/read.delim2() perhaps work better?  

With CSV, you generally get the same sort of issues, just with , instead of 
\t. 


 
 best
 
 yong
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

-- 
Peter Dalgaard
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.