You should be able to figure it out if you just print out the four factor levels that read.table() missed. The main differences are that read.table() includes ' in the quote= argument and it recognizes # as a comment (and therefore discards it and everything after it):
setdiff(levels(dfcsv$Var), levels(dftxt$Var)) The base function is read.table() and it includes the following defaults: quote="\"'", comment.char="#" Functions read.csv() and read.delim() call read.table() but change those defaults to quote="\"", comment.char="" David From: SH [mailto:empti...@gmail.com] Sent: Wednesday, August 21, 2013 10:14 AM To: dcarl...@tamu.edu; peter dalgaard Cc: r-help Subject: Re: [R] data import: strange experience Thanks Peter. It works with read.delim. David: Thanks for your comments. To answer your questions. I don't have 'NA' and all balanced. The number of mssing levels were 4 and it happened only to those four levels. Yes, there is commas embedded and some characters (e.g., '-', space, some wired characters in the middle of names, etc.). I can send you sample data if you are willing to take a look. Even though using 'read.delim' works, I am still curious what caused the problem and potential problem that I may miss. Thanks again, SH On Wed, Aug 21, 2013 at 10:58 AM, David Carlson <dcarl...@tamu.edu> wrote: This is not really enough information to diagnose the problem. What are the missing factor levels? Were the missing levels combined with another level or do you have missing values (NA) for those observations? Do the extra factor levels include embedded commas? There are differences between read.table and read.csv in the default quote= and comment.char= arguments. ------------------------------------- David L Carlson Associate Professor of Anthropology Texas A&M University College Station, TX 77840-4352 -----Original Message----- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of SH Sent: Wednesday, August 21, 2013 9:36 AM To: r-help@r-project.org Subject: [R] data import: strange experience Dear List: I had some strange experience in importing data. I wonder if anyone of you had the same problem before and would greatly appreciate your suggestion in advance. The original data set in excel format. Here is a brief summary of the procedure I did: 1. I saved the original excel data as csv and txt formats, separately. 2. I imported two data using the following codes. There were no error messages. dftxt = read.table('df.txt',header=T, sep='\t') dfcsv = read.csv('df.csv',header=T, sep=',') 3. When I checked data with 'str', I found that factor levels of a variable were different each other. Levels of dftxt were less than those of dfcsv (48 vs 52). 4. So, I checked 'df.txt' file and found that the missing levels were still there, i.e., there is a no problem in text file. I suspect that something happened when I imported it into R. Since there was no errors in importing the file into R, I do not have an idea where to start to fix it. Do you have any suggestion? Thank you very much in advance, SH [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.