Re: [R] data import: strange experience

2013-08-22 Thread jwd
On Wed, 21 Aug 2013 10:35:53 -0400
SH empti...@gmail.com wrote:

It looks like your problem has already been answered, however, as a
rule of thumb anytime you see a peculiarity like this you should look
for minor variations between what you expected to export and what Excel
really exported as delimited text.  Occasionally there will be a
space or other character (,'/$,- etc.) that maybe handled as
a signal or ignored by the importing program but not by Excel.  Usually
Excel works as expected, but it is a good idea to examine the text
file(s) in an editor like notepad in Windows or Kate in Linux if you
encounter an oddity. BTW, there better choices than notepad for Windows
and I would recommend one with column selecting abilities for work on
delimited data files.

jwdougherty

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] data import: strange experience

2013-08-21 Thread SH
Dear List:

I had some strange experience in importing data.  I wonder if anyone of you
had the same problem before and would greatly appreciate your suggestion in
advance.

The original data set in excel format.

Here is a brief summary of the procedure I did:
1. I saved the original excel data as csv and txt formats, separately.
2. I imported two data using the following codes.  There were no error
messages.
dftxt = read.table('df.txt',header=T, sep='\t')
dfcsv = read.csv('df.csv',header=T, sep=',')
3. When I checked data with 'str', I found that factor levels of a variable
were different each other.
Levels of dftxt were less than those of dfcsv (48 vs 52).
4. So, I checked 'df.txt' file and found that the missing levels were still
there, i.e., there is a no problem in text file.  I suspect that something
happened when I imported it into R.

Since there was no errors in importing the file into R, I do not have an
idea where to start to fix it.  Do you have any suggestion?

Thank you very much in advance,

SH

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] data import: strange experience

2013-08-21 Thread Sarah Goslee
Hi,

We don't know anything about your data or your file, so it's utterly
impossible to offer useful suggestions.

The very best thing you can do is condense your problem into a
reproducible example, with fake data if necessary. Otherwise you're
limited by the ability of the list to guess what you're looking at,
and our track record with that is spotty.


Sarah

On Wed, Aug 21, 2013 at 10:35 AM, SH empti...@gmail.com wrote:
 Dear List:

 I had some strange experience in importing data.  I wonder if anyone of you
 had the same problem before and would greatly appreciate your suggestion in
 advance.

 The original data set in excel format.

 Here is a brief summary of the procedure I did:
 1. I saved the original excel data as csv and txt formats, separately.
 2. I imported two data using the following codes.  There were no error
 messages.
 dftxt = read.table('df.txt',header=T, sep='\t')
 dfcsv = read.csv('df.csv',header=T, sep=',')
 3. When I checked data with 'str', I found that factor levels of a variable
 were different each other.
 Levels of dftxt were less than those of dfcsv (48 vs 52).
 4. So, I checked 'df.txt' file and found that the missing levels were still
 there, i.e., there is a no problem in text file.  I suspect that something
 happened when I imported it into R.

 Since there was no errors in importing the file into R, I do not have an
 idea where to start to fix it.  Do you have any suggestion?

 Thank you very much in advance,

 SH


-- 
Sarah Goslee
http://www.functionaldiversity.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] data import: strange experience

2013-08-21 Thread David Carlson
This is not really enough information to diagnose the problem.
What are the missing factor levels? Were the missing levels
combined with another level or do you have missing values (NA)
for those observations? Do the extra factor levels include
embedded commas? There are differences between read.table and
read.csv in the default quote= and comment.char= arguments.

-
David L Carlson
Associate Professor of Anthropology
Texas AM University
College Station, TX 77840-4352

-Original Message-
From: r-help-boun...@r-project.org
[mailto:r-help-boun...@r-project.org] On Behalf Of SH
Sent: Wednesday, August 21, 2013 9:36 AM
To: r-help@r-project.org
Subject: [R] data import: strange experience

Dear List:

I had some strange experience in importing data.  I wonder if
anyone of you
had the same problem before and would greatly appreciate your
suggestion in
advance.

The original data set in excel format.

Here is a brief summary of the procedure I did:
1. I saved the original excel data as csv and txt formats,
separately.
2. I imported two data using the following codes.  There were no
error
messages.
dftxt = read.table('df.txt',header=T, sep='\t')
dfcsv = read.csv('df.csv',header=T, sep=',')
3. When I checked data with 'str', I found that factor levels of
a variable
were different each other.
Levels of dftxt were less than those of dfcsv (48 vs 52).
4. So, I checked 'df.txt' file and found that the missing levels
were still
there, i.e., there is a no problem in text file.  I suspect that
something
happened when I imported it into R.

Since there was no errors in importing the file into R, I do not
have an
idea where to start to fix it.  Do you have any suggestion?

Thank you very much in advance,

SH

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible
code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] data import: strange experience

2013-08-21 Thread SH
Hi Sarah,

Thanks for a prompt feedback.  I knew it will be very vague without
example.  However, I only used two commands to import data and had no
'apparent' errors.  The original data have about 19000 obs and I was able
to reduce about 3200.  I wonder if I can attach the data file (size:
109K) with my email.

Best,

Steve


On Wed, Aug 21, 2013 at 10:46 AM, Sarah Goslee sarah.gos...@gmail.comwrote:

 Hi,

 We don't know anything about your data or your file, so it's utterly
 impossible to offer useful suggestions.

 The very best thing you can do is condense your problem into a
 reproducible example, with fake data if necessary. Otherwise you're
 limited by the ability of the list to guess what you're looking at,
 and our track record with that is spotty.


 Sarah

 On Wed, Aug 21, 2013 at 10:35 AM, SH empti...@gmail.com wrote:
  Dear List:
 
  I had some strange experience in importing data.  I wonder if anyone of
 you
  had the same problem before and would greatly appreciate your suggestion
 in
  advance.
 
  The original data set in excel format.
 
  Here is a brief summary of the procedure I did:
  1. I saved the original excel data as csv and txt formats, separately.
  2. I imported two data using the following codes.  There were no error
  messages.
  dftxt = read.table('df.txt',header=T, sep='\t')
  dfcsv = read.csv('df.csv',header=T, sep=',')
  3. When I checked data with 'str', I found that factor levels of a
 variable
  were different each other.
  Levels of dftxt were less than those of dfcsv (48 vs 52).
  4. So, I checked 'df.txt' file and found that the missing levels were
 still
  there, i.e., there is a no problem in text file.  I suspect that
 something
  happened when I imported it into R.
 
  Since there was no errors in importing the file into R, I do not have an
  idea where to start to fix it.  Do you have any suggestion?
 
  Thank you very much in advance,
 
  SH
 

 --
 Sarah Goslee
 http://www.functionaldiversity.org


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] data import: strange experience

2013-08-21 Thread SH
Thanks Peter.  It works with read.delim.

David: Thanks for your comments.  To answer your questions.  I don't have
'NA' and all balanced.  The number of mssing levels were 4 and it happened
only to those four levels.  Yes, there is commas embedded and some
characters (e.g., '-', space, some wired characters in the middle of names,
etc.).  I can send you sample data if you are willing to take a look.  Even
though using 'read.delim' works, I am still curious what caused the problem
and potential problem that I may miss.

Thanks again,

SH




On Wed, Aug 21, 2013 at 10:58 AM, David Carlson dcarl...@tamu.edu wrote:

 This is not really enough information to diagnose the problem.
 What are the missing factor levels? Were the missing levels
 combined with another level or do you have missing values (NA)
 for those observations? Do the extra factor levels include
 embedded commas? There are differences between read.table and
 read.csv in the default quote= and comment.char= arguments.

 -
 David L Carlson
 Associate Professor of Anthropology
 Texas AM University
 College Station, TX 77840-4352

 -Original Message-
 From: r-help-boun...@r-project.org
 [mailto:r-help-boun...@r-project.org] On Behalf Of SH
 Sent: Wednesday, August 21, 2013 9:36 AM
 To: r-help@r-project.org
 Subject: [R] data import: strange experience

 Dear List:

 I had some strange experience in importing data.  I wonder if
 anyone of you
 had the same problem before and would greatly appreciate your
 suggestion in
 advance.

 The original data set in excel format.

 Here is a brief summary of the procedure I did:
 1. I saved the original excel data as csv and txt formats,
 separately.
 2. I imported two data using the following codes.  There were no
 error
 messages.
 dftxt = read.table('df.txt',header=T, sep='\t')
 dfcsv = read.csv('df.csv',header=T, sep=',')
 3. When I checked data with 'str', I found that factor levels of
 a variable
 were different each other.
 Levels of dftxt were less than those of dfcsv (48 vs 52).
 4. So, I checked 'df.txt' file and found that the missing levels
 were still
 there, i.e., there is a no problem in text file.  I suspect that
 something
 happened when I imported it into R.

 Since there was no errors in importing the file into R, I do not
 have an
 idea where to start to fix it.  Do you have any suggestion?

 Thank you very much in advance,

 SH

 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible
 code.



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] data import: strange experience

2013-08-21 Thread David Carlson
You should be able to figure it out if you just print out the
four factor levels that read.table() missed. The main
differences are that read.table() includes ' in the quote=
argument and it recognizes # as a comment (and therefore
discards it and everything after it):

setdiff(levels(dfcsv$Var), levels(dftxt$Var))

The base function is read.table() and it includes the following
defaults:

quote=\', comment.char=#

Functions read.csv() and read.delim() call read.table() but
change those defaults to

quote=\, comment.char=

David

From: SH [mailto:empti...@gmail.com] 
Sent: Wednesday, August 21, 2013 10:14 AM
To: dcarl...@tamu.edu; peter dalgaard
Cc: r-help
Subject: Re: [R] data import: strange experience

Thanks Peter.  It works with read.delim.
 
David: Thanks for your comments.  To answer your questions.  I
don't have 'NA' and all balanced.  The number of mssing levels
were 4 and it happened only to those four levels.  Yes, there
is commas embedded and some characters (e.g., '-', space, some
wired characters in the middle of names, etc.).  I can send you
sample data if you are willing to take a look.  Even though
using 'read.delim' works, I am still curious what caused the
problem and potential problem that I may miss.
 
Thanks again,
 
SH
 
 

On Wed, Aug 21, 2013 at 10:58 AM, David Carlson
dcarl...@tamu.edu wrote:
This is not really enough information to diagnose the problem.
What are the missing factor levels? Were the missing levels
combined with another level or do you have missing values (NA)
for those observations? Do the extra factor levels include
embedded commas? There are differences between read.table and
read.csv in the default quote= and comment.char= arguments.

-
David L Carlson
Associate Professor of Anthropology
Texas AM University
College Station, TX 77840-4352

-Original Message-
From: r-help-boun...@r-project.org
[mailto:r-help-boun...@r-project.org] On Behalf Of SH
Sent: Wednesday, August 21, 2013 9:36 AM
To: r-help@r-project.org
Subject: [R] data import: strange experience

Dear List:

I had some strange experience in importing data.  I wonder if
anyone of you
had the same problem before and would greatly appreciate your
suggestion in
advance.

The original data set in excel format.

Here is a brief summary of the procedure I did:
1. I saved the original excel data as csv and txt formats,
separately.
2. I imported two data using the following codes.  There were no
error
messages.
dftxt = read.table('df.txt',header=T, sep='\t')
dfcsv = read.csv('df.csv',header=T, sep=',')
3. When I checked data with 'str', I found that factor levels of
a variable
were different each other.
Levels of dftxt were less than those of dfcsv (48 vs 52).
4. So, I checked 'df.txt' file and found that the missing levels
were still
there, i.e., there is a no problem in text file.  I suspect that
something
happened when I imported it into R.

Since there was no errors in importing the file into R, I do not
have an
idea where to start to fix it.  Do you have any suggestion?

Thank you very much in advance,

SH
        [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible
code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.