Re: [R] When creating a data frame with data.frame() transforms integers into factors

2013-05-26 Thread Bert Gunter
1. Please always cc. the list; do not reply just to me.

2.  OK, I see. I ERRED. Had you cc'ed the list, someone might have
pointed this out. The correct example reproduces what you saw.

z- sample(1:10,30,rep=TRUE)
table(z)
w - data.frame(table(z))
w

 z  Freq
1   12
2   23
3   31
4   43
5   55
6   63
7   75
8   84
9   91
10 103

 sapply(w,class)
z  Freq
 factor integer

This is exactly what is expected and documented.  See ?table. So the
question is: What do you expect?  table() produces an array whose
cross-classifying factors are the dimensions. data.frame converts this
into a data frame. Perhaps the following will help clarify:

 z - data.frame(fac1= sample(LETTERS[1:3],10,rep=TRUE),
  fac2 = sample(c(j,k),10,rep=TRUE))
 z
   fac1 fac2
1 Ak
2 Bk
3 Ck
4 Ck
5 Bk
6 Ck
7 Ck
8 Aj
9 Aj
10Cj

 table(z)

fac2
fac1 j k
   A 2 1
   B 0 2
   C 1 4

 data.frame(table(z))

  fac1 fac2 Freq
1Aj2
2Bj0
3Cj1
4Ak1
5Bk2
6Ck4

 table(z['fac1'])

A B C
3 2 5

 data.frame(table(z['fac1']))
  Var1 Freq
1A3
2B2
3C5

Cheers,
Bert

On Sat, May 25, 2013 at 6:54 PM, António Camacho toin...@gmail.com wrote:
 Hello Bert
 Thanks for your prompt reply.
 I tried your example and it worked without a problem.

 But what i want is to create a data frame from the output of the function
 table(), so in your example i tried sapply(data.frame(tbl),class) and the
 output was z -- factor and Freq ---integer.
 What is happening in the table() function that is transforming the integers
 in z into values with labels ?
 because when i do names(tbl) it returns each value of z as a name

 I read the manual for  [  but i didn't understand it completely. I have to
 read the introduction to R more carefully.

 I also tried using [, [[ and $ for the extraction of the values from
 the 'posts' column, but the problem persisted.

 Like i said, this code was taken from an example in a webpage. I contacted
 the author and he confirmed me that the code worked on his machine, that was
 running R 2.15.1
 Maybe something changed between versions in the data.frame() ??

 I really don't understant what I am doing wrong.

 António

 On 2013/05/26, at 01:44, Bert Gunter wrote:

 Huh?

 z - sample(1:10,30,rep=TRUE)
 tbl - table(z)
 tbl

 z
 1 2 3 4 5 6 7 8 9 10
 4 3 2 6 3 3 2 2 2 3

 data.frame(z)

z
 1   5
 2   2
 3   4
 4   1
 5   6
 6   4
 7  10
 8   4
 9   3
 10  8
 11 10
 12  4
 13  3
 14  9
 15  2
 16  2
 17  6
 18  1
 19  4
 20  7
 21  9
 22 10
 23  7
 24  5
 25  5
 26  6
 27  8
 28  1
 29  1
 30  4

 sapply(data.frame(z),class)

z
 integer

 Your error: you used df['posts']  . You should have used df[,'posts'] .

 The former is a data frame. The latter is a vector. Read the
 Introduction to R tutorial or ?[ if you don't understand why.

 -- Bert

 -- Bert

 On Sat, May 25, 2013 at 12:36 PM, António Camacho toin...@gmail.com
 wrote:

 Hello


 I am novice to R and i was learning how to do a scatter plot with R using
 an example from a website.

 My setup is iMac with Mac OS X 10.8.3, with R 3.0.1, default install,
 without additional packages loaded

 I created a .csv file in vim with  the following content
 userID,user,posts
 1,user1,581
 2,user2,281
 3,user3,196
 4,user4,150
 5,user5,282
 6,user6,184
 7,user7,90
 8,user8,74
 9,user9,45
 10,user10,20
 11,user11,3
 12,user12,1
 13,user13,345
 14,user14,123

 i imported the file into R using : ' df - read.csv('file.csv')
 to confirm the data types i did : 'sappily(df, class) '
 that returns userID -- integer ; user --- factor ; posts ---
 integer
 then i try to create another data frame with the number of posts and its
 frequencies,
 so i did: 'postFreqCount-data.frame(table(df['posts']))'
 this gives me the postFreqCount data frame with two columns, one called
 'Var1' that has the number of posts each user did, and another collumn
 'Freq' with the frequency of each number of posts.
 the problem is that if i do : 'sappily(postFreqCount['Var1'],class)' it
 returns factor.
 So the data.frame() function transformed a variable that was integer
 (posts) to a variable (Var1) that has the same values but is factor.
 I want to know how to prevent this from happening. How do i keep the
 values
 from being transformed from integer to factor ?

 Thank you for your help

 António

[[alternative HTML version deleted]]


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




 --

 Bert Gunter
 Genentech Nonclinical Biostatistics

 Internal Contact Info:
 Phone: 467-7374
 Website:

 

Re: [R] When creating a data frame with data.frame() transforms integers into factors

2013-05-26 Thread António Brito Camacho
Hello Bert.

I didn't reply to the list because i forgot. I hit reply instead of reply 
all

Thanks for your example.
I understood now that i was trying to do something that didn't made sense and 
that was why it failed.
I should have used an histogram do do a graph of the frequency of each number 
of 'posts' instead of going the convoluted way around and trying to do a 
scatterplot.
I now understand that table() transforms each value of the variable into a 
factor and counts how many times it shows up. It makes sense that these 
factors are then tranformed into character when in the data frame, because 
they are not a quantity, but the representation of the number.

Thanks for the help. Problem solved.

António Brito Camacho


No dia 26/05/2013, às 15:00, Bert Gunter gunter.ber...@gene.com escreveu:

 1. Please always cc. the list; do not reply just to me.
 
 2.  OK, I see. I ERRED. Had you cc'ed the list, someone might have
 pointed this out. The correct example reproduces what you saw.
 
 z- sample(1:10,30,rep=TRUE)
 table(z)
 w - data.frame(table(z))
 w
 
 z  Freq
 1   12
 2   23
 3   31
 4   43
 5   55
 6   63
 7   75
 8   84
 9   91
 10 103
 
 sapply(w,class)
z  Freq
 factor integer
 
 This is exactly what is expected and documented.  See ?table. So the
 question is: What do you expect?  table() produces an array whose
 cross-classifying factors are the dimensions. data.frame converts this
 into a data frame. Perhaps the following will help clarify:
 
 z - data.frame(fac1= sample(LETTERS[1:3],10,rep=TRUE),
  fac2 = sample(c(j,k),10,rep=TRUE))
 z
   fac1 fac2
 1 Ak
 2 Bk
 3 Ck
 4 Ck
 5 Bk
 6 Ck
 7 Ck
 8 Aj
 9 Aj
 10Cj
 
 table(z)
 
fac2
 fac1 j k
   A 2 1
   B 0 2
   C 1 4
 
 data.frame(table(z))
 
  fac1 fac2 Freq
 1Aj2
 2Bj0
 3Cj1
 4Ak1
 5Bk2
 6Ck4
 
 table(z['fac1'])
 
 A B C
 3 2 5
 
 data.frame(table(z['fac1']))
  Var1 Freq
 1A3
 2B2
 3C5
 
 Cheers,
 Bert
 
 On Sat, May 25, 2013 at 6:54 PM, António Camacho toin...@gmail.com wrote:
 Hello Bert
 Thanks for your prompt reply.
 I tried your example and it worked without a problem.
 
 But what i want is to create a data frame from the output of the function
 table(), so in your example i tried sapply(data.frame(tbl),class) and the
 output was z -- factor and Freq ---integer.
 What is happening in the table() function that is transforming the integers
 in z into values with labels ?
 because when i do names(tbl) it returns each value of z as a name
 
 I read the manual for  [  but i didn't understand it completely. I have to
 read the introduction to R more carefully.
 
 I also tried using [, [[ and $ for the extraction of the values from
 the 'posts' column, but the problem persisted.
 
 Like i said, this code was taken from an example in a webpage. I contacted
 the author and he confirmed me that the code worked on his machine, that was
 running R 2.15.1
 Maybe something changed between versions in the data.frame() ??
 
 I really don't understant what I am doing wrong.
 
 António
 
 On 2013/05/26, at 01:44, Bert Gunter wrote:
 
 Huh?
 
 z - sample(1:10,30,rep=TRUE)
 tbl - table(z)
 tbl
 
 z
 1 2 3 4 5 6 7 8 9 10
 4 3 2 6 3 3 2 2 2 3
 
 data.frame(z)
 
   z
 1   5
 2   2
 3   4
 4   1
 5   6
 6   4
 7  10
 8   4
 9   3
 10  8
 11 10
 12  4
 13  3
 14  9
 15  2
 16  2
 17  6
 18  1
 19  4
 20  7
 21  9
 22 10
 23  7
 24  5
 25  5
 26  6
 27  8
 28  1
 29  1
 30  4
 
 sapply(data.frame(z),class)
 
   z
 integer
 
 Your error: you used df['posts']  . You should have used df[,'posts'] .
 
 The former is a data frame. The latter is a vector. Read the
 Introduction to R tutorial or ?[ if you don't understand why.
 
 -- Bert
 
 -- Bert
 
 On Sat, May 25, 2013 at 12:36 PM, António Camacho toin...@gmail.com
 wrote:
 
 Hello
 
 
 I am novice to R and i was learning how to do a scatter plot with R using
 an example from a website.
 
 My setup is iMac with Mac OS X 10.8.3, with R 3.0.1, default install,
 without additional packages loaded
 
 I created a .csv file in vim with  the following content
 userID,user,posts
 1,user1,581
 2,user2,281
 3,user3,196
 4,user4,150
 5,user5,282
 6,user6,184
 7,user7,90
 8,user8,74
 9,user9,45
 10,user10,20
 11,user11,3
 12,user12,1
 13,user13,345
 14,user14,123
 
 i imported the file into R using : ' df - read.csv('file.csv')
 to confirm the data types i did : 'sappily(df, class) '
 that returns userID -- integer ; user --- factor ; posts ---
 integer
 then i try to create another data frame with the number of posts and its
 frequencies,
 so i did: 'postFreqCount-data.frame(table(df['posts']))'
 this gives me the postFreqCount data frame with two columns, one called
 'Var1' that has the number of posts each user did, and another collumn
 'Freq' with the frequency of each 

Re: [R] When creating a data frame with data.frame() transforms integers into factors

2013-05-25 Thread Thomas Stewart
Antonio-

What exactly do you want as output?  You stated you wanted a scatter plot,
but which variable do you want on the X axis and which variable do you want
on the Y axis?

-tgs


On Sat, May 25, 2013 at 3:36 PM, António Camacho toin...@gmail.com wrote:

 Hello


 I am novice to R and i was learning how to do a scatter plot with R using
 an example from a website.

 My setup is iMac with Mac OS X 10.8.3, with R 3.0.1, default install,
 without additional packages loaded

 I created a .csv file in vim with  the following content
 userID,user,posts
 1,user1,581
 2,user2,281
 3,user3,196
 4,user4,150
 5,user5,282
 6,user6,184
 7,user7,90
 8,user8,74
 9,user9,45
 10,user10,20
 11,user11,3
 12,user12,1
 13,user13,345
 14,user14,123

 i imported the file into R using : ' df - read.csv('file.csv')
 to confirm the data types i did : 'sappily(df, class) '
 that returns userID -- integer ; user --- factor ; posts ---
 integer
 then i try to create another data frame with the number of posts and its
 frequencies,
 so i did: 'postFreqCount-data.frame(table(df['posts']))'
 this gives me the postFreqCount data frame with two columns, one called
 'Var1' that has the number of posts each user did, and another collumn
 'Freq' with the frequency of each number of posts.
 the problem is that if i do : 'sappily(postFreqCount['Var1'],class)' it
 returns factor.
 So the data.frame() function transformed a variable that was integer
 (posts) to a variable (Var1) that has the same values but is factor.
 I want to know how to prevent this from happening. How do i keep the values
 from being transformed from integer to factor ?

 Thank you for your help

 António

 [[alternative HTML version deleted]]


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] When creating a data frame with data.frame() transforms integers into factors

2013-05-25 Thread Bert Gunter
Huh?

 z - sample(1:10,30,rep=TRUE)
 tbl - table(z)
 tbl
z
 1  2  3  4  5  6  7  8  9 10
 4  3  2  6  3  3  2  2  2  3
 data.frame(z)
z
1   5
2   2
3   4
4   1
5   6
6   4
7  10
8   4
9   3
10  8
11 10
12  4
13  3
14  9
15  2
16  2
17  6
18  1
19  4
20  7
21  9
22 10
23  7
24  5
25  5
26  6
27  8
28  1
29  1
30  4
 sapply(data.frame(z),class)
z
integer

Your error: you used df['posts']  . You should have used df[,'posts'] .

The former is a data frame. The latter is a vector. Read the
Introduction to R tutorial or ?[ if you don't understand why.

-- Bert

-- Bert

On Sat, May 25, 2013 at 12:36 PM, António Camacho toin...@gmail.com wrote:
 Hello


 I am novice to R and i was learning how to do a scatter plot with R using
 an example from a website.

 My setup is iMac with Mac OS X 10.8.3, with R 3.0.1, default install,
 without additional packages loaded

 I created a .csv file in vim with  the following content
 userID,user,posts
 1,user1,581
 2,user2,281
 3,user3,196
 4,user4,150
 5,user5,282
 6,user6,184
 7,user7,90
 8,user8,74
 9,user9,45
 10,user10,20
 11,user11,3
 12,user12,1
 13,user13,345
 14,user14,123

 i imported the file into R using : ' df - read.csv('file.csv')
 to confirm the data types i did : 'sappily(df, class) '
 that returns userID -- integer ; user --- factor ; posts ---
 integer
 then i try to create another data frame with the number of posts and its
 frequencies,
 so i did: 'postFreqCount-data.frame(table(df['posts']))'
 this gives me the postFreqCount data frame with two columns, one called
 'Var1' that has the number of posts each user did, and another collumn
 'Freq' with the frequency of each number of posts.
 the problem is that if i do : 'sappily(postFreqCount['Var1'],class)' it
 returns factor.
 So the data.frame() function transformed a variable that was integer
 (posts) to a variable (Var1) that has the same values but is factor.
 I want to know how to prevent this from happening. How do i keep the values
 from being transformed from integer to factor ?

 Thank you for your help

 António

 [[alternative HTML version deleted]]


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.