Re: [R] When creating a data frame with data.frame() transforms integers into factors
1. Please always cc. the list; do not reply just to me. 2. OK, I see. I ERRED. Had you cc'ed the list, someone might have pointed this out. The correct example reproduces what you saw. z- sample(1:10,30,rep=TRUE) table(z) w - data.frame(table(z)) w z Freq 1 12 2 23 3 31 4 43 5 55 6 63 7 75 8 84 9 91 10 103 sapply(w,class) z Freq factor integer This is exactly what is expected and documented. See ?table. So the question is: What do you expect? table() produces an array whose cross-classifying factors are the dimensions. data.frame converts this into a data frame. Perhaps the following will help clarify: z - data.frame(fac1= sample(LETTERS[1:3],10,rep=TRUE), fac2 = sample(c(j,k),10,rep=TRUE)) z fac1 fac2 1 Ak 2 Bk 3 Ck 4 Ck 5 Bk 6 Ck 7 Ck 8 Aj 9 Aj 10Cj table(z) fac2 fac1 j k A 2 1 B 0 2 C 1 4 data.frame(table(z)) fac1 fac2 Freq 1Aj2 2Bj0 3Cj1 4Ak1 5Bk2 6Ck4 table(z['fac1']) A B C 3 2 5 data.frame(table(z['fac1'])) Var1 Freq 1A3 2B2 3C5 Cheers, Bert On Sat, May 25, 2013 at 6:54 PM, António Camacho toin...@gmail.com wrote: Hello Bert Thanks for your prompt reply. I tried your example and it worked without a problem. But what i want is to create a data frame from the output of the function table(), so in your example i tried sapply(data.frame(tbl),class) and the output was z -- factor and Freq ---integer. What is happening in the table() function that is transforming the integers in z into values with labels ? because when i do names(tbl) it returns each value of z as a name I read the manual for [ but i didn't understand it completely. I have to read the introduction to R more carefully. I also tried using [, [[ and $ for the extraction of the values from the 'posts' column, but the problem persisted. Like i said, this code was taken from an example in a webpage. I contacted the author and he confirmed me that the code worked on his machine, that was running R 2.15.1 Maybe something changed between versions in the data.frame() ?? I really don't understant what I am doing wrong. António On 2013/05/26, at 01:44, Bert Gunter wrote: Huh? z - sample(1:10,30,rep=TRUE) tbl - table(z) tbl z 1 2 3 4 5 6 7 8 9 10 4 3 2 6 3 3 2 2 2 3 data.frame(z) z 1 5 2 2 3 4 4 1 5 6 6 4 7 10 8 4 9 3 10 8 11 10 12 4 13 3 14 9 15 2 16 2 17 6 18 1 19 4 20 7 21 9 22 10 23 7 24 5 25 5 26 6 27 8 28 1 29 1 30 4 sapply(data.frame(z),class) z integer Your error: you used df['posts'] . You should have used df[,'posts'] . The former is a data frame. The latter is a vector. Read the Introduction to R tutorial or ?[ if you don't understand why. -- Bert -- Bert On Sat, May 25, 2013 at 12:36 PM, António Camacho toin...@gmail.com wrote: Hello I am novice to R and i was learning how to do a scatter plot with R using an example from a website. My setup is iMac with Mac OS X 10.8.3, with R 3.0.1, default install, without additional packages loaded I created a .csv file in vim with the following content userID,user,posts 1,user1,581 2,user2,281 3,user3,196 4,user4,150 5,user5,282 6,user6,184 7,user7,90 8,user8,74 9,user9,45 10,user10,20 11,user11,3 12,user12,1 13,user13,345 14,user14,123 i imported the file into R using : ' df - read.csv('file.csv') to confirm the data types i did : 'sappily(df, class) ' that returns userID -- integer ; user --- factor ; posts --- integer then i try to create another data frame with the number of posts and its frequencies, so i did: 'postFreqCount-data.frame(table(df['posts']))' this gives me the postFreqCount data frame with two columns, one called 'Var1' that has the number of posts each user did, and another collumn 'Freq' with the frequency of each number of posts. the problem is that if i do : 'sappily(postFreqCount['Var1'],class)' it returns factor. So the data.frame() function transformed a variable that was integer (posts) to a variable (Var1) that has the same values but is factor. I want to know how to prevent this from happening. How do i keep the values from being transformed from integer to factor ? Thank you for your help António [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website:
Re: [R] When creating a data frame with data.frame() transforms integers into factors
Hello Bert. I didn't reply to the list because i forgot. I hit reply instead of reply all Thanks for your example. I understood now that i was trying to do something that didn't made sense and that was why it failed. I should have used an histogram do do a graph of the frequency of each number of 'posts' instead of going the convoluted way around and trying to do a scatterplot. I now understand that table() transforms each value of the variable into a factor and counts how many times it shows up. It makes sense that these factors are then tranformed into character when in the data frame, because they are not a quantity, but the representation of the number. Thanks for the help. Problem solved. António Brito Camacho No dia 26/05/2013, às 15:00, Bert Gunter gunter.ber...@gene.com escreveu: 1. Please always cc. the list; do not reply just to me. 2. OK, I see. I ERRED. Had you cc'ed the list, someone might have pointed this out. The correct example reproduces what you saw. z- sample(1:10,30,rep=TRUE) table(z) w - data.frame(table(z)) w z Freq 1 12 2 23 3 31 4 43 5 55 6 63 7 75 8 84 9 91 10 103 sapply(w,class) z Freq factor integer This is exactly what is expected and documented. See ?table. So the question is: What do you expect? table() produces an array whose cross-classifying factors are the dimensions. data.frame converts this into a data frame. Perhaps the following will help clarify: z - data.frame(fac1= sample(LETTERS[1:3],10,rep=TRUE), fac2 = sample(c(j,k),10,rep=TRUE)) z fac1 fac2 1 Ak 2 Bk 3 Ck 4 Ck 5 Bk 6 Ck 7 Ck 8 Aj 9 Aj 10Cj table(z) fac2 fac1 j k A 2 1 B 0 2 C 1 4 data.frame(table(z)) fac1 fac2 Freq 1Aj2 2Bj0 3Cj1 4Ak1 5Bk2 6Ck4 table(z['fac1']) A B C 3 2 5 data.frame(table(z['fac1'])) Var1 Freq 1A3 2B2 3C5 Cheers, Bert On Sat, May 25, 2013 at 6:54 PM, António Camacho toin...@gmail.com wrote: Hello Bert Thanks for your prompt reply. I tried your example and it worked without a problem. But what i want is to create a data frame from the output of the function table(), so in your example i tried sapply(data.frame(tbl),class) and the output was z -- factor and Freq ---integer. What is happening in the table() function that is transforming the integers in z into values with labels ? because when i do names(tbl) it returns each value of z as a name I read the manual for [ but i didn't understand it completely. I have to read the introduction to R more carefully. I also tried using [, [[ and $ for the extraction of the values from the 'posts' column, but the problem persisted. Like i said, this code was taken from an example in a webpage. I contacted the author and he confirmed me that the code worked on his machine, that was running R 2.15.1 Maybe something changed between versions in the data.frame() ?? I really don't understant what I am doing wrong. António On 2013/05/26, at 01:44, Bert Gunter wrote: Huh? z - sample(1:10,30,rep=TRUE) tbl - table(z) tbl z 1 2 3 4 5 6 7 8 9 10 4 3 2 6 3 3 2 2 2 3 data.frame(z) z 1 5 2 2 3 4 4 1 5 6 6 4 7 10 8 4 9 3 10 8 11 10 12 4 13 3 14 9 15 2 16 2 17 6 18 1 19 4 20 7 21 9 22 10 23 7 24 5 25 5 26 6 27 8 28 1 29 1 30 4 sapply(data.frame(z),class) z integer Your error: you used df['posts'] . You should have used df[,'posts'] . The former is a data frame. The latter is a vector. Read the Introduction to R tutorial or ?[ if you don't understand why. -- Bert -- Bert On Sat, May 25, 2013 at 12:36 PM, António Camacho toin...@gmail.com wrote: Hello I am novice to R and i was learning how to do a scatter plot with R using an example from a website. My setup is iMac with Mac OS X 10.8.3, with R 3.0.1, default install, without additional packages loaded I created a .csv file in vim with the following content userID,user,posts 1,user1,581 2,user2,281 3,user3,196 4,user4,150 5,user5,282 6,user6,184 7,user7,90 8,user8,74 9,user9,45 10,user10,20 11,user11,3 12,user12,1 13,user13,345 14,user14,123 i imported the file into R using : ' df - read.csv('file.csv') to confirm the data types i did : 'sappily(df, class) ' that returns userID -- integer ; user --- factor ; posts --- integer then i try to create another data frame with the number of posts and its frequencies, so i did: 'postFreqCount-data.frame(table(df['posts']))' this gives me the postFreqCount data frame with two columns, one called 'Var1' that has the number of posts each user did, and another collumn 'Freq' with the frequency of each
Re: [R] When creating a data frame with data.frame() transforms integers into factors
Antonio- What exactly do you want as output? You stated you wanted a scatter plot, but which variable do you want on the X axis and which variable do you want on the Y axis? -tgs On Sat, May 25, 2013 at 3:36 PM, António Camacho toin...@gmail.com wrote: Hello I am novice to R and i was learning how to do a scatter plot with R using an example from a website. My setup is iMac with Mac OS X 10.8.3, with R 3.0.1, default install, without additional packages loaded I created a .csv file in vim with the following content userID,user,posts 1,user1,581 2,user2,281 3,user3,196 4,user4,150 5,user5,282 6,user6,184 7,user7,90 8,user8,74 9,user9,45 10,user10,20 11,user11,3 12,user12,1 13,user13,345 14,user14,123 i imported the file into R using : ' df - read.csv('file.csv') to confirm the data types i did : 'sappily(df, class) ' that returns userID -- integer ; user --- factor ; posts --- integer then i try to create another data frame with the number of posts and its frequencies, so i did: 'postFreqCount-data.frame(table(df['posts']))' this gives me the postFreqCount data frame with two columns, one called 'Var1' that has the number of posts each user did, and another collumn 'Freq' with the frequency of each number of posts. the problem is that if i do : 'sappily(postFreqCount['Var1'],class)' it returns factor. So the data.frame() function transformed a variable that was integer (posts) to a variable (Var1) that has the same values but is factor. I want to know how to prevent this from happening. How do i keep the values from being transformed from integer to factor ? Thank you for your help António [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] When creating a data frame with data.frame() transforms integers into factors
Huh? z - sample(1:10,30,rep=TRUE) tbl - table(z) tbl z 1 2 3 4 5 6 7 8 9 10 4 3 2 6 3 3 2 2 2 3 data.frame(z) z 1 5 2 2 3 4 4 1 5 6 6 4 7 10 8 4 9 3 10 8 11 10 12 4 13 3 14 9 15 2 16 2 17 6 18 1 19 4 20 7 21 9 22 10 23 7 24 5 25 5 26 6 27 8 28 1 29 1 30 4 sapply(data.frame(z),class) z integer Your error: you used df['posts'] . You should have used df[,'posts'] . The former is a data frame. The latter is a vector. Read the Introduction to R tutorial or ?[ if you don't understand why. -- Bert -- Bert On Sat, May 25, 2013 at 12:36 PM, António Camacho toin...@gmail.com wrote: Hello I am novice to R and i was learning how to do a scatter plot with R using an example from a website. My setup is iMac with Mac OS X 10.8.3, with R 3.0.1, default install, without additional packages loaded I created a .csv file in vim with the following content userID,user,posts 1,user1,581 2,user2,281 3,user3,196 4,user4,150 5,user5,282 6,user6,184 7,user7,90 8,user8,74 9,user9,45 10,user10,20 11,user11,3 12,user12,1 13,user13,345 14,user14,123 i imported the file into R using : ' df - read.csv('file.csv') to confirm the data types i did : 'sappily(df, class) ' that returns userID -- integer ; user --- factor ; posts --- integer then i try to create another data frame with the number of posts and its frequencies, so i did: 'postFreqCount-data.frame(table(df['posts']))' this gives me the postFreqCount data frame with two columns, one called 'Var1' that has the number of posts each user did, and another collumn 'Freq' with the frequency of each number of posts. the problem is that if i do : 'sappily(postFreqCount['Var1'],class)' it returns factor. So the data.frame() function transformed a variable that was integer (posts) to a variable (Var1) that has the same values but is factor. I want to know how to prevent this from happening. How do i keep the values from being transformed from integer to factor ? Thank you for your help António [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.