Re: [R] help sub setting data frame
Hi Sean, Comment in line below. On Thu, Oct 22, 2009 at 5:39 PM, Sean MacEachern sean.mace...@gmail.com wrote: Hi, I'm running into a problem subsetting a data frame that I have never encountered before: dim(chkPd) [1] 3213 6 df = head(chkPd) df PN WB Sire Dam MG SEX 601 1001 715349 61710 61702 67 F 969 1001_1 511092 616253 615037 168 F 986 1002_1 511082 616253 623905 168 F 667 1003 715617 61817 61441 67 F 1361 1003_1 510711 635246 627321 168 F 754 1004 715272 62356 61380 67 F dfb = chkPd[df$PN,] dfb PN WB Sire Dam MG SEX 1001 2114_1 510944 616294 614865 168 M NA NA NA NA NA NA NA NA.1 NA NA NA NA NA NA 1003 1130_1 510950 616294 619694 168 F NA.2 NA NA NA NA NA NA 1004 2221-SHR2 510952 616294 619694 168 M I'm not sure why I'm getting this behaviour? By sub-setting the original data frame by PN I seem to be pulling out row numbers? Therefore I am only getting results where PN is less than the dimensions of the original data frame and of course nothing where PN has _ in the id. I have also tried using subset but haven't had any luck with that either. That is the documented behavior as far as I can tell. See ?[.data.frame Maybe my brain is going soft at the end of a long day, but I can't tell what you're trying to do. Can you clarify? -Ista dfb = subset(chkPd, PN==df$PN) Warning message: In PN == df$PN : longer object length is not a multiple of shorter object length I wasn't aware that both the larger data frame had to be a multiple of the object you were sub-setting . In any case I would appreciate any insight into what I may be doing wrong. Cheers, Sean sessionInfo() R version 2.9.1 (2009-06-26) i386-apple-darwin8.11.1 locale: en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] splines stats graphics grDevices utils datasets methods base __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Ista Zahn Graduate student University of Rochester Department of Clinical and Social Psychology http://yourpsyche.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] help sub setting data frame
Hi Ista, I think I'm suffering long dayitis myself. You are probably right. I don't use subset that often. I typically use brackets to subset dataframes. Essentially what I am trying to do is take my original dataframe (chkPd) and subset it using a smaller dataframe with some matching PN IDs. They are only a few hundred rows different in size so subset wouldn't be appropriate here. I'm just struggling to figure out what's going wrong in my first example. for instance if I try: df = data.frame('id'=c(1,2,3,4),'res'=c(10,10,20,20)) dfb=df[1:2] dfc = df[dfb$id,] I get something along the lines of what I'd expect where my new dataframe is a subset of the original based on the matching ids I specified in dfb$id. Is that wrong in my first example? Cheers, Sean On Thu, Oct 22, 2009 at 4:55 PM, Ista Zahn istaz...@gmail.com wrote: Hi Sean, Comment in line below. On Thu, Oct 22, 2009 at 5:39 PM, Sean MacEachern sean.mace...@gmail.com wrote: Hi, I'm running into a problem subsetting a data frame that I have never encountered before: dim(chkPd) [1] 3213 6 df = head(chkPd) df PN WB Sire Dam MG SEX 601 1001 715349 61710 61702 67 F 969 1001_1 511092 616253 615037 168 F 986 1002_1 511082 616253 623905 168 F 667 1003 715617 61817 61441 67 F 1361 1003_1 510711 635246 627321 168 F 754 1004 715272 62356 61380 67 F dfb = chkPd[df$PN,] dfb PN WB Sire Dam MG SEX 1001 2114_1 510944 616294 614865 168 M NA NA NA NA NA NA NA NA.1 NA NA NA NA NA NA 1003 1130_1 510950 616294 619694 168 F NA.2 NA NA NA NA NA NA 1004 2221-SHR2 510952 616294 619694 168 M I'm not sure why I'm getting this behaviour? By sub-setting the original data frame by PN I seem to be pulling out row numbers? Therefore I am only getting results where PN is less than the dimensions of the original data frame and of course nothing where PN has _ in the id. I have also tried using subset but haven't had any luck with that either. That is the documented behavior as far as I can tell. See ?[.data.frame Maybe my brain is going soft at the end of a long day, but I can't tell what you're trying to do. Can you clarify? -Ista dfb = subset(chkPd, PN==df$PN) Warning message: In PN == df$PN : longer object length is not a multiple of shorter object length I wasn't aware that both the larger data frame had to be a multiple of the object you were sub-setting . In any case I would appreciate any insight into what I may be doing wrong. Cheers, Sean sessionInfo() R version 2.9.1 (2009-06-26) i386-apple-darwin8.11.1 locale: en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] splines stats graphics grDevices utils datasets methods base __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Ista Zahn Graduate student University of Rochester Department of Clinical and Social Psychology http://yourpsyche.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] help sub setting data frame
Is this what you want? df = data.frame('id'=c(1:100),'res'=c(1001:1100)) dfb=df[1:10,] dfc = df[df$id %in% dfb$id,] Still not sure, but that's my best guess. Going back to your original data you can try dfb = chkPd[chkPd$PN %in% df$PN,] Hope it helps, Ista On Thu, Oct 22, 2009 at 6:10 PM, Sean MacEachern sean.mace...@gmail.com wrote: Hi Ista, I think I'm suffering long dayitis myself. You are probably right. I don't use subset that often. I typically use brackets to subset dataframes. Essentially what I am trying to do is take my original dataframe (chkPd) and subset it using a smaller dataframe with some matching PN IDs. They are only a few hundred rows different in size so subset wouldn't be appropriate here. I'm just struggling to figure out what's going wrong in my first example. for instance if I try: df = data.frame('id'=c(1,2,3,4),'res'=c(10,10,20,20)) dfb=df[1:2] dfc = df[dfb$id,] I get something along the lines of what I'd expect where my new dataframe is a subset of the original based on the matching ids I specified in dfb$id. Is that wrong in my first example? Cheers, Sean On Thu, Oct 22, 2009 at 4:55 PM, Ista Zahn istaz...@gmail.com wrote: Hi Sean, Comment in line below. On Thu, Oct 22, 2009 at 5:39 PM, Sean MacEachern sean.mace...@gmail.com wrote: Hi, I'm running into a problem subsetting a data frame that I have never encountered before: dim(chkPd) [1] 3213 6 df = head(chkPd) df PN WB Sire Dam MG SEX 601 1001 715349 61710 61702 67 F 969 1001_1 511092 616253 615037 168 F 986 1002_1 511082 616253 623905 168 F 667 1003 715617 61817 61441 67 F 1361 1003_1 510711 635246 627321 168 F 754 1004 715272 62356 61380 67 F dfb = chkPd[df$PN,] dfb PN WB Sire Dam MG SEX 1001 2114_1 510944 616294 614865 168 M NA NA NA NA NA NA NA NA.1 NA NA NA NA NA NA 1003 1130_1 510950 616294 619694 168 F NA.2 NA NA NA NA NA NA 1004 2221-SHR2 510952 616294 619694 168 M I'm not sure why I'm getting this behaviour? By sub-setting the original data frame by PN I seem to be pulling out row numbers? Therefore I am only getting results where PN is less than the dimensions of the original data frame and of course nothing where PN has _ in the id. I have also tried using subset but haven't had any luck with that either. That is the documented behavior as far as I can tell. See ?[.data.frame Maybe my brain is going soft at the end of a long day, but I can't tell what you're trying to do. Can you clarify? -Ista dfb = subset(chkPd, PN==df$PN) Warning message: In PN == df$PN : longer object length is not a multiple of shorter object length I wasn't aware that both the larger data frame had to be a multiple of the object you were sub-setting . In any case I would appreciate any insight into what I may be doing wrong. Cheers, Sean sessionInfo() R version 2.9.1 (2009-06-26) i386-apple-darwin8.11.1 locale: en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] splines stats graphics grDevices utils datasets methods base __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Ista Zahn Graduate student University of Rochester Department of Clinical and Social Psychology http://yourpsyche.org -- Ista Zahn Graduate student University of Rochester Department of Clinical and Social Psychology http://yourpsyche.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] help sub setting data frame
Works perfectly! Thanks to all who responded. Sean On Thu, Oct 22, 2009 at 6:24 PM, Ista Zahn istaz...@gmail.com wrote: Is this what you want? df = data.frame('id'=c(1:100),'res'=c(1001:1100)) dfb=df[1:10,] dfc = df[df$id %in% dfb$id,] Still not sure, but that's my best guess. Going back to your original data you can try dfb = chkPd[chkPd$PN %in% df$PN,] Hope it helps, Ista On Thu, Oct 22, 2009 at 6:10 PM, Sean MacEachern sean.mace...@gmail.com wrote: Hi Ista, I think I'm suffering long dayitis myself. You are probably right. I don't use subset that often. I typically use brackets to subset dataframes. Essentially what I am trying to do is take my original dataframe (chkPd) and subset it using a smaller dataframe with some matching PN IDs. They are only a few hundred rows different in size so subset wouldn't be appropriate here. I'm just struggling to figure out what's going wrong in my first example. for instance if I try: df = data.frame('id'=c(1,2,3,4),'res'=c(10,10,20,20)) dfb=df[1:2] dfc = df[dfb$id,] I get something along the lines of what I'd expect where my new dataframe is a subset of the original based on the matching ids I specified in dfb$id. Is that wrong in my first example? Cheers, Sean On Thu, Oct 22, 2009 at 4:55 PM, Ista Zahn istaz...@gmail.com wrote: Hi Sean, Comment in line below. On Thu, Oct 22, 2009 at 5:39 PM, Sean MacEachern sean.mace...@gmail.com wrote: Hi, I'm running into a problem subsetting a data frame that I have never encountered before: dim(chkPd) [1] 3213 6 df = head(chkPd) df PN WB Sire Dam MG SEX 601 1001 715349 61710 61702 67 F 969 1001_1 511092 616253 615037 168 F 986 1002_1 511082 616253 623905 168 F 667 1003 715617 61817 61441 67 F 1361 1003_1 510711 635246 627321 168 F 754 1004 715272 62356 61380 67 F dfb = chkPd[df$PN,] dfb PN WB Sire Dam MG SEX 1001 2114_1 510944 616294 614865 168 M NA NA NA NA NA NA NA NA.1 NA NA NA NA NA NA 1003 1130_1 510950 616294 619694 168 F NA.2 NA NA NA NA NA NA 1004 2221-SHR2 510952 616294 619694 168 M I'm not sure why I'm getting this behaviour? By sub-setting the original data frame by PN I seem to be pulling out row numbers? Therefore I am only getting results where PN is less than the dimensions of the original data frame and of course nothing where PN has _ in the id. I have also tried using subset but haven't had any luck with that either. That is the documented behavior as far as I can tell. See ?[.data.frame Maybe my brain is going soft at the end of a long day, but I can't tell what you're trying to do. Can you clarify? -Ista dfb = subset(chkPd, PN==df$PN) Warning message: In PN == df$PN : longer object length is not a multiple of shorter object length I wasn't aware that both the larger data frame had to be a multiple of the object you were sub-setting . In any case I would appreciate any insight into what I may be doing wrong. Cheers, Sean sessionInfo() R version 2.9.1 (2009-06-26) i386-apple-darwin8.11.1 locale: en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] splines stats graphics grDevices utils datasets methods base __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Ista Zahn Graduate student University of Rochester Department of Clinical and Social Psychology http://yourpsyche.org -- Ista Zahn Graduate student University of Rochester Department of Clinical and Social Psychology http://yourpsyche.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.