Re: [R] Problem Creating Partial Dependence Plot
Jane Shevtsov wrote: I am trying to use the plotmo package to generate a partial dependence plot for a CART model created with rpart. When running plotmo, I get Error: get.plotmo.y returned the wrong length (got 204938 expected 205000). The rpart predict function does indeed return 204938 results, but plotmo is supposed to be able to handle NA's in rpart models. What I might do about this? It’s because you have NAs in y; plotmo only allows them in x (and then only with rpart models). The plotmo help page is admittedly not clear on this and I will update it in due course. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Scatter plot selection points
Hi all, I'd like to do a scatterplot where some of the values, out of a subset, are plotted differently in color and shape. I've worked around the following code but I don't manage to make it right. Any help greatly appreciated! # My data dd - iris iris$Code - 1:150 # A selection of my data I'd like to plot differently subset - subset(iris, iris$Sepal.Width5) sel - as.character(subset$Code) # I think the problems start already here :) # Plotting doesn't work plot(iris$Sepal.Length ~ iris$Sepal.Widith, col=ifelse(iris$Code==sel, red, black) pch=ifelse(iris$Code==sel, 17, 1)) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Dataframe: Average cells of two rows and replace them with one row
Hi Please do not use html formating in your post. It does not bring any advantage. See inline. From: Verena Weinbir [mailto:vwein...@gmail.com] Sent: Thursday, May 29, 2014 3:33 PM To: PIKAL Petr Subject: Re: [R] Dataframe: Average cells of two rows and replace them with one row Hey, Thank you for your reply! I've attached some sample data. When I tried your code it gave me the error message, that arguments must have same Why you attached data? Preferable is using dput. When I tried to read your data it had some flaw with number of items in row 13 (and probably others), Excel is not famous for keeping same formating across versions. test-read.table(clipboard, header=T, na.string=NA, dec=,) Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, : line 13 did not have 25 elements So I read only lines 1:10. test-read.table(clipboard, header=T, na.string=NA, dec=,) Which results in data frame with two factor variables Author and Test. BTW there is no variable âNameâ in your data. str(test) 'data.frame': 10 obs. of 25 variables: $ Author : Factor w/ 4 levels Beck,Joll,..: 2 2 2 2 1 1 1 1 3 4 $ Year: int 2006 2006 2006 2006 1988 1988 1988 1988 2004 2004 $ Number : int 720 720 720 720 33 41 41 41 19 26 $ NumberA : int 344 344 344 344 5 6 6 6 9 12 $ NumberB : int 376 376 376 376 28 35 35 35 10 14 $ Age : num 15 15 15 15 25.5 NA NA NA 37.4 37.2 $ AgeA: int NA NA NA NA 27 NA NA NA NA NA $ AgeB: int NA NA NA NA 24 NA NA NA NA NA $ Test: Factor w/ 2 levels green,red: 2 2 2 2 1 1 1 1 1 1 $ ScoreA : num 64.8 63 64.7 60.6 61 ... $ ScoreAdv: num 9.96 9.96 9.96 9.96 20.64 ... $ ScoreB : num 75.5 73.4 74.6 69.2 70.8 ... $ ScoreBdv: num 9.04 9.04 9.04 9.04 16.36 ... $ Sub1: logi NA NA NA NA NA NA ... $ Sub2: logi NA NA NA NA NA NA ... $ Sub3: logi NA NA NA NA NA NA ... $ Sub4: logi NA NA NA NA NA NA ... $ Sub5: logi NA NA NA NA NA NA ... $ Sub6: logi NA NA NA NA NA NA ... $ Sub7: logi NA NA NA NA NA NA ... $ Sub8: logi NA NA NA NA NA NA ... $ Sub8.1 : logi NA NA NA NA NA NA ... $ Sub10 : logi NA NA NA NA NA NA ... $ yi : num 1.124 1.092 1.04 0.903 0.515 ... $ vi : num 0.00643 0.00638 0.0063 0.00612 0.23337 ... Here is output from dput which you can use to inspect if my data are the same as yours (that is why dput is preferable) dput(test) structure(list(Author = structure(c(3L, 3L, 3L, 3L, 1L, 1L, 1L, 1L, 4L, 5L, 2L), .Label = c(Beck, Con, Joll, Per(a), Per(b)), class = factor), Year = c(2006L, 2006L, 2006L, 2006L, 1988L, 1988L, 1988L, 1988L, 2004L, 2004L, 2012L), Number = c(720L, 720L, 720L, 720L, 33L, 41L, 41L, 41L, 19L, 26L, 312L), NumberA = c(344L, 344L, 344L, 344L, 5L, 6L, 6L, 6L, 9L, 12L, 156L), NumberB = c(376L, 376L, 376L, 376L, 28L, 35L, 35L, 35L, 10L, 14L, 156L), Age = c(15, 15, 15, 15, 25.5, NA, NA, NA, 37.4, 37.2, 37.25), AgeA = c(NA, NA, NA, NA, 27, NA, NA, NA, NA, NA, 38.3), AgeB = c(NA, NA, NA, NA, 24, NA, NA, NA, NA, NA, 36.2), Test = structure(c(3L, 3L, 3L, 3L, 2L, 2L, 2L, 2L, 2L, 2L, 1L), .Label = c(blue, green, red), class = factor), ScoreA = c(64.8, 63, 64.7, 60.6, 61, 60.66, 58.5, 61.66, 87.58, 91.2, 0.26), ScoreAdv = c(9.955, 9.955, 9.955, 9.955, 20.64, 19.38, 20.35, 19.44, 16.79, 15.6, 0.27), ScoreB = c(75.5, 73.4, 74.6, 69.2, 70.83, 70.34, 70.91, 71.19, 98.08, 86.87, 0.3), ScoreBdv = c(9.043, 9.043, 9.043, 9.043, 16.36, 17.78, 18.23, 18.93, 16.35, 15.73, 0.26), Sub1 = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), Sub2 = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), Sub3 = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), Sub4 = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), Sub5 = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), Sub6 = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), Sub7 = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), Sub8 = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), Sub8.1 = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), Sub10 = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), yi = c(1.12396298138735, 1.0924560079, 1.03992836595652, 0.90337211588142, 0.514940166844419, 0.510422808437657, 0.629923007603453, 0.487074464117519, 0.605177248294008, -0.26766583881062, 0.150551071047105), vi = c(0.0064268782069221, 0.00637821308397186, 0.00630017975096319, 0.00611528303580472, 0.233373905723904, 0.212826775760406, 0.211924228535386, 0.222536036643126, 0.224889816220824, 0.158901797586393, 0.0128772400934118)), .Names = c(Author, Year, Number, NumberA, NumberB, Age, AgeA, AgeB, Test, ScoreA, ScoreAdv, ScoreB, ScoreBdv, Sub1, Sub2, Sub3, Sub4, Sub5, Sub6, Sub7, Sub8, Sub8.1, Sub10, yi, vi), class = data.frame, row.names = c(NA, -11L)) I can use aggregate without problems test.ag-aggregate(test[,-1], list(test[,1]), mean, na.rm=T) Here is the result dput(test.ag) structure(list(Group.1 = structure(1:5, .Label = c(Beck, Con, Joll, Per(a), Per(b)), class = factor), Year = c(1988,
Re: [R] partykit ctree: minbucket and case weights
Amber Dawn Nolder wrote 2014-05-28 23:16: Hello, I am an R novice, and I am using the partykit package to create regression trees. I used the following to generate the trees: ctree(y~x1+x2+x3+x4,data=my_data,control=ctree_control(testtype = Bonferroni, mincriterion = 0.90, minsplit = 12, minbucket = 4, majority = TRUE) I thought that minbucket set the minimum value for the sum of weights in each terminal node, and that each case weight is 1, unless otherwise specified. In which case, the sum of case weights in a node should equal the number of cases (n) in that node. However, I sometimes obtain a tree with a terminal node that contains fewer than 4 cases. I do agree that the tree below looks suspicious. You may have found a bug. But you didn't provide commented, minimal, self-contained, reproducible code, i.e., we're missing your 'my_data' object, and therefore we cannot reproduce this easily. Can you please provide us with the output from 'dput(my_data)'? My data set has a total of 36 cases. The dependent and all independent variables are continuous data. Variables x1 and x2 contain missing (NA) values. I tried a few other data sets and there the results seem to come out OK (even after inducing NAs). Could someone please explain why I am getting these results? Probably. But you need to provide a reproducible example and the details obtained by 'sessionInfo()'. As per the posting guide, since this is a contributed package you should first contact its maintainer (Torsten Hothorn, CC'd) and only post here if you get no reply. Did you try contacting Torsten? Am I mistaken about the value of case weights or about the use of minbucket to restrict the size of a terminal node? I don't think you're mistaken since '?ctree_control' says that minbucket: the minimum sum of weights in a terminal node. Henric This is an example of the output: Model formula: y ~ x1 + x2 + x3 + x4 Fitted party: [1] root | [2] x4 = 30: 0.927 (n = 17, err = 1.1) | [3] x4 30 | | [4] x2 = 43: 0.472 (n = 8, err = 0.4) | | [5] x2 43 | | | [6] x3 = 0.4: 0.282 (n = 3, err = 0.0) | | | [7] x3 0.4: 0.020 (n = 8, err = 0.0) Number of inner nodes:3 Number of terminal nodes: 4 Many thanks! Amber Nolder Graduate Student Indiana University of Pennsylvania __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Scatter plot selection points
Hi -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- project.org] On Behalf Of Beatriz Sent: Friday, May 30, 2014 9:37 AM To: R Help Subject: [R] Scatter plot selection points Hi all, I'd like to do a scatterplot where some of the values, out of a subset, are plotted differently in color and shape. I've worked around the following code but I don't manage to make it right. Any help greatly appreciated! # My data dd - iris iris$Code - 1:150 # A selection of my data I'd like to plot differently subset - subset(iris, iris$Sepal.Width5) max(iris$Sepal.Width) [1] 4.4 No values out of subset. So I changed threshold. iris$code-iris$Sepal.Width3.5 sel - as.character(subset$Code) # I think the problems start already here :) # Plotting doesn't work plot(iris$Sepal.Length ~ iris$Sepal.Widith, col=ifelse(iris$Code==sel, red, black) pch=ifelse(iris$Code==sel, 17, 1)) Overcomplicated plot(iris$Sepal.Length ~ iris$Sepal.Width, col=c(red, black)[iris$code+1], pch=c(17, 1)[iris$code+1]) Regards Petr __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. Tento e-mail a jakékoliv k němu připojené dokumenty jsou důvěrné a jsou určeny pouze jeho adresátům. Jestliže jste obdržel(a) tento e-mail omylem, informujte laskavě neprodleně jeho odesílatele. Obsah tohoto emailu i s přílohami a jeho kopie vymažte ze svého systému. Nejste-li zamýšleným adresátem tohoto emailu, nejste oprávněni tento email jakkoliv užívat, rozšiřovat, kopírovat či zveřejňovat. Odesílatel e-mailu neodpovídá za eventuální škodu způsobenou modifikacemi či zpožděním přenosu e-mailu. V případě, že je tento e-mail součástí obchodního jednání: - vyhrazuje si odesílatel právo ukončit kdykoliv jednání o uzavření smlouvy, a to z jakéhokoliv důvodu i bez uvedení důvodu. - a obsahuje-li nabídku, je adresát oprávněn nabídku bezodkladně přijmout; Odesílatel tohoto e-mailu (nabídky) vylučuje přijetí nabídky ze strany příjemce s dodatkem či odchylkou. - trvá odesílatel na tom, že příslušná smlouva je uzavřena teprve výslovným dosažením shody na všech jejích náležitostech. - odesílatel tohoto emailu informuje, že není oprávněn uzavírat za společnost žádné smlouvy s výjimkou případů, kdy k tomu byl písemně zmocněn nebo písemně pověřen a takové pověření nebo plná moc byly adresátovi tohoto emailu případně osobě, kterou adresát zastupuje, předloženy nebo jejich existence je adresátovi či osobě jím zastoupené známá. This e-mail and any documents attached to it may be confidential and are intended only for its intended recipients. If you received this e-mail by mistake, please immediately inform its sender. Delete the contents of this e-mail with all attachments and its copies from your system. If you are not the intended recipient of this e-mail, you are not authorized to use, disseminate, copy or disclose this e-mail in any manner. The sender of this e-mail shall not be liable for any possible damage caused by modifications of the e-mail or by delay with transfer of the email. In case that this e-mail forms part of business dealings: - the sender reserves the right to end negotiations about entering into a contract in any time, for any reason, and without stating any reasoning. - if the e-mail contains an offer, the recipient is entitled to immediately accept such offer; The sender of this e-mail (offer) excludes any acceptance of the offer on the part of the recipient containing any amendment or variation. - the sender insists on that the respective contract is concluded only upon an express mutual agreement on all its aspects. - the sender of this e-mail informs that he/she is not authorized to enter into any contracts on behalf of the company except for cases in which he/she is expressly authorized to do so in writing, and such authorization or power of attorney is submitted to the recipient or the person represented by the recipient, or the existence of such authorization is known to the recipient of the person represented by the recipient. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Scatter plot selection points
Hi Ptr, Thanks for your email however, I cannot make the code work. Also, I quite like the ifelse approach. I find it very clean. Cheers On 30/05/2014 15:57, PIKAL Petr wrote: Hi -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- project.org] On Behalf Of Beatriz Sent: Friday, May 30, 2014 9:37 AM To: R Help Subject: [R] Scatter plot selection points Hi all, I'd like to do a scatterplot where some of the values, out of a subset, are plotted differently in color and shape. I've worked around the following code but I don't manage to make it right. Any help greatly appreciated! # My data dd - iris iris$Code - 1:150 # A selection of my data I'd like to plot differently subset - subset(iris, iris$Sepal.Width5) max(iris$Sepal.Width) [1] 4.4 No values out of subset. So I changed threshold. iris$code-iris$Sepal.Width3.5 sel - as.character(subset$Code) # I think the problems start already here :) # Plotting doesn't work plot(iris$Sepal.Length ~ iris$Sepal.Widith, col=ifelse(iris$Code==sel, red, black) pch=ifelse(iris$Code==sel, 17, 1)) Overcomplicated plot(iris$Sepal.Length ~ iris$Sepal.Width, col=c(red, black)[iris$code+1], pch=c(17, 1)[iris$code+1]) Regards Petr __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. Tento e-mail a jakékoliv k němu připojené dokumenty jsou důvěrné a jsou určeny pouze jeho adresátům. Jestliže jste obdržel(a) tento e-mail omylem, informujte laskavě neprodleně jeho odesílatele. Obsah tohoto emailu i s přílohami a jeho kopie vymažte ze svého systému. Nejste-li zamýšleným adresátem tohoto emailu, nejste oprávněni tento email jakkoliv užívat, rozšiřovat, kopírovat či zveřejňovat. Odesílatel e-mailu neodpovídá za eventuální škodu způsobenou modifikacemi či zpožděním přenosu e-mailu. V případě, že je tento e-mail součástí obchodního jednání: - vyhrazuje si odesílatel právo ukončit kdykoliv jednání o uzavření smlouvy, a to z jakéhokoliv důvodu i bez uvedení důvodu. - a obsahuje-li nabídku, je adresát oprávněn nabídku bezodkladně přijmout; Odesílatel tohoto e-mailu (nabídky) vylučuje přijetí nabídky ze strany příjemce s dodatkem či odchylkou. - trvá odesílatel na tom, že příslušná smlouva je uzavřena teprve výslovným dosažením shody na všech jejích náležitostech. - odesílatel tohoto emailu informuje, že není oprávněn uzavírat za společnost žádné smlouvy s výjimkou případů, kdy k tomu byl písemně zmocněn nebo písemně pověřen a takové pověření nebo plná moc byly adresátovi tohoto emailu případně osobě, kterou adresát zastupuje, předloženy nebo jejich existence je adresátovi či osobě jím zastoupené známá. This e-mail and any documents attached to it may be confidential and are intended only for its intended recipients. If you received this e-mail by mistake, please immediately inform its sender. Delete the contents of this e-mail with all attachments and its copies from your system. If you are not the intended recipient of this e-mail, you are not authorized to use, disseminate, copy or disclose this e-mail in any manner. The sender of this e-mail shall not be liable for any possible damage caused by modifications of the e-mail or by delay with transfer of the email. In case that this e-mail forms part of business dealings: - the sender reserves the right to end negotiations about entering into a contract in any time, for any reason, and without stating any reasoning. - if the e-mail contains an offer, the recipient is entitled to immediately accept such offer; The sender of this e-mail (offer) excludes any acceptance of the offer on the part of the recipient containing any amendment or variation. - the sender insists on that the respective contract is concluded only upon an express mutual agreement on all its aspects. - the sender of this e-mail informs that he/she is not authorized to enter into any contracts on behalf of the company except for cases in which he/she is expressly authorized to do so in writing, and such authorization or power of attorney is submitted to the recipient or the person represented by the recipient, or the existence of such authorization is known to the recipient of the person represented by the recipient. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Scatter plot selection points
Hi -Original Message- From: Beatriz [mailto:aguitatie...@hotmail.com] Sent: Friday, May 30, 2014 10:08 AM To: PIKAL Petr; R Help Subject: Re: [R] Scatter plot selection points Hi Ptr, Thanks for your email however, I cannot make the code work. Errors? What code you tried? iris$code - iris$Sepal.Width3.5 plot(iris$Sepal.Length ~ iris$Sepal.Width, col=c(red,black)[iris$code+1], pch=c(17, 1)[iris$code+1]) works for me without any problem. Also, I quite like the ifelse approach. I find it very clean. Yes. I mean complicated is your subset approach. You can use ifelse if you like. plot(iris$Sepal.Length ~ iris$Sepal.Width, col=ifelse(iris$code, black, red), pch= ifelse(iris$code, 1,17)) Regards Petr Cheers On 30/05/2014 15:57, PIKAL Petr wrote: Hi -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- project.org] On Behalf Of Beatriz Sent: Friday, May 30, 2014 9:37 AM To: R Help Subject: [R] Scatter plot selection points Hi all, I'd like to do a scatterplot where some of the values, out of a subset, are plotted differently in color and shape. I've worked around the following code but I don't manage to make it right. Any help greatly appreciated! # My data dd - iris iris$Code - 1:150 # A selection of my data I'd like to plot differently subset - subset(iris, iris$Sepal.Width5) max(iris$Sepal.Width) [1] 4.4 No values out of subset. So I changed threshold. iris$code-iris$Sepal.Width3.5 sel - as.character(subset$Code) # I think the problems start already here :) # Plotting doesn't work plot(iris$Sepal.Length ~ iris$Sepal.Widith, col=ifelse(iris$Code==sel, red, black) pch=ifelse(iris$Code==sel, 17, 1)) Overcomplicated plot(iris$Sepal.Length ~ iris$Sepal.Width, col=c(red, black)[iris$code+1], pch=c(17, 1)[iris$code+1]) Regards Petr __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. Tento e-mail a jakékoliv k němu připojené dokumenty jsou důvěrné a jsou určeny pouze jeho adresátům. Jestliže jste obdržel(a) tento e-mail omylem, informujte laskavě neprodleně jeho odesílatele. Obsah tohoto emailu i s přílohami a jeho kopie vymažte ze svého systému. Nejste-li zamýšleným adresátem tohoto emailu, nejste oprávněni tento email jakkoliv užívat, rozšiřovat, kopírovat či zveřejňovat. Odesílatel e-mailu neodpovídá za eventuální škodu způsobenou modifikacemi či zpožděním přenosu e-mailu. V případě, že je tento e-mail součástí obchodního jednání: - vyhrazuje si odesílatel právo ukončit kdykoliv jednání o uzavření smlouvy, a to z jakéhokoliv důvodu i bez uvedení důvodu. - a obsahuje-li nabídku, je adresát oprávněn nabídku bezodkladně přijmout; Odesílatel tohoto e-mailu (nabídky) vylučuje přijetí nabídky ze strany příjemce s dodatkem či odchylkou. - trvá odesílatel na tom, že příslušná smlouva je uzavřena teprve výslovným dosažením shody na všech jejích náležitostech. - odesílatel tohoto emailu informuje, že není oprávněn uzavírat za společnost žádné smlouvy s výjimkou případů, kdy k tomu byl písemně zmocněn nebo písemně pověřen a takové pověření nebo plná moc byly adresátovi tohoto emailu případně osobě, kterou adresát zastupuje, předloženy nebo jejich existence je adresátovi či osobě jím zastoupené známá. This e-mail and any documents attached to it may be confidential and are intended only for its intended recipients. If you received this e-mail by mistake, please immediately inform its sender. Delete the contents of this e-mail with all attachments and its copies from your system. If you are not the intended recipient of this e-mail, you are not authorized to use, disseminate, copy or disclose this e-mail in any manner. The sender of this e-mail shall not be liable for any possible damage caused by modifications of the e-mail or by delay with transfer of the email. In case that this e-mail forms part of business dealings: - the sender reserves the right to end negotiations about entering into a contract in any time, for any reason, and without stating any reasoning. - if the e-mail contains an offer, the recipient is entitled to immediately accept such offer; The sender of this e-mail (offer) excludes any acceptance of the offer on the part of the recipient containing any amendment or variation. - the sender insists on that the respective contract is concluded only upon an express mutual agreement on all its aspects. - the sender of this e-mail informs that he/she is not authorized to enter into any contracts on behalf of the company except for cases in which he/she is expressly authorized to do so in writing, and such
Re: [R] Multiple regression in R
Hello, lm() is designed to work with data.frames, not with matrices. You can change your code to something like dat - data.frame(price, pred1 = c(5,6,3,4,5), pred2 = c(2,1,8,5,6)) fit - lm(price ~ pred1 + pred2, data = dat) and then use the fitted model to do predictions. You don't have to give the new values in a matrix, you can give them as vectors of a data.frame. predict(fit, data.frame(pred1 = 1:3, pred2 = 3:5)) Hope this helps, Rui Barradas Em 29-05-2014 21:38, Safiye Celik escreveu: I want to perform a multiple regression in R and make predictions based on the trained model. Below is an example code I am using: price = c(10,18,18,11,17) predictors = cbind(c(5,6,3,4,5),c(2,1,8,5,6)) predict(lm(price ~ predictors), data.frame(predictors=matrix(c(3,5),nrow=1))) So, based on the 2-variate regression model trained by 5 samples, I want to make a prediction for the test data point where the first variate is 3 and second variate is 5. But I get a warning from above code saying that 'newdata' had 1 rows but variable(s) found have 5 rows. How can I correct above code? Below code works fine where I give the variables separately to the model formula. But since I will have hundreds of variates, I have to give them in a matrix since it would be unfeasible to append hundreds of columns using + sign. price = c(10,18,18,11,17) predictor1 = c(5,6,3,4,5) predictor2 = c(2,1,8,5,6) predict(lm(price ~ predictor1 + predictor2), data.frame(predictor1=3,predictor2=5)) Thanks in advance! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Scatter plot selection points
Hi Petr, Initially your code didn´t work because 'Code' wasn't in uppercase. It works now! :) The only thing is that I wanted in red the codes 3.5. Optional code: sel - iris[iris$Sepal.Width3.5,Code] plot(iris$Sepal.Length ~ iris$Sepal.Width, col=ifelse(iris$Code %in% sel, red, black), pch=ifelse(iris$Code %in% sel, 17, 1)) Cheers On 30/05/2014 17:38, PIKAL Petr wrote: Hi -Original Message- From: Beatriz [mailto:aguitatie...@hotmail.com] Sent: Friday, May 30, 2014 10:08 AM To: PIKAL Petr; R Help Subject: Re: [R] Scatter plot selection points Hi Ptr, Thanks for your email however, I cannot make the code work. Errors? What code you tried? iris$code - iris$Sepal.Width3.5 plot(iris$Sepal.Length ~ iris$Sepal.Width, col=c(red,black)[iris$code+1], pch=c(17, 1)[iris$code+1]) works for me without any problem. Also, I quite like the ifelse approach. I find it very clean. Yes. I mean complicated is your subset approach. You can use ifelse if you like. plot(iris$Sepal.Length ~ iris$Sepal.Width, col=ifelse(iris$code, black, red), pch= ifelse(iris$code, 1,17)) Regards Petr Cheers On 30/05/2014 15:57, PIKAL Petr wrote: Hi -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- project.org] On Behalf Of Beatriz Sent: Friday, May 30, 2014 9:37 AM To: R Help Subject: [R] Scatter plot selection points Hi all, I'd like to do a scatterplot where some of the values, out of a subset, are plotted differently in color and shape. I've worked around the following code but I don't manage to make it right. Any help greatly appreciated! # My data dd - iris iris$Code - 1:150 # A selection of my data I'd like to plot differently subset - subset(iris, iris$Sepal.Width5) max(iris$Sepal.Width) [1] 4.4 No values out of subset. So I changed threshold. iris$code-iris$Sepal.Width3.5 sel - as.character(subset$Code) # I think the problems start already here :) # Plotting doesn't work plot(iris$Sepal.Length ~ iris$Sepal.Widith, col=ifelse(iris$Code==sel, red, black) pch=ifelse(iris$Code==sel, 17, 1)) Overcomplicated plot(iris$Sepal.Length ~ iris$Sepal.Width, col=c(red, black)[iris$code+1], pch=c(17, 1)[iris$code+1]) Regards Petr __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. Tento e-mail a jakékoliv k němu připojené dokumenty jsou důvěrné a jsou určeny pouze jeho adresátům. Jestliže jste obdržel(a) tento e-mail omylem, informujte laskavě neprodleně jeho odesílatele. Obsah tohoto emailu i s přílohami a jeho kopie vymažte ze svého systému. Nejste-li zamýšleným adresátem tohoto emailu, nejste oprávněni tento email jakkoliv užívat, rozšiřovat, kopírovat či zveřejňovat. Odesílatel e-mailu neodpovídá za eventuální škodu způsobenou modifikacemi či zpožděním přenosu e-mailu. V případě, že je tento e-mail součástí obchodního jednání: - vyhrazuje si odesílatel právo ukončit kdykoliv jednání o uzavření smlouvy, a to z jakéhokoliv důvodu i bez uvedení důvodu. - a obsahuje-li nabídku, je adresát oprávněn nabídku bezodkladně přijmout; Odesílatel tohoto e-mailu (nabídky) vylučuje přijetí nabídky ze strany příjemce s dodatkem či odchylkou. - trvá odesílatel na tom, že příslušná smlouva je uzavřena teprve výslovným dosažením shody na všech jejích náležitostech. - odesílatel tohoto emailu informuje, že není oprávněn uzavírat za společnost žádné smlouvy s výjimkou případů, kdy k tomu byl písemně zmocněn nebo písemně pověřen a takové pověření nebo plná moc byly adresátovi tohoto emailu případně osobě, kterou adresát zastupuje, předloženy nebo jejich existence je adresátovi či osobě jím zastoupené známá. This e-mail and any documents attached to it may be confidential and are intended only for its intended recipients. If you received this e-mail by mistake, please immediately inform its sender. Delete the contents of this e-mail with all attachments and its copies from your system. If you are not the intended recipient of this e-mail, you are not authorized to use, disseminate, copy or disclose this e-mail in any manner. The sender of this e-mail shall not be liable for any possible damage caused by modifications of the e-mail or by delay with transfer of the email. In case that this e-mail forms part of business dealings: - the sender reserves the right to end negotiations about entering into a contract in any time, for any reason, and without stating any reasoning. - if the e-mail contains an offer, the recipient is entitled to immediately accept such offer; The sender of this e-mail (offer) excludes any acceptance of the offer on the part of the recipient containing any amendment or variation. - the sender insists on that the respective contract is concluded only upon an express
Re: [R] Scatter plot selection points
Hi My code worked. Your code did not work because you was not aware that R distinguish case of letters :) -Original Message- From: Beatriz [mailto:aguitatie...@hotmail.com] Sent: Friday, May 30, 2014 12:21 PM To: PIKAL Petr; R Help Subject: Re: [R] Scatter plot selection points Hi Petr, Initially your code didn´t work because 'Code' wasn't in uppercase. It works now! :) The only thing is that I wanted in red the codes 3.5. If you insist on separate object for colouring purpose you can have it. sel - iris$Sepal.Width3.5 plot(iris$Sepal.Length ~ iris$Sepal.Width, col=c(red,black)[sel+1], pch=c(17, 1)[sel+1]) I still consider my code simpler and easier to read and understand. sel is logical TRUE/FALSE which can be used as 1/0 in calculations. This c(red,black)[sel+1] selects red if sel is FALSE and and black if sel is TRUE the same applies to selection of pch. If you want to change colouring just swap the red/black c(black, red)[sel+1] Regards Petr Optional code: sel - iris[iris$Sepal.Width3.5,Code] plot(iris$Sepal.Length ~ iris$Sepal.Width, col=ifelse(iris$Code %in% sel, red, black), pch=ifelse(iris$Code %in% sel, 17, 1)) Cheers On 30/05/2014 17:38, PIKAL Petr wrote: Hi -Original Message- From: Beatriz [mailto:aguitatie...@hotmail.com] Sent: Friday, May 30, 2014 10:08 AM To: PIKAL Petr; R Help Subject: Re: [R] Scatter plot selection points Hi Ptr, Thanks for your email however, I cannot make the code work. Errors? What code you tried? iris$code - iris$Sepal.Width3.5 plot(iris$Sepal.Length ~ iris$Sepal.Width, col=c(red,black)[iris$code+1], pch=c(17, 1)[iris$code+1]) works for me without any problem. Also, I quite like the ifelse approach. I find it very clean. Yes. I mean complicated is your subset approach. You can use ifelse if you like. plot(iris$Sepal.Length ~ iris$Sepal.Width, col=ifelse(iris$code, black, red), pch= ifelse(iris$code, 1,17)) Regards Petr Cheers On 30/05/2014 15:57, PIKAL Petr wrote: Hi -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- project.org] On Behalf Of Beatriz Sent: Friday, May 30, 2014 9:37 AM To: R Help Subject: [R] Scatter plot selection points Hi all, I'd like to do a scatterplot where some of the values, out of a subset, are plotted differently in color and shape. I've worked around the following code but I don't manage to make it right. Any help greatly appreciated! # My data dd - iris iris$Code - 1:150 # A selection of my data I'd like to plot differently subset - subset(iris, iris$Sepal.Width5) max(iris$Sepal.Width) [1] 4.4 No values out of subset. So I changed threshold. iris$code-iris$Sepal.Width3.5 sel - as.character(subset$Code) # I think the problems start already here :) # Plotting doesn't work plot(iris$Sepal.Length ~ iris$Sepal.Widith, col=ifelse(iris$Code==sel, red, black) pch=ifelse(iris$Code==sel, 17, 1)) Overcomplicated plot(iris$Sepal.Length ~ iris$Sepal.Width, col=c(red, black)[iris$code+1], pch=c(17, 1)[iris$code+1]) Regards Petr __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. Tento e-mail a jakékoliv k němu připojené dokumenty jsou důvěrné a jsou určeny pouze jeho adresátům. Jestliže jste obdržel(a) tento e-mail omylem, informujte laskavě neprodleně jeho odesílatele. Obsah tohoto emailu i s přílohami a jeho kopie vymažte ze svého systému. Nejste-li zamýšleným adresátem tohoto emailu, nejste oprávněni tento email jakkoliv užívat, rozšiřovat, kopírovat či zveřejňovat. Odesílatel e-mailu neodpovídá za eventuální škodu způsobenou modifikacemi či zpožděním přenosu e-mailu. V případě, že je tento e-mail součástí obchodního jednání: - vyhrazuje si odesílatel právo ukončit kdykoliv jednání o uzavření smlouvy, a to z jakéhokoliv důvodu i bez uvedení důvodu. - a obsahuje-li nabídku, je adresát oprávněn nabídku bezodkladně přijmout; Odesílatel tohoto e-mailu (nabídky) vylučuje přijetí nabídky ze strany příjemce s dodatkem či odchylkou. - trvá odesílatel na tom, že příslušná smlouva je uzavřena teprve výslovným dosažením shody na všech jejích náležitostech. - odesílatel tohoto emailu informuje, že není oprávněn uzavírat za společnost žádné smlouvy s výjimkou případů, kdy k tomu byl písemně zmocněn nebo písemně pověřen a takové pověření nebo plná moc byly adresátovi tohoto emailu případně osobě, kterou adresát zastupuje, předloženy nebo jejich existence je adresátovi či osobě jím zastoupené známá. This e-mail and any documents attached to it may be confidential and are intended only for its
Re: [R] Smoothed HR for interaction term in coxph model
Please include example data in the future. Perhaps the following is useful. (1) Your model is redundant. The * produces both main effects and the interaction. So I removed the main effects from your call (2) For my simulated data, the df=0 option chose a model that resulted in a singular fit. I selected a smoother spline (df=2). (3) the two plots at the end show (1) the risk (exp(linear predictor)) for combinations of CONTINUOUS and DICHOTOMOUS and (2) a ratio (risk for A vs risk for B), which I think is what you wanted. Chris library(survival) set.seed(20140530) nn - 1000 datanew2 - data.frame(my.surv = Surv(rexp(nn)), DICHOTOMOUS=factor(rep(c(A,B), nn/2)), CONTINUOUS=rnorm(nn)) #surv.fit - coxph(my.surv ~ pspline(CONTINUOUS, df=0) + factor(DICHOTOMOUS) + pspline(CONTINUOUS, df=0) * factor(DICHOTOMOUS), data=datanew2) #surv.fit - coxph(my.surv ~ pspline(CONTINUOUS, df=0) * factor(DICHOTOMOUS), data=datanew2) surv.fit - coxph(my.surv ~ pspline(CONTINUOUS, df=2) * factor(DICHOTOMOUS), data=datanew2) surv.fit xseq - seq(-3, 3, length=100) predictions - matrix(predict(surv.fit, newdata=expand.grid(CONTINUOUS=xseq, DICHOTOMOUS=factor(c(A,B))), type=risk), ncol=2) matplot(predictions, type=l) plot(xseq, predictions[,1]/predictions[,2], type=l, ylab=Hazard Ratio of Event (A vs B), xlab=CONTINUOUS) -Original Message- From: Lynn Dunsire [mailto:l...@contrastconsultancy.com] Sent: Thursday, May 29, 2014 6:03 AM To: r-help@r-project.org Subject: [R] Smoothed HR for interaction term in coxph model Hello R-help members, I have a dataset with 2 treatments and want to assess the effect of a continous covariate on the Hazard ratio between treatment A and B. I want a smoothed interaction term which I have modelled below with the following code: surv.fit - coxph(my.surv ~ pspline(CONTINUOUS, df=0) + factor(DICHOTOMOUS) + pspline(CONTINUOUS, df=0)*factor(DICHOTOMOUS), data = datanew2) and consequently I would like to obtain a smoothed plot of the hazard ratio between treatment A and B on the y-axis with the continuous covariate on the x-axis. As termplot ignores interaction terms, I was wondering if anyone has seen anything like this before and can advise on the best way to do it. Many thanks in advance for any help that you can offer, Kind regards, Lynn [[alternative HTML version deleted]] ** Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] uGARCHspec function
Hello, I am trying to re-estimate parameters and standard errors in a mean regression equation by simultaneously running a GARCH (1,1) variance equation. This should be relatively straightforward, but I cannot for the life of me get it to work. This has taken up several weeks of my life already. My mean equation is this: dlm2A1A-lm(A1.0-rSP_RI.0~rSP_RI.0+inter.0+interdev.1+inter.1+interdev.2+inter.2+interdev.3+inter.3+interdev.4 I'm fairly sure I need to use the uGARCHspec function and have adpoted this: ugarchspec(variance.model = list(model = sGARCH, garchOrder = c(1, 1), submodel = NULL, external.regressors = NULL, variance.targeting = FALSE), mean.model = list(armaOrder = c(1, 1), include.mean = TRUE, archm = TRUE, archpow = 1, arfima = FALSE, external.regressors = NULL, archex = FALSE), distribution.model = norm, start.pars = list(), fixed.pars = list()) How to correctly input the mean equation in the 'external.regressors=' parameter is beyond me. Also, I don't know where my actual data fits in here as this is just a framework without any identification of the data. Can anyone help? Drew [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Difference in coefficients in Cox proportional hazard estimates between R and Stata, why?
Dear R users, Hi, thank you so much for your help in advance. I have been using Stata but new to R. For my paper revision using Aalen's survival analysis, I need to use R, as the command including Aalen's survival seems to be available in R (32-bit, version 3.1.0 (2014-04-10)) but less ready to be used in Stata (version 13/SE). To make sure that I can do basics, I have fitted logistic regression and Cox proportional hazard regression using R and Stata. The data I used were from UCLA R's textbook example page: http://www.ats.ucla.edu/stat/r/examples/asa/asa_ch1_r.htm. http://www.ats.ucla.edu/stat/r/examples/asa/asa_ch1_r.htm. I used this in Stata too. When I fitted logistic regression as below, the estimates were exactly same between R and Stata. Example using logistic regression R: logistic1 - glm(censor ~ age + drug, data=, family = binomial) summary(logistic1) exp(cbind(OR=coef(logistic1), confint(logistic1))) OR 2.5 %97.5 % (Intercept) 1.0373731 0.06358296 16.797896 age 1.0436805 0.96801933 1.131233 drug0.7192149 0.26042635 1.937502 Stata: logistic censor age i.drug OR CI_lower CI_upper age | 1.043681 .96623881.127329 drug |.719215 .26651941.940835 _cons | 1.037373 .065847 16.3431 However, when I fitted Cox proportional hazard regression, there were some discrepancies in coefficient (and exponentiated hazard ratios). Example using Cox proportioanl hazard regression R: cox1 - coxph(Surv(time, censor) ~ drug, age, data=) summary(cox1) Call: coxph(formula = Surv(time, censor) ~ drug + age, data = ) n= 100, number of events= 80 coef exp(coef) se(coef) z Pr(|z|) drug 1.01670 2.76405 0.25622 3.968 7.24e-05 *** age 0.09714 1.10202 0.01864 5.211 1.87e-07 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 exp(coef) exp(-coef) lower .95 upper .95 drug 2.764 0.3618 1.673 4.567 age 1.102 0.9074 1.062 1.143 Concordance= 0.711 (se = 0.042 ) Rsquare= 0.324 (max possible= 0.997 ) Likelihood ratio test= 39.13 on 2 df, p=3.182e-09 Wald test= 36.13 on 2 df, p=1.431e-08 Score (logrank) test = 38.39 on 2 df, p=4.602e-09 Stata: stset time, f(censor) stcox drug age -- _t | Haz. Ratio Std. Err. zP|z| [95% Conf. Interval] -+ drug | 2.563531 .6550089 3.68 0.000 1.553634.229893 age | 1.095852 .02026 4.95 0.000 1.0568541.136289 -- The HR estimates for drug was 2.76 from R, but 2.56 from Stata. I searched in internet for explanation, but could not find any. In parametric survival regression with exponential distribution, R and Stata's coefficients were completely opposite while the values were exactly same (i.e. say 0.08 for Stata and -0.08 for R). I suspected something like this (http://www.theanalysisfactor.com/ordinal-logistic-regression-mystery/) going on, but for Cox proportional hazard regression, i coudl not find any resource helping me. I highly appreciate if anyone could explain this for me, or suggest me resource that I can read. Thank you so much for your help. Best, Ayako [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problem with sample session
On May 29, 2014 9:45 PM, Stephen Meskin actu...@umbc.edu wrote: Thanks Greg for your response. Is there a work around? A work around for what? Of course this begs the question as to Why is attach part of the sample session in App. A of the introductory manual? Because people find it convenient. All the commands are directly from App. A. Is it possible the configuration of R on my computer is not in accord with acceptable practice? I.e. Could my configuration be set so that attach works as App. A intends? As far as I can see attach is working exactly as intended. What is the perceived problem? If not then App. A needs to be changed to replace attach. If so, then App. A needs to provide instructing on appropriate configuration of R for newbies. I personally agree that the demonstration of the attach function should be removed from the manual, but you've stated the case much too strongly. No configuration is required, and the attach example is working as intended. Best, Ista Stephen Meskin Sent from my iPad On May 29, 2014, at 1:06 PM, Greg Snow 538...@gmail.com wrote: This is a warning and in your case would not be a problem, but it is good to think about and the reason why it is suggested that you avoid using attach and be very careful when you do use attach. What is happening is that you first created a vector named 'x' in your global workspace, you then create a data frame that contains a column that is a copy of 'x' that is also named 'x' and the data frame also has another column named 'y'. You then later attach the data frame to the search list (if you run the 'search()' command you will see your search list). This is convenient in that you can now access 'y' by typing its name instead of something like 'dummy$y', but what happens if you just type x? The issue is that there are 2 objects on your search path with that same name. For your example it will not matter much because they have the same value, but what if you run a command like 'x - 3', now you will see a single value instead of a vector of length 20 which can lead to hard to find errors. This is why R tries to be helpful by warning you that there are multiple objects named 'x' and therefore you may not be accessing the one that you think. If you use attach without being careful it is possible to plot (or regress or ...) one variable from one dataset against another variable from a completely unrelated dataset and end up with meaningless results. So, if you use attach, be careful. You may also want to look at the followng functions for help with dealing with these issues: conflicts, find, get, with, within On Wed, May 28, 2014 at 11:55 PM, Stephen Meskin actu...@umbc.edu wrote: While following the suggestion in the manual An Introduction to R to begin with Appendix A, I ran into the problem shown below about 3/4 of the way down the 1st page of App. A. After using the function /attach/, I did not get visible columns in the data frame as indicated but the rather puzzling message emphasized below. I am running R version 3.1.0 (2014-04-10) using Windows XP. Thanks in advance for your help. x-1:20 w-1+sqrt(x)/2 dummy-data.frame(x=x, y=x+rnorm(x)*w) dummy x y 1 1 2.885347 ... fm- lm(y ~ x, data=dummy) summary(fm) Call: ... fm1- lm(y ~ x, data=dummy,weight=1/w^2) summary(fm1) Call: ... attach(dummy) *_The following object is masked _by_ .GlobalEnv: x_**_ _* -- /Stephen A Meskin/, PhD, FSA, MAAA Adjunct Assistant Professor of Mathematics, UMBC **// [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Gregory (Greg) L. Snow Ph.D. 538...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] using model coefficients on a new data set
Hello again R People: I fit an ARIMA model on a particular data set with x_1, ,x_n. Point x_(n+1) becomes available. I now want to produce a forecast without updating the model. Is there a way to do that within R, or do I need to write my own function, please? Thanks! Sincerely, erin -- Erin Hodgess Associate Professor Department of Mathematical and Statistics University of Houston - Downtown mailto: erinm.hodg...@gmail.com [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] rJava fail
R version 3.1.0 (2014-04-10) -- Spring Dance Copyright (C) 2014 The R Foundation for Statistical Computing Platform: i386-w64-mingw32/i386 (32-bit) library(rJava) Error : .onLoad failed in loadNamespace() for 'rJava', details: call: dirname(this$RuntimeLib) error: a character vector argument expected Error: package or namespace load failed for 'rJava' Things used to work on R 3.0.1 but suddenly stopped. I installed the new R and new packages. Then started downgrading Java. Went from Java 7 to Java 6 update 16 and still no luck. Please, advise which Java I need and if any paths need to be modified. Thank you. Stephen B [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Computer requirements to run R on huge datasets
Dear R users, I am writing to ask your advice with regard to the computer requirements (RAM, architecture, processor, hard drive) in order to run R smoothly on large datasets. I will be running commands with many bootstrap replications (2000) on the datasets of 10 firms. Thank you in advance for your suggestions. Best regards, Magdalena [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Computer requirements to run R on huge datasets
You have given information related to the number of rows that will be involved, but have offered nothing about the number of columns. That is okay though... you should attempt your algorithms on progressively larger datasets to gauge how your problem scales and use your operating system to observe how much memory is involved and extrapolate. You can also rent time on cloud servers such as Amazon offers. Any minimum number we tell you could turn out to be insufficient when you start exploring your large data sets... it is better for you to make your own estimate and safety margin so you don't blame us when it turns out to run slowly or choke completely. Also, please stop posting in HTML format as requested by the Posting Guide. --- Jeff NewmillerThe . . Go Live... DCN:jdnew...@dcn.davis.ca.usBasics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/BatteriesO.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --- Sent from my phone. Please excuse my brevity. On May 30, 2014 6:15:53 AM PDT, Magdalena Kapelko magdalena.kape...@gmail.com wrote: Dear R users, I am writing to ask your advice with regard to the computer requirements (RAM, architecture, processor, hard drive) in order to run R smoothly on large datasets. I will be running commands with many bootstrap replications (2000) on the datasets of 10 firms. Thank you in advance for your suggestions. Best regards, Magdalena [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Difference in coefficients in Cox proportional hazard estimates between R and Stata, why?
In the Cox regression case, the probable explanation is that you have ties in your data; Stata and coxph may have different defaults for handling ties. Read the manuals! The difference in sign in the other cases is simply due to different definitions of the models. I am sure it is well documented in relevant manuals. Göran On 2014-05-30 13:37, Hiyoshi, Ayako wrote: Dear R users, Hi, thank you so much for your help in advance. I have been using Stata but new to R. For my paper revision using Aalen's survival analysis, I need to use R, as the command including Aalen's survival seems to be available in R (32-bit, version 3.1.0 (2014-04-10)) but less ready to be used in Stata (version 13/SE). To make sure that I can do basics, I have fitted logistic regression and Cox proportional hazard regression using R and Stata. The data I used were from UCLA R's textbook example page: http://www.ats.ucla.edu/stat/r/examples/asa/asa_ch1_r.htm. http://www.ats.ucla.edu/stat/r/examples/asa/asa_ch1_r.htm. I used this in Stata too. When I fitted logistic regression as below, the estimates were exactly same between R and Stata. Example using logistic regression R: logistic1 - glm(censor ~ age + drug, data=, family = binomial) summary(logistic1) exp(cbind(OR=coef(logistic1), confint(logistic1))) OR 2.5 %97.5 % (Intercept) 1.0373731 0.06358296 16.797896 age 1.0436805 0.96801933 1.131233 drug0.7192149 0.26042635 1.937502 Stata: logistic censor age i.drug OR CI_lower CI_upper age | 1.043681 .96623881.127329 drug |.719215 .2665194 1.940835 _cons | 1.037373 .065847 16.3431 However, when I fitted Cox proportional hazard regression, there were some discrepancies in coefficient (and exponentiated hazard ratios). Example using Cox proportioanl hazard regression R: cox1 - coxph(Surv(time, censor) ~ drug, age, data=) summary(cox1) Call: coxph(formula = Surv(time, censor) ~ drug + age, data = ) n= 100, number of events= 80 coef exp(coef) se(coef) z Pr(|z|) drug 1.01670 2.76405 0.25622 3.968 7.24e-05 *** age 0.09714 1.10202 0.01864 5.211 1.87e-07 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 exp(coef) exp(-coef) lower .95 upper .95 drug 2.764 0.3618 1.673 4.567 age 1.102 0.9074 1.062 1.143 Concordance= 0.711 (se = 0.042 ) Rsquare= 0.324 (max possible= 0.997 ) Likelihood ratio test= 39.13 on 2 df, p=3.182e-09 Wald test= 36.13 on 2 df, p=1.431e-08 Score (logrank) test = 38.39 on 2 df, p=4.602e-09 Stata: stset time, f(censor) stcox drug age -- _t | Haz. Ratio Std. Err. zP|z| [95% Conf. Interval] -+ drug | 2.563531 .6550089 3.68 0.000 1.553634.229893 age | 1.095852 .02026 4.95 0.000 1.056854 1.136289 -- The HR estimates for drug was 2.76 from R, but 2.56 from Stata. I searched in internet for explanation, but could not find any. In parametric survival regression with exponential distribution, R and Stata's coefficients were completely opposite while the values were exactly same (i.e. say 0.08 for Stata and -0.08 for R). I suspected something like this (http://www.theanalysisfactor.com/ordinal-logistic-regression-mystery/) going on, but for Cox proportional hazard regression, i coudl not find any resource helping me. I highly appreciate if anyone could explain this for me, or suggest me resource that I can read. Thank you so much for your help. Best, Ayako [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] accessing C code from a base package within a function
Hello yet again. I have written a small function which calls a couple of the C programs from the stats base package. It's actually a modification of the arima function. However, when I try to run it, it says that the C program is not found. Any suggestions would be much appreciated. Windows 7, R version 3.0.2 Thanks, Erin -- Erin Hodgess Associate Professor Department of Mathematical and Statistics University of Houston - Downtown mailto: erinm.hodg...@gmail.com [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] rJava fail
On May 30, 2014, at 9:55 AM, Bond, Stephen stephen.b...@cibc.com wrote: R version 3.1.0 (2014-04-10) -- Spring Dance Copyright (C) 2014 The R Foundation for Statistical Computing Platform: i386-w64-mingw32/i386 (32-bit) library(rJava) Error : .onLoad failed in loadNamespace() for 'rJava', details: call: dirname(this$RuntimeLib) error: a character vector argument expected Error: package or namespace load failed for 'rJava' Things used to work on R 3.0.1 but suddenly stopped. I installed the new R and new packages. Then started downgrading Java. Went from Java 7 to Java 6 update 16 and still no luck. Please, advise which Java I need and if any paths need to be modified. Please make sure that your Java architecture matches your R architecture and then re-install the matching Java (i.e. both have to be 32-bit or both have to be 64-bit - you cannot mix/match). It seems that there is a problem with your Java registry entries. The version is irrelevant - any Java version 1.4 or higher should work. Please direct further questions to the stats-rosuda-devel mailing list for rJava. Thanks, Simon __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problem with sample session
If you pay attention and are careful not to use any variables names that conflict then you do not need a work around (and the conflicts function can help you see if there are any conflicts that you may need to worry about). Probably the best work around is to use the with or within function instead of attaching. For a couple of quick commands these work great and I prefer them to using attach. But, sometimes for a long sequence of commands attach is much more convenient and is fine to use as long as you recognize the potential dangers and are careful. On Thu, May 29, 2014 at 3:56 PM, Stephen Meskin actu...@umbc.edu wrote: Thanks Greg for your response. Is there a work around? Of course this begs the question as to Why is attach part of the sample session in App. A of the introductory manual? All the commands are directly from App. A. Is it possible the configuration of R on my computer is not in accord with acceptable practice? I.e. Could my configuration be set so that attach works as App. A intends? If not then App. A needs to be changed to replace attach. If so, then App. A needs to provide instructing on appropriate configuration of R for newbies. Stephen Meskin Sent from my iPad On May 29, 2014, at 1:06 PM, Greg Snow 538...@gmail.com wrote: This is a warning and in your case would not be a problem, but it is good to think about and the reason why it is suggested that you avoid using attach and be very careful when you do use attach. What is happening is that you first created a vector named 'x' in your global workspace, you then create a data frame that contains a column that is a copy of 'x' that is also named 'x' and the data frame also has another column named 'y'. You then later attach the data frame to the search list (if you run the 'search()' command you will see your search list). This is convenient in that you can now access 'y' by typing its name instead of something like 'dummy$y', but what happens if you just type x? The issue is that there are 2 objects on your search path with that same name. For your example it will not matter much because they have the same value, but what if you run a command like 'x - 3', now you will see a single value instead of a vector of length 20 which can lead to hard to find errors. This is why R tries to be helpful by warning you that there are multiple objects named 'x' and therefore you may not be accessing the one that you think. If you use attach without being careful it is possible to plot (or regress or ...) one variable from one dataset against another variable from a completely unrelated dataset and end up with meaningless results. So, if you use attach, be careful. You may also want to look at the followng functions for help with dealing with these issues: conflicts, find, get, with, within On Wed, May 28, 2014 at 11:55 PM, Stephen Meskin actu...@umbc.edu wrote: While following the suggestion in the manual An Introduction to R to begin with Appendix A, I ran into the problem shown below about 3/4 of the way down the 1st page of App. A. After using the function /attach/, I did not get visible columns in the data frame as indicated but the rather puzzling message emphasized below. I am running R version 3.1.0 (2014-04-10) using Windows XP. Thanks in advance for your help. x-1:20 w-1+sqrt(x)/2 dummy-data.frame(x=x, y=x+rnorm(x)*w) dummy x y 1 1 2.885347 ... fm- lm(y ~ x, data=dummy) summary(fm) Call: ... fm1- lm(y ~ x, data=dummy,weight=1/w^2) summary(fm1) Call: ... attach(dummy) *_The following object is masked _by_ .GlobalEnv: x_**_ _* -- /Stephen A Meskin/, PhD, FSA, MAAA Adjunct Assistant Professor of Mathematics, UMBC **// [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Gregory (Greg) L. Snow Ph.D. 538...@gmail.com -- Gregory (Greg) L. Snow Ph.D. 538...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Gui for R-Script
Hi, I'm just looking into creating a GUI for my R-Script. Is it possible to create a gui for the script and send it to somebody? Maybe as a .exe for example Thanks -- Shane [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Gui for R-Script
There are several options for creating GUIs depending on how much control you want and how much work you are willing to put in. One simple option is the tkexamp function in the TeachingDemos package. This approach would require whoever receives your script to have R running, but then they could just run your script and have a GUI to change parameters and see the results. Another option is shiny from the Rstudio developers. This has 2 options, you can either send a script that the user would then run on their own machine (would need R and packages installed) or you can set up a server and send the user the URL, R would only be installed on the server and the user would only need web access. I don't know of any options that would produce an .exe, since any R code does need access to an implementation of R and it seems a bit of overkill to package all of R with each sample script that you want to run. On Fri, May 30, 2014 at 10:48 AM, Shane Carey careys...@gmail.com wrote: Hi, I'm just looking into creating a GUI for my R-Script. Is it possible to create a gui for the script and send it to somebody? Maybe as a .exe for example Thanks -- Shane [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Gregory (Greg) L. Snow Ph.D. 538...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Gui for R-Script
Possible, yes, anything is possible, but it your goal is to easily hide R from the users then you will probably not find the project worth the effort required. --- Jeff NewmillerThe . . Go Live... DCN:jdnew...@dcn.davis.ca.usBasics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/BatteriesO.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --- Sent from my phone. Please excuse my brevity. On May 30, 2014 9:48:46 AM PDT, Shane Carey careys...@gmail.com wrote: Hi, I'm just looking into creating a GUI for my R-Script. Is it possible to create a gui for the script and send it to somebody? Maybe as a .exe for example Thanks __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] missing return
hi R user, i have a problem about missing data while i calculate return in times series. It means my time series have a lot of code of stocks, i arrange them as type of panel data. I don't look for any solution. For example: CODE DATE RETURN A 2008 NA A 2009 0.25 A 2010 0.4 A 2011 0.3 B 2008 NA B 2009 0.35 B 2010 0.15 B 2011 0.20 Please give me some idea to solve this problem and start step 2: run time series to demonstrate market's efficient. Thanks! -- View this message in context: http://r.789695.n4.nabble.com/missing-return-tp4691489.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] EpiX Analytics - Quantitative Risk Analysis with R Course
There are still a few places available to attend the following course: Quantitative Risk Analysis with R Fort Collins, Colorado, USA June 16-19, 2014 This 4-day course will cover the core principles of quantitative risk analysis and the most important risk modeling principles and techniques. The course will be taught using the R statistical language but the lessons apply equally well to other modeling environments. The focus of the course is on how to conduct accurate and effective quantitative risk analyses, including best practices of risk modeling, selecting the appropriate distribution, using data and expert opinion, and avoiding common mistakes. The course will also cover essential probability and statistics theory and various stochastic processes to provide the participants with a solid understanding of quantitative risk analysis. For additional information and to register please visit our website at: http://www.epixanalytics.com/Quantitative-Risk-Analysis-with-R.html To register by phone or for any questions please contact: Barbara O'Neill - bone...@epixanalytics.com Ph: 1-303-440-8524 EpiX Analytics 1643 Spruce Street, Boulder, CO, 80302, USA www.EpiXAnalytics.com|bone...@epixanalytics.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] RODBC and PosgreSQL problems
Dear All, I am trying for the first time to run SQL queries against a remote PostgreSQL database via RODBC. I am able to establish a connection just fine, as shown by getting results back from the sqlTables(), sqlColumns() and sqlPrimary Key() functions in RODBC. However, when I try to run a SQL query using the sqlQuery() function I get [1] 42P01 7 ERROR: relation \tblceramicware\ does not exist;\nError while executing the query [2] [RODBC] ERROR: Could not SQLExecDirect '\n SELECT * \n FROM tblCeramicWare What am I doing wrong? Here are the relevant snips from the R console. What's puzzling is that tblcermicWare is recognized as an argument to sqlColumns() and sqlPrimaryKey() . But NOT in sqlQuery() . Thanks for any pointers. best, Fraser library(RODBC) # connect to DAACS and assign a name (DAACSch) to the connection DRCch - odbcConnect(postgreSQL35W , case= nochange, uid =XX,pwd=XX); #list the tables that are avalailabale sqlTables(DRCch, tableType = TABLE) TABLE_QUALIFIER TABLE_OWNER TABLE_NAME TABLE_TYPE REMARKS 1 daacs-production public TempSTPTable TABLE 2 daacs-production public activities TABLE 3 daacs-production public articles TABLE 4 daacs-production publicschema_migrations TABLE 5 daacs-production publictblACDistance TABLE 6 daacs-production public tblArtifactBox TABLE 7 daacs-production public tblArtifactImage TABLE 8 daacs-production publictblBasicColor TABLE 9 daacs-production public tblBead TABLE sqlColumns(DRCch, tblCeramicWare) TABLE_QUALIFIER TABLE_OWNER TABLE_NAME COLUMN_NAME DATA_TYPE TYPE_NAME PRECISION LENGTH SCALE RADIX NULLABLE 1 daacs-production public tblCeramicWare WareID 4 int4 10 4 0100 2 daacs-production public tblCeramicWareWare-9 varchar 50100NANA1 REMARKS COLUMN_DEF SQL_DATA_TYPE SQL_DATETIME_SUB CHAR_OCTET_LENGTH ORDINAL_POSITION 1 nextval('global_id_seq'::regclass) 4 NA -11 2 NA-9 NA 1002 IS_NULLABLE DISPLAY_SIZE FIELD_TYPE AUTO_INCREMENT PHYSICAL NUMBER TABLE OID BASE TYPEID TYPMOD 1NA 11 23 1 1 27441 0 -1 2NA 50 1043 0 2 27441 0 50 sqlPrimaryKeys(DRCch, tblCeramicWare) TABLE_QUALIFIER TABLE_OWNER TABLE_NAME COLUMN_NAME KEY_SEQ PK_NAME 1 daacs-production public tblCeramicWare WareID 1 tblCeramicWare_pkey sqlQuery(DRCch,paste( + SELECT * + FROM tblCeramicWare + )) [1] 42P01 7 ERROR: relation \tblceramicware\ does not exist;\nError while executing the query [2] [RODBC] ERROR: Could not SQLExecDirect '\n SELECT * \n FROM tblCeramicWare \n ' Fraser D. Neiman Department of Archaeology, Monticello (434) 984 9812 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] missing return
-- View this message in context: http://r.789695.n4.nabble.com/missing-return-tp4691489p4691490.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] missing return
-- View this message in context: http://r.789695.n4.nabble.com/missing-return-tp4691489p4691493.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problem with sample session
Greg, Ista, (or anyone else), Let me take one last run at this problem. Consider the following extract from the Appendix A text: x - 1:20 w - 1+sqrt(x)/2 dummy - data.frame(x=x, y=x+rnorm(x)*w) fm - lm(y~x, data=dummy) fm1 - lm(y~x, data=dummy, weight=1/w^2) attach(dummy) /Make the columns in the data frame visible as variables. / The following object is masked_by_.GlobalEnv: x In the above I have included only one comment, Make the columns ... visible as variables. from the text and only one response, The following object is masked by ... : x. Stuff I don't understand: 1. The purpose of attach seems to be to make x and y visible but I can already see them by entering the command dummy even after the warning. So what does attach do? 2. What I would like to see is a table with 1st column x; 2nd column y; 3rd and 4th columns predicted ys from fm and fm1; plus possibly columns of residuals and other stuff. Such tables don't seem to be available according to the discussion in ?lm. 3. The warning about attach seems to say that there is an x in the Global Environment that will mask the x that I am using. But that is not happening in what I see. If I enter x after the warning, I still get 1,2,3, as before. What is the problem? 4. If I place the above R-script in a folder other than the R-console that comes up when I first open R will that obviate the attach problem. /Stephen A Meskin/, PhD, FSA, MAAA Adjunct Assistant Professor of Mathematics, UMBC *Most people give you an anticipatory grin when you mention a /statistic/, frown doubtingly when you mention the plural /statistics/, and grunt and groan in a gurgle when you mention /a statistics course/.*// On 5/30/2014 12:20 PM, Greg Snow wrote: If you pay attention and are careful not to use any variables names that conflict then you do not need a work around (and the conflicts function can help you see if there are any conflicts that you may need to worry about). Probably the best work around is to use the with or within function instead of attaching. For a couple of quick commands these work great and I prefer them to using attach. But, sometimes for a long sequence of commands attach is much more convenient and is fine to use as long as you recognize the potential dangers and are careful. On Thu, May 29, 2014 at 3:56 PM, Stephen Meskin actu...@umbc.edu wrote: Thanks Greg for your response. Is there a work around? Of course this begs the question as to Why is attach part of the sample session in App. A of the introductory manual? All the commands are directly from App. A. Is it possible the configuration of R on my computer is not in accord with acceptable practice? I.e. Could my configuration be set so that attach works as App. A intends? If not then App. A needs to be changed to replace attach. If so, then App. A needs to provide instructing on appropriate configuration of R for newbies. Stephen Meskin Sent from my iPad On May 29, 2014, at 1:06 PM, Greg Snow 538...@gmail.com wrote: This is a warning and in your case would not be a problem, but it is good to think about and the reason why it is suggested that you avoid using attach and be very careful when you do use attach. What is happening is that you first created a vector named 'x' in your global workspace, you then create a data frame that contains a column that is a copy of 'x' that is also named 'x' and the data frame also has another column named 'y'. You then later attach the data frame to the search list (if you run the 'search()' command you will see your search list). This is convenient in that you can now access 'y' by typing its name instead of something like 'dummy$y', but what happens if you just type x? The issue is that there are 2 objects on your search path with that same name. For your example it will not matter much because they have the same value, but what if you run a command like 'x - 3', now you will see a single value instead of a vector of length 20 which can lead to hard to find errors. This is why R tries to be helpful by warning you that there are multiple objects named 'x' and therefore you may not be accessing the one that you think. If you use attach without being careful it is possible to plot (or regress or ...) one variable from one dataset against another variable from a completely unrelated dataset and end up with meaningless results. So, if you use attach, be careful. You may also want to look at the followng functions for help with dealing with these issues: conflicts, find, get, with, within On Wed, May 28, 2014 at 11:55 PM, Stephen Meskin actu...@umbc.edu wrote: While following the suggestion in the manual An Introduction to R to begin with Appendix A, I ran into the problem shown below about 3/4 of the way down the 1st page of App. A. After using the function /attach/, I did not get visible columns in the data frame as
Re: [R] Problem with sample session
Hi Stephen, See in line. On Fri, May 30, 2014 at 4:18 PM, Stephen Meskin actu...@umbc.edu wrote: Greg, Ista, (or anyone else), Let me take one last run at this problem. Consider the following extract from the Appendix A text: x - 1:20 w - 1+sqrt(x)/2 dummy - data.frame(x=x, y=x+rnorm(x)*w) fm - lm(y~x, data=dummy) fm1 - lm(y~x, data=dummy, weight=1/w^2) attach(dummy) Make the columns in the data frame visible as variables. The following object is masked_by_.GlobalEnv: x In the above I have included only one comment, Make the columns ... visible as variables. from the text and only one response, The following object is masked by ... : x. Stuff I don't understand: The purpose of attach seems to be to make x and y visible but I can already see them by entering the command dummy even after the warning. So what does attach do? It makes them visible in the sense that you can refer to them without referring to dummy: try rm(list=ls()) ## delete everything from your workspace dummy - data.frame(x=1:20) # data.frame containing x dummy$w - 1+sqrt(dummy$x)/2 # add w column to dummy dummy$y - dummy$x + dummy$x + rnorm(dummy$x) * dummy$w # add y column # x is not available x #Error: object 'x' not found #...exept as an element of dummy dummy$x attach(dummy) # Make the columns in the data frame visible as variables. x # x is now available as x, as well as dummy$x What I would like to see is a table with 1st column x; 2nd column y; 3rd and 4th columns predicted ys from fm and fm1; plus possibly columns of residuals and other stuff. Such tables don't seem to be available according to the discussion in ?lm. In R it is common to calculate things only as you need them. The predicted and residual values are not calculated by lm, but after the fact by predict.lm() and residuals.lm(). For example: fm - lm(y~x, data=dummy) fm1 - lm(y~x, data=dummy, weight=1/w^2) dummy$yhat.fm - predict(fm) dummy$yhat.fm1 - predict(fm1) dummy$yresid - residuals(fm1) dummy The warning about attach seems to say that there is an x in the Global Environment that will mask the x that I am using. But that is not happening in what I see. Yes it is. If I enter x after the warning, I still get 1,2,3, as before. What is the problem? That 1,2,3 ... you are seeing comes from the x that was defined earlier, by x - 1:20 _not_ from the x in dummy nor the x attached from dummy. Try this: rm(list=ls()) x - 1:5 w - 1+sqrt(x)/2 dummy - data.frame(x=x, y=x+rnorm(x)*w) attach(dummy) You now have three x variables: the one created with x - 1:5 (in the global environment), the one in dummy, and the attached one copied from dummy. This makes it easy to become confused about which x you are getting. Consider: x - 1 ## changes x in the global workspace, but not dummy$x nor the attached copy of dummy$x dummy$x - 2 # changes dummy$x but not the attached copy of dummy$x x # this is the x in the global environment # [1] 1 rm(x) x # this is the attached copy of x #[1] 1 2 3 4 5 dummy$x # this is the x in dummy # [1] 2 2 2 2 2 If I place the above R-script in a folder other than the R-console that comes up when I first open R will that obviate the attach problem. I'm not following this one... Best, Ista Stephen A Meskin, PhD, FSA, MAAA Adjunct Assistant Professor of Mathematics, UMBC Most people give you an anticipatory grin when you mention a statistic, frown doubtingly when you mention the plural statistics, and grunt and groan in a gurgle when you mention a statistics course. On 5/30/2014 12:20 PM, Greg Snow wrote: If you pay attention and are careful not to use any variables names that conflict then you do not need a work around (and the conflicts function can help you see if there are any conflicts that you may need to worry about). Probably the best work around is to use the with or within function instead of attaching. For a couple of quick commands these work great and I prefer them to using attach. But, sometimes for a long sequence of commands attach is much more convenient and is fine to use as long as you recognize the potential dangers and are careful. On Thu, May 29, 2014 at 3:56 PM, Stephen Meskin actu...@umbc.edu wrote: Thanks Greg for your response. Is there a work around? Of course this begs the question as to Why is attach part of the sample session in App. A of the introductory manual? All the commands are directly from App. A. Is it possible the configuration of R on my computer is not in accord with acceptable practice? I.e. Could my configuration be set so that attach works as App. A intends? If not then App. A needs to be changed to replace attach. If so, then App. A needs to provide instructing on appropriate configuration of R for newbies. Stephen Meskin Sent from my iPad On May 29, 2014, at 1:06 PM, Greg Snow 538...@gmail.com wrote: This is a warning and in your case would not be a problem, but it is good to think about and the
[R] Calculating the focal mean of a raster using an annulus
Hello esteemed R experts, I am attempting the use the 'focal' function in the raster package to calculate the mean of an annulus (as opposed to a focal mean of a circle or square). From what I can tell, this requires me to generate a weights matrix and use this matrix in the focal function. The problem, however, is that there are edge effects because the weights to not get readjusted along the boundary/edge of my raster. Therefore, the focal mean of the annulus near the boundary of my rasters are lower than would be expected. Code is below that repeats the problem. Please help if you can. Thank you, Sean Parks ## ## library(raster) the.raster - raster(matrix(round(rep(seq(1,5, 0.0001), 250)), nrow=250, ncol=250)) # # Create annulus weights matrix # There is actually an error in the weights matrix that I will figure out later # Bonus points if you can identify and fix it # This error does not affect the overall issue that I am attempting to address # Please don't [publicly] make fun of me for this clunky code # w - focalWeight(the.raster, d=0.025, type='circle') w.rev - w for (i in 1:ncol(w)) { col - w[,i] num.recs - length(which(w != 0)) first.rec - match(1/num.recs, col) last.rec - length(col) - (first.rec- 1) non.recs - seq(1:length(col)) non.recs - non.recs[-c(first.rec, last.rec)] col[non.recs] - 0 w.rev[,i] - col } count - length(which(w.rev != 0)) w.rev[w.rev != 0] - 1/count ### # End of creating weights matrix ### # Now make and view the annulus raster # Note the edge effects in the top and bottom of the plot annulus.raster - round(focal(the.raster, w=w.rev, na.rm=T, pad=T)) plot(annulus.raster) # END # # This electronic message contains information generated by the USDA solely for the intended recipients. Any unauthorized interception of this message or the use or disclosure of the information it contains may violate the law and subject the violator to civil or criminal penalties. If you believe you have received this message in error, please notify the sender and delete the email immediately. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problem with sample session
1. If you create a variable x or w or dummy, it is stored in the current environment. You can refer to it by the name x or w or dummy. If you create a column x in the data frame dummy, you can refer to it as dummy$x or dummy[[x]]. That is a different x than the variable x in your current environment. If you execute attach(dummy), then your current environment becomes the dummy data frame and you can refer to the column x in the data frame as x rather than the variable discussed above. If you have attached dummy and you refer to a variable like w that doesn't exist in dummy, R will search the chain of environments. The first environment it finds after looking through dummy and failing is the environment that was previously your current environment, which does have a w variable, and it will use it. 2. The R lm function is not a swiss army knife... there are other functions to obtain those additional columns... in this case the predict and residuals functions would get that data. For example, dummy[ , y.fm ] - predict( fm ) dummy[ , resid.fm ] - residuals( fm ) Read ?lm and pay attention to the see also and examples sections. 3. Until you use the detach function, when you refer to x you will see the x that is a column in the dummy data frame. Afterward, you will have to use dummy$x to see that same value. 4. The chain of variable environments exists only within the RAM used by R, and has nothing too do with the disk directory structure. That was a concept from S (so I was told), not from R. 5. You are overdue to read the Posting Guide mentioned in the footer. In there, among other things, is advice to post in plain text. HTML tends to be corrupted when the mailing list strips the HTML, so we may not see what you think we see. --- Jeff NewmillerThe . . Go Live... DCN:jdnew...@dcn.davis.ca.usBasics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/BatteriesO.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --- Sent from my phone. Please excuse my brevity. On May 30, 2014 1:18:04 PM PDT, Stephen Meskin actu...@umbc.edu wrote: Greg, Ista, (or anyone else), Let me take one last run at this problem. Consider the following extract from the Appendix A text: x - 1:20 w - 1+sqrt(x)/2 dummy - data.frame(x=x, y=x+rnorm(x)*w) fm - lm(y~x, data=dummy) fm1 - lm(y~x, data=dummy, weight=1/w^2) attach(dummy) /Make the columns in the data frame visible as variables. / The following object is masked_by_.GlobalEnv: x In the above I have included only one comment, Make the columns ... visible as variables. from the text and only one response, The following object is masked by ... : x. Stuff I don't understand: 1. The purpose of attach seems to be to make x and y visible but I can already see them by entering the command dummy even after the warning. So what does attach do? 2. What I would like to see is a table with 1st column x; 2nd column y; 3rd and 4th columns predicted ys from fm and fm1; plus possibly columns of residuals and other stuff. Such tables don't seem to be available according to the discussion in ?lm. 3. The warning about attach seems to say that there is an x in the Global Environment that will mask the x that I am using. But that is not happening in what I see. If I enter x after the warning, I still get 1,2,3, as before. What is the problem? 4. If I place the above R-script in a folder other than the R-console that comes up when I first open R will that obviate the attach problem. /Stephen A Meskin/, PhD, FSA, MAAA Adjunct Assistant Professor of Mathematics, UMBC *Most people give you an anticipatory grin when you mention a /statistic/, frown doubtingly when you mention the plural /statistics/, and grunt and groan in a gurgle when you mention /a statistics course/.*// On 5/30/2014 12:20 PM, Greg Snow wrote: If you pay attention and are careful not to use any variables names that conflict then you do not need a work around (and the conflicts function can help you see if there are any conflicts that you may need to worry about). Probably the best work around is to use the with or within function instead of attaching. For a couple of quick commands these work great and I prefer them to using attach. But, sometimes for a long sequence of commands attach is much more convenient and is fine to use as long as you recognize the potential dangers and are careful. On Thu, May 29, 2014 at 3:56 PM, Stephen Meskin actu...@umbc.edu wrote: Thanks Greg for your response. Is there a work around? Of course this begs the question as to Why is attach part of the sample session in App. A of
[R] converting a data.frame into a different table
Hi, I have a matrix of 4.5Kx4.5K elements with column- and row names I need to convert this matrix into a table, where one column is the name of the row for the element, the second column is the name of the column for the same element and the third column is the element itself. The way I do it at the moment is with a double for-loop. With this way though it takes ages for the loop to finish. I was wondering whether there is a faster way of doing the same conversion. This is how I am doing it now: my.df -data.frame() for (i in 1:(nrow(out5.df)-1)){ for (j in i:ncol(out5.df)) { #print(paste( I am at position: row-, i, and col-, j, sep=)) a- cbind(start=rownames(out5.df)[i], start.1=colnames(out5.df)[j], Value=out5.df[i,j]) my.df - rbind(my.df, a) } } this is an example for the data I have: 1234567 1FBgn0037249FBpp0312226FBtr0346646FBgn0266186 FBpp0312219FBtr0346639FBgn0010100 2FBgn0036389FBpp0312225FBtr0346645FBgn0037894 FBpp0312218FBtr0346638FBgn0026577 3FBgn0014002FBpp0312224FBtr0346644FBgn0025712 FBpp0312183FBtr0346593FBpp0312178 4FBgn0034201FBpp0312223FBtr0346643FBgn0025712 FBpp0312182FBtr0346592FBpp0312177 5FBgn0029860FBpp031FBtr0346642FBgn0261597 FBpp0312181FBtr0346591FBtr0346587 6FBgn0028526FBpp0312221FBtr0346641FBgn0263050 FBpp0312180FBtr0346589FBtr0346586 7FBgn0003486FBpp0312220FBtr0346640FBgn0263051 FBpp0312179FBtr0346588FBpp0312219 What I would like to get at the end is something like that: my.df start start.1 Value 1 1 X1 FBgn0037249 2 1 X2 FBpp0312226 3 1 X3 FBtr0346646 4 1 X4 FBgn0266186 5 1 X5 FBpp0312219 6 1 X6 FBtr0346639 7 1 X7 FBgn0010100 8 2 X2 FBpp0312225 9 2 X3 FBtr0346645 10 2 X4 FBgn0037894 11 2 X5 FBpp0312218 12 2 X6 FBtr0346638 13 2 X7 FBgn0026577 14 3 X3 FBtr0346644 15 3 X4 FBgn0025712 16 3 X5 FBpp0312183 17 3 X6 FBtr0346593 18 3 X7 FBpp0312178 19 4 X4 FBgn0025712 20 4 X5 FBpp0312182 21 4 X6 FBtr0346592 22 4 X7 FBpp0312177 23 5 X5 FBpp0312181 24 5 X6 FBtr0346591 25 5 X7 FBtr0346587 26 6 X6 FBtr0346589 27 6 X7 FBtr0346586 Sp I would like to know if there is a better way of ding it than a double for loop. thanks Assa [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] converting a data.frame into a different table
library(reshape2) # you probably need to install reshape2 before this works ?melt --- Jeff NewmillerThe . . Go Live... DCN:jdnew...@dcn.davis.ca.usBasics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/BatteriesO.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --- Sent from my phone. Please excuse my brevity. On May 30, 2014 3:07:25 PM PDT, Assa Yeroslaviz fry...@gmail.com wrote: Hi, I have a matrix of 4.5Kx4.5K elements with column- and row names I need to convert this matrix into a table, where one column is the name of the row for the element, the second column is the name of the column for the same element and the third column is the element itself. The way I do it at the moment is with a double for-loop. With this way though it takes ages for the loop to finish. I was wondering whether there is a faster way of doing the same conversion. This is how I am doing it now: my.df -data.frame() for (i in 1:(nrow(out5.df)-1)){ for (j in i:ncol(out5.df)) { #print(paste( I am at position: row-, i, and col-, j, sep=)) a- cbind(start=rownames(out5.df)[i], start.1=colnames(out5.df)[j], Value=out5.df[i,j]) my.df - rbind(my.df, a) } } this is an example for the data I have: 1234567 1FBgn0037249FBpp0312226FBtr0346646FBgn0266186 FBpp0312219FBtr0346639FBgn0010100 2FBgn0036389FBpp0312225FBtr0346645FBgn0037894 FBpp0312218FBtr0346638FBgn0026577 3FBgn0014002FBpp0312224FBtr0346644FBgn0025712 FBpp0312183FBtr0346593FBpp0312178 4FBgn0034201FBpp0312223FBtr0346643FBgn0025712 FBpp0312182FBtr0346592FBpp0312177 5FBgn0029860FBpp031FBtr0346642FBgn0261597 FBpp0312181FBtr0346591FBtr0346587 6FBgn0028526FBpp0312221FBtr0346641FBgn0263050 FBpp0312180FBtr0346589FBtr0346586 7FBgn0003486FBpp0312220FBtr0346640FBgn0263051 FBpp0312179FBtr0346588FBpp0312219 What I would like to get at the end is something like that: my.df start start.1 Value 1 1 X1 FBgn0037249 2 1 X2 FBpp0312226 3 1 X3 FBtr0346646 4 1 X4 FBgn0266186 5 1 X5 FBpp0312219 6 1 X6 FBtr0346639 7 1 X7 FBgn0010100 8 2 X2 FBpp0312225 9 2 X3 FBtr0346645 10 2 X4 FBgn0037894 11 2 X5 FBpp0312218 12 2 X6 FBtr0346638 13 2 X7 FBgn0026577 14 3 X3 FBtr0346644 15 3 X4 FBgn0025712 16 3 X5 FBpp0312183 17 3 X6 FBtr0346593 18 3 X7 FBpp0312178 19 4 X4 FBgn0025712 20 4 X5 FBpp0312182 21 4 X6 FBtr0346592 22 4 X7 FBpp0312177 23 5 X5 FBpp0312181 24 5 X6 FBtr0346591 25 5 X7 FBtr0346587 26 6 X6 FBtr0346589 27 6 X7 FBtr0346586 Sp I would like to know if there is a better way of ding it than a double for loop. thanks Assa [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] converting a data.frame into a different table
On May 30, 2014, at 3:07 PM, Assa Yeroslaviz wrote: Hi, I have a matrix of 4.5Kx4.5K elements with column- and row names I need to convert this matrix into a table, where one column is the name of the row for the element, the second column is the name of the column for the same element and the third column is the element itself. In R a table object is just a matrix with a class of table and there is a really kewl function to do exactly what you ask for on objects with class table so try this: class(out5.df) - table my.df - as.data.frame(out5.df) The way I do it at the moment is with a double for-loop. With this way though it takes ages for the loop to finish. I was wondering whether there is a faster way of doing the same conversion. This is how I am doing it now: my.df -data.frame() for (i in 1:(nrow(out5.df)-1)){ for (j in i:ncol(out5.df)) { #print(paste( I am at position: row-, i, and col-, j, sep=)) a- cbind(start=rownames(out5.df)[i], start.1=colnames(out5.df)[j], Value=out5.df[i,j]) my.df - rbind(my.df, a) } } this is an example for the data I have: I would have tested this if it had been offered using the output of dput() ?dput out5.df - matrix(1:30,5,6) colnames(out5.df)-letters[1:6] rownames(out5.df)-LETTERS[1:5] class(out5.df) - table my.df - as.data.frame(out5.df) my.df Var1 Var2 Freq 1 Aa1 2 Ba2 3 Ca3 4 Da4 5 Ea5 6 Ab6 ...snippped the rest -- David. 1234567 1FBgn0037249FBpp0312226FBtr0346646FBgn0266186 FBpp0312219FBtr0346639FBgn0010100 2FBgn0036389FBpp0312225FBtr0346645FBgn0037894 FBpp0312218FBtr0346638FBgn0026577 3FBgn0014002FBpp0312224FBtr0346644FBgn0025712 FBpp0312183FBtr0346593FBpp0312178 4FBgn0034201FBpp0312223FBtr0346643FBgn0025712 FBpp0312182FBtr0346592FBpp0312177 5FBgn0029860FBpp031FBtr0346642FBgn0261597 FBpp0312181FBtr0346591FBtr0346587 6FBgn0028526FBpp0312221FBtr0346641FBgn0263050 FBpp0312180FBtr0346589FBtr0346586 7FBgn0003486FBpp0312220FBtr0346640FBgn0263051 FBpp0312179FBtr0346588FBpp0312219 What I would like to get at the end is something like that: my.df start start.1 Value 1 1 X1 FBgn0037249 2 1 X2 FBpp0312226 3 1 X3 FBtr0346646 4 1 X4 FBgn0266186 5 1 X5 FBpp0312219 6 1 X6 FBtr0346639 7 1 X7 FBgn0010100 8 2 X2 FBpp0312225 9 2 X3 FBtr0346645 10 2 X4 FBgn0037894 11 2 X5 FBpp0312218 12 2 X6 FBtr0346638 13 2 X7 FBgn0026577 14 3 X3 FBtr0346644 15 3 X4 FBgn0025712 16 3 X5 FBpp0312183 17 3 X6 FBtr0346593 18 3 X7 FBpp0312178 19 4 X4 FBgn0025712 20 4 X5 FBpp0312182 21 4 X6 FBtr0346592 22 4 X7 FBpp0312177 23 5 X5 FBpp0312181 24 5 X6 FBtr0346591 25 5 X7 FBtr0346587 26 6 X6 FBtr0346589 27 6 X7 FBtr0346586 Sp I would like to know if there is a better way of ding it than a double for loop. thanks Assa [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius Alameda, CA, USA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] converting a data.frame into a different table
Hi, You may try: ##Assuming the dataset is a matrix mat - structure(c(FBgn0037249, FBgn0036389, FBgn0014002, FBgn0034201, FBgn0029860, FBgn0028526, FBgn0003486, FBpp0312226, FBpp0312225, FBpp0312224, FBpp0312223, FBpp031, FBpp0312221, FBpp0312220, FBtr0346646, FBtr0346645, FBtr0346644, FBtr0346643, FBtr0346642, FBtr0346641, FBtr0346640, FBgn0266186, FBgn0037894, FBgn0025712, FBgn0025712, FBgn0261597, FBgn0263050, FBgn0263051, FBpp0312219, FBpp0312218, FBpp0312183, FBpp0312182, FBpp0312181, FBpp0312180, FBpp0312179, FBtr0346639, FBtr0346638, FBtr0346593, FBtr0346592, FBtr0346591, FBtr0346589, FBtr0346588, FBgn0010100, FBgn0026577, FBpp0312178, FBpp0312177, FBtr0346587, FBtr0346586, FBpp0312219 ), .Dim = c(7L, 7L), .Dimnames = list(c(1, 2, 3, 4, 5, 6, 7), c(1, 2, 3, 4, 5, 6, 7))) res - data.frame(start=rownames(mat)[col(mat)], start.1=colnames(mat)[row(mat)], Value= c(t(mat))) ##Comparing the speed with other methods: ###For easy comparison across methods, converted the columns to factors fun1 - function(mat) { start - rownames(mat)[col(mat)] start.1 - paste0(X, colnames(mat)[row(mat)]) Value - c(t(mat)) data.frame(start = factor(start, levels = unique(start)), start.1 = factor(start.1, levels = unique(start.1)), Value) } fun2 - function(mat) { colnames(mat) - paste0(X, colnames(mat)) my.df - setNames(as.data.frame.table(mat), c(start, start.1, Value)) my.df - my.df[with(my.df, order(start, start.1)), ] row.names(my.df) - 1:nrow(my.df) my.df } library(reshape2) fun3 - function(mat) { colnames(mat) - paste0(X, colnames(mat)) my.df - transform(setNames(melt(mat), c(start, start.1, Value)), start = as.factor(start)) my.df - my.df[with(my.df, order(start, start.1)), ] row.names(my.df) - 1:nrow(my.df) my.df } set.seed(481) mat1 - matrix(sample(mat, 4.5e3*4.5e3, replace=TRUE), ncol=4.5e3, dimnames=list(1:4.5e3, 1:4.5e3)) #system.time(res1 - fun1(mat1)) # user system elapsed # 7.914 0.836 8.750 system.time(res2 - fun2(mat1)) # user system elapsed # 28.257 1.336 29.578 system.time(res3 - fun3(mat1)) # user system elapsed # 27.213 1.027 28.224 identical(res1,res2) #[1] TRUE identical(res1,res3) #[1] TRUE A.K. On Friday, May 30, 2014 6:10 PM, Assa Yeroslaviz fry...@gmail.com wrote: Hi, I have a matrix of 4.5Kx4.5K elements with column- and row names I need to convert this matrix into a table, where one column is the name of the row for the element, the second column is the name of the column for the same element and the third column is the element itself. The way I do it at the moment is with a double for-loop. With this way though it takes ages for the loop to finish. I was wondering whether there is a faster way of doing the same conversion. This is how I am doing it now: my.df -data.frame() for (i in 1:(nrow(out5.df)-1)){ for (j in i:ncol(out5.df)) { # print(paste( I am at position: row-, i, and col-, j, sep=)) a- cbind(start=rownames(out5.df)[i], start.1=colnames(out5.df)[j], Value=out5.df[i,j]) my.df - rbind(my.df, a) } } this is an example for the data I have: 1 2 3 4 5 6 7 1 FBgn0037249 FBpp0312226 FBtr0346646 FBgn0266186 FBpp0312219 FBtr0346639 FBgn0010100 2 FBgn0036389 FBpp0312225 FBtr0346645 FBgn0037894 FBpp0312218 FBtr0346638 FBgn0026577 3 FBgn0014002 FBpp0312224 FBtr0346644 FBgn0025712 FBpp0312183 FBtr0346593 FBpp0312178 4 FBgn0034201 FBpp0312223 FBtr0346643 FBgn0025712 FBpp0312182 FBtr0346592 FBpp0312177 5 FBgn0029860 FBpp031 FBtr0346642 FBgn0261597 FBpp0312181 FBtr0346591 FBtr0346587 6 FBgn0028526 FBpp0312221 FBtr0346641 FBgn0263050 FBpp0312180 FBtr0346589 FBtr0346586 7 FBgn0003486 FBpp0312220 FBtr0346640 FBgn0263051 FBpp0312179 FBtr0346588 FBpp0312219 What I would like to get at the end is something like that: my.df start start.1 Value 1 1 X1 FBgn0037249 2 1 X2 FBpp0312226 3 1 X3 FBtr0346646 4 1 X4 FBgn0266186 5 1 X5 FBpp0312219 6 1 X6 FBtr0346639 7 1 X7 FBgn0010100 8 2 X2 FBpp0312225 9 2 X3 FBtr0346645 10 2 X4 FBgn0037894 11 2 X5 FBpp0312218 12 2 X6 FBtr0346638 13 2 X7 FBgn0026577 14 3 X3 FBtr0346644 15 3 X4 FBgn0025712 16 3 X5 FBpp0312183 17 3 X6 FBtr0346593 18 3 X7 FBpp0312178 19 4 X4 FBgn0025712 20 4 X5 FBpp0312182 21 4 X6 FBtr0346592 22 4 X7 FBpp0312177 23 5 X5 FBpp0312181 24 5 X6 FBtr0346591 25 5 X7 FBtr0346587 26 6 X6 FBtr0346589 27 6 X7 FBtr0346586 Sp I would like to know if there is a better way of ding it than a double for loop. thanks Assa [[alternative HTML version deleted]]