[R] hgu133plus2hsentrezgprobe library
Hello R community, I am processing raw Affymetrix CEL files and I am using the Michigan custom CDF library hgu133plus2hsentrezgprobe. I have been looking for documentation on the function that it contains...I am specifically interested in converting probe names to gene symbols. Does anybody know where I can find it? Thank a lot! Eleni [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Transfer R workspace on another PC
Hi all and thanks for your answers. It is my first attempt to do this kind of transfer and oth machines are 32 bit. The size of my data is 92,3 Mb and I did not try to restart. However, Steve you are right, I have not installed the same packages in both computers. Moreoer, I have not used the 'session' package. I will try both and I will let you know. Once again, Thanks a lot for your help! Eleni On Wed, Mar 10, 2010 at 5:34 AM, Khanh Nguyen kngu...@cs.umb.edu wrote: I don't have an answer, but I suggest 'session' package.. I use it to move my workspace around. Never had any problem before. -k On Tue, Mar 9, 2010 at 4:44 PM, Eleni Christodoulou elenic...@gmail.com wrote: Hi list! I have recently tried to take my office work home, meaning that I tried to transfer my ... .RData workspace from my PC on my laptop. The office PC runs on Windows XP and my laptop runs on Windows Vista. I have saved the workspace at the office PC and kept it in a usb drive. When I tried to open it on my laptop I got an error: Fatal Error: Unable to restore saved data in .RData. On both computers I have the R.2.9.0 version. Could anybody give me an explanation why this happens and how I can solve this? Thanks a lot! Eleni [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Transfer R workspace on another PC
It worked with the installation of the proper packages!!! Thanks a lot! Eleni On Wed, Mar 10, 2010 at 10:48 AM, Petr PIKAL petr.pi...@precheza.cz wrote: Hi r-help-boun...@r-project.org napsal dne 09.03.2010 22:44:31: Hi list! I have recently tried to take my office work home, meaning that I tried to transfer my ... .RData workspace from my PC on my laptop. The office PC runs on Windows XP and my laptop runs on Windows Vista. I have saved the workspace at the office PC and kept it in a usb drive. When I tried to open it on my laptop I got an error: Fatal Error: Unable to restore saved data in .RData. On both computers I have the R.2.9.0 version. Could anybody give I suppose your error continued with naming some package you have installed it office comp but do not have installed in your home. Try to install necessary packages and then to open workspace again. Regards Petr me an explanation why this happens and how I can solve this? Thanks a lot! Eleni [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Transfer R workspace on another PC
Hi list! I have recently tried to take my office work home, meaning that I tried to transfer my ... .RData workspace from my PC on my laptop. The office PC runs on Windows XP and my laptop runs on Windows Vista. I have saved the workspace at the office PC and kept it in a usb drive. When I tried to open it on my laptop I got an error: Fatal Error: Unable to restore saved data in .RData. On both computers I have the R.2.9.0 version. Could anybody give me an explanation why this happens and how I can solve this? Thanks a lot! Eleni [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] sum of list elements
Dear list, I have some difficulty in manipulating list elements. More specifically, I am performing svm regression and have a list of lists, called pred.svm. The elements of the second list are 3D arrays. Thus I have pred.svm[[i]][[j]], with 1=i=5 and 1=j=20. I want to take the sum of the elements a specific array dimension across all j, for one i. Mathematically speaking, I want to calculate *W* as: *W = pred.svm[[i]][[1]][1,2,5] + pred.svm[[i]][[2]][1,2,5]+ pred.svm[[i]][[3]][1,2,5]+...+ pred.svm[[i]][[20]][1,2,5]* I have tried to apply the *lapply() *function but it seems that its arguments can only be vector elements of a list...Do I need to convert the array data to vector data? Any advice would be very welcome! Thanks a lot, Eleni [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sum of list elements
Thank you Dimitris! I have 3D arrays of the same dimensions, so Reduce worked... Best, Eleni On Thu, Mar 4, 2010 at 5:13 PM, Dimitris Rizopoulos d.rizopou...@erasmusmc.nl wrote: do these lists contain 3D arrays of the same dimensions? If yes, then you could use Reduce(+, pred.svm[[i]])[1,2,5] otherwise a for-loop will also be clear and efficient, e.g., W - pred.svm[[i]][[1]][1,2,5] for (j in 2:20) { W - W + pred.svm[[i]][[j]][1,2,5] } I hope it helps. Best, Dimitris On 3/4/2010 4:02 PM, Eleni Christodoulou wrote: Dear list, I have some difficulty in manipulating list elements. More specifically, I am performing svm regression and have a list of lists, called pred.svm. The elements of the second list are 3D arrays. Thus I have pred.svm[[i]][[j]], with 1=i=5 and 1=j=20. I want to take the sum of the elements a specific array dimension across all j, for one i. Mathematically speaking, I want to calculate *W* as: *W = pred.svm[[i]][[1]][1,2,5] + pred.svm[[i]][[2]][1,2,5]+ pred.svm[[i]][[3]][1,2,5]+...+ pred.svm[[i]][[20]][1,2,5]* I have tried to apply the *lapply() *function but it seems that its arguments can only be vector elements of a list...Do I need to convert the array data to vector data? Any advice would be very welcome! Thanks a lot, Eleni [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Dimitris Rizopoulos Assistant Professor Department of Biostatistics Erasmus University Medical Center Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands Tel: +31/(0)10/7043478 Fax: +31/(0)10/7043014 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Ridge regression
Hello again and Happy 2010! I was looking back at this email because I need to do some additional processing now. I was thinking that if I take the coef(ans) I get n+1 coefficients. I guess that the coef(ans)[1] is the constant term... Do I need to add it when I calculate the estimated value for the outcome? For example, lets say that I have divided my data into training data and test data and I have the corresponding observed try_values and tey_values (the real values for the samples that belong to the training set and the test set respectively) Here is my code: * library(MASS) ridge.test=lm.ridge(tey_values~tedata,lambda) est-list() yest-numeric() for(i in 1:length(tey_values)){ est[[i]]=coef(ridge.test)[-1]*tedata[i,] yest[i]=sum(est[[i]])+coef(ridge.test)[1] }* On Wed, Dec 2, 2009 at 8:22 PM, Ravi Varadhan rvarad...@jhmi.edu wrote: The help page clearly states that ans$coef is not on the original scale and are for use by the coef method. You also see that ans$scales gives you the scales used in the computation of ans$coef. So, to get coefficients on the original scale, you can either use coef(ans) or you can divide ans$coef by ans$scales. X1 - runif(20) X2 - runif(20) Y - 2 * X1 - 2 * X2 + rnorm(20, sd=0.1) lam - 10 ans1 - lm.ridge(Y ~ X1 + X2, lambda = lam) all.equal(ans1$coef / ans1$scales, coef(ans1)[2:3] ) Hope this helps, Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: rvarad...@jhmi.edu Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty_personal_pages/Varadhan.h tmlhttp://www.jhsph.edu/agingandhealth/People/Faculty_personal_pages/Varadhan.h%0Atml -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Ravi Varadhan Sent: Wednesday, December 02, 2009 12:25 PM To: 'David Winsemius'; 'Eleni Christodoulou' Cc: r-help@r-project.org Subject: Re: [R] Ridge regression You are right that the ans$coef and coef(ans) are different in ridge regression, where `ans' is the object from lm.ridge. It is the coef(ans) that yields the coefficients on the original scale. ans$coef is the coefficient of X-scaled and Y-centered version. Here is an example that illustrates the workings of ridge regression. First let us create some data: X1 - runif(20) X2 - runif(20) Y - 2 * X1 - 2 * X2 + rnorm(20, sd=0.1) lam - 10 ans1 - lm.ridge(Y ~ X1 + X2, lambda = lam) ans1$coef coef(ans1) # Note that these two are different # Now Let us scale the variables X1 and X2 and center Y # cY - scale(Y, scale=FALSE) n - length(Y) sX1 - scale(X1) * sqrt(n/(n-1)) sX2 - scale(X2) * sqrt(n/(n-1)) require(MASS) lam - 10 ans2 - lm.ridge(cY ~ sX1 + sX2, lambda = lam) ans2$coef coef(ans2) # Now, see that the coefficients of sX1 and sX2 are the same # This is the connection! # Armed with this insight, we now compare the ans1$coef with scaled coefficients # ans1$coef c(coef(ans1)[2] * sd(X1), coef(ans1)[3] * sd(X2)) * sqrt((n-1)/n) # Now they are the same! I hope this is clear. Best, Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: rvarad...@jhmi.edu Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty_personal_pages/Varadhan.h tmlhttp://www.jhsph.edu/agingandhealth/People/Faculty_personal_pages/Varadhan.h%0Atml -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of David Winsemius Sent: Wednesday, December 02, 2009 11:04 AM To: Eleni Christodoulou Cc: r-help@r-project.org Subject: Re: [R] Ridge regression On Dec 2, 2009, at 10:42 AM, Eleni Christodoulou wrote: Dear list, I have a couple of questions concerning ridge regression. I am using the lm.ridge(...) function in order to fit a model to my microarray data. Thus *model=lm.ridge(...)* I retrieve some coefficients and some scales for each gene. First of all, I would like to ask: the real coefficients of the model are not included in the first argument of the output but in the result of coef(model), am I right? Not exactly. coef(model) extracts the coefficients from the model but the coefficients do in the example instance I created following the help page happen to be in the first element of the model. eg: long.rr
Re: [R] Ridge regression
I am sorry, I just pressed the send button by accident before completing my e-mail. The yest are the estimated values according to the ridge model. Is the way that I calculate them correct? Or should I cut the *+coef(ridge.test)[1] *term? Thanks a lot! Eleni On Fri, Jan 8, 2010 at 6:16 PM, Eleni Christodoulou elenic...@gmail.comwrote: Hello again and Happy 2010! I was looking back at this email because I need to do some additional processing now. I was thinking that if I take the coef(ans) I get n+1 coefficients. I guess that the coef(ans)[1] is the constant term... Do I need to add it when I calculate the estimated value for the outcome? For example, lets say that I have divided my data into training data and test data and I have the corresponding observed try_values and tey_values (the real values for the samples that belong to the training set and the test set respectively) Here is my code: * library(MASS) ridge.test=lm.ridge(tey_values~tedata,lambda) est-list() yest-numeric() for(i in 1:length(tey_values)){ est[[i]]=coef(ridge.test)[-1]*tedata[i,] yest[i]=sum(est[[i]])+coef(ridge.test)[1] }* On Wed, Dec 2, 2009 at 8:22 PM, Ravi Varadhan rvarad...@jhmi.edu wrote: The help page clearly states that ans$coef is not on the original scale and are for use by the coef method. You also see that ans$scales gives you the scales used in the computation of ans$coef. So, to get coefficients on the original scale, you can either use coef(ans) or you can divide ans$coef by ans$scales. X1 - runif(20) X2 - runif(20) Y - 2 * X1 - 2 * X2 + rnorm(20, sd=0.1) lam - 10 ans1 - lm.ridge(Y ~ X1 + X2, lambda = lam) all.equal(ans1$coef / ans1$scales, coef(ans1)[2:3] ) Hope this helps, Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: rvarad...@jhmi.edu Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty_personal_pages/Varadhan.h tmlhttp://www.jhsph.edu/agingandhealth/People/Faculty_personal_pages/Varadhan.h%0Atml -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Ravi Varadhan Sent: Wednesday, December 02, 2009 12:25 PM To: 'David Winsemius'; 'Eleni Christodoulou' Cc: r-help@r-project.org Subject: Re: [R] Ridge regression You are right that the ans$coef and coef(ans) are different in ridge regression, where `ans' is the object from lm.ridge. It is the coef(ans) that yields the coefficients on the original scale. ans$coef is the coefficient of X-scaled and Y-centered version. Here is an example that illustrates the workings of ridge regression. First let us create some data: X1 - runif(20) X2 - runif(20) Y - 2 * X1 - 2 * X2 + rnorm(20, sd=0.1) lam - 10 ans1 - lm.ridge(Y ~ X1 + X2, lambda = lam) ans1$coef coef(ans1) # Note that these two are different # Now Let us scale the variables X1 and X2 and center Y # cY - scale(Y, scale=FALSE) n - length(Y) sX1 - scale(X1) * sqrt(n/(n-1)) sX2 - scale(X2) * sqrt(n/(n-1)) require(MASS) lam - 10 ans2 - lm.ridge(cY ~ sX1 + sX2, lambda = lam) ans2$coef coef(ans2) # Now, see that the coefficients of sX1 and sX2 are the same # This is the connection! # Armed with this insight, we now compare the ans1$coef with scaled coefficients # ans1$coef c(coef(ans1)[2] * sd(X1), coef(ans1)[3] * sd(X2)) * sqrt((n-1)/n) # Now they are the same! I hope this is clear. Best, Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: rvarad...@jhmi.edu Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty_personal_pages/Varadhan.h tmlhttp://www.jhsph.edu/agingandhealth/People/Faculty_personal_pages/Varadhan.h%0Atml -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of David Winsemius Sent: Wednesday, December 02, 2009 11:04 AM To: Eleni Christodoulou Cc: r-help@r-project.org Subject: Re: [R] Ridge regression On Dec 2, 2009, at 10:42 AM, Eleni Christodoulou wrote: Dear list, I have a couple of questions concerning ridge regression. I am using the lm.ridge(...) function in order to fit a model to my microarray data. Thus *model=lm.ridge(...)* I retrieve some coefficients and some scales for each gene. First of all, I would like to ask
Re: [R] Ridge regression
Thanks a lot! Eleni On Fri, Jan 8, 2010 at 6:35 PM, Ravi Varadhan rvarad...@jhmi.edu wrote: Yes, you need to have the intercept term when you predict model-based response. This is what you need: * ridge.test=lm.ridge(tey_values~tedata, lambda)* * * * yest - drop(cbind(1, tedata) %*% coef(ridge.test))* Hope this helps, Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: rvarad...@jhmi.edu Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty_personal_pages/Varadhan.html *From:* Eleni Christodoulou [mailto:elenic...@gmail.com] *Sent:* Friday, January 08, 2010 11:18 AM *To:* Ravi Varadhan *Cc:* David Winsemius; r-help@r-project.org *Subject:* Re: [R] Ridge regression I am sorry, I just pressed the send button by accident before completing my e-mail. The yest are the estimated values according to the ridge model. Is the way that I calculate them correct? Or should I cut the *+coef(ridge.test)[1] *term? Thanks a lot! Eleni On Fri, Jan 8, 2010 at 6:16 PM, Eleni Christodoulou elenic...@gmail.com wrote: Hello again and Happy 2010! I was looking back at this email because I need to do some additional processing now. I was thinking that if I take the coef(ans) I get n+1 coefficients. I guess that the coef(ans)[1] is the constant term... Do I need to add it when I calculate the estimated value for the outcome? For example, lets say that I have divided my data into training data and test data and I have the corresponding observed try_values and tey_values (the real values for the samples that belong to the training set and the test set respectively) Here is my code: * library(MASS) ridge.test=lm.ridge(tey_values~tedata,lambda) est-list() yest-numeric() for(i in 1:length(tey_values)){ est[[i]]=coef(ridge.test)[-1]*tedata[i,] yest[i]=sum(est[[i]])+coef(ridge.test)[1] }* On Wed, Dec 2, 2009 at 8:22 PM, Ravi Varadhan rvarad...@jhmi.edu wrote: The help page clearly states that ans$coef is not on the original scale and are for use by the coef method. You also see that ans$scales gives you the scales used in the computation of ans$coef. So, to get coefficients on the original scale, you can either use coef(ans) or you can divide ans$coef by ans$scales. X1 - runif(20) X2 - runif(20) Y - 2 * X1 - 2 * X2 + rnorm(20, sd=0.1) lam - 10 ans1 - lm.ridge(Y ~ X1 + X2, lambda = lam) all.equal(ans1$coef / ans1$scales, coef(ans1)[2:3] ) Hope this helps, Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: rvarad...@jhmi.edu Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty_personal_pages/Varadhan.h tmlhttp://www.jhsph.edu/agingandhealth/People/Faculty_personal_pages/Varadhan.h%0Atml -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Ravi Varadhan Sent: Wednesday, December 02, 2009 12:25 PM To: 'David Winsemius'; 'Eleni Christodoulou' Cc: r-help@r-project.org Subject: Re: [R] Ridge regression You are right that the ans$coef and coef(ans) are different in ridge regression, where `ans' is the object from lm.ridge. It is the coef(ans) that yields the coefficients on the original scale. ans$coef is the coefficient of X-scaled and Y-centered version. Here is an example that illustrates the workings of ridge regression. First let us create some data: X1 - runif(20) X2 - runif(20) Y - 2 * X1 - 2 * X2 + rnorm(20, sd=0.1) lam - 10 ans1 - lm.ridge(Y ~ X1 + X2, lambda = lam) ans1$coef coef(ans1) # Note that these two are different # Now Let us scale the variables X1 and X2 and center Y # cY - scale(Y, scale=FALSE) n - length(Y) sX1 - scale(X1) * sqrt(n/(n-1)) sX2 - scale(X2) * sqrt(n/(n-1)) require(MASS) lam - 10 ans2 - lm.ridge(cY ~ sX1 + sX2, lambda = lam) ans2$coef coef(ans2) # Now, see that the coefficients of sX1 and sX2 are the same # This is the connection! # Armed with this insight, we now compare the ans1$coef with scaled coefficients # ans1$coef c(coef(ans1)[2] * sd(X1), coef(ans1)[3] * sd(X2)) * sqrt((n-1)/n) # Now they are the same! I hope this is clear. Best, Ravi. --- Ravi Varadhan, Ph.D. Assistant
[R] lasso regression coefficients
Dear list, I have been trying to apply a simple lasso regression on a 10-element vector, just to see how this method works so as to later implement it on larger datasets. I thus create an input vector x: * x=rnorm(10)* I add some noise *noise=runif(n=10, min=-0.1, max=0.1)* and I create a simple linear model which calculates my output vector y *y=2*x+1+noise* I then do * my_data - data.matrix(x) model = lars(my_data, y, type = 'lasso')* I then calculate the coefficients (type=coefficients) based on the created *model ** preds=predict.lars(model) for(i in 1:10){ est[i]=preds$coef[2]*x[i] } y.estimated=est+1+noise *Then, I apply the same function, predict.lars, but this time with type=fit. *preds2=predict.lars(model,my_data)* When I compare the *y.estimated *to *preds2$fit[,2] *I see that they are not equal... I provide you with the returned results: *y.estimated:* [2.855597 1.259374 1.673388 1.625999 0.337993 -1.672998 -1.055416 2.423278 4.092116 -1.595545] *preds2$fit[,2]:* [2.9120115 1.1790466 1.7452670 1.7239429 0.2893512 -1.6682459 -1.1500982 2.4364527 4.1511509 -1.6098748] I think they should be equal...Does anyone have an explanation about that? Thanks a lot for your time! Eleni C. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] SVM regression
Thank you very much! Eleni On Fri, Dec 11, 2009 at 7:19 PM, Steve Lianoglou mailinglist.honey...@gmail.com wrote: Hi Eleni, On Dec 11, 2009, at 12:04 PM, Eleni Christodoulou wrote: Dear R users, I am trying to apply SVM regression for a set of microarray data. I am using the function svm() under the package {e1071}. Can anyone tell me what the *residuals *value represents? I have some observed values *y_obs* for the parameter that I want to estimate and I would expect that *svm$residuals = y_obs - svm$fitted. *However, this does not happen...Does anyone have any idea on that? This actually is what's happening. The $residuals that are reported in the model are against your *scaled* y-vector. So, with your data: R m - svm(x,y) R all(scale(y) - predict(m,x) == m$residuals) [1] TRUE -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contacthttp://cbio.mskcc.org/%7Elianos/contact [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] SVM regression
Dear R users, I am trying to apply SVM regression for a set of microarray data. I am using the function svm() under the package {e1071}. Can anyone tell me what the *residuals *value represents? I have some observed values *y_obs* for the parameter that I want to estimate and I would expect that *svm$residuals = y_obs - svm$fitted. *However, this does not happen...Does anyone have any idea on that? Thanks a lot! Eleni C.* * [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Ridge regression
Dear list, I have a couple of questions concerning ridge regression. I am using the lm.ridge(...) function in order to fit a model to my microarray data. Thus *model=lm.ridge(...)* I retrieve some coefficients and some scales for each gene. First of all, I would like to ask: the real coefficients of the model are not included in the first argument of the output but in the result of coef(model), am I right? Moreover, what does the scale argument represent? Which is its connection with the coefficients? The R help file os not very informative for me... Thank you very much in advance, Eleni Christodoulou [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] help with linear model
Dear list, I have been searching for a week to fit a simple linear model to my data. I have looked into the previous posts but I haven't found anything relevant to my problem. I guess it is something simple...I just cannot see it. I have the following data frame, named data, which is a subset of a microarray experiment. The columns are the samples and the rows are the probes. I binded the first line, called norm, which represents the estimated output. I want to create a linear model which shows the relationship between the gene expressions (rows) and the output (norm). *data* GSM276723.CEL GSM276724.CEL GSM276725.CEL GSM276726.CEL norm 0.897000 0.59 0.683000 0.949000 206427_s_at 5.387205 6.036506 8.824783 10.864122 205338_s_at 6.454779 13.143095 6.123212 12.726562 209848_s_at 6.703062 7.783330 12.175654 9.339651 205694_at5.894131 5.794516 12.876555 11.534664 201909_at 12.616538 12.913255 12.275182 12.767743 208894_at 13.049286 9.317874 12.873516 13.527182 216512_s_at 6.324789 12.783791 6.216932 12.013404 205337_at6.175940 12.158796 6.117519 12.041078 201850_at6.633013 6.465900 6.535434 7.749985 210982_s_at 12.444791 8.597388 12.197696 12.963449 GSM276727.CEL GSM276728.CEL GSM276729.CEL GSM276731.CEL norm 0.302000 0.597000 0.27 0.53 206427_s_at 5.690357 8.014055 13.034753 5.493977 205338_s_at 5.757048 7.706341 13.258410 5.562588 209848_s_at 6.461028 7.036515 13.633649 5.874098 205694_at5.519552 5.297107 6.498811 5.146150 201909_at 12.814454 11.592632 6.594229 6.650796 208894_at 13.835359 13.028096 5.839909 6.045578 216512_s_at 6.033096 7.273650 12.669054 5.946932 205337_at5.879028 7.381713 12.633829 5.379559 201850_at9.684397 6.560014 8.523229 6.573052 210982_s_at 13.342729 12.470517 5.903681 5.658115 GSM276732.CEL GSM276735.CEL GSM276736.CEL GSM276737.CEL norm 0.43400 0.647000 0.113000 1.00 206427_s_at 12.80257 5.645002 6.519554 13.572480 205338_s_at 13.38057 5.804107 11.090690 14.024922 209848_s_at 13.27718 6.490851 9.784199 14.101162 205694_at11.37717 5.802105 7.944963 14.060492 201909_at13.24126 12.263899 12.578315 6.443491 208894_at12.29916 7.563361 9.971493 7.094214 216512_s_at 13.00303 5.905789 10.512761 13.647573 205337_at12.63560 5.430138 10.707242 13.020312 201850_at12.71874 6.275480 6.987962 12.354580 210982_s_at 11.53559 7.225199 9.322706 6.617615 GSM276738.CEL GSM276739.CEL GSM276740.CEL GSM276742.CEL norm 0.35700 0.967000 0.823000 1.00 206427_s_at 13.33764 13.607918 13.190551 12.387189 205338_s_at 13.65492 12.812950 12.237476 12.912605 209848_s_at 13.48525 13.435389 13.851347 12.540495 205694_at 7.70928 10.045331 13.391456 11.103841 201909_at12.47093 11.937344 6.631023 7.160071 208894_at12.20508 8.892181 6.478889 5.927860 216512_s_at 13.42313 12.151691 11.620552 12.341763 205337_at12.67544 12.036528 11.641203 12.275845 201850_at11.85481 13.172666 12.964316 12.156142 210982_s_at 11.49940 8.380404 6.121762 5.921634 GSM276743.CEL GSM276744.CEL GSM276745.CEL GSM276747.CEL norm 0.899000 0.927000 0.754000 0.437000 206427_s_at 12.665097 12.604673 11.446630 13.000295 205338_s_at 13.261141 12.448096 13.185698 12.510952 209848_s_at 13.396711 13.882529 13.040600 12.984137 205694_at 10.888474 7.094063 8.630120 12.321685 201909_at 12.100560 6.666787 12.330600 6.572282 208894_at7.741437 8.348155 10.106442 6.009902 216512_s_at 12.830373 11.504074 12.300163 11.525958 205337_at 12.264569 11.676281 11.940917 11.618351 201850_at 11.055564 12.202366 7.327056 12.853055 210982_s_at 7.285289 8.129298 9.577032 5.924993 GSM276748.CEL GSM276752.CEL GSM276754.CEL GSM276756.CEL norm 0.321000 0.62 0.155000 0.946000 206427_s_at 9.081283 11.446978 8.191261 13.192507 205338_s_at 13.737773 13.698520 12.983830 10.948681 209848_s_at 13.234025 12.956672 10.644642
Re: [R] help with linear model
Thank you all for your replies. I have tried transposing my data and before but I did not mention it because I was getting the same error. In the present case though it worked because I put lm1=lm(*norm~*.,data=t(data)) instead of lm1=lm(*fm1*, data=t(data)) where *fm1=norm~cols...* I actually didn't know that there exists such a difference between norm~cols and norm~. I wonder why... Thank you all again! Best, Eleni On Mon, Oct 26, 2009 at 12:24 PM, Petr PIKAL petr.pi...@precheza.cz wrote: Hi r-help-boun...@r-project.org napsal dne 26.10.2009 10:48:51: Dear list, I have been searching for a week to fit a simple linear model to my data. I have looked into the previous posts but I haven't found anything relevant to my problem. I guess it is something simple...I just cannot see it. I have the following data frame, named data, which is a subset of a microarray experiment. The columns are the samples and the rows are the probes. I binded the first line, called norm, which represents the estimated output. I want to create a linear model which shows the relationship between the gene expressions (rows) and the output (norm). *data* GSM276723.CEL GSM276724.CEL GSM276725.CEL GSM276726.CEL norm 0.897000 0.59 0.683000 0.949000 206427_s_at 5.387205 6.036506 8.824783 10.864122 205338_s_at 6.454779 13.143095 6.123212 12.726562 209848_s_at 6.703062 7.783330 12.175654 9.339651 205694_at5.894131 5.794516 12.876555 11.534664 201909_at 12.616538 12.913255 12.275182 12.767743 208894_at 13.049286 9.317874 12.873516 13.527182 216512_s_at 6.324789 12.783791 6.216932 12.013404 205337_at6.175940 12.158796 6.117519 12.041078 201850_at6.633013 6.465900 6.535434 7.749985 210982_s_at 12.444791 8.597388 12.197696 12.963449 GSM276727.CEL GSM276728.CEL GSM276729.CEL GSM276731.CEL norm 0.302000 0.597000 0.27 0.53 206427_s_at 5.690357 8.014055 13.034753 5.493977 205338_s_at 5.757048 7.706341 13.258410 5.562588 209848_s_at 6.461028 7.036515 13.633649 5.874098 205694_at5.519552 5.297107 6.498811 5.146150 201909_at 12.814454 11.592632 6.594229 6.650796 208894_at 13.835359 13.028096 5.839909 6.045578 216512_s_at 6.033096 7.273650 12.669054 5.946932 205337_at5.879028 7.381713 12.633829 5.379559 201850_at9.684397 6.560014 8.523229 6.573052 210982_s_at 13.342729 12.470517 5.903681 5.658115 GSM276732.CEL GSM276735.CEL GSM276736.CEL GSM276737.CEL norm 0.43400 0.647000 0.113000 1.00 206427_s_at 12.80257 5.645002 6.519554 13.572480 205338_s_at 13.38057 5.804107 11.090690 14.024922 209848_s_at 13.27718 6.490851 9.784199 14.101162 205694_at11.37717 5.802105 7.944963 14.060492 201909_at13.24126 12.263899 12.578315 6.443491 208894_at12.29916 7.563361 9.971493 7.094214 216512_s_at 13.00303 5.905789 10.512761 13.647573 205337_at12.63560 5.430138 10.707242 13.020312 201850_at12.71874 6.275480 6.987962 12.354580 210982_s_at 11.53559 7.225199 9.322706 6.617615 GSM276738.CEL GSM276739.CEL GSM276740.CEL GSM276742.CEL norm 0.35700 0.967000 0.823000 1.00 206427_s_at 13.33764 13.607918 13.190551 12.387189 205338_s_at 13.65492 12.812950 12.237476 12.912605 209848_s_at 13.48525 13.435389 13.851347 12.540495 205694_at 7.70928 10.045331 13.391456 11.103841 201909_at12.47093 11.937344 6.631023 7.160071 208894_at12.20508 8.892181 6.478889 5.927860 216512_s_at 13.42313 12.151691 11.620552 12.341763 205337_at12.67544 12.036528 11.641203 12.275845 201850_at11.85481 13.172666 12.964316 12.156142 210982_s_at 11.49940 8.380404 6.121762 5.921634 GSM276743.CEL GSM276744.CEL GSM276745.CEL GSM276747.CEL norm 0.899000 0.927000 0.754000 0.437000 206427_s_at 12.665097 12.604673 11.446630 13.000295 205338_s_at 13.261141 12.448096 13.185698 12.510952 209848_s_at 13.396711 13.882529 13.040600 12.984137 205694_at 10.888474 7.094063 8.630120 12.321685 201909_at 12.100560
Re: [R] samr result
Yes, here it is: samr.obj-samr(data,resp.type=Two class unpaired, nperms=100, center.arrays=T) where *data *is a matrix of microarray gene expressions with genes as rows and tissues as columns. With putting *center.arrays=T *the *data* matrix is normalized such as each column has median=0. I would like to retrieve the new normalized matrix, but it seems that it is not returned by *samr.* If you have any idea on how I can find this transformed matrix I would be glad to hear that! Thanks again, E. On Tue, Jun 10, 2008 at 11:30 PM, Richardson, Patrick [EMAIL PROTECTED] wrote: Could you post your code so we can see what you are trying to do? Thanks, Patrick From: [EMAIL PROTECTED] [EMAIL PROTECTED] On Behalf Of Eleni Christodoulou [EMAIL PROTECTED] Sent: Tuesday, June 10, 2008 11:20 AM To: r-help@r-project.org Subject: [R] samr result Hello list! I have a proble trying to perform a SAM analysis using the function samr from the samr package. I have put the option *center.arrays=TRUE *in order to scale all the experiments to median=0. I would like to retrieved the scaled data but it seems that samr does not return it...Does anyone have any idea on this? Thanks a lot!!! E. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. This email message, including any attachments, is for ...{{dropped:10}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] samr result
Hello list! I have a proble trying to perform a SAM analysis using the function samr from the samr package. I have put the option *center.arrays=TRUE *in order to scale all the experiments to median=0. I would like to retrieved the scaled data but it seems that samr does not return it...Does anyone have any idea on this? Thanks a lot!!! E. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] which question
Hello list, I was trying to select a column of a data frame using the *which* command. I was actually selecting the rows of the data frame using *which, *and then displayed a certain column of it. The command that I was using is: sequence=*mydata*[*which*(human[,3] %in% genes.sam.names),*9*] In the above command, *mydata *is my data frame, *9 *is the column which I want to display. The rest are just other variables that I use. The *which*command is supposed to retrieve the rows of interst. The rows are well retrieved, however, if for the certain row, column *9* is NA, the respective element of column *10* is displayed. How can I fix that? Thank you very much, Eleni [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] which question
An example is: symbol=human[which(human[,3] %in% genes.sam.names),8] The data* human* and *genes.sam.names* are attached. The result of the above command is: symbol [1] CCL18 MARCO SYT13 [4] FOXC1 CDH3 [7] CA12 CELSR1 NM_018440 [10] MICROTUBULE-ASSOCIATED NM_015529 ESR1 [13] PHGDH GABRP LGMN [16] MMP9 BMP7 KLF5 [19] RIPK2 GATA3 NM_032023 [22] TRIM2 CCND1 MMP12 [25] LDHB AF493978 SOD2 [28] SOD2 SOD2 NME5 [31] STC2 RBP1 ROPN1 [34] RDH10 KRTHB1 SLPI [37] BBOX1 FOXA1 NM_005669 [40] MCCC2 CHI3L1 GSTM3 [43] LPIN1 DSC2 FADS2 [46] ELF5 CYP1B1 LMO4 [49] AL035297 NM_152398 AB018342 [52] PIK3R1 NFKBIE MLZE [55] NFIB NM_052997 NM_006023 [58] CPB1 CXCL13 CBR3 [61] NM_017527 FABP7 DACH [64] IFI27 ACOX2 CXCL11 [67] UGP2 CLDN4 M12740 [70] IGKC IGKC CLECSF12 [73] AY069977 HOXB2 SOX11 [76]NM_017422 TLR2 [79] CKS1B BC017946 APOBEC3B [82]HLA-DRB1 HLA-DQB1 [85]CCL13 C4orf7 [88]NM_173552 21345 Levels: (2 (32 (55.11 (AIB-1) (ALU (CAK1) (CAP4) (CASPASE ... ZYX As you can see, apart from gene symbols, which is the required thing, RefSeq ID sare also retrieved... Thanks a lot, Eleni On Fri, Jun 6, 2008 at 1:23 PM, Dieter Menne [EMAIL PROTECTED] wrote: Eleni Christodoulou elenichri at gmail.com writes: I was trying to select a column of a data frame using the *which* command. I was actually selecting the rows of the data frame using *which, *and then displayed a certain column of it. The command that I was using is: sequence=*mydata*[*which*(human[,3] %in% genes.sam.names),*9*] Please provide a running example. The *mydata* are difficult to read. Dieter __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] oligo ids
Dear list, I am having a set of human oligo ids (H26022 H22025 H34703 H20442 H25719 H300018350) which I want to map to Ensembl or RefSeq. I am sure R has a function to do that. I downloaded the {oligo} package and tried to use the probeNames function. Although the factor of ologo ids is an object (as the argument to probeNames should be) I retrieve the following error: probeNames(significant_genes[,1]) Error in function (classes, fdef, mtable) : unable to find an inherited method for function probeNames, for signature factor where significant_genes is the factor with the oligo ids. Could anyone help me with the format I should use in order to apply probeNames? Or if someone has any other function in mind whcih can do the mapping I would be really grateful to hear that. Thank you all, Eleni [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] [BioC] oligo ids
Thanks Sean! Your reply was very helpful. I already got almost what I wanted. I have some NA values but I will look if I can find them through bibliography or an external tool. Best Regards, Eleni On Mon, May 19, 2008 at 6:07 PM, Sean Davis [EMAIL PROTECTED] wrote: On Mon, May 19, 2008 at 10:47 AM, Eleni Christodoulou [EMAIL PROTECTED] wrote: Dear list, I am having a set of human oligo ids (H26022 H22025 H34703 H20442 H25719 H300018350) which I want to map to Ensembl or RefSeq. I am sure R has a function to do that. I downloaded the {oligo} package and tried to use the probeNames function. Although the factor of ologo ids is an object (as the argument to probeNames should be) I retrieve the following error: probeNames(significant_genes[,1]) Error in function (classes, fdef, mtable) : unable to find an inherited method for function probeNames, for signature factor where significant_genes is the factor with the oligo ids. Could anyone help me with the format I should use in order to apply probeNames? Or if someone has any other function in mind which can do the mapping I would be really grateful to hear that. Hi, Eleni. The probeNames() function is not applicable here, unfortunately. You are asking a question related to annotating your array. Therefore, you need an annotation package. I think the IDs that you specified are Qiagen (Operon) IDs, so the place to look is in the annotation package associated with the Qiagen arrays: Assuming that you are using R 2.7.0 (you are, correct?), then you can do: source('http://bioconductor.org/biocLite.R') biocLite('hguqiagenv3.db') library(hguqiagenv3.db) mget(c('H26022','H22025'),hguqiagenenv3REFSEQ) The last command will return a list of mappings between those two oligo ids and RefSeq. Typing: hguqiagenv3() will tell you the other annotation sources available for your qiagen chip. Ensembl mappings are available, as are a bunch of other mappings. Let us know if you have more questions. Sean [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Significance analysis of Microarrays (SAM)
Thanks Martin, I also posted the question on the bioconductor list but I have no reply yet. In the meanwhile I found out that instead of saying d=list(data.matrix2,y,censored) I should specify the arguments: d=list(x=data.matrix2,y=y,censoring.status=censored) Strange, huh? Anyway, it solves the problem. Thanks once again, Eleni On Tue, May 6, 2008 at 6:43 PM, Martin Morgan [EMAIL PROTECTED] wrote: Hi Eleni -- Although samr is not a Bioconductor package, you might have more luck asking on the Bioconductor mailing list, http://bioconductor.org. The obvious place to start, and probably you have already done this, is to ensure that the class of the objects passed to the function agree with the classes described on the function help page. Martin Eleni Christodoulou [EMAIL PROTECTED] writes: Dear list, I am trying to perform a significance analysis of a microarray experiment with survival data using the {samr} package. I have a matrix containing my data which has 17816 rows corresponding to genes, and 286 columns corresponding to samples. The name of this matrix is data.matrix2. Some of the first values of this matrix are: data.matrix2[1:3,1:5] GSM36777 GSM36778 GSM36779 GSM36780 GSM36781 [1,] 1.009274 1.0740659 1.048540 1.015946 1.022650 [2,] 1.007992 0.8768410 0.962442 1.111742 1.121150 [3,] 0.981853 0.9606492 1.024987 1.053302 1.063408 I also have the time in which each patient-sample is examined for relapse. This information is in vector y, which has length 286, and is declared in months. Indicatively: y[1:5] [1] 101 118 9 106 37 Finally, I have a variable censored, which is 1 if the patient has relapsed when examined at the examined time and 0 if not. Indicatively: censored[1:5] [1] 0 0 1 0 1 I am trying to perform the following sam analysis: d=list(data.matrix2,y,censored) samr.obj=samr(d,resp.type=Survival, nperms=20) When I am running the above commands I get the error: Error in check.format(y, resp.type = resp.type, censoring.status = censoring.status) : Error in input response data: response type Survival specified; error in censoring indicator In addition: Warning message: In is.na(x) : is.na() applied to non-(list or vector) of type 'NULL' I really cannot understand what is wrong with my code. Could anyone please help me with this? Thank you all, Eleni [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Martin Morgan Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M2 B169 Phone: (206) 667-2793 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Significance analysis of Microarrays (SAM)
Dear list, I am trying to perform a significance analysis of a microarray experiment with survival data using the {samr} package. I have a matrix containing my data which has 17816 rows corresponding to genes, and 286 columns corresponding to samples. The name of this matrix is data.matrix2. Some of the first values of this matrix are: data.matrix2[1:3,1:5] GSM36777 GSM36778 GSM36779 GSM36780 GSM36781 [1,] 1.009274 1.0740659 1.048540 1.015946 1.022650 [2,] 1.007992 0.8768410 0.962442 1.111742 1.121150 [3,] 0.981853 0.9606492 1.024987 1.053302 1.063408 I also have the time in which each patient-sample is examined for relapse. This information is in vector y, which has length 286, and is declared in months. Indicatively: y[1:5] [1] 101 118 9 106 37 Finally, I have a variable censored, which is 1 if the patient has relapsed when examined at the examined time and 0 if not. Indicatively: censored[1:5] [1] 0 0 1 0 1 I am trying to perform the following sam analysis: d=list(data.matrix2,y,censored) samr.obj=samr(d,resp.type=Survival, nperms=20) When I am running the above commands I get the error: Error in check.format(y, resp.type = resp.type, censoring.status = censoring.status) : Error in input response data: response type Survival specified; error in censoring indicator In addition: Warning message: In is.na(x) : is.na() applied to non-(list or vector) of type 'NULL' I really cannot understand what is wrong with my code. Could anyone please help me with this? Thank you all, Eleni [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] sensitivity analysis
Hello list, I am performing a sensitivity analysis using the package ROCR. I am using the class prediction in this aim. My question is, could anyone tell me what the vector cutoffs represent in the result? Thank you all, Eleni [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] ROC analysis
Hello list, I am trying to perform ROC analysis and count the AUC in order to validate my results. I use package ROCR. I would like to count the AUC not under the cutoff found by performance but to use another cutoff that I calculate. How could I change the following command in order to get what I want? perform=performance(pred,measure=auc,x.measure=cutoff), where pred is a prediction object. Thank you very much, Eleni [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ROC analysis
Richard, thanks, I think it will work. I will calculate the cutoff value and then, from the prediction object, find the fpr that is related to it and put it as argument to performance. I will keep you informed. Eleni On Wed, Mar 19, 2008 at 11:51 AM, Richard Pearson [EMAIL PROTECTED] wrote: Eleni Does the fpr.stop argument do what you want? This is described in ?performance under the details of the auc measure. Try, e.g. perform=performance(pred,measure=auc,fpr.stop=0.5) Richard. Eleni Christodoulou wrote: Hello list, I am trying to perform ROC analysis and count the AUC in order to validate my results. I use package ROCR. I would like to count the AUC not under the cutoff found by performance but to use another cutoff that I calculate. How could I change the following command in order to get what I want? perform=performance(pred,measure=auc,x.measure=cutoff), where pred is a prediction object. Thank you very much, Eleni [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] t.test p-Value
Hello list, I am trying to apply the paired t.test between diseased and not diseased patients to identify genes that are more expressed in the one situation under the other. In order to retrieve the genes that are more expressed in the positive disease state I do: p.values-c() for(i in 1:length(Significant[,1])){ p.values[i]-try(t.test(positive[i,],negative[i,],alternative =greater)$p.value) } which(p.values0.01) where Significant is my matrix of genes and their expression in tumors and positive, negative are subsets of thes matrix. Whn p0.01, I reject the null hypothesis and I accept the alternative one, that I have greater gene expression in positive than in negative. I assume I must be doing sth wrong because the heatmap that I get with the genes that pass the filter of p-value is wrong. Could anyone help me with this? thanks a lot, Eleni [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] t.test p-Value
I am sorry, the test is unpaired...But my question remains Thanks, Eleni On Wed, Mar 5, 2008 at 2:33 PM, Eleni Christodoulou [EMAIL PROTECTED] wrote: Hello list, I am trying to apply the paired t.test between diseased and not diseased patients to identify genes that are more expressed in the one situation under the other. In order to retrieve the genes that are more expressed in the positive disease state I do: p.values-c() for(i in 1:length(Significant[,1])){ p.values[i]-try(t.test(positive[i,],negative[i,],alternative =greater)$p.value) } which(p.values0.01) where Significant is my matrix of genes and their expression in tumors and positive, negative are subsets of thes matrix. Whn p0.01, I reject the null hypothesis and I accept the alternative one, that I have greater gene expression in positive than in negative. I assume I must be doing sth wrong because the heatmap that I get with the genes that pass the filter of p-value is wrong. Could anyone help me with this? thanks a lot, Eleni [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] t.test p-Value
On Wed, Mar 5, 2008 at 2:05 PM, ian white [EMAIL PROTECTED] wrote: Don't you need to make some allowance for multiple testing? E.g. to get a experiment-wise significance level of 0.01 you need which(p.values very small number) where the very small number is approximately 0.01/(total number of genes). On Wed, 2008-03-05 at 14:38 +0200, Eleni Christodoulou wrote: I am sorry, the test is unpaired...But my question remains Thanks, Eleni On Wed, Mar 5, 2008 at 2:33 PM, Eleni Christodoulou [EMAIL PROTECTED] wrote: Hello list, I am trying to apply the paired t.test between diseased and not diseased patients to identify genes that are more expressed in the one situation under the other. In order to retrieve the genes that are more expressed in the positive disease state I do: p.values-c() for(i in 1:length(Significant[,1])){ p.values[i]-try(t.test(positive[i,],negative[i,],alternative =greater)$p.value) } which(p.values0.01) where Significant is my matrix of genes and their expression in tumors and positive, negative are subsets of thes matrix. Whn p0.01, I reject the null hypothesis and I accept the alternative one, that I have greater gene expression in positive than in negative. I assume I must be doing sth wrong because the heatmap that I get with the genes that pass the filter of p-value is wrong. Could anyone help me with this? thanks a lot, Eleni [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Cox model+ROCR
Dear list, I am trying to build a cox model and then perform ROC analysis in order to retrieve some genes that are correlated with breast cancer. When I calculate the hazard score taking into account different numbers of genes and their coefficients ( I am trying to find the pest predictor number of genes), I retrieve from around 1 values (for few genes included ) to size of e+80 values (for many genes included). I am using the prediction method from the ROCR package which takes as arguments the calculated scores and the true class scores. I really don't know what to compare my values with, because the only data that I have available are the time to relapse or last follow-up (months) and the relapse score (1=TRUE, 0=FALSE) of the patients. I have never performed ROC analysis before and I am a bit lost... Any help with this is really very welcome! Thank you all, Eleni [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Van't Veer paper on breast cancer
Hello all, I am working at the FORTH institute in Crete and it's been a long now that I am trying to reproduce the results of the paper : Gene expression profiling predits clinical outcome of breast cancer, by Van't Veer et al. It has been published in NATURE, vol 415, 31 January 2002. http://www.nature.com/nature/journal/v415/n6871/full/415530a.html I am facing some difficulties in building the classifier and I was wondering if someone else has worked on it and could give me some help. Thank you all, Eleni [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Kaplan Meier function
Hi all, I am trying to draw a Kaplan-Meier curve and I found online that Kaplan - Meier estimates are computed with a function called km in the event package. Is there an update for that because when I choose to download packages in R,. there is no package called event, even though I have selected all the repositories. Thanks in advance, Eleni [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Kaplan Meier function
Thank you all for the replies! Eleni On Thu, Feb 14, 2008 at 4:10 PM, Dimitris Rizopoulos [EMAIL PROTECTED] wrote: check function survfit() in package survival. Best, Dimitris Dimitris Rizopoulos Ph.D. Student Biostatistical Centre School of Public Health Catholic University of Leuven Address: Kapucijnenvoer 35, Leuven, Belgium Tel: +32/(0)16/336899 Fax: +32/(0)16/337015 Web: http://med.kuleuven.be/biostat/ http://www.student.kuleuven.be/~m0390867/dimitris.htmhttp://www.student.kuleuven.be/%7Em0390867/dimitris.htm - Original Message - From: Eleni Christodoulou [EMAIL PROTECTED] To: r-help@r-project.org Sent: Thursday, February 14, 2008 2:50 PM Subject: [R] Kaplan Meier function Hi all, I am trying to draw a Kaplan-Meier curve and I found online that Kaplan - Meier estimates are computed with a function called km in the event package. Is there an update for that because when I choose to download packages in R,. there is no package called event, even though I have selected all the repositories. Thanks in advance, Eleni [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Cox model
Hmm...I see. I think I will give a try to the univariate analysis nonetheless...I intend to catch the p-values for each gene and select the most significant from these...I have seen it in several papers. Best Regards, Eleni On Feb 13, 2008 2:59 PM, Terry Therneau [EMAIL PROTECTED] wrote: What you appear to want are all of the univariate models. You can get this with a loop (and patience - it won't be fast). ngene - ncol(genes) coefmat - matrix(0., nrow=ngene, ncol=2) for (i in 1:ngene) { tempfit - coxph(Surv(time, relapse) ~ genes[,i]) coefmat[i,] - c(tempfit$coef, sqrt(tempfit$var)) } However, the fact that R can do this for you does not mean it is a good idea. In fact, doing all of the univariate tests for a microarray has been shown by many people to be a very bad idea. There are several approaches to deal with the key issues, which you should research before going forward. Terry Therneau [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] dimnames
What if you just removed the first column from your matrix: XX-XX[,2:length(XX[1,])) so you have a new matrix without the first column and save this second one to a file? Regards, Eleni On Feb 13, 2008 3:06 PM, Roberto Olivares Hernandez [EMAIL PROTECTED] wrote: Hi, I used the write.table function to save data in txt file, and this is the output: V1 V2 V3 V4 1 YAL005C 21 14 11 2 YAL007C 2 1 4 3 YAL012W 8 16 3 4 YAL016W 24 23 23 5 YAL019W 3 3 2 6 YAL020C 2 4 2 7 YAL021C 7 5 5 8 YAL022C 3 1 2 but I need to remove the dimnames (first column) I tried to use dimnames function to remove it and then save it, but still, the output is the same These are the command lines, XX #matrix dimnames(XX)-NULL write.table(XX,XX.txt,quote=FALSE,sep=\t) Thanks in advance Roberto [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Cox model
Hi David, The problem is that I need all these regressors. I need a coefficient for every one of them and then rank them according to that coefficient. Thanks, Eleni On Feb 12, 2008 4:54 PM, [EMAIL PROTECTED] wrote: Hi Eleni, I am not an expert in R or statistics but in my opinion you have too many regressors compared to the number of observations and that might be the reason why you get the error. Others might say better but as far as I know, having only 80 observations, it is a good idea to first filter your list of variables down to a few tenths. HTH David Hello R-community, It's been a week now that I am struggling with the implementation of a cox model in R. I have 80 cancer patients, so 80 time measurements and 80 relapse or no measurements (respective to censor, 1 if relapsed over the examined period, 0 if not). My microarray data contain around 18000 genes. So I have the expressions of 18000 genes in each of the 80 tumors (matrix 80*18000). I would like to build a cox model in order to retrieve the most significant genes (according to the p-value). The command that I am using is: test1 - list(time,relapse,genes) coxph( Surv(time, relapse) ~ genes, test1) where time is a vector of size 80 containing the times, relapse is a vector of size 80 containing the relapse values and genes is a matrix 80*18000. When I give the coxph command I retrieve an error saying that cannot allocate vector of size 2.7Mb (in Windows). I also tried linux and then I receive error that maximum memory is reached. I increase the memory by initializing R with the command: R --min-vsize=10M --max-vsize=250M --min-nsize=1M --max-nsize=200M I think it cannot get better than that because if I try for example max-vsize=300 the memomry capacity is stored as NA. Does anyone have any idea why this happens and how I can overcome it? I would be really grateful if you could help! It has been bothering me a lot! Thank you all, Eleni [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Cox model
Hello R-community, It's been a week now that I am struggling with the implementation of a cox model in R. I have 80 cancer patients, so 80 time measurements and 80 relapse or no measurements (respective to censor, 1 if relapsed over the examined period, 0 if not). My microarray data contain around 18000 genes. So I have the expressions of 18000 genes in each of the 80 tumors (matrix 80*18000). I would like to build a cox model in order to retrieve the most significant genes (according to the p-value). The command that I am using is: test1 - list(time,relapse,genes) coxph( Surv(time, relapse) ~ genes, test1) where time is a vector of size 80 containing the times, relapse is a vector of size 80 containing the relapse values and genes is a matrix 80*18000. When I give the coxph command I retrieve an error saying that cannot allocate vector of size 2.7Mb (in Windows). I also tried linux and then I receive error that maximum memory is reached. I increase the memory by initializing R with the command: R --min-vsize=10M --max-vsize=250M --min-nsize=1M --max-nsize=200M I think it cannot get better than that because if I try for example max-vsize=300 the memomry capacity is stored as NA. Does anyone have any idea why this happens and how I can overcome it? I would be really grateful if you could help! It has been bothering me a lot! Thank you all, Eleni [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Memory problem?
Hello R users, I am trying to run a cox model for the prediction of relapse of 80 cancer tumors, taking into account the expression of 17000 genes. The data are large and I retrieve an error: Cannot allocate vector of 2.4 Mb. I increase the memory.limit to 4000 (which is the largest supported by my computer) but I still retrieve the error because of other big variables that I have in the workspace. Does anyone know how to overcome this problem? Many thanks in advance, Eleni [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] select repositories under linux
Hi all, I am trying to install the package GEOquery in unix. I have downloaded the standard version of R and this package is not contained in the default. I know that I can select repositories under windows but I don't know how to do it in unix. Does anyone have any idea on this? Thank you in advance, Eleni [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Clustering
Thank you very much! I had misunderstood it's true... On Nov 28, 2007 6:28 PM, Birgit Lemcke [EMAIL PROTECTED] wrote: Hello Eleni, as far as I understood and used agnes() the method argument determines only the clustering method. If you use diss=TRUE the distances should be taken from the distance matrix. Birgit Am 28.11.2007 um 12:18 schrieb Eleni Christodoulou: Hello all! I am performingsome clustering analysis on microarray data using agnes{cluster} and I have created my own dissimilarity matrix according to a distance measure different from euclidean or manhattan etc. My question is, if I choose for example method=complete, how are the distances between the elements calculated? Are they taken form the dissimilarity matrix I have provided as the first argument? clust.complete.agnes-agnes(as.dist(D),diss=TRUE,method=complete) Thank you very much, Eleni [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. Birgit Lemcke Institut für Systematische Botanik Zollikerstrasse 107 CH-8008 Zürich Switzerland Ph: +41 (0)44 634 8351 [EMAIL PROTECTED] [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Clustering
Hello all! I am performingsome clustering analysis on microarray data using agnes{cluster} and I have created my own dissimilarity matrix according to a distance measure different from euclidean or manhattan etc. My question is, if I choose for example method=complete, how are the distances between the elements calculated? Are they taken form the dissimilarity matrix I have provided as the first argument? clust.complete.agnes-agnes(as.dist(D),diss=TRUE,method=complete) Thank you very much, Eleni [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] NA values
Yes, thanks a lot! It works fine! Eleni On Nov 21, 2007 2:03 PM, Ted Harding [EMAIL PROTECTED] wrote: On 21-Nov-07 11:15:32, Eleni Christodoulou wrote: Hi all! I am new to R and I would like to ask you the following question: How can I substitute the NA values with 0 in a data frame? I cannot find a command to check if a value is NA... Thank you very much! Eleni As has been said, is.na() is the function which determines whether something has value NA (result=TRUE) or not (result=FALSE). is.na() will work nicely with dataframes (also, of course, with structures such as vectors, matrices and arrays). Example: dummy-data.frame(X1=c(101,102,103,104,NA,106), X2=c(201,202,203,NA,205,206)) dummy # X1 X2 #1 101 201 #2 102 202 #3 103 203 #4 104 NA #5 NA 205 #6 106 206 dummy[is.na(dummy)] - 0 dummy # X1 X2 #1 101 201 #2 102 202 #3 103 203 #4 104 0 #5 0 205 #6 106 206 Hoping this makes it clear! Ted. E-Mail: (Ted Harding) [EMAIL PROTECTED] Fax-to-email: +44 (0)870 094 0861 Date: 21-Nov-07 Time: 12:03:33 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.