Re: [R] help needed with printing multiple arguments as vectors, not matrices
Awesome! Thanks so much for your help. I am able to get the format I was looking for. On Mon, Jun 24, 2013 at 2:50 PM, Adams, Jean wrote: > When sharing data, use dput() to output the data in a way that R-Help > readers can easily use. > > Give this code a try and see if it helps. > > Jean > > > > # read in the example data > mat1 <- structure(list("GENE SYMBOL" = c("ADCK2", "ADCK3", "ADCK4", > "ADCK5", > "ADRBK1", "ADRBK2", "AKT1", "AKT2", "AKT3", "ALK"), Sample.A1.A0SK.01 = > c(0L, > 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L), Sample.A1.A0SO.01 = c(0L, > 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), Sample.A1.A0SP.01 = c(0L, > 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), Sample.A2.A04P.01 = c(0L, > 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), Sample.A2.A04Q.01 = c(0L, > 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), Sample.A2.A04U.01 = c(0L, > 1L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L), Sample.A2.A0CL.01 = c(0L, > 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), Sample.A2.A0CM.01 = c(0L, > 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), Sample.A2.A0D0.01 = c(0L, > 1L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L), Sample.A2.A0D2.01 = c(1L, > 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), Sample.A2.A0ST.01 = c(0L, > 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), Sample.A2.A0SX.01 = c(0L, > 1L, 1L, 0L, 0L, 0L, 0L, 1L, 1L, 0L), Sample.A2.A0T0.01 = c(0L, > 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L)), .Names = c("GENE SYMBOL", > "Sample-A1-A0SK-01", "Sample-A1-A0SO-01", "Sample-A1-A0SP-01", > "Sample-A2-A04P-01", "Sample-A2-A04Q-01", "Sample-A2-A04U-01", > "Sample-A2-A0CL-01", "Sample-A2-A0CM-01", "Sample-A2-A0D0-01", > "Sample-A2-A0D2-01", "Sample-A2-A0ST-01", "Sample-A2-A0SX-01", > "Sample-A2-A0T0-01"), class = "data.frame", row.names = c(NA, -10L)) > mat2 <- mat1 > > fish <- function(x, y) { > n00 = sum((1-x)*(1-y)) > n01 = sum((1-x)*y) > n10 = sum(x*(1-y)) > n11 = sum(x*y) > a = matrix(c(n00, n01, n10, n11), nrow=2) > pval = fisher.test(a)$p.value > return(c(n00, n01, n10, n11, pval)) > } > > results <- data.frame(matrix(NA, nrow=dim(mat1)[1]*dim(mat2)[1], ncol=7, > dimnames=list(NULL, c("gene1", "gene2", "n00", "n01", "n10", "n11", > "pval" > count <- 0 > for(i in 1:dim(mat1)[1]) { > for(j in 1:dim(mat2)[1]) { > count <- count + 1 > results[count, "gene1"] <- mat1$"GENE SYMBOL"[i] > results[count, "gene2"] <- mat2$"GENE SYMBOL"[j] > results[count, 3:7] <- fish(mat1[i, -1], mat2[j, -1]) > }} > > > > On Mon, Jun 24, 2013 at 10:35 AM, Angel Russo wrote: > >> Sample data is as follows (for simplicity assume mat1 and mat2 are the >> same matrices). Also attached as an excel file. >> >> I want to get the pairwise interaction fischer test results. Not just the >> pvalues but also want to wrote the n00, n01, n10 and n11 in the file as: >> >> >> ADCK2_mat1 ADCK3_mat2 n00 n01 n10 n11 pvalue >> >> >> >> ADCK2_mat1 ADCK4_mat2 n00 n01 n10 n11 pvalue >> >> GENE SYMBOL Sample-A1-A0SK-01 Sample-A1-A0SO-01 Sample-A1-A0SP-01 >> Sample-A2-A04P-01 Sample-A2-A04Q-01 Sample-A2-A04U-01 Sample-A2-A0CL-01 >> Sample-A2-A0CM-01 Sample-A2-A0D0-01 Sample-A2-A0D2-01 Sample-A2-A0ST-01 >> Sample-A2-A0SX-01 Sample-A2-A0T0-01 ADCK2 0 0 0 0 0 0 0 0 0 1 0 0 0 >> ADCK3 0 0 0 0 0 1 0 0 1 0 0 1 0 ADCK4 0 0 0 0 0 0 0 0 0 0 0 1 0 ADCK5 1 >> 0 0 0 0 0 0 0 0 0 0 0 1 ADRBK1 0 0 0 0 0 0 0 0 0 0 0 0 0 ADRBK2 0 0 0 0 >> 0 1 0 0 0 0 0 0 0 AKT1 0 0 0 0 0 0 0 0 0 0 0 0 0 AKT2 0 0 0 0 0 0 0 0 0 >> 0 0 1 0 AKT3 0 0 0 0 0 0 0 0 1 0 0 1 0 ALK 0 0 0 0 0 0 0 0 0 0 0 0 0 >> >> >> On Mon, Jun 24, 2013 at 11:03 AM, Adams, Jean wrote: >> >>> Could you provide an example of mat1 and mat2 as well as an example of >>> the output you would like from them. >>> >>> Your code is very difficult to follow as written. It will be easier for >>> readers of the list to interpret if you use carriage returns rather than >>> semi colons, for example ... >>> >>> fish <- function(x, y) { >>> n00 = sum((1-x)*(1-y)) >>> n01 = sum((1-x)*y) >>> n10 = sum(x*(1-y)) >>> n11 = sum(x*y) >>> a = matrix(c(n00, n01, n10, n11), nrow=2) >>> pval = fisher.test(a)$p.value >>> return(pval) >>> } >>> >>> Jean >>> >>> >>> On Mon, Jun 24, 2013
Re: [R] help needed with printing multiple arguments as vectors, not matrices
Sample data is as follows (for simplicity assume mat1 and mat2 are the same matrices). Also attached as an excel file. I want to get the pairwise interaction fischer test results. Not just the pvalues but also want to wrote the n00, n01, n10 and n11 in the file as: ADCK2_mat1 ADCK3_mat2 n00 n01 n10 n11 pvalue ADCK2_mat1 ADCK4_mat2 n00 n01 n10 n11 pvalue GENE SYMBOL Sample-A1-A0SK-01 Sample-A1-A0SO-01 Sample-A1-A0SP-01 Sample-A2-A04P-01 Sample-A2-A04Q-01 Sample-A2-A04U-01 Sample-A2-A0CL-01 Sample-A2-A0CM-01 Sample-A2-A0D0-01 Sample-A2-A0D2-01 Sample-A2-A0ST-01 Sample-A2-A0SX-01 Sample-A2-A0T0-01 ADCK2 0 0 0 0 0 0 0 0 0 1 0 0 0 ADCK3 0 0 0 0 0 1 0 0 1 0 0 1 0 ADCK4 0 0 0 0 0 0 0 0 0 0 0 1 0 ADCK5 1 0 0 0 0 0 0 0 0 0 0 0 1 ADRBK1 0 0 0 0 0 0 0 0 0 0 0 0 0 ADRBK2 0 0 0 0 0 1 0 0 0 0 0 0 0 AKT1 0 0 0 0 0 0 0 0 0 0 0 0 0 AKT2 0 0 0 0 0 0 0 0 0 0 0 1 0 AKT3 0 0 0 0 0 0 0 0 1 0 0 1 0 ALK 0 0 0 0 0 0 0 0 0 0 0 0 0 On Mon, Jun 24, 2013 at 11:03 AM, Adams, Jean wrote: > Could you provide an example of mat1 and mat2 as well as an example of the > output you would like from them. > > Your code is very difficult to follow as written. It will be easier for > readers of the list to interpret if you use carriage returns rather than > semi colons, for example ... > > fish <- function(x, y) { > n00 = sum((1-x)*(1-y)) > n01 = sum((1-x)*y) > n10 = sum(x*(1-y)) > n11 = sum(x*y) > a = matrix(c(n00, n01, n10, n11), nrow=2) > pval = fisher.test(a)$p.value > return(pval) > } > > Jean > > > On Mon, Jun 24, 2013 at 6:09 AM, Angel Russo wrote: > >> ** >> >> I am using the following way to get p-values from fiser exact test. >> However, I do need to print for each pair the values "n00, n01, n10, n11". >> How can I print that as a table and not a matrix as below along with the >> p-value? Any help will be greatly appreciated >> >> fish <- function(y, x) {n00 = sum((1-x)*(1-y)); n01 = sum((1-x)*y); >> n10 = sum(x*(1-y)); n11 = sum(x*y); a = matrix(c(n00, n01, n10, n11), >> nrow = 2); pval = fisher.test(a)$p.value; return(pval);} >> >> chiArray <- function(x) { apply(mat1, 1, fish, x); } >> sapply(1:nrow(mat2), function(j){chiArray(mat2[j, ]);}); >> chisq.cna.mut.test <- sapply(1:nrow(mat2), function(j){chiArray(mat2[j, >> ]);}); >> >> I want output to be: >> >> name1_mat1 name1_mat2 n00 n01 n10 n11 pvalue >> >> >> >> name1_mat1 name2_mat2 n00 n01 n10 n11 pvalue >> >> [[alternative HTML version deleted]] >> >> __ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > GENE SYMBOL Sample-A1-A0SK-01 Sample-A1-A0SO-01 Sample-A1-A0SP-01 Sample-A2-A04P-01 Sample-A2-A04Q-01 Sample-A2-A04U-01 Sample-A2-A0CL-01 Sample-A2-A0CM-01 Sample-A2-A0D0-01 Sample-A2-A0D2-01 Sample-A2-A0ST-01 Sample-A2-A0SX-01 Sample-A2-A0T0-01 ADCK2 0 0 0 0 0 0 0 0 0 1 0 0 0 ADCK3 0 0 0 0 0 1 0 0 1 0 0 1 0 ADCK4 0 0 0 0 0 0 0 0 0 0 0 1 0 ADCK5 1 0 0 0 0 0 0 0 0 0 0 0 1 ADRBK1 0 0 0 0 0 0 0 0 0 0 0 0 0 ADRBK2 0 0 0 0 0 1 0 0 0 0 0 0 0 AKT10 0 0 0 0 0 0 0 0 0 0 0 0 AKT20 0 0 0 0 0 0 0 0 0 0 1 0 AKT30 0 0 0 0 0 0 0 1 0 0 1 0 ALK 0 0 0 0 0 0 0 0 0 0 0 0 0 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] help needed with printing multiple arguments as vectors, not matrices
** I am using the following way to get p-values from fiser exact test. However, I do need to print for each pair the values "n00, n01, n10, n11". How can I print that as a table and not a matrix as below along with the p-value? Any help will be greatly appreciated fish <- function(y, x) {n00 = sum((1-x)*(1-y)); n01 = sum((1-x)*y); n10 = sum(x*(1-y)); n11 = sum(x*y); a = matrix(c(n00, n01, n10, n11), nrow = 2); pval = fisher.test(a)$p.value; return(pval);} chiArray <- function(x) { apply(mat1, 1, fish, x); } sapply(1:nrow(mat2), function(j){chiArray(mat2[j, ]);}); chisq.cna.mut.test <- sapply(1:nrow(mat2), function(j){chiArray(mat2[j, ]);}); I want output to be: name1_mat1 name1_mat2 n00 n01 n10 n11 pvalue name1_mat1 name2_mat2 n00 n01 n10 n11 pvalue [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] problem with geom_point in ggplot using a different column
I want to draw boxplot where the geom_points are displayed based on "ERBB2.MUT" subset and they should be displayed in the right box (based both on the "ERBB2.2064" field and "ERBB2_Status"). However, given my command I currently only see "red" points corresponding to "MUT" subset in one straight line corresponding to only "ERBB2.2064" stratification on x-axis. It dosen't take into account the "ERBB2.Status" stratification. Can anyone help me? Call ERBB2|2064 ERBB2_Status ERBB2-MUT A 7.214E-01 CHANGE MUT B -4.208E-02 NEUTRAL MUT D 1.080E+00 NEUTRAL MUT C 2.347E-01 NEUTRAL MUT ggplot(data=testdata, aes(x=Call, y=ERBB2.2064)) + geom_boxplot(aes(fill=ERBB2_Status),width=0.8)+theme_bw()+geom_point(data=subset(testdata,ERBB2.MUT=="MUT"),aes(shape=Call,color="Red")) [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Query regarding SVD of binary matrix:
Hello, I have a binary matrix of 80k sets (sets comprising of combination of cities) by 885 cities (dimension = 80k x 885). For matrix, 1 means city is a part of the set and 0 means the city is not part of the set. Sets are rows and cities are columns (city.test). I want to do feature reduction to only keep important sets (most likely 2-10 sets of city combinations) and the associated cities. So I chose SVD and I am following these steps but not sure how to go about the next step. Could anyone help with this? s <- svd(city.test) D <- diag(s$d) d2 <- (s$d)^2 ratio <- cumsum(d2/dum(d2)) # proportion of total variance from 885 PCs. and looking at the plots, I see about first ~10 or 20 PCs explain the most variation (Please see attatched plot). How do I use this to extract the most relevant sets from my original matrix? COuld you please help. A friend of mine recommended plotting: rowSums(abs(s$u*s$d)) and choosing only the highest magnitude sets. I didn't understand the significance of it. Most probably, it reflects that only the first PC contributes the most, hence we only care about rowsum(abs(u*d)). Is this correct? Thanks. variance-cities.pdf Description: Adobe PDF document __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Assigning genes to CBS segmented output:
Thanks co much Martin for your bug reply. I will follow up with it. Best, Angel On Tue, Oct 4, 2011 at 7:35 PM, Martin Morgan wrote: > On 10/04/2011 02:44 PM, Angel Russo wrote: > >> Hi All, >> >> I have an CBS segmentation algorithm output for 10 tumor samples each from >> 2 >> different tumors. >> >> Now, I am in an urgent need to assign gene (followed by all genes present) >> that belong to a particular segment after I removed all the CNVs from >> segment data. The format of the data is: >> >> Sample Chromosome Start End Num_Probes Segment_Mean >> Sample1A-TA 1 51598 76187 15 -1.115 >> > > Hi Angel -- In Bioconductor > > http://bioconductor.org > > for some model organism create a data frame of known Entrez genes and their > begin / end locations. Start by installing necessary software and data > packages > > > source('http://bioconductor.**org/biocLite.R<http://bioconductor.org/biocLite.R> > ") > biocLite(c('org.Hs.eg.db', "GenomicRanges')) > > then load the library with annotations about genic coordinates > > library(org.Hs.eg.db) > anno = merge(toTable(org.Hs.egCHRLOC)**, toTable(org.Hs.egCHRLOCEND)) > > leading to > > > head(anno) >gene_id Chromosome start_location end_location > 1 1 1 -243666483 -244006553 > 2 1 1 -243666483 -244006553 > 3 1 1 -243651534 -244006553 > 4 1 1 -243651534 -244006553 > 5 18586 X 49217770 49223847 > 6 18586 X 49217770 49332715 > > For the simple question 'which genes are located on chromosome A starting > at X and going to Y' you could > > subset(geno, Chromosome=="A" & > abs(start_location) > X & > abs(end_location) < Y) > > This could also be done through the 'biomaRt' package or GenomicFeatures / > TxDb packages. To get this for many segments filter 'anno' to remove funky > genes, e.g., those that have negative length(!) > > idx = with(anno, abs(start_location) > abs(end_location)) > anno = anno[!idx,] > > manipulate this to a GRanges object; > > library(GenomicRanges) > gr = with(anno, GRanges(Chromosome, >IRanges(abs(start_location), abs(end_location)), >names=gene_id)) > > convert your CBS result into a GRanges > > seg = with(CBS, GRanges(Chromosome, IRanges(Start, End))) > > then find overlaps > > olap = findOverlaps(gr, seg) > > the 'gr' is called the 'query', 'seg' is called the 'subject'. > queryHits(olap) and subjectHits(olap) give equal-length vectors describing > which queries overlap which subjects. You could group gene names by segment > with > > split(names(gr)[queryHits(**olap)], subjectHits(olap)) > > An important issue is to use the same genome build for annotations as you > used for segmentation. Hope that helps / provides some hints for getting > from A to B. > > Martin > > >> Could anyone suggest an R library or code or method that I can quickly use >> to get the genes assigned to CBS output. >> >> Thanks so much, >> Angel >> >>[[alternative HTML version deleted]] >> >> __** >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/**listinfo/r-help<https://stat.ethz.ch/mailman/listinfo/r-help> >> PLEASE do read the posting guide http://www.R-project.org/** >> posting-guide.html <http://www.R-project.org/posting-guide.html> >> and provide commented, minimal, self-contained, reproducible code. >> > > > -- > Computational Biology > Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 > > Location: M1-B861 > Telephone: 206 667-2793 > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Assigning genes to CBS segmented output:
Hi All, I have an CBS segmentation algorithm output for 10 tumor samples each from 2 different tumors. Now, I am in an urgent need to assign gene (followed by all genes present) that belong to a particular segment after I removed all the CNVs from segment data. The format of the data is: Sample Chromosome Start End Num_Probes Segment_Mean Sample1A-TA 1 51598 76187 15 -1.115 Could anyone suggest an R library or code or method that I can quickly use to get the genes assigned to CBS output. Thanks so much, Angel [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Hazard ratio calculation and KM plot p-value:
I have two questions: 1) Can anyone provide me with a reference regarding calculation of Hazard ratio for two groups of data. How is it being manually calculated with an example. Unlike median time ratio which is the ratio of median times in two groups, at what time is the hazard ratio calculation done? 2) In kaplan-meier statistics of stratifying into two groups, a p-value is often calculated (e.g. log-rank p-value). p-value is the test is how significant the separation is between two groups compared to random. What is "random" in kaplan-meier statistics. How is p-value calculated? Thanks for everyone's time to read and hopefully respond as well. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] pairwise p-values in KM plots?
How can I compute pairwise p-values in Kaplan-meier plots for three or more groups? bin.1<-cut(score,c(-1000,-1,1,1000),c("low","intermediate","high")) I use "km.coxph.plot" currently which reports one p-value. Thanks very much. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] using pre-calculated coefficients and LP in coxph()?
Thanks very much Dimitrius and David. I want to compute CPE using pre-calculated beta-model and linear.predictor using the following code. I hope it the code is OK. Let me know. I am also doing some sanity checks. testc$x <- scores testc$y <- Surv(testdata$time,testdata$status) testfit <- coxph(y ~ x, testc, iter.max=0, init=1) cpe=phcpe(testfit) Thanks very much again. - On Sun, Mar 13, 2011 at 2:28 PM, David Winsemius wrote: > > On Mar 13, 2011, at 1:32 PM, Dimitris Rizopoulos wrote: > > probably you want to use the 'init' argument and 'iter.max' >> control-argument of coxph(). For example, for the Lung dataset, we fix the >> coefficients of age and ph.karno at 0.05 and -0.05, respectively: >> >> library(survival) >> >> coxph(Surv(time, status) ~ age + ph.karno, data = lung, >> init = c(0.05, -0.05), iter.max = 0) >> > > >> >> I hope it helps. >> >> Best, >> Dimitris >> >> >> On 3/13/2011 6:08 PM, Angel Russo wrote: >> >>> I need to force a coxph() function in R to use a pre-calculated set of >>> beta >>> coefficients of a gene signature consisting of xx genes and the gene >>> expression is also provided of those xx genes. >>> >> > I would have guessed (and that is all one can do without an example and > better description of what the setting and goal might be) that the use of > the offset capablity in coxph might be needed. > > -- > David. > > >>> If I try to use "coxph()" function in R using just the gene expression >>> data >>> alone, the beta coefficients and coxph$linear.predictors will change and >>> I >>> need to use the pre-calcuated linear predictor not re-computed using >>> coxph() >>> function. The reason is I need to compute a quantity that uses as it's >>> input >>> the coxph() output but I need this output to be pre-calculated >>> beta-coefficients and linear.predictor. >>> >>> Any one can show me how to do this in R? >>> >>> Thanks a lot. >>> >> > David Winsemius, MD > West Hartford, CT > > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] using pre-calculated coefficients and LP in coxph()?
I need to force a coxph() function in R to use a pre-calculated set of beta coefficients of a gene signature consisting of xx genes and the gene expression is also provided of those xx genes. If I try to use "coxph()" function in R using just the gene expression data alone, the beta coefficients and coxph$linear.predictors will change and I need to use the pre-calcuated linear predictor not re-computed using coxph() function. The reason is I need to compute a quantity that uses as it's input the coxph() output but I need this output to be pre-calculated beta-coefficients and linear.predictor. Any one can show me how to do this in R? Thanks a lot. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ROC from R-SVM?
Hi Max and Andrew, Thanks so much for your reply. Indeed I found your link last night using steps shown below. My first question is if the following two steps are right and AUC is 51% as shown below. My seond question is that currently I am using cost parameter=1 (the default in R-SVM; http://www.stanford.edu/group/wonglab/RSVMpage/R-SVM.html). To improve the AUC from ROC curve, can I can optimize the SVM cost function (instead of keeping it fixed a 1) to get the mimimum LOO error in training cross.validation and then draw the ROC from decision.values as Step2 below using a cost parameter that gave mimimum cross.validation error in the training data. Is that right? Many thanks. -- *Step 1: For obtaining ROC curve of test data I turned on "prob=T" option:* > svmres.prob <- svm(traindx[,resrsvm$SelInd], as.factor(traindy), decision.values = TRUE) > svmpred.prob <- predict(svmres.prob, testdx[,resrsvm$SelInd], decision.values = TRUE) > print(confusionMatrix(svmpred.prob,testdy)) Confusion Matrix and Statistics Reference Prediction Resistant Sensitive Resistant 513 Sensitive3788 Accuracy : 0.6503 *Step 2: Actual ROC plot command using output from above and plot attached as well as pdf (I am assuming the following says the AUC is 51.4):* > library(ROCR) > svm.roc <- prediction(attributes(svmpred.prob)$decision.values, testdy) > svm.auc <- performance(svm.roc, 'tpr', 'fpr') > aucsvm <- performance(svm.roc, 'auc') > pdf(file="roc_curve_rsvm_decval.pdf") > plot(svm.auc) > print(str(aucsvm)) > print(str(aucsvm)) Formal class 'performance' [package "ROCR"] with 6 slots ..@ x.name : chr "None" ..@ y.name : chr "Area under the ROC curve" ..@ alpha.name : chr "none" ..@ x.values: list() ..@ y.values:List of 1 .. ..$ : num 0.514 ..@ alpha.values: list() --- On Tue, Feb 22, 2011 at 4:23 PM, Andrew Ziem wrote: > In addition's to Max's suggestion about caret, look at ROCR which > visualizes ROC charts for any binary classifier. I have an example of > e1071::SVN and ROCR here > > > https://heuristically.wordpress.com/2009/12/23/compare-performance-machine-learning-classifiers-r/ > > > > -Original Message- > From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] > On Behalf Of Angel Russo > Sent: Monday, February 21, 2011 3:34 PM > To: r-help@r-project.org > Subject: [R] ROC from R-SVM? > > *Hi, > > *Does anyone know how can I show an *ROC curve for R-SVM*? I understand in > R-SVM we are not optimizing over SVM cost parameter. Any example ROC for > R-SVM code or guidance can be really useful. > > Thanks, Angel. > > > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] why no "probabilities" from svm.predict?
> library(ROCR) > library(e1071) svmres.prob <- svm(traindx, traindy, probability=TRUE) svmpred.prob <- predict(svmres.prob, testdx, probability = TRUE, decision.values = TRUE, type="prob") > print(length(attr(svmpred.prob, "probabilities"))) [1] 0 > print(attr(svmpred.prob, "probabilities")) NULL > print(attributes(svmpred.prob)$decision.values) / 1 -0.54976883 2 -0.19255459 3 -0.05190025 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] ROC from R-SVM?
*Hi, *Does anyone know how can I show an *ROC curve for R-SVM*? I understand in R-SVM we are not optimizing over SVM cost parameter. Any example ROC for R-SVM code or guidance can be really useful. Thanks, Angel. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Error in R-SVM help:
Greetings: I am trying to use your R code for R-SVM as follows. Why it dosen't print the LOO.error and the list of features? http://www.stanford.edu/group/wonglab/RSVMpage/R-SVM.html My training data as follows contains 142 cases and 264 features. instead I get en error as below "invalid 'digits' argument". Is a working example available anywhere using R-code of R-SVM? Thanks very much. > source("R-SVM.R") > data<-ReadSVMdata("train.txt") > len<-length(data$y) > print(data$y) [1] -1 -1 1 -1 -1 1 1 1 -1 -1 -1 -1 -1 -1 1 1 1 -1 -1 1 -1 1 1 1 1 [26] 1 1 1 1 -1 1 -1 1 1 -1 -1 1 -1 -1 1 -1 -1 -1 1 1 1 1 -1 -1 -1 [51] -1 -1 -1 -1 -1 -1 1 -1 -1 -1 -1 -1 1 -1 -1 -1 -1 1 -1 -1 -1 -1 1 -1 1 [76] -1 -1 1 -1 1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 1 -1 -1 -1 -1 -1 -1 1 1 -1 [101] -1 1 -1 -1 -1 -1 -1 1 -1 1 1 -1 -1 -1 -1 1 1 -1 -1 1 -1 -1 -1 -1 -1 [126] 1 -1 1 -1 1 1 -1 -1 -1 1 1 1 -1 -1 -1 -1 1 > print(dim(data$x)) [1] 142 264 > x<-data$x > y<-data$y > ladder<-CreatLadder(len) > RSVMsum <- RSVM(x,y,ladder,"LOO") > SummaryRSVM(RSVMsum) *Error in print.default("MinCV error of", min(RSVMres$Error), "at", MinLevel, : invalid 'digits' argument Calls: SummaryRSVM -> print -> print.default Execution halted* I only modified the ReadSVMdata function ReadSVMdata <- function(filename) { dd <- read.table( filename, header=TRUE,sep="\t") x <- as.matrix( dd[, 14:(ncol(dd))] ) class <- as.numeric(dd[,2]) class[class==2] <- -1 y <- class ret <- list(x=x, y=y) } and un-commented this line "print("MinCV error of", min(RSVMres$Error), "at", MinLevel, "genes" )" in function Summary(RSVM). [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] list of features from svmpath?
How can I get the list of non-zero features from svmpath at any given lambda? All I get is following information and information about what features were selected. In Iris example, we have 4 features and 60 cases. In my own example which is 200cases by 300 features, I can't figure out how to print the list of non-zero (beta) or features from svmpath. Any help? Thanks. > solution <- svmpath(as.matrix(iris[1:60,1:4]), buffer, type="class", trace=FALSE) > names(solution) [1] "alpha""alpha0" "lambda" "alpha00" "Error""SumEps" [7] "Size.Elbow" "Elbow""Moveto" "Movefrom" "Obs.step" "Step" [13] "kernel" "param.kernel" "x""y""call" > solution$alpha0 [1] 130.785000 108.685000 106.21 86.13 85.485000 71.176837 68.197549 67.660686 [9] 60.847899 58.442887 53.721098 53.544146 44.61 44.525517 44.415343 39.709048 [17] 38.832267 36.940401 33.800161 28.115484 25.859545 25.734545 24.818367 19.666905 [25] 18.833000 17.768614 17.219663 16.737715 15.503339 14.717148 8.586288 > solution$Obs.step [1] 27 53 27 53 32 51 32 51 26 55 6 55 57 6 44 57 59 44 19 44 59 52 19 44 21 52 56 21 52 42 26 [32] 56 45 52 54 42 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] hazard.ratio command:
Hi All, In absence of any reply, I am posting a slightly modified question. What is "x" in hazard.ratio in the command below? example(hazard.ratio) binscores<-cut(scores.train,c(-1000,-1,1,1000),c("low","intermediate","high")) * hazard.ratio(x=?, surv.time=train$ProgFreeSurv, surv.event=train$PfSStatus, strat=binscores) * Input to "x" is ("x a vector of risk predictions"), what does it mean? I have computed scores (binscore) above from a 100-gene signature I have derived from an expression data. I need to compute hazard rations of the groups (low and high) defined by me using "cut" command above. Thanks, Angel. -- I am interested in calculating hazard ratio of the stratified groups defined by me. library(survcomp) binscores <- cut(scores.train,c(-1000,-1,1,1000),c("low","intermediate","high")) dd <- data.frame("surv.time"=OS, "surv.event"= status, "strat"=binscores) km.coxph.plot(formula.s=Surv(surv.time, surv.event) ~ strat) How do I calculate hazard ratio between the groups defined by me as above? Thanks very much, Angel [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] hazard ratio between my groups
HI, I am interested in calculating hazard ratio of the stratified groups defined by me. library(survcomp) binscores <- cut(scores.train,c(-1000,-1,1,1000),c("low","intermediate","high")) dd <- data.frame("surv.time"=OS, "surv.event"= status, "strat"=binscores) km.coxph.plot(formula.s=Surv(surv.time, surv.event) ~ strat) How do I calculate hazard ratio between the groups defined by me as above? Thanks very much, Angel [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Urgent help requested using survfit(individual=T):
Hello: I would like to obtain probability of an event for one single patient as a function of time (from survfit.coxph) object, as I want to find what is the probability of an event say at 1 month and what is the probability of an event at 80 months and compare. So I tried the following but it fails miserably. I looked at some old posts but could not figure out the solution. Here's what I did where for one single patient, the answer is "NULL". Could anyone help me with the problem? R.app GUI 1.34 (5589 Leopard build 32-bit) > library(glmpath) *> dataall <- list(x=lung.data$x[1:130,], time=lung.data$time[1:130], status=lung.data$status[1:130])* * > fit.a <- coxpath(dataall) > testall <- list(x=lung.data$x[131:137,], time=lung.data$time[131:137], status=lung.data$status[131:137]) > testpred <- predict(fit.a, testall, s=0.99, type='coxph', mode='lambda.fraction') > testpred Call: coxph(formula = Surv(time, status) ~ x, method = object$method) coef exp(coef) se(coef) zp karno -0.00756 0.992 0.0364 -0.208 0.84 Likelihood ratio test=0.89 on 1 df, p=0.344 n= 7 > newd1 <- list(testall$x[1,]) > survtest <- survfit(testpred,newdata=newd1,individual=T) > survtest Call: survfit(formula = testpred, newdata = newd1, individual = T) records n.max n.start events median 0.95LCL 0.95UCL 7 0 0 0 0 0 0 > summary(survtest) Call: survfit(formula = testpred, newdata = newd1, individual = T) time n.risk n.event survival std.err lower 95% CI upper 95% CI > survtest$survival NULL * [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] survfit
Hello R helpers: *My first message didn't pass trough filter so here it's again* I would like to obtain probability of an event for one single patient as a function of time (from survfit.coxph) object, as I want to find what is the probability of an event say at 1 month and what is the probability of an event at 80 months and compare. So I tried the following but it fails miserably. I looked at some old posts but could not figure out the solution. Here's what I did where for one single patient, the answer is "NULL". Could anyone help me with the problem? R.app GUI 1.34 (5589 Leopard build 32-bit) > library(glmpath) *> dataall <- list(x=lung.data$x[1:130,], time=lung.data$time[1:130], status=lung.data$status[1:130])* * > fit.a <- coxpath(dataall) > testall <- list(x=lung.data$x[131:137,], time=lung.data$time[131:137], status=lung.data$status[131:137]) > testpred <- predict(fit.a, testall, s=0.99, type='coxph', mode='lambda.fraction') > testpred Call: coxph(formula = Surv(time, status) ~ x, method = object$method) coef exp(coef) se(coef) zp karno -0.00756 0.992 0.0364 -0.208 0.84 Likelihood ratio test=0.89 on 1 df, p=0.344 n= 7 > newd1 <- list(testall$x[1,]) > survtest <- survfit(testpred,newdata=newd1,individual=T) > survtest Call: survfit(formula = testpred, newdata = newd1, individual = T) records n.max n.start events median 0.95LCL 0.95UCL 7 0 0 0 0 0 0 > summary(survtest) Call: survfit(formula = testpred, newdata = newd1, individual = T) time n.risk n.event survival std.err lower 95% CI upper 95% CI > survtest$survival NULL *** [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.