Re: [R] help needed with printing multiple arguments as vectors, not matrices

2013-06-24 Thread Angel Russo
Awesome! Thanks so much for your help. I am able to get the format I was
looking for.


On Mon, Jun 24, 2013 at 2:50 PM, Adams, Jean  wrote:

> When sharing data, use dput() to output the data in a way that R-Help
> readers can easily use.
>
> Give this code a try and see if it helps.
>
> Jean
>
>
>
> # read in the example data
> mat1 <- structure(list("GENE SYMBOL" = c("ADCK2", "ADCK3", "ADCK4",
> "ADCK5",
>  "ADRBK1", "ADRBK2", "AKT1", "AKT2", "AKT3", "ALK"), Sample.A1.A0SK.01 =
> c(0L,
> 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L), Sample.A1.A0SO.01 = c(0L,
>  0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), Sample.A1.A0SP.01 = c(0L,
> 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), Sample.A2.A04P.01 = c(0L,
>  0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), Sample.A2.A04Q.01 = c(0L,
> 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), Sample.A2.A04U.01 = c(0L,
>  1L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L), Sample.A2.A0CL.01 = c(0L,
> 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), Sample.A2.A0CM.01 = c(0L,
>  0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), Sample.A2.A0D0.01 = c(0L,
> 1L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L), Sample.A2.A0D2.01 = c(1L,
>  0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), Sample.A2.A0ST.01 = c(0L,
> 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), Sample.A2.A0SX.01 = c(0L,
>  1L, 1L, 0L, 0L, 0L, 0L, 1L, 1L, 0L), Sample.A2.A0T0.01 = c(0L,
> 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L)), .Names = c("GENE SYMBOL",
>  "Sample-A1-A0SK-01", "Sample-A1-A0SO-01", "Sample-A1-A0SP-01",
> "Sample-A2-A04P-01", "Sample-A2-A04Q-01", "Sample-A2-A04U-01",
>  "Sample-A2-A0CL-01", "Sample-A2-A0CM-01", "Sample-A2-A0D0-01",
> "Sample-A2-A0D2-01", "Sample-A2-A0ST-01", "Sample-A2-A0SX-01",
>  "Sample-A2-A0T0-01"), class = "data.frame", row.names = c(NA, -10L))
> mat2 <- mat1
>
> fish <- function(x, y) {
>  n00 = sum((1-x)*(1-y))
> n01 = sum((1-x)*y)
> n10 = sum(x*(1-y))
>  n11 = sum(x*y)
> a = matrix(c(n00, n01, n10, n11), nrow=2)
> pval = fisher.test(a)$p.value
>  return(c(n00, n01, n10, n11, pval))
> }
>
> results <- data.frame(matrix(NA, nrow=dim(mat1)[1]*dim(mat2)[1], ncol=7,
> dimnames=list(NULL, c("gene1", "gene2", "n00", "n01", "n10", "n11",
> "pval"
> count <- 0
> for(i in 1:dim(mat1)[1]) {
> for(j in 1:dim(mat2)[1]) {
> count <- count + 1
> results[count, "gene1"] <- mat1$"GENE SYMBOL"[i]
>  results[count, "gene2"] <- mat2$"GENE SYMBOL"[j]
> results[count, 3:7] <- fish(mat1[i, -1], mat2[j, -1])
>  }}
>
>
>
> On Mon, Jun 24, 2013 at 10:35 AM, Angel Russo wrote:
>
>> Sample data is as follows (for simplicity assume mat1 and mat2 are the
>> same matrices). Also attached as an excel file.
>>
>> I want to get the pairwise interaction fischer test results. Not just the
>> pvalues but also want to wrote the n00, n01, n10 and n11 in the file as:
>>
>>
>> ADCK2_mat1 ADCK3_mat2 n00 n01 n10 n11 pvalue
>>
>>
>>
>> ADCK2_mat1 ADCK4_mat2 n00 n01 n10 n11 pvalue
>>
>>  GENE SYMBOL Sample-A1-A0SK-01 Sample-A1-A0SO-01 Sample-A1-A0SP-01
>> Sample-A2-A04P-01 Sample-A2-A04Q-01 Sample-A2-A04U-01 Sample-A2-A0CL-01
>> Sample-A2-A0CM-01 Sample-A2-A0D0-01 Sample-A2-A0D2-01 Sample-A2-A0ST-01
>> Sample-A2-A0SX-01 Sample-A2-A0T0-01  ADCK2 0 0 0 0 0 0 0 0 0 1 0 0 0
>> ADCK3 0 0 0 0 0 1 0 0 1 0 0 1 0  ADCK4 0 0 0 0 0 0 0 0 0 0 0 1 0  ADCK5 1
>> 0 0 0 0 0 0 0 0 0 0 0 1  ADRBK1 0 0 0 0 0 0 0 0 0 0 0 0 0  ADRBK2 0 0 0 0
>> 0 1 0 0 0 0 0 0 0  AKT1 0 0 0 0 0 0 0 0 0 0 0 0 0  AKT2 0 0 0 0 0 0 0 0 0
>> 0 0 1 0  AKT3 0 0 0 0 0 0 0 0 1 0 0 1 0  ALK 0 0 0 0 0 0 0 0 0 0 0 0 0
>>
>>
>> On Mon, Jun 24, 2013 at 11:03 AM, Adams, Jean  wrote:
>>
>>> Could you provide an example of mat1 and mat2 as well as an example of
>>> the output you would like from them.
>>>
>>> Your code is very difficult to follow as written.  It will be easier for
>>> readers of the list to interpret if you use carriage returns rather than
>>> semi colons, for example ...
>>>
>>> fish <- function(x, y) {
>>> n00 = sum((1-x)*(1-y))
>>> n01 = sum((1-x)*y)
>>>  n10 = sum(x*(1-y))
>>>  n11 = sum(x*y)
>>> a = matrix(c(n00, n01, n10, n11), nrow=2)
>>> pval = fisher.test(a)$p.value
>>>  return(pval)
>>> }
>>>
>>> Jean
>>>
>>>
>>> On Mon, Jun 24, 2013 

Re: [R] help needed with printing multiple arguments as vectors, not matrices

2013-06-24 Thread Angel Russo
Sample data is as follows (for simplicity assume mat1 and mat2 are the same
matrices). Also attached as an excel file.

I want to get the pairwise interaction fischer test results. Not just the
pvalues but also want to wrote the n00, n01, n10 and n11 in the file as:


ADCK2_mat1 ADCK3_mat2 n00 n01 n10 n11 pvalue



ADCK2_mat1 ADCK4_mat2 n00 n01 n10 n11 pvalue

 GENE SYMBOL Sample-A1-A0SK-01 Sample-A1-A0SO-01 Sample-A1-A0SP-01
Sample-A2-A04P-01 Sample-A2-A04Q-01 Sample-A2-A04U-01 Sample-A2-A0CL-01
Sample-A2-A0CM-01 Sample-A2-A0D0-01 Sample-A2-A0D2-01 Sample-A2-A0ST-01
Sample-A2-A0SX-01 Sample-A2-A0T0-01  ADCK2 0 0 0 0 0 0 0 0 0 1 0 0 0  ADCK3
0 0 0 0 0 1 0 0 1 0 0 1 0  ADCK4 0 0 0 0 0 0 0 0 0 0 0 1 0  ADCK5 1 0 0 0 0
0 0 0 0 0 0 0 1  ADRBK1 0 0 0 0 0 0 0 0 0 0 0 0 0  ADRBK2 0 0 0 0 0 1 0 0 0
0 0 0 0  AKT1 0 0 0 0 0 0 0 0 0 0 0 0 0  AKT2 0 0 0 0 0 0 0 0 0 0 0 1 0
AKT3 0 0 0 0 0 0 0 0 1 0 0 1 0  ALK 0 0 0 0 0 0 0 0 0 0 0 0 0


On Mon, Jun 24, 2013 at 11:03 AM, Adams, Jean  wrote:

> Could you provide an example of mat1 and mat2 as well as an example of the
> output you would like from them.
>
> Your code is very difficult to follow as written.  It will be easier for
> readers of the list to interpret if you use carriage returns rather than
> semi colons, for example ...
>
> fish <- function(x, y) {
> n00 = sum((1-x)*(1-y))
> n01 = sum((1-x)*y)
> n10 = sum(x*(1-y))
>  n11 = sum(x*y)
> a = matrix(c(n00, n01, n10, n11), nrow=2)
> pval = fisher.test(a)$p.value
>  return(pval)
> }
>
> Jean
>
>
> On Mon, Jun 24, 2013 at 6:09 AM, Angel Russo wrote:
>
>> **
>>
>> I am using the following way to get p-values from fiser exact test.
>> However, I do need to print for each pair the values "n00, n01, n10, n11".
>> How can I print that as a table and not a matrix as below along with the
>> p-value? Any help will be greatly appreciated
>>
>>  fish <- function(y, x) {n00 = sum((1-x)*(1-y)); n01 = sum((1-x)*y);
>> n10 = sum(x*(1-y)); n11 = sum(x*y); a = matrix(c(n00, n01, n10, n11),
>> nrow = 2); pval = fisher.test(a)$p.value; return(pval);}
>>
>>  chiArray <- function(x) {  apply(mat1, 1, fish, x); }
>>  sapply(1:nrow(mat2), function(j){chiArray(mat2[j, ]);});
>>  chisq.cna.mut.test <- sapply(1:nrow(mat2), function(j){chiArray(mat2[j,
>> ]);});
>>
>> I want output to be:
>>
>> name1_mat1 name1_mat2 n00 n01 n10 n11 pvalue
>>
>>
>>
>> name1_mat1 name2_mat2 n00 n01 n10 n11 pvalue
>>
>> [[alternative HTML version deleted]]
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
GENE SYMBOL Sample-A1-A0SK-01   Sample-A1-A0SO-01   
Sample-A1-A0SP-01   Sample-A2-A04P-01   Sample-A2-A04Q-01   
Sample-A2-A04U-01   Sample-A2-A0CL-01   Sample-A2-A0CM-01   
Sample-A2-A0D0-01   Sample-A2-A0D2-01   Sample-A2-A0ST-01   
Sample-A2-A0SX-01   Sample-A2-A0T0-01
ADCK2   0   0   0   0   0   0   0   0   0   
1   0   0   0
ADCK3   0   0   0   0   0   1   0   0   1   
0   0   1   0
ADCK4   0   0   0   0   0   0   0   0   0   
0   0   1   0
ADCK5   1   0   0   0   0   0   0   0   0   
0   0   0   1
ADRBK1  0   0   0   0   0   0   0   0   0   
0   0   0   0
ADRBK2  0   0   0   0   0   1   0   0   0   
0   0   0   0
AKT10   0   0   0   0   0   0   0   0   
0   0   0   0
AKT20   0   0   0   0   0   0   0   0   
0   0   1   0
AKT30   0   0   0   0   0   0   0   1   
0   0   1   0
ALK 0   0   0   0   0   0   0   0   0   
0   0   0   0
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] help needed with printing multiple arguments as vectors, not matrices

2013-06-24 Thread Angel Russo
**

I am using the following way to get p-values from fiser exact test.
However, I do need to print for each pair the values "n00, n01, n10, n11".
How can I print that as a table and not a matrix as below along with the
p-value? Any help will be greatly appreciated

 fish <- function(y, x) {n00 = sum((1-x)*(1-y)); n01 = sum((1-x)*y);
n10 = sum(x*(1-y)); n11 = sum(x*y); a = matrix(c(n00, n01, n10, n11),
nrow = 2); pval = fisher.test(a)$p.value; return(pval);}

 chiArray <- function(x) {  apply(mat1, 1, fish, x); }
 sapply(1:nrow(mat2), function(j){chiArray(mat2[j, ]);});
 chisq.cna.mut.test <- sapply(1:nrow(mat2), function(j){chiArray(mat2[j, ]);});

I want output to be:

name1_mat1 name1_mat2 n00 n01 n10 n11 pvalue



name1_mat1 name2_mat2 n00 n01 n10 n11 pvalue

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] problem with geom_point in ggplot using a different column

2013-04-25 Thread Angel Russo
I want to draw boxplot where the geom_points are displayed based on
"ERBB2.MUT" subset and they should be displayed in the right box (based
both on the "ERBB2.2064" field and "ERBB2_Status").

However, given my command I currently only see "red" points corresponding
to "MUT" subset in one straight line corresponding to only "ERBB2.2064"
stratification on x-axis. It dosen't take into account the "ERBB2.Status"
stratification. Can anyone help me?

Call ERBB2|2064 ERBB2_Status ERBB2-MUT
A 7.214E-01 CHANGE MUT
B -4.208E-02 NEUTRAL MUT
D 1.080E+00 NEUTRAL MUT
C 2.347E-01 NEUTRAL MUT

ggplot(data=testdata, aes(x=Call, y=ERBB2.2064)) +
geom_boxplot(aes(fill=ERBB2_Status),width=0.8)+theme_bw()+geom_point(data=subset(testdata,ERBB2.MUT=="MUT"),aes(shape=Call,color="Red"))

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Query regarding SVD of binary matrix:

2012-06-07 Thread Angel Russo
Hello,

I have a binary matrix of 80k sets (sets comprising of combination of
cities) by 885 cities
(dimension = 80k x 885). For matrix, 1 means city is a part of the set and
0 means the city is not part of the set.

Sets are rows and cities are columns (city.test).

I want to do feature reduction to only keep important sets (most likely
2-10 sets of city combinations) and the associated cities. So I chose SVD
and I am following these steps but not sure how to go about the next step.
Could anyone help with this?

s <- svd(city.test)
D <- diag(s$d)
d2 <- (s$d)^2
ratio <- cumsum(d2/dum(d2))   # proportion of total variance from 885 PCs.

and looking at the plots, I see about first ~10 or 20 PCs explain the most
variation (Please see attatched plot). How do I use this to extract the
most relevant sets from my original matrix? COuld you please help.

A friend of mine recommended plotting: rowSums(abs(s$u*s$d)) and choosing
only the highest magnitude sets. I didn't understand the significance of
it. Most probably, it reflects that only the first PC contributes the most,
hence we only care about rowsum(abs(u*d)). Is this correct?

Thanks.


variance-cities.pdf
Description: Adobe PDF document
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Assigning genes to CBS segmented output:

2011-10-04 Thread Angel Russo
Thanks co much Martin for your bug reply.

I will follow up with it.

Best,
Angel

On Tue, Oct 4, 2011 at 7:35 PM, Martin Morgan  wrote:

> On 10/04/2011 02:44 PM, Angel Russo wrote:
>
>> Hi All,
>>
>> I have an CBS segmentation algorithm output for 10 tumor samples each from
>> 2
>> different tumors.
>>
>> Now, I am in an urgent need to assign gene (followed by all genes present)
>> that belong to a particular segment after I removed all the CNVs from
>> segment data. The format of the data is:
>>
>> Sample  Chromosome  Start   End Num_Probes  Segment_Mean
>> Sample1A-TA  1   51598 76187   15  -1.115
>>
>
> Hi Angel -- In Bioconductor
>
>  http://bioconductor.org
>
> for some model organism create a data frame of known Entrez genes and their
> begin / end locations. Start by installing necessary software and data
> packages
>
>  
> source('http://bioconductor.**org/biocLite.R<http://bioconductor.org/biocLite.R>
> ")
>  biocLite(c('org.Hs.eg.db', "GenomicRanges'))
>
> then load the library with annotations about genic coordinates
>
>  library(org.Hs.eg.db)
>  anno = merge(toTable(org.Hs.egCHRLOC)**, toTable(org.Hs.egCHRLOCEND))
>
> leading to
>
> > head(anno)
>gene_id Chromosome start_location end_location
> 1 1  1 -243666483   -244006553
> 2 1  1 -243666483   -244006553
> 3 1  1 -243651534   -244006553
> 4 1  1 -243651534   -244006553
> 5 18586  X   49217770 49223847
> 6 18586  X   49217770 49332715
>
> For the simple question 'which genes are located on chromosome A starting
> at X and going to Y' you could
>
>  subset(geno, Chromosome=="A" &
> abs(start_location) > X &
>   abs(end_location) < Y)
>
> This could also be done through the 'biomaRt' package or GenomicFeatures /
> TxDb packages. To get this for many segments filter 'anno' to remove funky
> genes, e.g., those that have negative length(!)
>
>  idx = with(anno, abs(start_location) > abs(end_location))
>  anno = anno[!idx,]
>
> manipulate this to a GRanges object;
>
>  library(GenomicRanges)
>  gr = with(anno, GRanges(Chromosome,
>IRanges(abs(start_location), abs(end_location)),
>names=gene_id))
>
> convert your CBS result into a GRanges
>
>  seg = with(CBS, GRanges(Chromosome, IRanges(Start, End)))
>
> then find overlaps
>
>  olap = findOverlaps(gr, seg)
>
> the 'gr' is called the 'query', 'seg' is called the 'subject'.
> queryHits(olap) and subjectHits(olap) give equal-length vectors describing
> which queries overlap which subjects. You could group gene names by segment
> with
>
>  split(names(gr)[queryHits(**olap)], subjectHits(olap))
>
> An important issue is to use the same genome build for annotations as you
> used for segmentation. Hope that helps / provides some hints for getting
> from A to B.
>
> Martin
>
>
>> Could anyone suggest an R library or code or method that I can quickly use
>> to get the genes assigned to CBS output.
>>
>> Thanks so much,
>> Angel
>>
>>[[alternative HTML version deleted]]
>>
>> __**
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/**listinfo/r-help<https://stat.ethz.ch/mailman/listinfo/r-help>
>> PLEASE do read the posting guide http://www.R-project.org/**
>> posting-guide.html <http://www.R-project.org/posting-guide.html>
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
> --
> Computational Biology
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109
>
> Location: M1-B861
> Telephone: 206 667-2793
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Assigning genes to CBS segmented output:

2011-10-04 Thread Angel Russo
Hi All,

I have an CBS segmentation algorithm output for 10 tumor samples each from 2
different tumors.

Now, I am in an urgent need to assign gene (followed by all genes present)
that belong to a particular segment after I removed all the CNVs from
segment data. The format of the data is:

Sample  Chromosome  Start   End Num_Probes  Segment_Mean
Sample1A-TA  1   51598   76187   15  -1.115

Could anyone suggest an R library or code or method that I can quickly use
to get the genes assigned to CBS output.

Thanks so much,
Angel

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Hazard ratio calculation and KM plot p-value:

2011-04-05 Thread Angel Russo
I have two questions:

1) Can anyone provide me with a reference regarding calculation of Hazard
ratio for two groups of data. How is it being manually calculated with an
example. Unlike median time ratio which is the ratio of median times in two
groups, at what time is the hazard ratio calculation done?

2) In kaplan-meier statistics of stratifying into two groups, a p-value is
often calculated (e.g. log-rank p-value). p-value is the test is how
significant the separation is between two groups compared to random. What is
"random" in kaplan-meier statistics. How is p-value calculated?

Thanks for everyone's time to read and hopefully respond as well.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] pairwise p-values in KM plots?

2011-03-13 Thread Angel Russo
How can I compute pairwise p-values in Kaplan-meier plots for three or more
groups?

 bin.1<-cut(score,c(-1000,-1,1,1000),c("low","intermediate","high"))

I use "km.coxph.plot" currently which reports one p-value.

Thanks very much.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] using pre-calculated coefficients and LP in coxph()?

2011-03-13 Thread Angel Russo
Thanks very much Dimitrius and David.

I want to compute CPE using pre-calculated beta-model and linear.predictor
using the following code. I hope it the code is OK. Let me know. I am also
doing some sanity checks.

testc$x <- scores
testc$y <- Surv(testdata$time,testdata$status)
testfit <- coxph(y ~ x, testc, iter.max=0, init=1)
cpe=phcpe(testfit)

Thanks very much again.

-

On Sun, Mar 13, 2011 at 2:28 PM, David Winsemius wrote:

>
> On Mar 13, 2011, at 1:32 PM, Dimitris Rizopoulos wrote:
>
>  probably you want to use the 'init' argument and 'iter.max'
>> control-argument of coxph(). For example, for the Lung dataset, we fix the
>> coefficients of age and ph.karno at 0.05 and -0.05, respectively:
>>
>> library(survival)
>>
>> coxph(Surv(time, status) ~ age + ph.karno, data = lung,
>>   init = c(0.05, -0.05), iter.max = 0)
>>
>
>
>>
>> I hope it helps.
>>
>> Best,
>> Dimitris
>>
>>
>> On 3/13/2011 6:08 PM, Angel Russo wrote:
>>
>>> I need to force a coxph() function in R to use a pre-calculated set of
>>> beta
>>> coefficients of a gene signature consisting of xx genes and the gene
>>> expression is also provided of those xx genes.
>>>
>>
> I would have guessed (and that is all one can do without an example and
> better description of what the setting and goal might be) that the use of
> the offset capablity in coxph might be needed.
>
> --
> David.
>
>
>>> If I try to use "coxph()" function in R using just the gene expression
>>> data
>>> alone, the beta coefficients and coxph$linear.predictors will change and
>>> I
>>> need to use the pre-calcuated linear predictor not re-computed using
>>> coxph()
>>> function. The reason is I need to compute a quantity that uses as it's
>>> input
>>> the coxph() output but I need this output to be pre-calculated
>>> beta-coefficients and linear.predictor.
>>>
>>> Any one can show me how to do this in R?
>>>
>>> Thanks a lot.
>>>
>>
> David Winsemius, MD
> West Hartford, CT
>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] using pre-calculated coefficients and LP in coxph()?

2011-03-13 Thread Angel Russo
I need to force a coxph() function in R to use a pre-calculated set of beta
coefficients of a gene signature consisting of xx genes and the gene
expression is also provided of those xx genes.

If I try to use "coxph()" function in R using just the gene expression data
alone, the beta coefficients and coxph$linear.predictors will change and I
need to use the pre-calcuated linear predictor not re-computed using coxph()
function. The reason is I need to compute a quantity that uses as it's input
the coxph() output but I need this output to be pre-calculated
beta-coefficients and linear.predictor.

Any one can show me how to do this in R?

Thanks a lot.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] ROC from R-SVM?

2011-02-22 Thread Angel Russo
Hi Max and Andrew,

Thanks so much for your reply. Indeed I found your link last night using
steps shown below.

My first question is if the following two steps are right and AUC is 51% as
shown below.

My seond question is that currently I am using cost parameter=1 (the default
in R-SVM; http://www.stanford.edu/group/wonglab/RSVMpage/R-SVM.html). To
improve the AUC from ROC curve, can I can optimize the SVM cost function
(instead of keeping it fixed a 1) to get the mimimum LOO error in training
cross.validation and then draw the ROC from decision.values as Step2 below
using a cost parameter that gave mimimum cross.validation error in the
training data. Is that right?

Many thanks.
--

*Step 1: For obtaining ROC curve of test data I turned on "prob=T" option:*

> svmres.prob <- svm(traindx[,resrsvm$SelInd], as.factor(traindy),
decision.values = TRUE)
> svmpred.prob <- predict(svmres.prob, testdx[,resrsvm$SelInd],
decision.values = TRUE)
> print(confusionMatrix(svmpred.prob,testdy))
Confusion Matrix and Statistics

   Reference
Prediction  Resistant Sensitive
  Resistant 513
  Sensitive3788

 Accuracy : 0.6503


*Step 2: Actual ROC plot command using output from above and plot attached
as well as pdf (I am assuming the following says the AUC is 51.4):*
> library(ROCR)
> svm.roc <- prediction(attributes(svmpred.prob)$decision.values, testdy)
> svm.auc <- performance(svm.roc, 'tpr', 'fpr')
> aucsvm <- performance(svm.roc, 'auc')
> pdf(file="roc_curve_rsvm_decval.pdf")
> plot(svm.auc)
> print(str(aucsvm))
> print(str(aucsvm))
Formal class 'performance' [package "ROCR"] with 6 slots
  ..@ x.name  : chr "None"
  ..@ y.name  : chr "Area under the ROC curve"
  ..@ alpha.name  : chr "none"
  ..@ x.values: list()
  ..@ y.values:List of 1
  .. ..$ : num 0.514
  ..@ alpha.values: list()

---



On Tue, Feb 22, 2011 at 4:23 PM, Andrew Ziem  wrote:

> In addition's to Max's suggestion about caret, look at ROCR which
> visualizes ROC charts for any binary classifier.  I have an example of
> e1071::SVN and ROCR here
>
>
> https://heuristically.wordpress.com/2009/12/23/compare-performance-machine-learning-classifiers-r/
>
>
>
> -Original Message-
> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
> On Behalf Of Angel Russo
> Sent: Monday, February 21, 2011 3:34 PM
> To: r-help@r-project.org
> Subject: [R] ROC from R-SVM?
>
> *Hi,
>
> *Does anyone know how can I show an *ROC curve for R-SVM*? I understand in
> R-SVM we are not optimizing over SVM cost parameter. Any example ROC for
> R-SVM code or guidance can be really useful.
>
> Thanks, Angel.
>
>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] why no "probabilities" from svm.predict?

2011-02-21 Thread Angel Russo
> library(ROCR)
> library(e1071)
svmres.prob <- svm(traindx, traindy, probability=TRUE)
svmpred.prob <- predict(svmres.prob, testdx, probability = TRUE,
decision.values = TRUE, type="prob")

> print(length(attr(svmpred.prob, "probabilities")))
[1] 0
> print(attr(svmpred.prob, "probabilities"))
NULL
> print(attributes(svmpred.prob)$decision.values)
  /
1   -0.54976883
2   -0.19255459
3   -0.05190025

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] ROC from R-SVM?

2011-02-21 Thread Angel Russo
*Hi,

*Does anyone know how can I show an *ROC curve for R-SVM*? I understand in
R-SVM we are not optimizing over SVM cost parameter. Any example ROC for
R-SVM code or guidance can be really useful.

Thanks, Angel.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Error in R-SVM help:

2011-02-20 Thread Angel Russo
Greetings:

I am trying to use your R code for R-SVM as follows. Why it dosen't print
the LOO.error and the list of features?

http://www.stanford.edu/group/wonglab/RSVMpage/R-SVM.html

My training data as follows contains 142 cases and 264 features. instead I
get en error as below "invalid 'digits' argument". Is a working example
available anywhere using R-code of R-SVM?

Thanks very much.

> source("R-SVM.R")
> data<-ReadSVMdata("train.txt")
> len<-length(data$y)
> print(data$y)
  [1] -1 -1  1 -1 -1  1  1  1 -1 -1 -1 -1 -1 -1  1  1  1 -1 -1  1 -1  1  1
1  1
 [26]  1  1  1  1 -1  1 -1  1  1 -1 -1  1 -1 -1  1 -1 -1 -1  1  1  1  1 -1
-1 -1
 [51] -1 -1 -1 -1 -1 -1  1 -1 -1 -1 -1 -1  1 -1 -1 -1 -1  1 -1 -1 -1 -1  1
-1  1
 [76] -1 -1  1 -1  1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1  1 -1 -1 -1 -1 -1 -1  1
1 -1
[101] -1  1 -1 -1 -1 -1 -1  1 -1  1  1 -1 -1 -1 -1  1  1 -1 -1  1 -1 -1 -1
-1 -1
[126]  1 -1  1 -1  1  1 -1 -1 -1  1  1  1 -1 -1 -1 -1  1
> print(dim(data$x))
[1] 142 264
> x<-data$x
> y<-data$y
> ladder<-CreatLadder(len)
> RSVMsum <- RSVM(x,y,ladder,"LOO")
> SummaryRSVM(RSVMsum)
*Error in print.default("MinCV error of", min(RSVMres$Error), "at",
MinLevel,  :
  invalid 'digits' argument
Calls: SummaryRSVM -> print -> print.default
Execution halted*



I only modified the ReadSVMdata function

ReadSVMdata <- function(filename)
{
dd <- read.table( filename, header=TRUE,sep="\t")
x <- as.matrix( dd[, 14:(ncol(dd))] )
class <- as.numeric(dd[,2])
class[class==2] <- -1

y <- class
ret <- list(x=x, y=y)
}

and un-commented this line "print("MinCV error of", min(RSVMres$Error),
"at", MinLevel, "genes" )" in function Summary(RSVM).

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] list of features from svmpath?

2011-02-20 Thread Angel Russo
How can I get the list of non-zero features from svmpath at any given
lambda? All I get is following information and information about what
features were selected. In Iris example, we have 4 features and 60 cases. In
my own example which is 200cases by 300 features, I can't figure out how to
print the list of non-zero (beta) or features from svmpath.

Any help? Thanks.




> solution <- svmpath(as.matrix(iris[1:60,1:4]), buffer, type="class",
trace=FALSE)

> names(solution)
 [1] "alpha""alpha0"   "lambda"   "alpha00"
"Error""SumEps"
 [7] "Size.Elbow"   "Elbow""Moveto"   "Movefrom"
"Obs.step" "Step"
[13] "kernel"   "param.kernel" "x""y""call"


> solution$alpha0
 [1] 130.785000 108.685000 106.21  86.13  85.485000  71.176837
68.197549  67.660686
 [9]  60.847899  58.442887  53.721098  53.544146  44.61  44.525517
44.415343  39.709048
[17]  38.832267  36.940401  33.800161  28.115484  25.859545  25.734545
24.818367  19.666905
[25]  18.833000  17.768614  17.219663  16.737715  15.503339  14.717148
8.586288
> solution$Obs.step
 [1] 27 53 27 53 32 51 32 51 26 55  6 55 57  6 44 57 59 44 19 44 59 52 19 44
21 52 56 21 52 42 26
[32] 56 45 52 54 42

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] hazard.ratio command:

2011-02-14 Thread Angel Russo
Hi All,

In absence of any reply, I am posting a slightly modified question. What is
"x" in hazard.ratio in the command below?

example(hazard.ratio)

binscores<-cut(scores.train,c(-1000,-1,1,1000),c("low","intermediate","high"))

*

hazard.ratio(x=?, surv.time=train$ProgFreeSurv,
surv.event=train$PfSStatus, strat=binscores)

*

Input to "x" is ("x a vector of risk predictions"), what does it mean?
I have computed scores (binscore) above from a 100-gene signature I
have derived from an expression data. I need to compute hazard rations
of the groups (low and high) defined by me using "cut" command above.

Thanks, Angel.

--

I am interested in calculating hazard ratio of the stratified groups defined
by me.

library(survcomp)
binscores <-
cut(scores.train,c(-1000,-1,1,1000),c("low","intermediate","high"))

dd <- data.frame("surv.time"=OS, "surv.event"= status, "strat"=binscores)
km.coxph.plot(formula.s=Surv(surv.time, surv.event) ~ strat)

How do I calculate hazard ratio between the groups defined by me as above?
Thanks very much,
Angel

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] hazard ratio between my groups

2011-02-10 Thread Angel Russo
HI,

I am interested in calculating hazard ratio of the stratified groups defined
by me.

library(survcomp)
binscores <-
cut(scores.train,c(-1000,-1,1,1000),c("low","intermediate","high"))

dd <- data.frame("surv.time"=OS, "surv.event"= status, "strat"=binscores)
km.coxph.plot(formula.s=Surv(surv.time, surv.event) ~ strat)

How do I calculate hazard ratio between the groups defined by me as above?
Thanks very much,
Angel

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Urgent help requested using survfit(individual=T):

2010-12-14 Thread Angel Russo
Hello:

I would like to obtain probability of an event for one single patient as a
function of time (from survfit.coxph) object, as I want to find what is the
probability of an event say at 1 month and what is the probability of an
event at 80 months and compare. So I tried the following but it fails
miserably. I looked at some old posts but could not figure out the solution.
Here's what I did where for one single patient, the answer is "NULL". Could
anyone help me with the problem?

R.app GUI 1.34 (5589 Leopard build 32-bit)

> library(glmpath)
*> dataall <- list(x=lung.data$x[1:130,], time=lung.data$time[1:130],
status=lung.data$status[1:130])*
*

> fit.a <- coxpath(dataall)

> testall <- list(x=lung.data$x[131:137,], time=lung.data$time[131:137],
status=lung.data$status[131:137])

> testpred <- predict(fit.a, testall, s=0.99, type='coxph',
mode='lambda.fraction')


> testpred

Call:

coxph(formula = Surv(time, status) ~ x, method = object$method)

  coef exp(coef) se(coef)  zp

karno -0.00756 0.992   0.0364 -0.208 0.84

Likelihood ratio test=0.89  on 1 df, p=0.344  n= 7

> newd1 <- list(testall$x[1,])

> survtest <- survfit(testpred,newdata=newd1,individual=T)

> survtest

Call: survfit(formula = testpred, newdata = newd1, individual = T)


records   n.max n.start  events  median 0.95LCL 0.95UCL

  7   0   0   0   0   0   0

> summary(survtest)

Call: survfit(formula = testpred, newdata = newd1, individual = T)

  time n.risk n.event survival std.err lower 95% CI upper 95% CI

  > survtest$survival



NULL


*

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] survfit

2010-12-14 Thread Angel Russo
Hello R helpers:

*My first message didn't pass trough filter so here it's again*

I would like to obtain probability of an event for one single patient as a
function of time (from survfit.coxph) object, as I want to find what is the
probability of an event say at 1 month and what is the probability of an
event at 80 months and compare. So I tried the following but it fails
miserably. I looked at some old posts but could not figure out the solution.
Here's what I did where for one single patient, the answer is "NULL". Could
anyone help me with the problem?

R.app GUI 1.34 (5589 Leopard build 32-bit)

> library(glmpath)
*> dataall <- list(x=lung.data$x[1:130,], time=lung.data$time[1:130],
status=lung.data$status[1:130])*
*

> fit.a <- coxpath(dataall)

> testall <- list(x=lung.data$x[131:137,], time=lung.data$time[131:137],
status=lung.data$status[131:137])

> testpred <- predict(fit.a, testall, s=0.99, type='coxph',
mode='lambda.fraction')


> testpred

Call:

coxph(formula = Surv(time, status) ~ x, method = object$method)

  coef exp(coef) se(coef)  zp

karno -0.00756 0.992   0.0364 -0.208 0.84

Likelihood ratio test=0.89  on 1 df, p=0.344  n= 7

> newd1 <- list(testall$x[1,])

> survtest <- survfit(testpred,newdata=newd1,individual=T)

> survtest

Call: survfit(formula = testpred, newdata = newd1, individual = T)


records   n.max n.start  events  median 0.95LCL 0.95UCL

  7   0   0   0   0   0   0

> summary(survtest)

Call: survfit(formula = testpred, newdata = newd1, individual = T)

time n.risk n.event survival std.err lower 95% CI upper 95% CI

> survtest$survival




NULL


***

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.