[R] reading a dense file of binary number

2011-10-18 Thread Brian Tsai
hi all,

i have a file of the following format that i want to read into a matrix:

010101001110101
10101001010
01001010010
...

it has no headers or row names.

I tried to use read.table(), but it doesn't allow me to specify nothing as
the column separator (specifying sep='' means whitespace for that
function).  read.fwf doesn't seem appropriate either.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] background normalization in rma() in the affy package

2011-10-16 Thread Brian Tsai
Hi,

i was looking into the documentation for the rma() function in affy()
package, and was trying to figure out how exactly the background
normalization is done.  I read all three papers cited in the rma()
documentation, but the most detailed explanation i could find was in Irizary
et al., 2003, where they state that they compute

B(PM_{ijn})  = E[s_{ijn}  | PM_{ijn}]

where s_{ijn} is assumed to be exponential, and bg_{ijn} is normal.

I still don't understand what value is being computed here, neither am i
clear on what the correction looks like.  i.e. if s_{ijn} is an
exponentially-distributed random variable, how is bg_{ijn} fit into this?

thanks!

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] p values in coxph()

2011-09-22 Thread Brian Tsai
Hi,

I'm interested in building a Cox PH model for survival modeling, using 2
covariates (x1 and x2).   x1 represents a 'baseline' covariate, whereas x2
represents a 'new' covariate, and my goal is to figure out where x2 adds
significant predictive information over x1.

Ideally, I could get a p-value for doing this.  Originally, I thought of
doing some kind of likelihood ratio test (LRT), where i measure the
(partial) likelihood of the model with just x1, then with x1 and x2, then it
becomes a LRT with 1 degree of freedom.  But when i use the summary()
function for coxph(), i get the following output (shown at the bottom).

I have two questions:

1) What exactly are the p-values in the Pr(|z|) representing?  I understand
that the coefficients have standard errors, etc., but i'm not sure how the
p-value there is calculated.

2) At the bottom, where it shows the results of an LRT with 2df, i don't
quite understand what model the ratio is being tested against.  If the
current model has two variables (x1 and x2), and those are the extra degrees
of freedom, then the baseline should then have 0 variables, but that's not
really a Cox model?

thanks for any help.

Brian


 summary(coxph(Surv(myTime,Event)~x1+x2))
Call:
coxph(formula = Surv(myTime, Event) ~ x1 + x2)

  n= 211

  coef exp(coef) se(coef) z Pr(|z|)
x1 0.03594   1.03660  0.17738 0.203  0.83942
x2 0.53829   1.71308  0.17775 3.028  0.00246 **
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

   exp(coef) exp(-coef) lower .95 upper .95
x1 1.037 0.96470.7322 1.468
x2 1.713 0.58371.2091 2.427

Rsquare= 0.111   (max possible= 0.975 )
Likelihood ratio test= 21.95  on 2 df,   p=1.714e-05
Wald test= 20.29  on 2 df,   p=3.924e-05
Score (logrank) test = 22.46  on 2 df,   p=1.328e-05

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] R sign test for censored data

2011-07-16 Thread Brian Tsai
does anyone know a statistical test implemented in R that can do a sign test
for difference of medians, except that can handle censored data?

Thanks!

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] function to compute pvalue for comparing two ROC curves

2011-07-05 Thread Brian Tsai
hi,

I'm looking to compare two area under ROC values  for different classifiers
on the same data -- is there an r function to do this?  Thanks!

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] comparing hazard ratios

2011-07-02 Thread Brian Tsai
hi,

I'm looking for a package to compare two hazard ratios (and assign
statistical significance) obtained from two different predictive models.  I
know of the hr.comp2 function from the survcomp package, but was wondering
if there's any other packages out there.

thanks!

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] BFGS versus L-BFGS-B

2011-02-25 Thread Brian Tsai
Hi John,

Thanks so much for the informative reply!  I'm currently trying to optimize
~10,000 parameters simultaneously - for some reason, when I compare the
memory usage for L-BFGS-B and BFGS, the L-BFGS-B only uses about 1/7 of the
memory, with all default input parameters, I'm a bit surprised that it isn't
a lot less, but BFGS is definitely converging a lot slower.

My other question is that, L-BFGS-B is returning 'non-finite' errors with
respect to the gradient function I'm supplying, because again, all the
parameters i'm optimizing need to be non-negative (so i'm optimizing the log
of the parameters), but the gradient at some point divides by each
parameter, so when some of the parameters go to 0, the gradient becomes
infinite.  Do you (or anyone else) have any suggestions for how to prevent
this?  Is the only way to force the parameters to be larger than some number
close to 0 (i.e. 1e-10), or modify the gradient function to set the entry of
small parameters to 0?

Thanks!

Brian.


On Fri, Feb 25, 2011 at 10:51 AM, Prof. John C Nash nas...@uottawa.cawrote:

 There are considerable differences between the algorithms. And BFGS is an
 unfortunate
 nomenclature, since there are so many variants that are VERY different. It
 was called
 variable metric in my book from which the code was derived, and that code
 was from Roger
 Fletcher's Fortran VM code based on his 1970 paper. L-BFGS-B is a later and
 more
 complicated algorithm with some pretty nice properties. The code is much
 larger.

 Re: less memory -- this will depend on the number of parameters, but to
 my knowledge
 there are no good benchmark studies of memory and performance. Perhaps
 someone wants to
 propose one for Google Summer of Code (see
 http://rwiki.sciviews.org/doku.php?id=developers:projects:gsoc2011
 ).

 The optimx package can call Rvmmin which has box constraints (also Rcgmin
 that is intended
 for very low memory). Also several other methods with box constraints,
 including L-BFGS-B.
 Worth a try if you are seeking a method for multiple production runs.
 Unfortunately, we
 seem to have some CRAN check errors on Solaris and some old releases --
 platforms I do not
 have -- so it may be a few days or more until we sort out the issues, which
 seem to be
 related to alignment of the underlying packages for which optimx is a
 wrapper.

 Use of transformation can be very effective. But again, I don't think there
 are good
 studies on whether use of box constraints or transformations is better
 and when. Another
 project, which I have made some tentative beginings to carry out.
 Collaborations welcome.

 Best,

 JN


 On 02/25/2011 06:00 AM, r-help-requ...@r-project.org wrote:
  Message: 86
  Date: Fri, 25 Feb 2011 00:11:59 -0500
  From: Brian Tsai btsa...@gmail.com
  To: r-help@r-project.org
  Subject: [R] BFGS versus L-BFGS-B
  Message-ID:
aanlktimszvkjbuhv-bbr1easpx9ootjxqcujgujr5...@mail.gmail.com
  Content-Type: text/plain
 
  Hi all,
 
  I'm trying to figure out the effective differences between BFGS and
 L-BFGS-B
  are, besides the obvious that L-BFGS-B should be using a lot less memory,
  and the user can provide box constraints.
 
  1) Why would you ever want to use BFGS, if L-BFGS-B does the same thing
 but
  use less memory?
 
  2) If i'm optimizing with respect to a variable x that must be
 non-negative,
  a common approach is to do a change of variables x = exp(y), and optimize
  unconstrained with respect to y.  Is optimization using box constraints
 on
  x, likely to produce as good a result as unconstrained optimization on y?
 
  - Brian.
 
[[alternative HTML version deleted]]
 
 
 

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] BFGS versus L-BFGS-B

2011-02-24 Thread Brian Tsai
Hi all,

I'm trying to figure out the effective differences between BFGS and L-BFGS-B
are, besides the obvious that L-BFGS-B should be using a lot less memory,
and the user can provide box constraints.

1) Why would you ever want to use BFGS, if L-BFGS-B does the same thing but
use less memory?

2) If i'm optimizing with respect to a variable x that must be non-negative,
a common approach is to do a change of variables x = exp(y), and optimize
unconstrained with respect to y.  Is optimization using box constraints on
x, likely to produce as good a result as unconstrained optimization on y?

- Brian.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] cv.glmnet errors

2011-02-17 Thread Brian Tsai
Hi,

I am trying to do multinomial regression using the glmnet package, but the
following gives me an error (for no reason apparent to me):

library(glmnet)
cv.glmnet(x=matrix(c(1,2,3,4,5,6,1,2,3,4,5,6),
nrow=6),y=as.factor(c(1,2,1,2,3,3)),family='multinomial',alpha=0.5,
nfolds=2)

The error i get is:
Error in if (outlist$msg != Unknown error) return(outlist) :
  argument is of length zero


If i change the number of folds to 1, i get a seg fault:
 *** caught segfault ***
address 0x0, cause 'memory not mapped'

Traceback:
 1: .Fortran(lognet, parm = alpha, nobs, nvars, nc, as.double(x), y,
offset, jd, vp, ne, nx, nlam, flmin, ulam, thresh, isd, maxit, kopt, lmu
= integer(1), a0 = double(nlam * nc), ca = double(nx * nlam * nc),
ia = integer(nx), nin = integer(nlam), nulldev = double(1), dev =
double(nlam), alm = double(nlam), nlp = integer(1), jerr = integer(1),
PACKAGE = glmnet)
 2: lognet(x, is.sparse, ix, jx, y, weights, offset, type, alpha, nobs,
nvars, jd, vp, ne, nx, nlam, flmin, ulam, thresh, isd, vnames, maxit,
HessianExact, family)
 3: glmnet(x[!which, ], y_sub, lambda = lambda, offset = offset_sub,
weights = weights[!which], ...)
 4: cv.glmnet(x = matrix(c(1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6), nrow =
6), y = as.factor(c(1, 2, 1, 2, 3, 3)), family = multinomial,
alpha = 0.5, nfolds = 1)

Possible actions:



any ideas?


Brian.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] logsumexp function in R

2011-02-16 Thread Brian Tsai
Hi,

I was wondering if anyone has implemented a numerically stable function to
compute log(sum(exp(x))) ?

Thanks!

Brian

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] ggplot2 errorbarh

2010-12-12 Thread Brian Tsai
Hi,

I'm having problems using the 'width' aesthetic attribute for the
geom_errorbarh.   This is the same problem reported earlier here, but I'll
try to write the problem more clearly:

http://www.mail-archive.com/r-help@r-project.org/msg62371.html

The problem I'm having is that, the 'width' attribute is supposed to set the
height of the endpoints of the whiskers, and respectively, the width of the
endpoints of the whiskers in geom_errorbar.

The width attribute works fine in geom_errorbar (sets width of whiskers
fine), but it has no effect in geom_errorbarh (height of whiskers is the
same no matter what value).  Does anyone else still have this problem?

Also, i'm not looking for the 'size' attribute, which seems to scale the
line thickness of the entire errorbar.

Thanks!

Brian.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] ggplot2 errorbarh

2010-12-12 Thread Brian Tsai
Hi Ben,

Indeed this is what i wanted, thanks.  height does make more sense, I guess
I was just reading the ggplot2 documentation directly, which still refers to
width.


Brian


On Sun, Dec 12, 2010 at 5:12 PM, Ben Bolker bbol...@gmail.com wrote:

 Brian Tsai btsai00 at gmail.com writes:

 
  Hi,
 
  I'm having problems using the 'width' aesthetic attribute for the
  geom_errorbarh.   This is the same problem reported earlier here, but
 I'll
  try to write the problem more clearly:
 
  http://www.mail-archive.com/r-help at r-project.org/msg62371.html
 
  The problem I'm having is that, the 'width' attribute is supposed to set
 the
  height of the endpoints of the whiskers, and respectively, the width of
 the
  endpoints of the whiskers in geom_errorbar.
 
  The width attribute works fine in geom_errorbar (sets width of whiskers
  fine), but it has no effect in geom_errorbarh (height of whiskers is the
  same no matter what value).  Does anyone else still have this problem?
 

   Looking at the answers in the previous thread, I think the commenters
 are trying to answer your question but be missing the point.

  I assume you have horizontal errorbars that look like this

  |--*---|

 and you want them to look like this:

  |  |
  |--*---|
  |  |

 In this case, it would be the height aesthetic rather than the
 width aesthetic that you would want to modify the length of the
 end-caps on the bar.

  If this is *not* what you want, it would help if you could somehow
 draw a picture of your desired graph (e.g. manipulate the graph you
 have in an image editor, or use ASCII art as above) ...

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] drawing dot plots with size, shape affecting dot characteristics

2010-08-12 Thread Brian Tsai
Hi all,

I'm interested in doing a dot plot where *both* the size and color (more
specifically, shade of grey) change with the associated value.

I've found examples online for ggplot2 where you can scale the size of the
dot with a value:

http://had.co.nz/ggplot2/graphics/6a053f23cf5bdfe5155ab53d345a5e0b.png

Or scale the color with the value:

http://had.co.nz/ggplot2/graphics/b17bf93530ff6695afb366e65677c17f.png

both of which are from here:
http://had.co.nz/ggplot2/geom_point.html

but I've been playing around with ggplot2 and couldn't figure out how to do
both at the same time - ideally i want size to increase with a value, and
the shade of grey to get lighter with increasing value.


Any help's appreciated, thanks!

Brian

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] glmnet - choosing the number of features

2010-07-07 Thread Brian Tsai
Hi,

I am trying to use the glmnet package to do some simple feature selection.
 However,  I would ideally like to be able to specify the number of features
to return (the glmnet package, as far as I can tell, only allows
specification of a regularization parameter, lambda, that in turn returns a
model with a specific number of non-zero features).

Is there a straightforward way of calculating the lambda value that will
return a specific number of features? I realize there is a range of lambdas
that should give a certain number of non-zero features, but is there an easy
way of figuring out what this range is?

Thanks!

Brian

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] glmpath crossvalidation

2010-06-03 Thread Brian Tsai
Hi all,

I'm relatively new to using R, and have been trying to fit an L1
regularization path using coxpath from the glmpath library.

I'm interested in using a cross validation framework, where I crossvalidate
on a training set to select the lambda that achieves the lowest error, then
use that value of lambda on the entire training set, before applying to a
test set.  This seems to entail somehow using cv.coxpath , inspecting the
 cv.error attribute, then using the corresponding lambda in coxpath.


However, the lambda values in cv.coxpath are defined in terms of fractions
(fraction of the largest value that lambda can be sensibly), whereas it
doesn't seem like you can specify lambda with respect to its largest value
in coxpath.


Any ideas?  Thanks!

Brian.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.