Re: [R] How to colorize the panel backgrounds of pairs()?

2012-03-02 Thread Marius Hofert
Dear Ilai,

I tried to also adjust the diagonal panels. However, the variable names are not
positioned correctly anymore. Do you know a solution?

Cheers,

Marius

count - 0
mypanel - function(x, y, ...){
   count - count+1
   bg - if(count %in% c(1,4,9,12)) #FDFF65 else transparent
   ll - par(usr)
   rect(ll[1], ll[3], ll[2], ll[4], col=bg)
   points(x, y, cex=0.5)
}

mydiag.panel - function(x, ...){
   ll - par(usr)
   rect(ll[1], ll[3], ll[2], ll[4], col=#FDFF65)
}

U - matrix(runif(4*500), ncol=4)
pairs(U, panel=mypanel, diag.panel=mydiag.panel)


Marius Hofert marius.hof...@math.ethz.ch writes:

 Indeed, precisely what I was looking for. Many thanks, Ilai.

 ilai ke...@math.montana.edu writes:

 par('bg') is not what you are looking for - it will set the bg of the
 whole graphic device, not panels. I think you want:
 count - 0
 mypanel - function(x, y, ...){
count - count+1
ll- par('usr')
if(count %in% c(1,4,9,12)) bg- #FDFF65
else bg- 'transparent'
rect(ll[1],ll[3],ll[2],ll[4],col=bg)
points(x, y, cex=0.5)
 }

 Cheers

 On Thu, Mar 1, 2012 at 4:49 PM, Marius Hofert
 marius.hof...@math.ethz.ch wrote:
 Dear expeRts,

 I would like to colorize the backgrounds of a pairs plot according to the
 respective panel number. Here is what I tried (without success):

 count - 0
 mypanel - function(x, y, ...){
    count - count+1
    bg. - if(count %in% c(1,4,9,12)) #FDFF65 else NA
    points(x, y, cex=0.5, bg=bg)
 }

 U - matrix(runif(4*500), ncol=4)
 pairs(U, panel=mypanel)

 I also tried to set par(bg=bg.) before the call to points(), but that didn't
 work either. The only thing I found is that bg= can be used to fill 
 certain
 plot symbols, but I would like to colorize the background of each panel, not
 the drawn circles.

 Cheers,

 Marius

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Parameterization of Inverse Wishart distribution available in MCMCpack and bayesm libraries

2012-03-02 Thread peter dalgaard

On Mar 2, 2012, at 05:55 , ilai wrote:

 What do you make of the following from ?riwish
 
   riwish(v, S)
 snip
   v: Degrees of freedom (scalar).
 
 does a m/2 parameterization yield a scalar for, say, 3 dof ?


Er, yes (scalar does not imply integer)

As a general matter:

1. This is the Open Source world, you can read the actual function and see what 
it does. It might even say so in the comments.

2. You can investigate empirically -- the moments are known as a function of 
the parameters (check Wikipedia), so how about simulating a few thousand 
matrices and looking at the means and variances. 

(I don't do IW on a daily basis, but AFAICT, the two parametrizations have 
roughly the same mean, but a factor of two between variances, so it should be 
fairly easy to spot whether it is one or the other.)

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Rscript example

2012-03-02 Thread Petr Savicky
On Thu, Mar 01, 2012 at 11:18:33PM -0800, statquant2 wrote:
 Hey Petr,
 ok I was thinking that R would handle the split by itself.
 I guess using eval we can even make arg1=val1 being executed by R.

Hi.

For executing the assignments, try myRscript.R containing

  args - commandArgs(TRUE);
  argmat - sapply(strsplit(args, =), identity)
 
  for (i in seq.int(length=ncol(argmat))) {
  assign(argmat[1, i], argmat[2, i])
  }

  # available variables
  print(ls()) 

  # print variables arg1, arg2, arg3
  print(arg1)
  print(arg2)
  print(arg3)

Then the command line

  ./myRscript.R arg1=aa arg2=22 arg3=cc

yields

  [1] argmat args   arg1   arg2   arg3   i 
  [1] aa
  [1] 22
  [1] cc

Another option for setting some variables before executing an R script
is to have two scripts, a shell script and an R script, containing

shell script callR.sh

  #!/bin/bash
  
  PATH_TO_R/bin/R --vanilla  EEE
  args - c($2,$3,$4,$5)
  source($1)
  EEE

a trivial R script tst.R

  print(args)

Then,

  ./callR.sh tst.R aa 22 cc

leads to

  ...
  ...
  Type 'demo()' for some demos, 'help()' for on-line help, or
  'help.start()' for an HTML browser interface to help.
  Type 'q()' to quit R.
  
   args - c(aa,22,cc,)
   source(tst.R)
  [1] aa 22 cc   
   

Hope this helps.

Petr Savicky.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] read.table issue with #

2012-03-02 Thread Jim Holtman
use the 'comment.char' parameter of read.table

Sent from my iPad

On Mar 1, 2012, at 17:51, Rui Barradas rui1...@sapo.pt wrote:

 Hello,
 
 
 The problem is that I get a the following error bacause anything after the
 # is ignored.
 
 Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, 
 :
  line 6 did not have 500 elements
 
 R thinks that line 6 has only 2 elements because of the #.
 
 
 Use 'readLines' instead, followed by 'strsplit'.
 In the example below the separator is a space.
 
 tc - textConnection(
 yes yes yes yes yes
 yes yes yes yes yes
 yes yes # yes yes
 )
 #x - read.table(tc)   # same error: line 3 did not have 5 elements
 x - readLines(tc)
 close(tc)
 strsplit(x,  )
 
 Hope this helps,
 
 Rui Barradas
 
 
 --
 View this message in context: 
 http://r.789695.n4.nabble.com/read-table-issue-with-tp4436554p4436737.html
 Sent from the R help mailing list archive at Nabble.com.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Why do my regular expressions require a double escape \\ to get a literal??

2012-03-02 Thread Roey Angel

Hi,
I was recently misfortunate enough to have to use regular expressions to 
sort out some data in R.
I'm working on a data file which contains taxonomical data of bacteria 
in hierarchical order.

A sample of this file can be generated using:

tax.data - read.table(header=F, con - textConnection('
G9SS7BA01D15EC  Bacteria(100)Cyanobacteria(84)unclassified
G9SS7BA01C9UIRBacteria(100)Proteobacteria(94)
Alphaproteobacteria(89)
G9SS7BA01CM00DBacteria(100)Proteobacteria(99)
Alphaproteobacteria(99)

'))
close(con)

What I try to do is to remove the parenthesis and the number inside 
(which could contain a decimal point)
I assumed that the following command would solve it, but instead I got 
an error.


tax.data - as.data.frame(apply(tax.data, 2, function(x) 
gsub('\(.*\)','',x)))

Error: '\(' is an unrecognized escape in character string starting \(

And it doesn't matter if I use perl = TRUE or not.
To solve it I need to use a double escape sign '\\' before opening and 
closing the parenthesis:


tax.data - as.data.frame(apply(tax.data, 2, function(x) 
gsub('\\(.*\\)','',x)))


This yields the desired result but I wonder why it does that?
No other regular expression system I'm used to (e.g. Perl, Shell) works 
like that.


I'm using R 2.14 (but also R 2.10) and I get the same results on Ubuntu 
and win XP.


I'd appreciate any explanation.

Thanks in advance,
baffled Roey

--
Dr. Roey Angel

Max-Planck-Institute for Terrestrial Microbiology
Karl-von-Frisch-Strasse 10
D-35043 Marburg, Germany

Office: +49 (0)6421/178-832
Mobile: +49 (0)176/612-785-88

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] 回复: Bayesian Hidden Markov Models

2012-03-02 Thread monkeylan
Dear Oscar,
 
Thanks for your help.It's so nice of you to explain this package to me.
 
Best Regards,
 
James LAN

发件人: Oscar Rueda [via R] ml-node+s789695n4431468...@n4.nabble.com
收件人: monkeylan lanjin...@yahoo.com.cn 
发送日期: 2012年2月29日, 星期三, 下午 9:21
主题: Re: Bayesian Hidden Markov Models


Dear James, 

The distances are normalized between zero and 1, so in your case all of them 
will be zero. You can check that with 

 res$Dist.for.model 

And do 

 Q.NH(summary(res)[[1]]$beta, x=0) 

To obtain the common transition matrix. 

Cheers, 
Oscar 


On 29/2/12 03:59, monkeylan [hidden email] wrote: 


 Dear Oscar, 
   
 I am extremely grateful to your help and detailed explanation of the use of 
 RJaCGH package. 
 But, when runing the sample codes you listed, another issue I am a little 
 confused is as following: 
 After runing summary(res), I have got the estimation of the random matrix 
 Beta: 
 
 Parameters of the transition functions: 
        Normal  Gain 
 Normal  0.000 4.258 
 Gain    2.001 0.000 
   
 But, the transition probabilty matrix Q based on the aboving Beta is more 
 concerned in my modeling. 
 Here, I am not sure how can I get the  matrix Q. I did try the Q.NH 
 functions.However, Shoud I set the distance parameter x be 1 or 0? I am not 
 sure. 
   
  If 1( according to my own understanding), the following result seems not 
 reseanable. 
   
 tran-matrix(c(0,2.001,4.528,0),2,2) 
 Q.NH(beta=tran, x=1) 
      [,1] [,2] 
 [1,]  0.5  0.5 
 [2,]  0.5  0.5 
   
 Many thanks for your further help and time. 
   
 James Allan 
 
 --- 12年2月28日,周二, Oscar Rueda [via R] 
 [hidden email] 写道: 
 
 
 发件人: Oscar Rueda [via R] [hidden email] 
 主题: Re: Bayesian Hidden Markov Models 
 收件人: monkeylan [hidden email] 
 日期: 2012年2月28日,周二,下午7:02 
 
 
 Dear James, 
 
 Basically you just need the values (y) and the positions (in your case it 
 would be the index of the times series). The chromosome argument does not 
 apply to your case so it can be a vector of ones. 
 If the positions are at the same distance between (equally spaced) then the 
 model will be homogeneous. 
 
 So for example something like this would be enough: 
 library(RJaCGH) 
 y - c(rnorm(100,0,1), rnorm(20, 2, 1), rnorm(50, 0, 1)) 
 Pos - 1:length(y) 
 Chrom - rep(1, length(y)) 
 res - RJaCGH(y=y, Pos=Pos, Chrom=Chrom) 
 summary(res) 
 
 However, it uses a Reversible Jump algorithm and therefore jumps between 
 models with different hidden states. I would suggest you take a look at the 
 vignette that comes with the package or the paper that is referenced there 
 for specific details of the model it fits. 
 
 
 Hope it helps, 
 Oscar 
   
 
 
 On 28/2/12 04:52, monkeylan [hidden email] wrote: 
 
 
 Dear Doctor Oscar, 
   
 Sorry for not noticing that you are the author of the RJaCGH package. 
 
 But I noticed that hidden Markov model in your package is with 
 non-homogeneous 
 transition probabilities. Here in my work, the HMM is just a first-order 
 homogeneous Markov chain, i.e. the  transition  matrix is constant. 
   
 So, Could you please tell me how can I adjust the R functions in your 
 package 
 to implement my analysis? 
   
 Best Regards, 
   
 James Allan 
 
 
 --- 12年2月27日,周一, Oscar Rueda [via R] 
 [hidden email] 写道: 
 
 
 发件人: Oscar Rueda [via R] [hidden email] 
 主题: Re: Bayesian Hidden Markov Models 
 收件人: monkeylan [hidden email] 
 日期: 2012年2月27日,周一,下午6:05 
 
 
 Dear James, 
 Although designed for the analysis of copy number CGH microarrays, RJaCGH 
 uses a Bayesian HMM model. 
 
 Cheers, 
 Oscar 
 
 
 On 27/2/12 08:32, monkeylan [hidden email] wrote: 
 
 
 Dear R buddies, 
 
 Recently, I attempt to model the US/RMB Exchange rate log-return time 
 series 
 with a *Hidden Markov model (first order Markov Chain  mixed Normal 
 distributions). * 
 
 I have applied the RHmm package to accomplish this task, but the results 
 are 
 not so satisfying. 
 So, I would like to try a *Bayesian method *for the parameter estimation of 
 the Hidden Markov model. 
 
 Could anyone kindly tell me which R package can perform Bayesian estimation 
 of the model? 
 
 Many thanks for your help and time. 
 
 Best Regards, 
 James Allan 
 
 
 -- 
 View this message in context: 
 http://r.789695.n4.nabble.com/Bayesian-Hidden-Markov-Models-tp4423946p442394
 6 
 . 
 
 html 
 Sent from the R help mailing list archive at Nabble.com. 
 
 __ 
 [hidden email] mailing list 
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code. 
 Oscar M. Rueda, PhD. 
 Postdoctoral Research Fellow, Breast Cancer Functional Genomics. 
 Cancer Research UK Cambridge Research Institute. 
 Li Ka Shing Centre, Robinson Way. 
 

Re: [R] Delete rows from data.frame matching a certain criteria

2012-03-02 Thread Petr PIKAL
Hi

my favourite would be

test$v[which(test$pattern==1)]-NA

Regards
Petr


 Hi,
 
 On Mar 1, 2012, at 12:38 PM, Sarah Goslee wrote:
 
  Hi,
  
  On Thu, Mar 1, 2012 at 11:11 AM, mails mails00...@gmail.com wrote:
  Hello,
  
  
  consider the following data.frame:
  
  test - data.frame(n = c(1,2,3,4,5), v = c(6,5,7,5,3), pattern =
  c(1,1,NA,1,NA))
  
 
  snip 
 
  So basically the result should look like this:
  test
   n v pattern
  1  1 NA   1
  2  2 NA  1
  3  3 7  NA
  4  4 NA   1
  5  5 3  NA
  
  So far, I solved it by creating subsets and using merge but it turns 
out to
  be super slow. Is there a way to do that
  with the apply function?
  
  Far too much work. What about:
  
  test$v - ifelse(test$pattern == 1, NA, v)
  test
   n  v pattern
  1 1 NA   1
  2 2 NA   1
  3 3 NA  NA
  4 4 NA   1
  5 5 NA  NA
 
 Actually that doesn't work because of those pesky missing values. You 
need
 
 test -  transform(test, v = ifelse(pattern == 1  !is.na(pattern), NA, 
v))
 
 Best,
 Ista
 
  
  
  -- 
  Sarah Goslee
  http://www.functionaldiversity.org
  
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Vector errors and missing values

2012-03-02 Thread Petr PIKAL
Hi

 
 Hi,
 
 I am trying to run two Non-Gaussian regressions: logistic and probit. I 
am
 receiving two different errors when I try to run these regressions and I 
am
 not sure what they mean or how to fix my syntax.
 
 Here is the logistic regression error:
 
 Error in family$linkfun(mustart) : 
   Argument mu must be a nonempty numeric vector
 
 Here is the probit regression error:
 
 Error in pmax(eta, -thresh) : cannot mix 0-length vectors with others

Without any code and structure of your data you probably does not get much 
help. See the bottom of any email. 

PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Probably somehow your data are not as you expect. What 

str(yourdata) tells you? Is it in compliance with function you use? 

Regards
Petr


 
 The dataset that I am using has some missing data. R puts NA values in 
place
 of the missing values. I am not sure if this is what is causing my 
vector
 problems or not. I have tried to use the 'data=na.omit(DataMiss)' in my 
glm
 as well as the command: 'na.action=na.exclude', but I am not sure if I 
am
 using the correct syntax because it is not working. 
 
 Any help would be most appreciated.
 
 --
 View this message in context: 
http://r.789695.n4.nabble.com/Vector-errors-
 and-missing-values-tp4437306p4437306.html
 Sent from the R help mailing list archive at Nabble.com.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Why do my regular expressions require a double escape \\ to get a literal??

2012-03-02 Thread Berend Hasselman

On 02-03-2012, at 09:36, Roey Angel wrote:

 Hi,
 I was recently misfortunate enough to have to use regular expressions to sort 
 out some data in R.
 I'm working on a data file which contains taxonomical data of bacteria in 
 hierarchical order.
 A sample of this file can be generated using:
 
 tax.data - read.table(header=F, con - textConnection('
 G9SS7BA01D15EC  Bacteria(100)Cyanobacteria(84)unclassified
 G9SS7BA01C9UIRBacteria(100)Proteobacteria(94)
 Alphaproteobacteria(89)
 G9SS7BA01CM00DBacteria(100)Proteobacteria(99)
 Alphaproteobacteria(99)
 '))
 close(con)
 
 What I try to do is to remove the parenthesis and the number inside (which 
 could contain a decimal point)
 I assumed that the following command would solve it, but instead I got an 
 error.
 
 tax.data - as.data.frame(apply(tax.data, 2, function(x) gsub('\(.*\)','',x)))
 Error: '\(' is an unrecognized escape in character string starting \(
 
 And it doesn't matter if I use perl = TRUE or not.
 To solve it I need to use a double escape sign '\\' before opening and 
 closing the parenthesis:
 
 tax.data - as.data.frame(apply(tax.data, 2, function(x) 
 gsub('\\(.*\\)','',x)))
 
 This yields the desired result but I wonder why it does that?
 No other regular expression system I'm used to (e.g. Perl, Shell) works like 
 that.
 
 I'm using R 2.14 (but also R 2.10) and I get the same results on Ubuntu and 
 win XP.
 
 I'd appreciate any explanation.

Section Character vectors in the R Intro manual.

?Quotes

The regular expression is provided as a string to gsub. In strings there are 
escape sequences.
To get the \ as a single \ to the regular expression parser it has to be \-ed 
in the string stage: \\

Berend

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Tobit Fixed Effects

2012-03-02 Thread Arne Henningsen
Dear Felipe

On 29 September 2011 14:10, Arne Henningsen
arne.henning...@googlemail.com wrote:
 Hi Felipe

 On 25 September 2011 00:16, Felipe Nunes felipnu...@gmail.com wrote:
 Hi Arne,
 my problem persists. I am still using censReg [version - 0.5-7] to run a
 random effects model in my data (50,000 cases), but I always get the
 message.
 tob7 - censReg(transfers.cap ~ pt.pt + psdb.pt + pt.opp + pt.coa + psdb.coa
 + pib.cap + transfers.cap.lag + pib.cap + ifdm + log(populat) +
 mayor.vot.per + log(bol.fam.per+0.01) + factor(uf.name) + factor(year) - 1,
 left=0, right=Inf, method=BHHH, nGHQ=8, iterlim=1, data = pdata2)
 Error in maxNRCompute(fn = logLikAttr, fnOrig = fn, gradOrig = grad,
 hessOrig = hess,  :
   NA in the initial gradient
 If I sent you my data set, could you try to help me? I have been struggling
 with that for two months now.

 Thanks for sending me your data set. With it, I was able to figure
 out, where the NAs in the (initial) gradients come from: when
 calculating the derivatives of the standard normal density function [d
 dnorm(x) / d x = - dnorm(x) * x], values for x that are larger than
 approximately 40 (in absolute terms) result in so small values (in
 absolute terms) that R rounds them to zero. Later, these derivatives
 are multiplied by some other values and then the logarithm is taken
 ... and multiplying any number by zero and taking the logarithms gives
 not a finite number :-(

 When *densities* of the standard normal distribution become too small,
 one can use dnorm(x,log=TRUE) and store the logarithm of the small
 number, which is much larger (in absolute terms) than the density and
 hence, is not rounded to zero. However, in the case of the
 *derivative* of the standard normal density function, this is more
 complicated as log( d dnorm(x) / d x ) =  log( dnorm(x) ) + log( - x )
 is not defined if x is positive. I will try to solve this problem by
 case distinction (x0 vs. x0). Or does anybody know a better
 solution?

Finally(!), I have implemented this solution in the censReg() package.
Some initial tests (including your model and data) show that the
revised calculation of the gradient of the random effects panel data
model for censored dependent variables is much more robust to rounding
errors. The improved version of the censReg package is not yet via
CRAN, but it is available at R-Forge:

https://r-forge.r-project.org/R/?group_id=256

If you have further questions or feedback regarding the censReg
package, please use a forum or tracker on the R-Forge site of the
sampleSelection project:

https://r-forge.r-project.org/projects/sampleselection/

Best wishes from Copenhagen,
Arne

-- 
Arne Henningsen
http://www.arne-henningsen.name

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] speed up merge

2012-03-02 Thread Ben quant
Hello,

I have a nasty loop that I have to do 11877 times. The only thing that
slows it down really is this merge:

xx1 = merge(dt,ua_rd,by.x=1,by.y= 'rt_date',all.x=T)

Any ideas on how to speed it up? The output can't change materially (it
works), but I'd like it to go faster. I'm looking at getting around the
loop (not shown), but I'm trying to speed up the merge first. I'll post
regarding the loop if nothing comes of this post.

Here is some information on what type of stuff is going into the merge:

 class(ua_rd)
[1] matrix
 dim(ua_rd)
[1] 20  2
 head(ua_rd)
   AName  rt_date
2007-03-31 14066.580078125 2007-04-26
2007-06-30 14717   2007-07-19
2007-09-30 15528   2007-10-25
2007-12-31 17609   2008-01-24
2008-03-31 17168   2008-04-24
2008-06-30 17681   2008-07-17
 class(dt)
[1] character
 length(dt)
[1] 1799
 dt[1:10]
 [1] 2007-03-31 2007-04-01 2007-04-02 2007-04-03 2007-04-04
2007-04-05 2007-04-06 2007-04-07
 [9] 2007-04-08 2007-04-09

thanks,

Ben

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Cloning R Installation Across Multiple Computers

2012-03-02 Thread Tom Hopper
I would like to set up identical R installations, with the same packages,
on multiple computers and with minimal interaction by users. Ideally, I
would like to have an installation script that the user can just run that
will set up everything, including R itself and base packages.

Standard packages would need to include ggplot2 and its dependencies, Rcmdr
and its dependencies and some Rcmdr plug-ins. Best of all would be to have
the script include JGR and Deducer in the installation. This would be set
up on Windows XP 32-bit, and all packages have to install from local .zip
files; R cannot get to the package servers through our firewall.

I could do the setup once, manually, on one machine, but I'm not sure if I
can simply copy the R installation directory to other computers and have it
still work. I don't think that the Windows registry wouldn't be configured
if I did it this way, and I'm not sure that JGR would be correctly
installed. Another approach would to have the user install R, then copy the
R directory tree from another computer that has been set up, and finally
JGR would have to be installed by the user. I'd like to roll this all into
one, simple step that the users don't have to worry about screwing up.

I recall that there has been some discussion of this on r-help in the past,
but I seem to be using the wrong search terms as I cannot find anything.
Any suggestions for, or pointers to, solutions will be much appreciated.

Thank you,

Tom Hopper

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] speed up merge

2012-03-02 Thread kees duineveld
Hi Ben,

It seems you merge a matrix and a vector. As far as I understand the
first thing merge does is convert these to data.frame. Is it possible
to make the preceding steps give data frames?

Regards,
Kees

On Fri, Mar 2, 2012 at 11:24 AM, Ben quant ccqu...@gmail.com wrote:

 Hello,

 I have a nasty loop that I have to do 11877 times. The only thing that
 slows it down really is this merge:

 xx1 = merge(dt,ua_rd,by.x=1,by.y= 'rt_date',all.x=T)

 Any ideas on how to speed it up? The output can't change materially (it
 works), but I'd like it to go faster. I'm looking at getting around the
 loop (not shown), but I'm trying to speed up the merge first. I'll post
 regarding the loop if nothing comes of this post.

 Here is some information on what type of stuff is going into the merge:

  class(ua_rd)
 [1] matrix
  dim(ua_rd)
 [1] 20  2
  head(ua_rd)
                   AName              rt_date
 2007-03-31 14066.580078125 2007-04-26
 2007-06-30 14717           2007-07-19
 2007-09-30 15528           2007-10-25
 2007-12-31 17609           2008-01-24
 2008-03-31 17168           2008-04-24
 2008-06-30 17681           2008-07-17
  class(dt)
 [1] character
  length(dt)
 [1] 1799
  dt[1:10]
  [1] 2007-03-31 2007-04-01 2007-04-02 2007-04-03 2007-04-04
 2007-04-05 2007-04-06 2007-04-07
  [9] 2007-04-08 2007-04-09

 thanks,

 Ben

        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] speed up merge

2012-03-02 Thread Hans Ekbrand
On Fri, Mar 02, 2012 at 03:24:20AM -0700, Ben quant wrote:
 Hello,
 
 I have a nasty loop that I have to do 11877 times. 

Are you completely sure about that? I often find my self avoiding
loops-by-row by constructing vectors of which rows that fullfil a
condition, and then creating new vectors out of that vector. If you
elaborate on the problem, perhaps we could find a way to avoid the
loops altogether?

Mostly as a note to self, I wrote
http://code.cjb.net/vectors-instead-of-loop.html, it might be
understood by others too, but I'm not sure.

-- 
Hans Ekbrand (http://sociologi.cjb.net) h...@sociologi.cjb.net

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Cloning R Installation Across Multiple Computers

2012-03-02 Thread Duncan Murdoch

On 12-03-02 5:32 AM, Tom Hopper wrote:

I would like to set up identical R installations, with the same packages,
on multiple computers and with minimal interaction by users. Ideally, I
would like to have an installation script that the user can just run that
will set up everything, including R itself and base packages.

Standard packages would need to include ggplot2 and its dependencies, Rcmdr
and its dependencies and some Rcmdr plug-ins. Best of all would be to have
the script include JGR and Deducer in the installation. This would be set
up on Windows XP 32-bit, and all packages have to install from local .zip
files; R cannot get to the package servers through our firewall.

I could do the setup once, manually, on one machine, but I'm not sure if I
can simply copy the R installation directory to other computers and have it
still work. I don't think that the Windows registry wouldn't be configured
if I did it this way, and I'm not sure that JGR would be correctly
installed. Another approach would to have the user install R, then copy the
R directory tree from another computer that has been set up, and finally
JGR would have to be installed by the user. I'd like to roll this all into
one, simple step that the users don't have to worry about screwing up.

I recall that there has been some discussion of this on r-help in the past,
but I seem to be using the wrong search terms as I cannot find anything.
Any suggestions for, or pointers to, solutions will be much appreciated.



See the Installation and Administration manual, in particular the 
section (3.1.8, I think) on building the Inno Setup installer.


You may even find the MSI installer more convenient, but that code is 
tested less, and is unsupported.


Duncan Murdoch

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Call the Standard Error and t-test probability in linear regression

2012-03-02 Thread IOANNA
Hello, 

I run a linear regression I get the summary, e.g.:

 summary(lm.r) 
   Call: 
   lm(formula = signal ~ conc) 
   Residuals: 
  12 3  4  5  
 0.4  -1.0   1.6  -1.80.8  
   Coefficients: 
  Estimate Std.  Errort valuePr(|t|) 
 (Intercept)  3.61.232882.92 0.0615 .   
 conc  1.94000 0.05033   38.54  3.84e-05 *** 
   --- 
   Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1  
   Residual standard error: 1.592 on 3 degrees of freedom 
   Multiple R-Squared: 0.998,  Adjusted R-squared: 0.9973  
   F-statistic:  1486 on 1 and 3 DF,  p-value: 3.842e-0

I would like to call the probability of the t-test only in order to use it
separately. For example I 'd like to get: 
Pr-3.84e-05 

Similarly I want to call the standard error of the parameters and the
function:
SEconc-0.05033

I don't know how to do this. Any help?

Regards, 
Ioanna

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Call the Standard Error and t-test probability in linear regression

2012-03-02 Thread Ted Harding
On 02-Mar-2012 IOANNA wrote:
 Hello, 
 I run a linear regression I get the summary, e.g.:
 
  
Call: 
lm(formula = signal ~ conc) 
Residuals: 
   12 3  4  5  
  0.4  -1.0   1.6  -1.80.8  
Coefficients: 
   Estimate Std.  Errort valuePr(|t|) 
  (Intercept)  3.61.232882.92 0.0615 .   
  conc  1.94000 0.05033   38.54  3.84e-05 *** 
--- 
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1  
Residual standard error: 1.592 on 3 degrees of freedom 
Multiple R-Squared: 0.998,  Adjusted R-squared: 0.9973  
F-statistic:  1486 on 1 and 3 DF,  p-value: 3.842e-0
 
 I would like to call the probability of the t-test only in order
 to use it separately. For example I 'd like to get: 
 Pr-3.84e-05 
 
 Similarly I want to call the standard error of the parameters
 and the function:
 SEconc-0.05033
 
 I don't know how to do this. Any help?
 
 Regards, 
 Ioanna

Hi Ioanna,
If you look at '?summary.lm' and read the section Value,
you will see that the returned value is a list with several
components, one of which is:

  coefficients: a p x 4 matrix with columns for the
estimated coefficient, its standard error,
t-statistic and corresponding (two-sided) p-value.
Aliased coefficients are omitted.

This is effectively as displayed by summary(lm...)).
So your 

  Coefficients: 
   Estimate  Std. Error  t value  Pr(|t|) 
   (Intercept)  3.6 1.23288 2.920.0615 .   
  conc  1.94000 0.0503338.54  3.84e-05 *** 

(apart from the significance codes . and ***) are the
elements in this p=2 x 4 matrix.

Hence

  summary(lm.r)$coef

would give the full 4x4 matrix (you can abbreviate coefficients
to coef), and so

  summary(lm.r)$coef[2,4]

will give you the P-value for conc, and

  summary(lm.r)$coef[2,2]

will give the SE of the estimate of conc. And so on.

Ted.

-
E-Mail: (Ted Harding) ted.hard...@wlandres.net
Date: 02-Mar-2012  Time: 13:09:03
This message was sent by XFMail

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] 3d surface plot (ideally using rgl package)?

2012-03-02 Thread R. Michael Weylandt
It is by no means clear what the peaks function does or if it has
any R equivalent, but perhaps looking at demo(rgl) will get you
started. After that, you should probably show what you've tried (at
least as far as replicating the calculation aspects).

Michael

On Fri, Mar 2, 2012 at 1:32 AM, e-mail athula.herath
athula.her...@ntlworld.com wrote:
 Dear HelpeRs,

 I would be grateful for anybody who might help to produce the following
 plot (the code for matlab is below) using the rgl package of R?

 [t,r] = meshgrid(linspace(0,2*pi,361),linspace(-4,4,101));
 [x,y] = pol2cart(t,r);
 P = peaks(x,y);
 figure('color','white');
 polarplot3d(P,'colordata',gradient(P));
 view([-18 72]);

 You may see the code and plot output at:

 https://plus.google.com/u/0/109789409461372488563/posts/YfcrsMhkjyf

 Many Thanks

 A.

        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Call the Standard Error and t-test probability in linear regression

2012-03-02 Thread Martin Maechler
 On 02-Mar-2012 IOANNA wrote:
  Hello, 
  I run a linear regression I get the summary, e.g.:
  
   
 Call: 
 lm(formula = signal ~ conc) 
 Residuals: 
12 3  4  5  
   0.4  -1.0   1.6  -1.80.8  
 Coefficients: 
Estimate Std.  Errort valuePr(|t|) 
   (Intercept)  3.61.232882.92 0.0615 .   
   conc  1.94000 0.05033   38.54  3.84e-05 *** 
 --- 
 Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1  
 Residual standard error: 1.592 on 3 degrees of freedom 
 Multiple R-Squared: 0.998,  Adjusted R-squared: 0.9973  
 F-statistic:  1486 on 1 and 3 DF,  p-value: 3.842e-0
  
  I would like to call the probability of the t-test only in order
  to use it separately. For example I 'd like to get: 
  Pr-3.84e-05 
  
  Similarly I want to call the standard error of the parameters
  and the function:
  SEconc-0.05033
  
  I don't know how to do this. Any help?
  
  Regards, 
  Ioanna

 Hi Ioanna,
 If you look at '?summary.lm' and read the section Value,
 you will see that the returned value is a list with several
 components, one of which is:

   coefficients: a p x 4 matrix with columns for the
 estimated coefficient, its standard error,
 t-statistic and corresponding (two-sided) p-value.
 Aliased coefficients are omitted.

 This is effectively as displayed by summary(lm...)).
 So your 

   Coefficients: 
Estimate  Std. Error  t value  Pr(|t|) 
(Intercept)  3.6 1.23288 2.920.0615 .   
   conc  1.94000 0.0503338.54  3.84e-05 *** 

 (apart from the significance codes . and ***) are the
 elements in this p=2 x 4 matrix.

indeed.

 Hence

   summary(lm.r)$coef

 would give the full 4x4 matrix (you can abbreviate coefficients
 to coef), 

Yes.  In the present case, I tell all my students (:= everyone
who wants to learn .. ;-) to use

  coef(summary(lm.r))
instead of
  summary(lm.r)$coef

because using the coef() [or coefficients()] method will work
with many more models and situations, notably also for cases
where the (summary of the) fitted model is not a list (but e.g. a S4 class,
reference class, ..), or does not store these in exactly this location.

Martin

 and so

   summary(lm.r)$coef[2,4]

 will give you the P-value for conc, and

   summary(lm.r)$coef[2,2]

 will give the SE of the estimate of conc. And so on.

 Ted.

 -
 E-Mail: (Ted Harding) ted.hard...@wlandres.net
 Date: 02-Mar-2012  Time: 13:09:03
 This message was sent by XFMail

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Vector errors and missing values

2012-03-02 Thread jahughes81
Hi Petr!

Thank you for responding to my post.

I checked out all my variables in the way you suggested and they are all in
integer form, but here are many missing values in some of my vectors,
denoted with NA.

So, they are in the correct form, I am just wondering if there is something
else I need to do to the NAs to be able to run my regressions. Because as it
is now, R may not be taking the NAs into account and that would cause a
mismatch in vector length. But I don't think that it is expected that all
your predictor variables and your outcome variable be the same length. I
would imagine that is often not the case for most people when they collect
data.

--
Jessica

--
View this message in context: 
http://r.789695.n4.nabble.com/Vector-errors-and-missing-values-tp4437306p4438415.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to change or copy to another the names of models

2012-03-02 Thread Waldir de Carvalho Junior
Hi help-list
I try to better explain my problem.
My problem is below. For each cycle of (n) I need to save the model
(model.mlp), because the neuralnet is unique and when I had choose the best
architecture of the neuralnet, and need the neuralnet (the model.mlp) that
has already been trainned.
I can´t train again, because the result will be not the same.
I thought that I can put a line at the end of the cycles to copy the
model.mlp to another name and save it as a model for each one of the
cycle.
I hope I have better explain my problem.
thanks in advance
##

# built a model  mlp (neuralnet) and training it changing the number of
hidden neurons (1 a 30)
for (n in 1:30) {
print(n)
model.mlp -  mlp(padroes$inputsTrain, padroes$targetsTrain, size = n,
initFunc=Randomize_Weights,
  initFuncParams=c(-0.5, 0.5), learnFunc=Std_Backpropagation,
learnFuncParams = c(0.2, 0.1),
  updateFunc=Topological_Order, hiddenActFunc=Act_Logistic,
shufflePatterns=TRUE, linOut=TRUE,
  maxit = 3000, inputsTest = padroes$inputsTest, targetsTest =
padroes$targetsTest)
}



2012/3/1 Waldir de Carvalho Junior waldi...@gmail.com

 Hi
 I would like to know how I can change the name of a model for each
 trainning cycle of a model.
 I work with the RSNNS package and to build a neural network, I used :
 for (i in 5:30) 
 model_ANN - mlp(X, Y, size=n,) # where size is the number of neurons
 in the hidden layer
 but I need to save each time that the model that is build (the end of each
 cycle), e.g., when i = 5, I need to save the model with a especific name,
 when i = 6, also I need to save the model with another name
 How i am doing, i am saving only the last model with n = 30 and with the
 name model_ANN
 Question
 how can I change the name of the model (model_ANN) at each end of cycle of
 i values?
 I have already tryied copy, save, rename,.. but unsuccessfully
 thanks for any answer


-- 

Waldir de Carvalho Junior
Pedologia/Pedologue
Pesquisador/Chercheur
Embrapa Solos/INRA - UMR- LISAH

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Statistical Histograms in R

2012-03-02 Thread SMcG
Hi, 

I'm wondering if anybody could possibly help me? 
I have a table with 5 tab-delimited columns. Each column has 'e-value'
scores for 5 different proteins. 

I'd like to plot a distribution curve using hist() for the 5 different
proteins and show the 5 distribution curves on the same graph in different
colours. 
In the case, E-values will be the X-axis and frequency will be the Y-axis. 

Is this at all possible? 

Thanks. 

--
View this message in context: 
http://r.789695.n4.nabble.com/Statistical-Histograms-in-R-tp4438336p4438336.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] plotting standard deviation of multivariate normal distribution (preferred in rgl package)

2012-03-02 Thread drflxms
Dear R colleagues,

for a statistics tutorial I want to develop a nice 3d-graphic of the
well known target comparison/analogy of accuracy and precision (see i.e.
http://en.wikipedia.org/wiki/Accuracy_and_precision for a simple hand
made 2d graphic).

The code for a really beautiful graphic is already provided as
demo(bivar) in the rgl package (for a picture see i.e
http://rgl.neoscientists.org/gallery.shtml right upper corner).

Now I'd like to add the standard deviation to the 3d plot. Unfortunately
I couldn't figure out how to do that.

What I did so far is:

Assuming you have executed the code of the demo mentioned above you can
plot a 2d scatter plot of the data (the shots on the target) via a simple

plot(x,y)

and add a contour plot via

par(new=TRUE)
contour(denobj, nlevels=8)

So the contour levels are about what I am looking for but...

1. how to make contour plot display exactly standard deviations as
contours and not arbitrary levels?

2. how to project the standard deviation contours on the 3d surface?

Probably there are (as always) lots of different solutions as well.
Id' appreciate any kind of help very much!

Greetings from Munich, Felix

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to change or copy to another the names of models

2012-03-02 Thread kees duineveld
Hi Waldir

I think this is easier via an lappy()

lapply(1:30, function(x)  mlp(...your settings here, including size=x...) )

Regards,
Kees

On Fri, Mar 2, 2012 at 2:36 PM, Waldir de Carvalho Junior
waldi...@gmail.com wrote:
 Hi help-list
 I try to better explain my problem.
 My problem is below. For each cycle of (n) I need to save the model
 (model.mlp), because the neuralnet is unique and when I had choose the best
 architecture of the neuralnet, and need the neuralnet (the model.mlp) that
 has already been trainned.
 I can´t train again, because the result will be not the same.
 I thought that I can put a line at the end of the cycles to copy the
 model.mlp to another name and save it as a model for each one of the
 cycle.
 I hope I have better explain my problem.
 thanks in advance
 ##

 # built a model  mlp (neuralnet) and training it changing the number of
 hidden neurons (1 a 30)
 for (n in 1:30) {
 print(n)
 model.mlp -  mlp(padroes$inputsTrain, padroes$targetsTrain, size = n,
 initFunc=Randomize_Weights,
              initFuncParams=c(-0.5, 0.5), learnFunc=Std_Backpropagation,
 learnFuncParams = c(0.2, 0.1),
              updateFunc=Topological_Order, hiddenActFunc=Act_Logistic,
 shufflePatterns=TRUE, linOut=TRUE,
              maxit = 3000, inputsTest = padroes$inputsTest, targetsTest =
 padroes$targetsTest)
 }
 


 2012/3/1 Waldir de Carvalho Junior waldi...@gmail.com

 Hi
 I would like to know how I can change the name of a model for each
 trainning cycle of a model.
 I work with the RSNNS package and to build a neural network, I used :
 for (i in 5:30) 
 model_ANN - mlp(X, Y, size=n,) # where size is the number of neurons
 in the hidden layer
 but I need to save each time that the model that is build (the end of each
 cycle), e.g., when i = 5, I need to save the model with a especific name,
 when i = 6, also I need to save the model with another name
 How i am doing, i am saving only the last model with n = 30 and with the
 name model_ANN
 Question
 how can I change the name of the model (model_ANN) at each end of cycle of
 i values?
 I have already tryied copy, save, rename,.. but unsuccessfully
 thanks for any answer


 --
 
 Waldir de Carvalho Junior
 Pedologia/Pedologue
 Pesquisador/Chercheur
 Embrapa Solos/INRA - UMR- LISAH

        [[alternative HTML version deleted]]


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Why do my regular expressions require a double escape \\ to get a literal??

2012-03-02 Thread Jeff Newmiller
Roey, you imply that this is unusual in implementations of regex, yet some of 
the oldest applications using regex out there are sed or awk, where extra 
quoting is so common that some people don't recognize regex patterns that are 
missing this extra level of quoting. Sigh.
---
Jeff NewmillerThe .   .  Go Live...
DCN:jdnew...@dcn.davis.ca.usBasics: ##.#.   ##.#.  Live Go...
  Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
/Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k
--- 
Sent from my phone. Please excuse my brevity.

Berend Hasselman b...@xs4all.nl wrote:


On 02-03-2012, at 09:36, Roey Angel wrote:

 Hi,
 I was recently misfortunate enough to have to use regular expressions
to sort out some data in R.
 I'm working on a data file which contains taxonomical data of
bacteria in hierarchical order.
 A sample of this file can be generated using:
 
 tax.data - read.table(header=F, con - textConnection('
 G9SS7BA01D15EC  Bacteria(100)Cyanobacteria(84)unclassified
 G9SS7BA01C9UIRBacteria(100)Proteobacteria(94)   
Alphaproteobacteria(89)
 G9SS7BA01CM00DBacteria(100)Proteobacteria(99)   
Alphaproteobacteria(99)
 '))
 close(con)
 
 What I try to do is to remove the parenthesis and the number inside
(which could contain a decimal point)
 I assumed that the following command would solve it, but instead I
got an error.
 
 tax.data - as.data.frame(apply(tax.data, 2, function(x)
gsub('\(.*\)','',x)))
 Error: '\(' is an unrecognized escape in character string starting
\(
 
 And it doesn't matter if I use perl = TRUE or not.
 To solve it I need to use a double escape sign '\\' before opening
and closing the parenthesis:
 
 tax.data - as.data.frame(apply(tax.data, 2, function(x)
gsub('\\(.*\\)','',x)))
 
 This yields the desired result but I wonder why it does that?
 No other regular expression system I'm used to (e.g. Perl, Shell)
works like that.
 
 I'm using R 2.14 (but also R 2.10) and I get the same results on
Ubuntu and win XP.
 
 I'd appreciate any explanation.

Section Character vectors in the R Intro manual.

?Quotes

The regular expression is provided as a string to gsub. In strings
there are escape sequences.
To get the \ as a single \ to the regular expression parser it has to
be \-ed in the string stage: \\

Berend

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Statistical Histograms in R

2012-03-02 Thread ONKELINX, Thierry
Have a look at http://had.co.nz/ggplot2/stat_density.html You'll find some 
exampled and the code to generate them.


ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek / Research Institute for Nature and 
Forest
team Biometrie  Kwaliteitszorg / team Biometrics  Quality Assurance
Kliniekstraat 25
1070 Anderlecht
Belgium
+ 32 2 525 02 51
+ 32 54 43 61 85
thierry.onkel...@inbo.be
www.inbo.be

To call in the statistician after the experiment is done may be no more than 
asking him to perform a post-mortem examination: he may be able to say what the 
experiment died of.
~ Sir Ronald Aylmer Fisher

The plural of anecdote is not data.
~ Roger Brinner

The combination of some data and an aching desire for an answer does not ensure 
that a reasonable answer can be extracted from a given body of data.
~ John Tukey


-Oorspronkelijk bericht-
Van: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] Namens 
SMcG
Verzonden: vrijdag 2 maart 2012 13:49
Aan: r-help@r-project.org
Onderwerp: [R] Statistical Histograms in R

Hi, 

I'm wondering if anybody could possibly help me? 
I have a table with 5 tab-delimited columns. Each column has 'e-value'
scores for 5 different proteins. 

I'd like to plot a distribution curve using hist() for the 5 different proteins 
and show the 5 distribution curves on the same graph in different colours. 
In the case, E-values will be the X-axis and frequency will be the Y-axis. 

Is this at all possible? 

Thanks. 

--
View this message in context: 
http://r.789695.n4.nabble.com/Statistical-Histograms-in-R-tp4438336p4438336.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] read.table issue with #

2012-03-02 Thread Sarah Goslee
The # is the default comment character in read.table(), but that can
easily be changed:

 tc - textConnection(
+ yes yes yes yes yes
+ yes yes yes yes yes
+ yes yes # yes yes
+ )
 x - read.table(tc, comment.char=)
 x
   V1  V2  V3  V4  V5
1 yes yes yes yes yes
2 yes yes yes yes yes
3 yes yes   # yes yes

There's insufficient context here to know if that was actually
the original problem, but is an alternate solution for what Rui
proposed.

Sarah

On Thu, Mar 1, 2012 at 5:51 PM, Rui Barradas rui1...@sapo.pt wrote:
 Hello,


 The problem is that I get a the following error bacause anything after the
 # is ignored.

 Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,
 :
   line 6 did not have 500 elements

 R thinks that line 6 has only 2 elements because of the #.


 Use 'readLines' instead, followed by 'strsplit'.
 In the example below the separator is a space.

 tc - textConnection(
 yes yes yes yes yes
 yes yes yes yes yes
 yes yes # yes yes
 )
 #x - read.table(tc)   # same error: line 3 did not have 5 elements
 x - readLines(tc)
 close(tc)
 strsplit(x,  )

 Hope this helps,

 Rui Barradas



-- 
Sarah Goslee
http://www.functionaldiversity.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Vector errors and missing values

2012-03-02 Thread Petr PIKAL
Hi
 
 Hi Petr!
 
 Thank you for responding to my post.
 
 I checked out all my variables in the way you suggested and they are all 
in
 integer form, but here are many missing values in some of my vectors,
 denoted with NA.
 
 So, they are in the correct form, I am just wondering if there is 
something
 else I need to do to the NAs to be able to run my regressions. Because 
as it
 is now, R may not be taking the NAs into account and that would cause a

No. R functions usually react to NA with correct and in help pages 
specified ways. I do not know which functions you used and how. You still 
fail to provide any reasonable info.

 mismatch in vector length. But I don't think that it is expected that 
all
 your predictor variables and your outcome variable be the same length. I

Without context I recall that you used some kind of regression. How do you 
suppose any regression or model can be executed if variables have 
different length?

 would imagine that is often not the case for most people when they 
collect
 data.

That is why there are missing values used. And regressins can usually 
handle missing values based on na.action parameter value smoothly. At 
least for me and for thousands of R users for more than 10 years. 
Therefore I suspect that problem is either in your data or in the way you 
use them.

But if you fail to provide some code and data you stay alone for resolving 
your problems.

Regards
Petr

 
 --
 Jessica
 
 --
 View this message in context: 
http://r.789695.n4.nabble.com/Vector-errors-
 and-missing-values-tp4437306p4438415.html
 Sent from the R help mailing list archive at Nabble.com.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Why do my regular expressions require a double escape \\ to get a literal??

2012-03-02 Thread Berend Hasselman

On 02-03-2012, at 14:13, Roey Angel wrote:

 Hi Bernard, thanks for the quick reply.
 Of course, I understand that an escape is needed because parenthesis are 
 reserved symbols in regular expressions.
 My problem is that if I just use \( I get the error:
 
 Error: '\(' is an unrecognized escape in character string starting \(
 
 so in order to get a literal ( I need to use \\(
 which is odd cause I've never encountered that in any other language and also 
 all the R manuals dont mention that.
 

It is not odd as the previous poster has already mentioned.

I have encountered this (e.g. awk).

You need the \\ because the expression between tour quotes is interpreted twice:
once and first as a character string (in which \( is illegal but \\ is legal) 
and then as a regular expression in which you want to match a literal ( and ) 
which must be escaped in the regular expression since they are meta characters.

If you don't like doing that (the \\) use this instead

as.data.frame(apply(tax.data, 2, function(x) gsub('[(].*[)]','',x)))

i.e. put the ( and ) in a character class.

Berend



 On 02-03-2012, at 09:36, Roey Angel wrote:
 
 Hi,
 I was recently misfortunate enough to have to use regular expressions to 
 sort out some data in R.
 I'm working on a data file which contains taxonomical data of bacteria in 
 hierarchical order.
 A sample of this file can be generated using:
 
 tax.data- read.table(header=F, con- textConnection('
 G9SS7BA01D15EC  Bacteria(100)Cyanobacteria(84)unclassified
 G9SS7BA01C9UIRBacteria(100)Proteobacteria(94)
 Alphaproteobacteria(89)
 G9SS7BA01CM00DBacteria(100)Proteobacteria(99)
 Alphaproteobacteria(99)
 '))
 close(con)
 
 What I try to do is to remove the parenthesis and the number inside (which 
 could contain a decimal point)
 I assumed that the following command would solve it, but instead I got an 
 error.
 
 tax.data- as.data.frame(apply(tax.data, 2, function(x) 
 gsub('\(.*\)','',x)))
 Error: '\(' is an unrecognized escape in character string starting \(
 
 And it doesn't matter if I use perl = TRUE or not.
 To solve it I need to use a double escape sign '\\' before opening and 
 closing the parenthesis:
 
 tax.data- as.data.frame(apply(tax.data, 2, function(x) 
 gsub('\\(.*\\)','',x)))
 
 This yields the desired result but I wonder why it does that?
 No other regular expression system I'm used to (e.g. Perl, Shell) works 
 like that.
 
 I'm using R 2.14 (but also R 2.10) and I get the same results on Ubuntu and 
 win XP.
 
 I'd appreciate any explanation.
 Section Character vectors in the R Intro manual.
 
 ?Quotes
 
 The regular expression is provided as a string to gsub. In strings there are 
 escape sequences.
 To get the \ as a single \ to the regular expression parser it has to be 
 \-ed in the string stage: \\
 
 Berend
 
 
 angel.vcf

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Memory issue. XXXX

2012-03-02 Thread Dan Abner
Hi everyone,

Any ideas on troubleshooting this memory issue:

 d1-read.csv(arrears.csv)
Error: cannot allocate vector of size 77.3 Mb
In addition: Warning messages:
1: In class(data) - data.frame :
  Reached total allocation of 1535Mb: see help(memory.size)
2: In class(data) - data.frame :
  Reached total allocation of 1535Mb: see help(memory.size)
3: In class(data) - data.frame :
  Reached total allocation of 1535Mb: see help(memory.size)
4: In class(data) - data.frame :
  Reached total allocation of 1535Mb: see help(memory.size)


Thanks!

Dan

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Memory issue. XXXX

2012-03-02 Thread Sarah Goslee
Let's see...

You could delete objects from your R session.
You could buy more RAM.
You could see help(memory.size).
You could try googling to see how others have dealt with memory
management in R, a process which turns up useful information like
this: http://www.r-bloggers.com/memory-management-in-r-a-few-tips-and-tricks/

You could provide the information on your system requested in the posting guide.

Sarah

On Fri, Mar 2, 2012 at 9:57 AM, Dan Abner dan.abne...@gmail.com wrote:
 Hi everyone,

 Any ideas on troubleshooting this memory issue:

 d1-read.csv(arrears.csv)
 Error: cannot allocate vector of size 77.3 Mb
 In addition: Warning messages:
 1: In class(data) - data.frame :
  Reached total allocation of 1535Mb: see help(memory.size)
 2: In class(data) - data.frame :
  Reached total allocation of 1535Mb: see help(memory.size)
 3: In class(data) - data.frame :
  Reached total allocation of 1535Mb: see help(memory.size)
 4: In class(data) - data.frame :
  Reached total allocation of 1535Mb: see help(memory.size)


 Thanks!

 Dan


-- 
Sarah Goslee
http://www.functionaldiversity.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] solnp Hessian problem in some datasets not in others

2012-03-02 Thread Diogo Alagador

Dear all,

Sorry to insist in this, but I am passing really bad times trying to  
solve the problem. Just to remember you:


I am tryng to solve a nonlinear optimization probel using the solnp function.
I have different datasets. For the smaller I get full solutions, for
the bigger I got an error message stating:


Iter: 1 fn: 101.8017 Pars:  0.21000 0.21000 0.21000 0.21000
0.21000 0.21000 0.21000 0.21000 0.21000 0.21000 0.21000 0.21000
0.21000 0.21000 0.21000 0.21000 0.21000 0.21000 0.21000 0.21000
0.21000 0.21000 0.21000 0.21000 0.21000 0.21000 0.21000 0.21000
0.21000 0.21000 0.21000 0.21000 0.21000 0.21000 0.21000 0.21000
0.21000 0.21000 0.21000 0.21000 0.21000 0.21000 0.21000 0.21000
0.21000 0.21000 0.21000 0.21000 0.21000 0.21000 0.21000 0.21000
0.21000 0.21000 0.21000 0.21000 0.21000 0.21000 0.21000 0.21000
0.21000 0.21000 0.21000 0.21000

solnp-- Solution not reliableProblem Inverting Hessian.
Warning messages:
1: In p0 * vscale[(neq + 2):(nc + np + 1)] :
longer object length is not a multiple of shorter object length
2: In cbind(temp, funv) :
number of rows of result is not a multiple of vector length (arg
1) 


Anyone knows what may be the reason? Just remembering that the same
problem runs OK for smaller datasets.

Thanks in advance,

Diogo André
Portugal

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] subseting a data frame

2012-03-02 Thread nathalie

HI,
this is my problem I want to subset this file df, using only  unique 
df$exon printing the line once even if  df$exon appear several times:


unique(df$exon) will show me the unique exons
If I try to print only the unique exon lines
with df[unique(df$exon),] -this doesn't print only the unique ones :(

could you help?
thanks
Nat




exon size  chr start   end
413077 ChrX_133594175_133594368_HPRT1  193 ChrX 133594175 133594368
413270 ChrX_133594183_133594368_HPRT1  185 ChrX 133594183 133594368
413455 ChrX_133594381_133594565_HPRT1  184 ChrX 133594381 133594565
413639 ChrX_133607389_133607495_HPRT1  106 ChrX 133607389 133607495
413745 ChrX_133607389_133607495_HPRT1  106 ChrX 133607389 133607495
413851 ChrX_133607404_133607495_HPRT1   91 ChrX 133607404 133607495
413942 ChrX_133609211_133609394_HPRT1  183 ChrX 133609211 133609394
414125 ChrX_133609211_133609394_HPRT1  183 ChrX 133609211 133609394
414308 ChrX_133620495_133620560_HPRT1   65 ChrX 133620495 133620560
414373 ChrX_133620495_133620560_HPRT1   65 ChrX 133620495 133620560
414438 ChrX_133620692_133620696_HPRT14 ChrX 133620692 133620696
414442 ChrX_133624218_133624235_HPRT1   17 ChrX 133624218 133624235



--
The Wellcome Trust Sanger Institute is operated by Genome Research 
Limited, a charity registered in England with number 1021457 and a 
company registered in England with number 2742969, whose registered 
office is 215 Euston Road, London, NW1 2BE.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] solnp Hessian problem in some datasets not in others

2012-03-02 Thread Berend Hasselman

On 02-03-2012, at 16:12, Diogo Alagador wrote:

 Dear all,
 
 Sorry to insist in this, but I am passing really bad times trying to solve 
 the problem. Just to remember you:
 
 I am tryng to solve a nonlinear optimization probel using the solnp function.
 I have different datasets. For the smaller I get full solutions, for
 the bigger I got an error message stating:
 
 
 Iter: 1 fn: 101.8017 Pars:  0.21000 0.21000 0.21000 0.21000
 0.21000 0.21000 0.21000 0.21000 0.21000 0.21000 0.21000 0.21000
 0.21000 0.21000 0.21000 0.21000 0.21000 0.21000 0.21000 0.21000
 0.21000 0.21000 0.21000 0.21000 0.21000 0.21000 0.21000 0.21000
 0.21000 0.21000 0.21000 0.21000 0.21000 0.21000 0.21000 0.21000
 0.21000 0.21000 0.21000 0.21000 0.21000 0.21000 0.21000 0.21000
 0.21000 0.21000 0.21000 0.21000 0.21000 0.21000 0.21000 0.21000
 0.21000 0.21000 0.21000 0.21000 0.21000 0.21000 0.21000 0.21000
 0.21000 0.21000 0.21000 0.21000
 
 solnp-- Solution not reliableProblem Inverting Hessian.
 Warning messages:
 1: In p0 * vscale[(neq + 2):(nc + np + 1)] :
longer object length is not a multiple of shorter object length
 2: In cbind(temp, funv) :
number of rows of result is not a multiple of vector length (arg
 1) 
 
 
 Anyone knows what may be the reason? Just remembering that the same
 problem runs OK for smaller datasets.

No.
You have not provided enough information.
I know nothing about solnp but I am quite prepared to investigate.
But from the warning message I would guess that [(neq + 2):(nc + np + 1)] is 
simply incorrect in your specific case.

But no data or no description of your data, no function, no reproducible 
example  === No help.
You should really try to be more informative.

Berend

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Help with segmented package

2012-03-02 Thread Filoche
Thank you Vito for your help.

Works very nice.

Have a nice day,
Phil

--
View this message in context: 
http://r.789695.n4.nabble.com/Help-with-segmented-package-tp4435550p4438589.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Vector errors and missing values

2012-03-02 Thread jahughes81
Here is my code:


##Centering predictors###
verbal.ability_C - verbal.ability - mean(verbal.ability)
children_C - children - mean(children)
age_C - age - mean(age)
education_C - education - mean(education)
work.from.home.frequency_C - work.from.home.frequency -
mean(work.from.home.frequency)
religious.orientation_C - religious.orientation -
mean(religious.orientation)
political.orientation_C - political.orientation -
mean(political.orientation)
sexual.orientation_C - sexual.orientation -mean(sexual.orientation)

## Logistic Regression###
logistic.model - glm( fire.communist.teacher ~ age_C + sex + children_C +
currently.married + religious.orientation_C + political.orientation_C,
binomial(logit) )
summary( logistic.model )
exp( coefficients( logistic.model ) )

###Probit/Binomial Regression###

install.packages(MASS)
library(MASS)

probit.model - polr( as.factor(verbal.ability) ~ education_C + children_C +
currently.married + work.from.home.frequency_C, method=probit) 
summary( probit.model)

Here is the output with I look at my data using the str(my.data) command:

'data.frame':   2044 obs. of  13 variables:
 $ sexual.orientation  : int  -1 -1 NA NA NA NA NA -1 NA NA ...
 $ political.orientation   : int  5 5 6 0 3 6 4 5 6 NA ...
 $ religious.orientation   : int  4 1 4 4 4 1 2 4 4 4 ...
 $ weekly.hours.on.internet: int  3 20 NA NA NA NA NA NA NA 0 ...
 $ verbal.ability  : int  6 9 NA 3 NA NA NA 8 NA NA ...
 $ work.from.home.frequency: int  3 4 NA NA NA NA 1 NA 1 1 ...
 $ fire.communist.teacher  : int  NA NA 1 NA 0 NA 0 0 0 1 ...
 $ currently.married   : int  -1 -1 -1 -1 1 -1 -1 -1 1 -1 ...
 $ children: int  0 0 3 5 8 2 1 1 3 2 ...
 $ education   : int  16 16 8 10 0 6 16 15 14 14 ...
 $ partnrs5: int  6 5 -1 99 -1 -1 -1 0 -1 -1 ...
 $ age : int  31 23 71 82 78 40 46 80 31 99 ...
 $ sex : int  1 -1 -1 -1 -1 1 -1 -1 -1 -1 ...

I tried using the na.action command by putting right after the 
'binomial(logit)' syntax, but it didn't work. I am not sure if I am using it
properly though.

So, I have tried this syntax to deal with the missing data:

logistic.model - glm( fire.communist.teacher ~ age_C + sex + children_C +
currently.married + religious.orientation_C + political.orientation_C,
binomial(logit), na.action=na.exclude )

as well as:

logistic.model - glm( fire.communist.teacher ~ age_C + sex + children_C +
currently.married + religious.orientation_C + political.orientation_C,
binomial(logit), na.action=na.exclude, data=na.omit(DataMiss) )



--
View this message in context: 
http://r.789695.n4.nabble.com/Vector-errors-and-missing-values-tp4437306p4438678.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] acf() plot of matrix cuts y-axis labels

2012-03-02 Thread Folkes, Michael
Hello all,
I found a funny problem with y-axis labels when plotting acf(matrix) -
the labels are too close to one of the margins and cut in half.
Here's the problem:

test-matrix(rnorm(200),ncol=4)
acf(test)

This doesn't fix the problem:
test-matrix(rnorm(200),ncol=4)
par(mar=c(3,3,2,0.2),oma=c(0,0,0,0))
acf(test)

This does fix the margin. I understand why, but not sure why ONLY this
will work?
test-matrix(rnorm(200),ncol=4)
acf(test,mar=c(3,3,2,0.2),oma=c(0,0,0,0))

__
Win xp sp3, 
 R.version.string
[1] R version 2.14.1 (2011-12-22)
 


Thanks
Michael Folkes



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to colorize the panel backgrounds of pairs()?

2012-03-02 Thread Marius Hofert
Okay, one simply has to use label.pos=0.5 in pairs() to get the correct 
behavior.


On 2012-03-02, at 09:10 , Marius Hofert wrote:

 Dear Ilai,
 
 I tried to also adjust the diagonal panels. However, the variable names are 
 not
 positioned correctly anymore. Do you know a solution?
 
 Cheers,
 
 Marius
 
 count - 0
 mypanel - function(x, y, ...){
   count - count+1
   bg - if(count %in% c(1,4,9,12)) #FDFF65 else transparent
   ll - par(usr)
   rect(ll[1], ll[3], ll[2], ll[4], col=bg)
   points(x, y, cex=0.5)
 }
 
 mydiag.panel - function(x, ...){
   ll - par(usr)
   rect(ll[1], ll[3], ll[2], ll[4], col=#FDFF65)
 }
 
 U - matrix(runif(4*500), ncol=4)
 pairs(U, panel=mypanel, diag.panel=mydiag.panel)
 
 
 Marius Hofert marius.hof...@math.ethz.ch writes:
 
 Indeed, precisely what I was looking for. Many thanks, Ilai.
 
 ilai ke...@math.montana.edu writes:
 
 par('bg') is not what you are looking for - it will set the bg of the
 whole graphic device, not panels. I think you want:
 count - 0
 mypanel - function(x, y, ...){
   count - count+1
   ll- par('usr')
   if(count %in% c(1,4,9,12)) bg- #FDFF65
   else bg- 'transparent'
   rect(ll[1],ll[3],ll[2],ll[4],col=bg)
   points(x, y, cex=0.5)
 }
 
 Cheers
 
 On Thu, Mar 1, 2012 at 4:49 PM, Marius Hofert
 marius.hof...@math.ethz.ch wrote:
 Dear expeRts,
 
 I would like to colorize the backgrounds of a pairs plot according to the
 respective panel number. Here is what I tried (without success):
 
 count - 0
 mypanel - function(x, y, ...){
count - count+1
bg. - if(count %in% c(1,4,9,12)) #FDFF65 else NA
points(x, y, cex=0.5, bg=bg)
 }
 
 U - matrix(runif(4*500), ncol=4)
 pairs(U, panel=mypanel)
 
 I also tried to set par(bg=bg.) before the call to points(), but that 
 didn't
 work either. The only thing I found is that bg= can be used to fill 
 certain
 plot symbols, but I would like to colorize the background of each panel, 
 not
 the drawn circles.
 
 Cheers,
 
 Marius
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] subseting a data frame

2012-03-02 Thread R. Michael Weylandt michael.weyla...@gmail.com
I believe you want the duplicated() function.

Michael

On Mar 2, 2012, at 10:19 AM, nathalie n...@sanger.ac.uk wrote:

 HI,
 this is my problem I want to subset this file df, using only  unique df$exon 
 printing the line once even if  df$exon appear several times:
 
 unique(df$exon) will show me the unique exons
 If I try to print only the unique exon lines
 with df[unique(df$exon),] -this doesn't print only the unique ones :(
 
 could you help?
 thanks
 Nat
 
 
 
 
exon size  chr start   end
 413077 ChrX_133594175_133594368_HPRT1  193 ChrX 133594175 133594368
 413270 ChrX_133594183_133594368_HPRT1  185 ChrX 133594183 133594368
 413455 ChrX_133594381_133594565_HPRT1  184 ChrX 133594381 133594565
 413639 ChrX_133607389_133607495_HPRT1  106 ChrX 133607389 133607495
 413745 ChrX_133607389_133607495_HPRT1  106 ChrX 133607389 133607495
 413851 ChrX_133607404_133607495_HPRT1   91 ChrX 133607404 133607495
 413942 ChrX_133609211_133609394_HPRT1  183 ChrX 133609211 133609394
 414125 ChrX_133609211_133609394_HPRT1  183 ChrX 133609211 133609394
 414308 ChrX_133620495_133620560_HPRT1   65 ChrX 133620495 133620560
 414373 ChrX_133620495_133620560_HPRT1   65 ChrX 133620495 133620560
 414438 ChrX_133620692_133620696_HPRT14 ChrX 133620692 133620696
 414442 ChrX_133624218_133624235_HPRT1   17 ChrX 133624218 133624235
 
 
 
 -- 
 The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a 
 charity registered in England with number 1021457 and a company registered in 
 England with number 2742969, whose registered office is 215 Euston Road, 
 London, NW1 2BE.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] speed up merge

2012-03-02 Thread Ben quant
I'm not sure. I'm still looking into it. Its pretty involved, so I asked
the simplest answer first (the merge question).

I'll reply back with a mock-up/sample that is testable under a more
appropriate subject line. Probably this weekend.

Regards,

Ben


On Fri, Mar 2, 2012 at 4:37 AM, Hans Ekbrand h...@sociologi.cjb.net wrote:

 On Fri, Mar 02, 2012 at 03:24:20AM -0700, Ben quant wrote:
  Hello,
 
  I have a nasty loop that I have to do 11877 times.

 Are you completely sure about that? I often find my self avoiding
 loops-by-row by constructing vectors of which rows that fullfil a
 condition, and then creating new vectors out of that vector. If you
 elaborate on the problem, perhaps we could find a way to avoid the
 loops altogether?

 Mostly as a note to self, I wrote
 http://code.cjb.net/vectors-instead-of-loop.html, it might be
 understood by others too, but I'm not sure.

 --
 Hans Ekbrand (http://sociologi.cjb.net) h...@sociologi.cjb.net

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Parameterization of Inverse Wishart distribution available in MCMCpack and bayesm libraries

2012-03-02 Thread ilai
On Fri, Mar 2, 2012 at 1:22 AM, peter dalgaard pda...@gmail.com wrote:


 Er, yes (scalar does not imply integer)

Dough! awkward... Sorry Shantanu.

I've added
cat('###\n # ',substr(fortunes::fortune(90)$quote,1,146),'\n ### \n')
To .First in my Rhelp directory.
Hope that helps (me).





 As a general matter:

 1. This is the Open Source world, you can read the actual function and see 
 what it does. It might even say so in the comments.

 2. You can investigate empirically -- the moments are known as a function of 
 the parameters (check Wikipedia), so how about simulating a few thousand 
 matrices and looking at the means and variances.

 (I don't do IW on a daily basis, but AFAICT, the two parametrizations have 
 roughly the same mean, but a factor of two between variances, so it should be 
 fairly easy to spot whether it is one or the other.)

 --
 Peter Dalgaard, Professor,
 Center for Statistics, Copenhagen Business School
 Solbjerg Plads 3, 2000 Frederiksberg, Denmark
 Phone: (+45)38153501
 Email: pd@cbs.dk  Priv: pda...@gmail.com









__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] acf() plot of matrix cuts y-axis labels

2012-03-02 Thread Duncan Murdoch

On 02/03/2012 11:40 AM, Folkes, Michael wrote:

Hello all,
I found a funny problem with y-axis labels when plotting acf(matrix) -
the labels are too close to one of the margins and cut in half.
Here's the problem:

test-matrix(rnorm(200),ncol=4)
acf(test)

This doesn't fix the problem:
test-matrix(rnorm(200),ncol=4)
par(mar=c(3,3,2,0.2),oma=c(0,0,0,0))
acf(test)

This does fix the margin. I understand why, but not sure why ONLY this
will work?
test-matrix(rnorm(200),ncol=4)
acf(test,mar=c(3,3,2,0.2),oma=c(0,0,0,0))


acf uses plot.acf to do the plotting.  If you read ?plot.acf, you'll see 
how it comes up with acf settings:  the global ones are overridden for 
data like yours.


You might want to do some experimenting and suggest better defaults for 
plot.acf.


Duncan Murdoch

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] subseting a data frame

2012-03-02 Thread R. Michael Weylandt michael.weyla...@gmail.com
Please always cc the list for archival/threading reasons. 

Sort answer is that unique() gives the unique elements rather than something 
you should subset by, like a set of logical indices or row numbers. 

Note that in general unique(x) == x[!duplicated(x)] I'd imagine there are cases 
where this breaks down but I can't assemble one off the top of my head. 

Michael

On Mar 2, 2012, at 12:13 PM, nathalie n...@sanger.ac.uk wrote:

 thanks
 why unique doesn't work here??
 I believe you want the duplicated() function.
 
 Michael
 
 On Mar 2, 2012, at 10:19 AM, nathalien...@sanger.ac.uk  wrote:
 
 HI,
 this is my problem I want to subset this file df, using only  unique 
 df$exon printing the line once even if  df$exon appear several times:
 
 unique(df$exon) will show me the unique exons
 If I try to print only the unique exon lines
 with df[unique(df$exon),] -this doesn't print only the unique ones :(
 
 could you help?
 thanks
 Nat
 
 
 
 
exon size  chr start   end
 413077 ChrX_133594175_133594368_HPRT1  193 ChrX 133594175 133594368
 413270 ChrX_133594183_133594368_HPRT1  185 ChrX 133594183 133594368
 413455 ChrX_133594381_133594565_HPRT1  184 ChrX 133594381 133594565
 413639 ChrX_133607389_133607495_HPRT1  106 ChrX 133607389 133607495
 413745 ChrX_133607389_133607495_HPRT1  106 ChrX 133607389 133607495
 413851 ChrX_133607404_133607495_HPRT1   91 ChrX 133607404 133607495
 413942 ChrX_133609211_133609394_HPRT1  183 ChrX 133609211 133609394
 414125 ChrX_133609211_133609394_HPRT1  183 ChrX 133609211 133609394
 414308 ChrX_133620495_133620560_HPRT1   65 ChrX 133620495 133620560
 414373 ChrX_133620495_133620560_HPRT1   65 ChrX 133620495 133620560
 414438 ChrX_133620692_133620696_HPRT14 ChrX 133620692 133620696
 414442 ChrX_133624218_133624235_HPRT1   17 ChrX 133624218 133624235
 
 
 
 -- 
 The Wellcome Trust Sanger Institute is operated by Genome Research Limited, 
 a charity registered in England with number 1021457 and a company 
 registered in England with number 2742969, whose registered office is 215 
 Euston Road, London, NW1 2BE.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 
 
 -- 
 The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a 
 charity registered in England with number 1021457 and a company registered in 
 England with number 2742969, whose registered office is 215 Euston Road, 
 London, NW1 2BE. 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] acf() plot of matrix cuts y-axis labels

2012-03-02 Thread Folkes, Michael
Thanks for the advice.  I guess I should have read the acf help page
more thoroughly to appreciate the role of plot.acf().  Typical.
Thanks Duncan. 

-Original Message-
From: Duncan Murdoch [mailto:murdoch.dun...@gmail.com] 
Sent: March 2, 2012 9:53 AM
To: Folkes, Michael
Cc: r-help@r-project.org
Subject: Re: [R] acf() plot of matrix cuts y-axis labels

On 02/03/2012 11:40 AM, Folkes, Michael wrote:
 Hello all,
 I found a funny problem with y-axis labels when plotting acf(matrix) -

 the labels are too close to one of the margins and cut in half.
 Here's the problem:

 test-matrix(rnorm(200),ncol=4)
 acf(test)

 This doesn't fix the problem:
 test-matrix(rnorm(200),ncol=4)
 par(mar=c(3,3,2,0.2),oma=c(0,0,0,0))
 acf(test)

 This does fix the margin. I understand why, but not sure why ONLY this

 will work?
 test-matrix(rnorm(200),ncol=4)
 acf(test,mar=c(3,3,2,0.2),oma=c(0,0,0,0))

acf uses plot.acf to do the plotting.  If you read ?plot.acf, you'll see
how it comes up with acf settings:  the global ones are overridden for
data like yours.

You might want to do some experimenting and suggest better defaults for
plot.acf.

Duncan Murdoch

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] MNP and exclusion restriction

2012-03-02 Thread saqlain raza
Hi,

I am running the MNP package in R. The model runs well. There are actually 4 
choices and 4th is considered as base category. I got the result of all 19 
covariates for all 3 model choices. What I want to do with the result is to 
eliminate all the covariates from one model choice except constant and to 
eliminate only one covariate from other model choice except constant and the 
third model choice remains with all the 19 covariates and constant (in other 
words, Exclusion restriction). I am working on a data to test for the existence 
of complementarity. For this I need the results as I mentioned above.

Model Choice:B1               Model Choice: B2                                  
         Model Choice:B3
Constant only                 Constant + 18 covariates                          
        Constant + 19 covariates
Blue color shows the result what I want. 

Waiting for the good suggestions 
Saqlain RAZA
PhD Student (saqlain.r...@toulouse.inra.fr)
IODA/AGIR (www.toulouse.inra.fr/agir)
France.
Mobile: (+33) 06 18 37 63 52
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Why predicted values are fewer that the real?

2012-03-02 Thread labbig
Hi

i am running a glm model family Gamma(link=log) trying to predict a vector
of 1554 (real) values 

Using predict() i got a vector of 950 predicted values instead of 1554.
The predictions are good though
The model doesnt take account of negative values and NAs which are only 121
values.

Any clue?

Thank

--
View this message in context: 
http://r.789695.n4.nabble.com/Why-predicted-values-are-fewer-that-the-real-tp4438912p4438912.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] subseting a data frame

2012-03-02 Thread Rui Barradas
Hello,

 HI,
 this is my problem I want to subset this file df, using only  unique
 df$exon printing the line once even if  df$exon appear several times:
 
 unique(df$exon) will show me the unique exons
 If I try to print only the unique exon lines
 with df[unique(df$exon),] -this doesn't print only the unique ones :(
 

Try

inx - match(unique(df$exon), df$exon)
df[inx, ]


Hope this helps,

Rui Barradas


--
View this message in context: 
http://r.789695.n4.nabble.com/subseting-a-data-frame-tp4438745p4438922.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] reshaping

2012-03-02 Thread robgriffin247
Hello,
I have a large data set which I am trying to get in to a long/narrow format.
I have given an example below of how I want my data to look before and
after... any ideas for an easy way to do this?

*###Start With this...
*set.seed(1)
a=rnorm(10)
b=rnorm(10)
c=rnorm(10)
d=rnorm(10)
e=rnorm(10)
f=rnorm(10)
g=rnorm(10)
h=rnorm(10)
G=c(1,2,3,4,5,6,7,8,9,10)
test=matrix(c(G,a,b,c,d,e,f,g,h),ncol=9)
colnames(test)=c(G,a,b,c,d,e,f,g,h)
*test*

### WHERE... 
  # a-d = male. e-h  sex = female
  # a,c,e,g  replicate = 1
  # a+b  experimental line = L1, c+d =L2

*### Which Becomes This...
*z2=as.numeric(c(seq(1:10),seq(1:10),seq(1:10),seq(1:10)))
sex=c(m,m,m,m,m,m,m,m,m,m,m,m,m,m,m,m,m,m,m,
 
m,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f)
rep=c(1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2)
line=c(A1,A2,A3,A4,A5,A6,A7,A8,A9,A10,A1,A2,A3,A4,A5,A6,A7,A8,A9,A10,A1,A2,A3,A4,A5
,A6   ,A7   ,A8   ,A9   ,A10  ,A1   ,A2   ,A3   ,A4   ,A5   
,A6   ,A7
,A8   ,A9   ,A10)
r=as.numeric(c(rnorm(10),rnorm(10),rnorm(10),rnorm(10)))
test2=matrix(c(z2,sex,rep,line,r),ncol=5)
colnames(test2)=c(G,Sex,Rep,Line,Res)
*test2*

--
View this message in context: 
http://r.789695.n4.nabble.com/reshaping-tp4439182p4439182.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] projection problem in sp package

2012-03-02 Thread uday
I am plotting some data using sp package
library (sp)
library(maps) 
data.aggm # data 
# Define standard projection
ll   - CRS(+proj=longlat +datum=WGS84)
# convert to a SpatialPointsDataFrame
xy   - cbind(data.aggm[,1], data.aggm[,2])
ch4.spPoints - SpatialPointsDataFrame(coords=xy,
data=data.frame(data.aggm[,3]), proj4string=ll)

after running this I am getting error is 

Error in validityMethod ( as (object,superClass)):
Geographical CRS given to non -comformant data


does any body know how to fix this error ?

Cheers 

--
View this message in context: 
http://r.789695.n4.nabble.com/projection-problem-in-sp-package-tp4439317p4439317.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Why predicted values are fewer that the real?

2012-03-02 Thread Sarah Goslee
Hi,

On Fri, Mar 2, 2012 at 11:19 AM, labbig lkaim...@windowslive.com wrote:
 Hi

 i am running a glm model family Gamma(link=log) trying to predict a vector
 of 1554 (real) values

 Using predict() i got a vector of 950 predicted values instead of 1554.
 The predictions are good though
 The model doesnt take account of negative values and NAs which are only 121
 values.

 Any clue?

To have a clue, we need far more information than you've provided, possibly
even the reproducible example requested in the posting guide.

 Thank

 --
 View this message in context: 
 http://r.789695.n4.nabble.com/Why-predicted-values-are-fewer-that-the-real-tp4438912p4438912.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Sarah Goslee
http://www.functionaldiversity.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Tobit Fixed Effects

2012-03-02 Thread Felipe Nunes
Hi Arne,

thanks for the improvements in the package. I'm using it right now and it's
working very well.

Best,

*Felipe Nunes*
CAPES/Fulbright Fellow
PhD Student Political Science - UCLA
Web: felipenunes.bol.ucla.edu



On Fri, Mar 2, 2012 at 2:13 AM, Arne Henningsen 
arne.henning...@googlemail.com wrote:

 Dear Felipe

 On 29 September 2011 14:10, Arne Henningsen
 arne.henning...@googlemail.com wrote:
  Hi Felipe
 
  On 25 September 2011 00:16, Felipe Nunes felipnu...@gmail.com wrote:
  Hi Arne,
  my problem persists. I am still using censReg [version - 0.5-7] to run a
  random effects model in my data (50,000 cases), but I always get the
  message.
  tob7 - censReg(transfers.cap ~ pt.pt + psdb.pt + pt.opp + pt.coa +
 psdb.coa
  + pib.cap + transfers.cap.lag + pib.cap + ifdm + log(populat) +
  mayor.vot.per + log(bol.fam.per+0.01) + factor(uf.name) + factor(year)
 - 1,
  left=0, right=Inf, method=BHHH, nGHQ=8, iterlim=1, data = pdata2)
  Error in maxNRCompute(fn = logLikAttr, fnOrig = fn, gradOrig = grad,
  hessOrig = hess,  :
NA in the initial gradient
  If I sent you my data set, could you try to help me? I have been
 struggling
  with that for two months now.
 
  Thanks for sending me your data set. With it, I was able to figure
  out, where the NAs in the (initial) gradients come from: when
  calculating the derivatives of the standard normal density function [d
  dnorm(x) / d x = - dnorm(x) * x], values for x that are larger than
  approximately 40 (in absolute terms) result in so small values (in
  absolute terms) that R rounds them to zero. Later, these derivatives
  are multiplied by some other values and then the logarithm is taken
  ... and multiplying any number by zero and taking the logarithms gives
  not a finite number :-(
 
  When *densities* of the standard normal distribution become too small,
  one can use dnorm(x,log=TRUE) and store the logarithm of the small
  number, which is much larger (in absolute terms) than the density and
  hence, is not rounded to zero. However, in the case of the
  *derivative* of the standard normal density function, this is more
  complicated as log( d dnorm(x) / d x ) =  log( dnorm(x) ) + log( - x )
  is not defined if x is positive. I will try to solve this problem by
  case distinction (x0 vs. x0). Or does anybody know a better
  solution?

 Finally(!), I have implemented this solution in the censReg() package.
 Some initial tests (including your model and data) show that the
 revised calculation of the gradient of the random effects panel data
 model for censored dependent variables is much more robust to rounding
 errors. The improved version of the censReg package is not yet via
 CRAN, but it is available at R-Forge:

 https://r-forge.r-project.org/R/?group_id=256

 If you have further questions or feedback regarding the censReg
 package, please use a forum or tracker on the R-Forge site of the
 sampleSelection project:

 https://r-forge.r-project.org/projects/sampleselection/

 Best wishes from Copenhagen,
 Arne

 --
 Arne Henningsen
 http://www.arne-henningsen.name


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] speed up merge

2012-03-02 Thread jim holtman
One way to speed up the merge is not to use merge.  You can use 'match' to
find matching indices and then manually.

Does this do what you want:

 ua - read.table(text = '  AName  rt_date
+ 2007-03-31 14066.580078125 2007-04-01
+ 2007-06-30 14717   2007-04-03
+ 2007-09-30 15528   2007-10-25
+ 2007-12-31 17609   2008-04-06
+ 2008-03-31 17168   2008-04-24
+ 2008-06-30 17681   2008-04-09', header = TRUE, as.is = TRUE)

 dt - c( 2007-03-31 ,2007-04-01 ,2007-04-02, 2007-04-03
,2007-04-04,
+ 2007-04-05 ,2007-04-06 ,2007-04-07,
+ 2007-04-08, 2007-04-09)

 # find matching values in ua
 indx - match(dt, ua$rt_date)

 # create new result matrix
 xx1 - cbind(dt, ua[indx,])
 rownames(xx1) - NULL  # delete funny names
 xx1
   dtANamert_date
1  2007-03-31   NA   NA
2  2007-04-01 14066.58 2007-04-01
3  2007-04-02   NA   NA
4  2007-04-03 14717.00 2007-04-03
5  2007-04-04   NA   NA
6  2007-04-05   NA   NA
7  2007-04-06   NA   NA
8  2007-04-07   NA   NA
9  2007-04-08   NA   NA
10 2007-04-09   NA   NA



On Fri, Mar 2, 2012 at 5:24 AM, Ben quant ccqu...@gmail.com wrote:

 Hello,

 I have a nasty loop that I have to do 11877 times. The only thing that
 slows it down really is this merge:

 xx1 = merge(dt,ua_rd,by.x=1,by.y= 'rt_date',all.x=T)

 Any ideas on how to speed it up? The output can't change materially (it
 works), but I'd like it to go faster. I'm looking at getting around the
 loop (not shown), but I'm trying to speed up the merge first. I'll post
 regarding the loop if nothing comes of this post.

 Here is some information on what type of stuff is going into the merge:

  class(ua_rd)
 [1] matrix
  dim(ua_rd)
 [1] 20  2
  head(ua_rd)
   AName  rt_date
 2007-03-31 14066.580078125 2007-04-26
 2007-06-30 14717   2007-07-19
 2007-09-30 15528   2007-10-25
 2007-12-31 17609   2008-01-24
 2008-03-31 17168   2008-04-24
 2008-06-30 17681   2008-07-17
  class(dt)
 [1] character
  length(dt)
 [1] 1799
  dt[1:10]
  [1] 2007-03-31 2007-04-01 2007-04-02 2007-04-03 2007-04-04
 2007-04-05 2007-04-06 2007-04-07
  [9] 2007-04-08 2007-04-09

 thanks,

 Ben

[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Spacing of text does not match spacing of bars in barplot

2012-03-02 Thread jon waterhouse
I have a very standard barplot. My labels are too long to be printed
horizontally under each bar, so I am using text to put the labels on a 45
degree slant. 

However, the labels are spaced more narrowly than the bars, so on an 8
vertical bar plot, the end of the eighth label is lined up with the seventh
bar.

Preferably I don't want to do every text label separately (I'm having this
problem on all the graphs where I'm using text()

Using Windows and version 2.14.1

X2sum - c(42.6,  3.6,  1.8,  3.9, 12.1, 14.3, 14.6 ,28.4)
X2.labels - c(No earnings, Less than $5000/year, $5K to $10K , $10K
to $15K  , $ 15K to $20K ,   $20K to $25K  ,   $25K to $30K  
,  Over $30K   )  

barplot(X2sum)
text(1:8, par(usr)[3] - 0.5, srt = 45, adj = 1, labels =X2.labels, xpd =
TRUE)

Thanks,

Jon


--
View this message in context: 
http://r.789695.n4.nabble.com/Spacing-of-text-does-not-match-spacing-of-bars-in-barplot-tp4439635p4439635.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Why predicted values are fewer that the real?

2012-03-02 Thread Richard M. Heiberger
try using glm(...,na.action=na.exclude)

See ?na.exclude

for the explanation

On Fri, Mar 2, 2012 at 11:19 AM, labbig lkaim...@windowslive.com wrote:

 Hi

 i am running a glm model family Gamma(link=log) trying to predict a vector
 of 1554 (real) values

 Using predict() i got a vector of 950 predicted values instead of 1554.
 The predictions are good though
 The model doesnt take account of negative values and NAs which are only 121
 values.

 Any clue?

 Thank

 --
 View this message in context:
 http://r.789695.n4.nabble.com/Why-predicted-values-are-fewer-that-the-real-tp4438912p4438912.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.htmlhttp://www.r-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Why predicted values are fewer that the real?

2012-03-02 Thread David Winsemius


On Mar 2, 2012, at 11:19 AM, labbig wrote:


Hi

i am running a glm model family Gamma(link=log) trying to predict a  
vector

of 1554 (real) values

Using predict() i got a vector of 950 predicted values instead of  
1554.

The predictions are good though
The model doesnt take account of negative values and NAs which are  
only 121

values.


Taking the logs of negative numbers?


Any clue?


I see a couple of possibilities.


View this message in context: 
http://r.789695.n4.nabble.com/Why-predicted-values-are-fewer-that-the-real-tp4438912p4438912.html
Sent from the R help mailing list archive at Nabble.com.


--
David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Spacing of text does not match spacing of bars in barplot

2012-03-02 Thread William Dunlap
The return value of barplot contains the locations
of the bars that it just drew.  Use that instead of
1:8 when you draw the text:

  barCenters - barplot(X2sum)
  text(barCenters, par(usr)[3] - 0.5, srt = 45, adj = 1, labels =X2.labels, 
  xpd = TRUE)

Look at help(barplot) for details.

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com 

 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
 Behalf Of jon waterhouse
 Sent: Friday, March 02, 2012 11:52 AM
 To: r-help@r-project.org
 Subject: [R] Spacing of text does not match spacing of bars in barplot
 
 I have a very standard barplot. My labels are too long to be printed
 horizontally under each bar, so I am using text to put the labels on a 45
 degree slant.
 
 However, the labels are spaced more narrowly than the bars, so on an 8
 vertical bar plot, the end of the eighth label is lined up with the seventh
 bar.
 
 Preferably I don't want to do every text label separately (I'm having this
 problem on all the graphs where I'm using text()
 
 Using Windows and version 2.14.1
 
 X2sum - c(42.6,  3.6,  1.8,  3.9, 12.1, 14.3, 14.6 ,28.4)
 X2.labels - c(No earnings, Less than $5000/year, $5K to $10K , $10K
 to $15K  , $ 15K to $20K ,   $20K to $25K  ,   $25K to $30K
 ,  Over $30K   )
 
 barplot(X2sum)
 text(1:8, par(usr)[3] - 0.5, srt = 45, adj = 1, labels =X2.labels, xpd =
 TRUE)
 
 Thanks,
 
 Jon
 
 
 --
 View this message in context: 
 http://r.789695.n4.nabble.com/Spacing-of-text-does-not-match-spacing-of-
 bars-in-barplot-tp4439635p4439635.html
 Sent from the R help mailing list archive at Nabble.com.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Spacing of text does not match spacing of bars in barplot

2012-03-02 Thread Marc Schwartz
On Mar 2, 2012, at 1:52 PM, jon waterhouse wrote:

 I have a very standard barplot. My labels are too long to be printed
 horizontally under each bar, so I am using text to put the labels on a 45
 degree slant. 
 
 However, the labels are spaced more narrowly than the bars, so on an 8
 vertical bar plot, the end of the eighth label is lined up with the seventh
 bar.
 
 Preferably I don't want to do every text label separately (I'm having this
 problem on all the graphs where I'm using text()
 
 Using Windows and version 2.14.1
 
 X2sum - c(42.6,  3.6,  1.8,  3.9, 12.1, 14.3, 14.6 ,28.4)
 X2.labels - c(No earnings, Less than $5000/year, $5K to $10K , $10K
 to $15K  , $ 15K to $20K ,   $20K to $25K  ,   $25K to $30K  
 ,  Over $30K   )  
 
 barplot(X2sum)
 text(1:8, par(usr)[3] - 0.5, srt = 45, adj = 1, labels =X2.labels, xpd =
 TRUE)
 
 Thanks,
 
 Jon


Read ?barplot and take note of the Value section:

A numeric vector (or matrix, when beside = TRUE), say mp, giving the 
coordinates of all the bar midpoints drawn, useful for adding to the graph.

If beside is true, use colMeans(mp) for the midpoints of each group of bars, 
see example.


Thus:

  mp - barplot(X2sum)

  text(mp, ...)

You correctly read the R FAQ on the matter, but that example uses plot() rather 
than barplot(). The midpoints of the bars are not at integer values.

HTH,

Marc Schwartz

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Computing line= for mtext

2012-03-02 Thread Frank Harrell
Hi Rich and Peter,

What I am trying to do is the right-justify a vector of numbers to the right
of the y-axis so that the leftmost digit of all of the numbers is one
character to the right of the axis line.  axis() plots tick marks and
left-justifies the numbers.

Peter's idea:
-
Since you're setting your right margin to 5, why 
not just 

mtext(s, side=4, las=1, at=5, adj=1, line = 5, cex=1) 
mtext(s, side=4, las=1, at=7, adj=1, line = 5,  cex=2) 

i.e. set the line argument to the right margin? 
-
comes close to what I'm trying to do but I haven't found out how to compute
line from the maximum width over all the strings in the vector being
plotted vertically, and I'm not sure it scales properly for different cex=. 
strwidth(s, units='inches')/par('cin')[1] doesn't seem to be a complete
solution.

Thanks
Frank


Richard M. Heiberger wrote
 
 Frank,
 
 this is it.  It uses Peter's idea.
 
  plot(1:10)
  axis(side=2, 1:10, las=1, line=-31.5, lwd=0)
  axis(side=4, 1:10, las=1, labels=FALSE)
 
 Rich
 
 
 On Thu, Mar 1, 2012 at 6:52 PM, Frank Harrell lt;f.harrell@gt;wrote:
 
 Rich's pointers deals with lattice/grid graphics.  Does anyone have a
 solution for base graphics?
 Thanks
 Frank

 Richard M. Heiberger wrote
 
  Frank,
 
  This can be done directly with a variant of the panel.axis function.
  See function panel.axis.right in the HH package.  This was provided for
 me
  by David Winsemius in response to my query on this list in October 2011
  https://stat.ethz.ch/pipermail/r-help/2011-October/292806.html
 
  The email thread also includes comments by Deepayan Sarkar and Paul
  Murrell.
 
  Rich
 
  On Wed, Feb 29, 2012 at 8:48 AM, Frank Harrell lt;f.harrell@gt;wrote:
 
  I want to right-justify a vector of numbers in the right margin of a
  low-level plot.  For this I need to compute the line parameter to give
 to
  mtext.  Is this the correct scalable calculation?
 
  par(mar=c(4,3,1,5)); plot(1:20)
  s - 'abcde'; w=strwidth(s, units='inches')/par('cin')[1]
  mtext(s, side=4, las=1, at=5, adj=1, line=w-.5, cex=1)
  mtext(s, side=4, las=1, at=7, adj=1, line=2*(w-.5), cex=2)
 
  Thanks
  Frank
 
  -
  Frank Harrell
  Department of Biostatistics, Vanderbilt University
  --
  View this message in context:
 
 http://r.789695.n4.nabble.com/Computing-line-for-mtext-tp4431554p4431554.html
  Sent from the R help mailing list archive at Nabble.com.
 
  __
  R-help@ mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 
 http://www.R-project.org/posting-guide.htmllt;http://www.r-project.org/posting-guide.htmlgtlt;http://www.r-project.org/posting-guide.htmlamp;lt;http://www.r-project.org/posting-guide.htmlamp;gtgt;
 ;
  and provide commented, minimal, self-contained, reproducible code.
 
 
[[alternative HTML version deleted]]
 
  __
  R-help@ mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 
 http://www.R-project.org/posting-guide.htmllt;http://www.r-project.org/posting-guide.htmlgt;
  and provide commented, minimal, self-contained, reproducible code.
 


 -
 Frank Harrell
 Department of Biostatistics, Vanderbilt University
 --
 View this message in context:
 http://r.789695.n4.nabble.com/Computing-line-for-mtext-tp4431554p4436923.html
  Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@ mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.htmllt;http://www.r-project.org/posting-guide.htmlgt;
 and provide commented, minimal, self-contained, reproducible code.

 
   [[alternative HTML version deleted]]
 
 __
 R-help@ mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 

-
Frank Harrell
Department of Biostatistics, Vanderbilt University
--
View this message in context: 
http://r.789695.n4.nabble.com/Computing-line-for-mtext-tp4431554p4439703.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] reshaping

2012-03-02 Thread Ista Zahn
Hi,

I *think* this is what you want...

On Fri, Mar 2, 2012 at 12:29 PM, robgriffin247
robgriffin...@hotmail.com wrote:
 Hello,
 I have a large data set which I am trying to get in to a long/narrow format.
 I have given an example below of how I want my data to look before and
 after... any ideas for an easy way to do this?

 *###Start With this...
 *set.seed(1)
 a=rnorm(10)
 b=rnorm(10)
 c=rnorm(10)
 d=rnorm(10)
 e=rnorm(10)
 f=rnorm(10)
 g=rnorm(10)
 h=rnorm(10)
 G=c(1,2,3,4,5,6,7,8,9,10)
 test=matrix(c(G,a,b,c,d,e,f,g,h),ncol=9)
 colnames(test)=c(G,a,b,c,d,e,f,g,h)
 *test*


library(reshape2)
test.m - melt(as.data.frame(test), id.vars=G)

 ### WHERE...
  # a-d = male. e-h  sex = female

test.m$sex - ifelse(as.character(test.m$variable) %in% letters[1:4],
male, female)

  # a,c,e,g  replicate = 1

test.m$replicate - ifelse(as.character(test.m$variable) %in% c(a,
c, e, g), 1, 2)

  # a+b  experimental line = L1, c+d =L2

test.m$line - paste(test.m$variable, test.m$G, sep = )

HTH,
Ista


 *### Which Becomes This...
 *z2=as.numeric(c(seq(1:10),seq(1:10),seq(1:10),seq(1:10)))
 sex=c(m,m,m,m,m,m,m,m,m,m,m,m,m,m,m,m,m,m,m,

 m,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f)
 rep=c(1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2)
 line=c(A1,A2,A3,A4,A5,A6,A7,A8,A9,A10,A1,A2,A3,A4,A5,A6,A7,A8,A9,A10,A1,A2,A3,A4,A5
 ,A6   ,A7   ,A8   ,A9   ,A10  ,A1   ,A2   ,A3   ,A4   ,A5 
   ,A6   ,A7
 ,A8   ,A9   ,A10)
 r=as.numeric(c(rnorm(10),rnorm(10),rnorm(10),rnorm(10)))
 test2=matrix(c(z2,sex,rep,line,r),ncol=5)
 colnames(test2)=c(G,Sex,Rep,Line,Res)
 *test2*

 --
 View this message in context: 
 http://r.789695.n4.nabble.com/reshaping-tp4439182p4439182.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Computing line= for mtext

2012-03-02 Thread ilai
On Fri, Mar 2, 2012 at 1:17 PM, Frank Harrell f.harr...@vanderbilt.edu wrote:
 Hi Rich and Peter,

 What I am trying to do is the right-justify a vector of numbers to the right
 of the y-axis so that the leftmost digit of all of the numbers is one
 character to the right of the axis line.  axis() plots tick marks and

No it doesn't (not always)

 left-justifies the numbers.

For axis(4). If you really want right justify on the right hand side,
you could use axis(2) with negative line numbers (device size
dependent). Or you could try something like

plot(1:20)
axis(2,at=seq(1,20,4),labels=T,tick=F,las=T,pos=c(22.25,1))

Hope this is getting there (after only 6 messages...)

Elai


 Peter's idea:
 -
 Since you're setting your right margin to 5, why
 not just

 mtext(s, side=4, las=1, at=5, adj=1, line = 5, cex=1)
 mtext(s, side=4, las=1, at=7, adj=1, line = 5,  cex=2)

 i.e. set the line argument to the right margin?
 -
 comes close to what I'm trying to do but I haven't found out how to compute
 line from the maximum width over all the strings in the vector being
 plotted vertically, and I'm not sure it scales properly for different cex=.
 strwidth(s, units='inches')/par('cin')[1] doesn't seem to be a complete
 solution.

 Thanks
 Frank


 Richard M. Heiberger wrote

 Frank,

 this is it.  It uses Peter's idea.

  plot(1:10)
  axis(side=2, 1:10, las=1, line=-31.5, lwd=0)
  axis(side=4, 1:10, las=1, labels=FALSE)

 Rich


 On Thu, Mar 1, 2012 at 6:52 PM, Frank Harrell lt;f.harrell@gt;wrote:

 Rich's pointers deals with lattice/grid graphics.  Does anyone have a
 solution for base graphics?
 Thanks
 Frank

 Richard M. Heiberger wrote
 
  Frank,
 
  This can be done directly with a variant of the panel.axis function.
  See function panel.axis.right in the HH package.  This was provided for
 me
  by David Winsemius in response to my query on this list in October 2011
  https://stat.ethz.ch/pipermail/r-help/2011-October/292806.html
 
  The email thread also includes comments by Deepayan Sarkar and Paul
  Murrell.
 
  Rich
 
  On Wed, Feb 29, 2012 at 8:48 AM, Frank Harrell lt;f.harrell@gt;wrote:
 
  I want to right-justify a vector of numbers in the right margin of a
  low-level plot.  For this I need to compute the line parameter to give
 to
  mtext.  Is this the correct scalable calculation?
 
  par(mar=c(4,3,1,5)); plot(1:20)
  s - 'abcde'; w=strwidth(s, units='inches')/par('cin')[1]
  mtext(s, side=4, las=1, at=5, adj=1, line=w-.5, cex=1)
  mtext(s, side=4, las=1, at=7, adj=1, line=2*(w-.5), cex=2)
 
  Thanks
  Frank
 
  -
  Frank Harrell
  Department of Biostatistics, Vanderbilt University
  --
  View this message in context:
 
 http://r.789695.n4.nabble.com/Computing-line-for-mtext-tp4431554p4431554.html
  Sent from the R help mailing list archive at Nabble.com.
 
  __
  R-help@ mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 
 http://www.R-project.org/posting-guide.htmllt;http://www.r-project.org/posting-guide.htmlgtlt;http://www.r-project.org/posting-guide.htmlamp;lt;http://www.r-project.org/posting-guide.htmlamp;gtgt;
 ;
  and provide commented, minimal, self-contained, reproducible code.
 
 
        [[alternative HTML version deleted]]
 
  __
  R-help@ mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 
 http://www.R-project.org/posting-guide.htmllt;http://www.r-project.org/posting-guide.htmlgt;
  and provide commented, minimal, self-contained, reproducible code.
 


 -
 Frank Harrell
 Department of Biostatistics, Vanderbilt University
 --
 View this message in context:
 http://r.789695.n4.nabble.com/Computing-line-for-mtext-tp4431554p4436923.html
  Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@ mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.htmllt;http://www.r-project.org/posting-guide.htmlgt;
 and provide commented, minimal, self-contained, reproducible code.


       [[alternative HTML version deleted]]

 __
 R-help@ mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


 -
 Frank Harrell
 Department of Biostatistics, Vanderbilt University
 --
 View this message in context: 
 http://r.789695.n4.nabble.com/Computing-line-for-mtext-tp4431554p4439703.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, 

Re: [R] speed up merge

2012-03-02 Thread Ben quant
I'll have to give this a try this weekend. Thank you!

ben

On Fri, Mar 2, 2012 at 12:07 PM, jim holtman jholt...@gmail.com wrote:

 One way to speed up the merge is not to use merge.  You can use 'match' to
 find matching indices and then manually.

 Does this do what you want:

  ua - read.table(text = '  AName  rt_date
 + 2007-03-31 14066.580078125 2007-04-01
 + 2007-06-30 14717   2007-04-03
 + 2007-09-30 15528   2007-10-25
 + 2007-12-31 17609   2008-04-06
 + 2008-03-31 17168   2008-04-24
 + 2008-06-30 17681   2008-04-09', header = TRUE, as.is = TRUE)
 
  dt - c( 2007-03-31 ,2007-04-01 ,2007-04-02, 2007-04-03
 ,2007-04-04,
 + 2007-04-05 ,2007-04-06 ,2007-04-07,
 + 2007-04-08, 2007-04-09)
 
  # find matching values in ua
  indx - match(dt, ua$rt_date)
 
  # create new result matrix
  xx1 - cbind(dt, ua[indx,])
  rownames(xx1) - NULL  # delete funny names
  xx1
dtANamert_date
 1  2007-03-31   NA   NA
 2  2007-04-01 14066.58 2007-04-01
 3  2007-04-02   NA   NA
 4  2007-04-03 14717.00 2007-04-03
 5  2007-04-04   NA   NA
 6  2007-04-05   NA   NA
 7  2007-04-06   NA   NA
 8  2007-04-07   NA   NA
 9  2007-04-08   NA   NA
 10 2007-04-09   NA   NA
 


 On Fri, Mar 2, 2012 at 5:24 AM, Ben quant ccqu...@gmail.com wrote:

 Hello,

 I have a nasty loop that I have to do 11877 times. The only thing that
 slows it down really is this merge:

 xx1 = merge(dt,ua_rd,by.x=1,by.y= 'rt_date',all.x=T)

 Any ideas on how to speed it up? The output can't change materially (it
 works), but I'd like it to go faster. I'm looking at getting around the
 loop (not shown), but I'm trying to speed up the merge first. I'll post
 regarding the loop if nothing comes of this post.

 Here is some information on what type of stuff is going into the merge:

  class(ua_rd)
 [1] matrix
  dim(ua_rd)
 [1] 20  2
  head(ua_rd)
   AName  rt_date
 2007-03-31 14066.580078125 2007-04-26
 2007-06-30 14717   2007-07-19
 2007-09-30 15528   2007-10-25
 2007-12-31 17609   2008-01-24
 2008-03-31 17168   2008-04-24
 2008-06-30 17681   2008-07-17
  class(dt)
 [1] character
  length(dt)
 [1] 1799
  dt[1:10]
  [1] 2007-03-31 2007-04-01 2007-04-02 2007-04-03 2007-04-04
 2007-04-05 2007-04-06 2007-04-07
  [9] 2007-04-08 2007-04-09

 thanks,

 Ben

[[alternative HTML version deleted]]


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




 --
 Jim Holtman
 Data Munger Guru

 What is the problem that you are trying to solve?
 Tell me what you want to do, not how you want to do it.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Calculation of standard error for a function

2012-03-02 Thread Jun Shen
Dear list,

If I know the standard error for k1 and k2, is there anything I can call in
R to calculate the standard error of k1/k2? Thanks.

Jun

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Calculation of standard error for a function

2012-03-02 Thread David Winsemius


On Mar 2, 2012, at 4:47 PM, Jun Shen wrote:


Dear list,

If I know the standard error for k1 and k2, is there anything I can  
call in

R to calculate the standard error of k1/k2? Thanks.


This does not appear to be a well-posed question yet, and it is  
arguably more a statistics question than a coding question. Perhaps if  
you posted it at:

http://stats.stackexchange.com/

 . with more detail about the background  for this question  
(especially _how_ you know these bits of information, but also what  
might be the purpose of this effort) you might get a more complete  
answer.


--
David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Noob question - Identity argument within aggregate function?

2012-03-02 Thread knavero
aggregate(z, identity, mean)
  1   2   3   4   5 
1.0 3.0 5.0 6.0 7.5 
 aggregate(z, mean)
Error: length(time(x)) == length(by[[1]]) is not TRUE

Can someone help me understand the error above and why identity is
necessary to satisfy the error




--
View this message in context: 
http://r.789695.n4.nabble.com/Noob-question-Identity-argument-within-aggregate-function-tp4439806p4439806.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] times series trellis plot

2012-03-02 Thread sluedtke
Dear List, 

I am struggling with the trellis graphic. A similar problem was mentioned
here:

http://r.789695.n4.nabble.com/R-How-can-you-get-N-replicates-of-a-multi-screen-multivariate-time-series-plot-td811850.html


I do have 2 time series data sets. The 2 time series differ in some orders
of magnitude. I managed to plot them into 1 graph, but, since the data is
that different, on of the data set appears as a line only, well almost. So I
would need to set a second y axis before, that is scaled to the second data
set. 

I know how to do it with the usual plot.window routine or something similar,
but not with the trellis graphic of the lattice package. 

Thanks in advance!

stefan


--
View this message in context: 
http://r.789695.n4.nabble.com/times-series-trellis-plot-tp4440032p4440032.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] ?Syntax on Taking differential on both sides of the equation in 'R'

2012-03-02 Thread Anil A. Panackal MD

Hi,I am using package deSolve to run some ordinary differential equations 
(ODE) as part of a mathematical modeling project.  I have solved for the 
following equilibrium states: Seq1-a*(1-Neq1)/(f*Veq1+m+d)
Ceq1-(f*Seq1*Veq1+g*Ieq1+r*(1-Neq1)-b1*Veq1*Ieq1)/(b2+m+d+g)
Ieq1-(-b2*Ceq1)-r*(1-Neq1)/(b1*Veq1-g-u)
Veq1-o*(Ceq1+Ieq1)/e  I want to take the differential of both sides of the 
equation and then solve for the inverse of the first as follows (the parameter 
values are made up): library(deSolve)
rkMethod(rk45dp7)
CSIeq1-function(t,yeq1,pars) {
with (as.list(c(yeq1,pars)),{
Neq1-Seq1+Ceq1+Ieq1
dSeq1-d[a*(1-Neq1)/(f*Veq1+m+d)]
dCeq1-d[(f*Seq1*Veq1+g*Ieq1+r*(1-Neq1)-b1*Veq1*Ieq1)/(b2+m+d+g)]
dIeq1-d[(-b2*Ceq1)-r*(1-Neq1)/(b1*Veq1-g-u)]
dVeq1-d[o*(Ceq1+Ieq1)/e]
return(list(c(Seq1,Ceq1,Ieq1,Veq1),Neq1))
})
}pars - c(a=0.1, 
m=0.0005, 
u=0.5, 
b1=0.7, 
b2=0.2, 
f=0.01, 
g=0.4,  
r=0.3, 
o=20, 
e=90, 
d=0.01)
# initial conditions
yeq1 - c(Seq1=0.99,Ceq1=0.01,Ieq1=0,Veq1=1)
t -seq(0,365, by=50)
o1 - ode(yeq1, t, CSIeq1, pars, method = rkMethod(rk45dp7))1/o1$Seq1
*** The output gives the 
following error message so I am wondering about the syntax of the differentials 
such as dSeq1-d[a*(1-Neq1)/(f*Veq1+m+d)]:
  library(deSolve)
 rkMethod(rk45dp7)
$ID
[1] rk45dp7$varstep
[1] TRUE$FSAL
[1] TRUE$A
   [,1]   [,2]  [,3]   [,4]   [,5]  [,6]
[1,] 0.   0.00 0.000  0.000  0.000 0.000
[2,] 0.2000   0.00 0.000  0.000  0.000 0.000
[3,] 0.0750   0.225000 0.000  0.000  0.000 0.000
[4,] 0.9778  -3.73 3.556  0.000  0.000 0.000
[5,] 2.95259869 -11.595793 9.8228929 -0.2908093  0.000 0.000
[6,] 2.84627525 -10.757576 8.9064227  0.2784091 -0.2735313 0.000
[7,] 0.09114583   0.00 0.4492363  0.6510417 -0.3223762 0.1309524$b1
[1]  0.08991319  0.  0.45348907  0.61406250 -0.27151238  0.08904762
[7]  0.0250$b2
[1]  0.09114583  0.  0.44923630  0.65104167 -0.32237618  0.13095238
[7]  0.$c
[1] 0.000 0.200 0.300 0.800 0.889 1.000 1.000$d
[1] -1.127018  0.00  2.675424 -5.685527  3.521932 -1.767281  
2.382469$densetype
[1] 1$stage
[1] 7$Qerr
[1] 4attr(,class)
[1] list rkMethod
 CSIeq1-function(t,yeq1,pars) {
+ with (as.list(c(yeq1,pars)),{
+ Neq1-Seq1+Ceq1+Ieq1
+ dSeq1-d[a*(1-Neq1)/(f*Veq1+m+d)]
+ dCeq1-d[(f*Seq1*Veq1+g*Ieq1+r*(1-Neq1)-b1*Veq1*Ieq1)/(b2+m+d+g)]
+ dIeq1-d[(-b2*Ceq1)-r*(1-Neq1)/(b1*Veq1-g-u)]
+ dVeq1-d[o*(Ceq1+Ieq1)/e]
+ return(list(c(Seq1,Ceq1,Ieq1,Veq1),Neq1))
+ })
+ }
 
 pars - c(a=0.1, 
m=0.0005, 
u=0.5, 
b1=0.7, 
b2=0.2, 
f=0.01, 
g=0.4,  
r=0.3, 
o=20, 
e=90, 
d=0.01)
 # initial conditions
 yeq1 - c(Seq1=0.99,Ceq1=0.01,Ieq1=0,Veq1=1)
 t -seq(0,365, by=50)
 o1 - ode(yeq1, t, CSIeq1, pars, method = rkMethod(rk45dp7))
There were 50 or more warnings (use warnings() to see the first 50)
 
 1/o1$Seq1
Error in o1$Seq1 : $ operator is invalid for atomic vectors
 warnings()
Warning messages:
1: In eval(expr, envir, enclos) : NAs introduced by coercion
  Any thoughts or suggestions would 
be appreciated.  If you run the program with only the differential on the left, 
it runs just fine.  Do I need to use a different 'R' package?
-- AAP
 
Anil A. Panackal, MD, ScM, FACP

Confidential E-Mail: This e-mail is intended only for the person or entity to 
which it is addressed, and may contain information that is privileged, 
confidential, or otherwise protected from disclosure. Dissemination, 
distribution, or copying of this e-mail or the information herein by anyone 
other than the intended recipient is prohibited. If you have received this 
e-mail in error, please notify the sender by reply e-mail, and destroy the 
original message and all copies.  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Count matches in pmatch

2012-03-02 Thread apricum
Hi,
I need to find number of occurence of each word from one string in other
string. So I need a function, which is similar to pmatch, but returns not
references, but number of matches.  Is there any function like this? If no,
that is the way to calculate what I need?

--
View this message in context: 
http://r.789695.n4.nabble.com/Count-matches-in-pmatch-tp4439722p4439722.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] pasting several things

2012-03-02 Thread chuck.01
I have this type of format:

structure(list(day = 19, C1 = structure(1L, .Label = c(, C1
), class = factor), C2 = structure(2L, .Label = c(, C2), class =
factor), 
C3 = structure(1L, .Label = c(, C3), class = factor), 
Q1 = structure(2L, .Label = c(, Q1), class = factor), 
Q2 = structure(2L, .Label = c(, Q2), class = factor), 
Q3 = structure(1L, .Label = c(, Q3), class = factor)), .Names =
c(day, 
C1, C2, C3, Q1, Q2, Q3), row.names = 8, class = data.frame)
 

and want something like this:

 19 -C2 _Q1_Q2  

any ideas?  obviously I could use paste() and get this.  Keep in mind I have
many of these and the presence of C1, C2, ... etc will vary.


Thanks. 

--
View this message in context: 
http://r.789695.n4.nabble.com/pasting-several-things-tp4439770p4439770.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Noob question - Identity argument within aggregate function?

2012-03-02 Thread Sarah Goslee
Hi,

On Fri, Mar 2, 2012 at 3:51 PM, knavero knav...@gmail.com wrote:
aggregate(z, identity, mean)
  1   2   3   4   5
 1.0 3.0 5.0 6.0 7.5
 aggregate(z, mean)
 Error: length(time(x)) == length(by[[1]]) is not TRUE

 Can someone help me understand the error above and why identity is
 necessary to satisfy the error

We can tell you to read ?aggregate and look at the order of the arguments.
We can point out that aggregate(z, mean) is not the same as mean(z), and
wonder what you are trying to accomplish with the former.

We can ask for a reproducible example, and some idea of what you are
trying to do.

Sarah


-- 
Sarah Goslee
http://www.functionaldiversity.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Noob question - Identity argument within aggregate function?

2012-03-02 Thread David Winsemius


On Mar 2, 2012, at 3:51 PM, knavero wrote:


aggregate(z, identity, mean)

 1   2   3   4   5
1.0 3.0 5.0 6.0 7.5

aggregate(z, mean)

Error: length(time(x)) == length(by[[1]]) is not TRUE


As generally happens when you call a function and fail to provide  
enough arguments to fill up its formals list.




Can someone help me understand the error above and why identity is
necessary to satisfy the error


Well on my machine it throws an error, probably because you failed to  
provide the requested code to create the objects you were working on.  
Is 'z' so sort of special classed object for which there is an  
aggregate method? Is 'identity' a list as expected by  
aggregate.default or aggregate.data.frame? It would be an unfortunate  
choice of an object name, since there is a function with that nam.


--
David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Memory issue. XXXX

2012-03-02 Thread steven mosher
1. How much RAM do you have (looks like 2GB ) . If you have more than 2GB
then you can allocate
more memory with memory.size()

2. If you have 2GB or less then you have a couple options

a) make sure your session is clean of unnecessary objects.
b) Dont read in all the data if you dont need to ( see colClasses  to
control this )
c) use the bigmemory package or ff package
d) buy more RAM


On Fri, Mar 2, 2012 at 6:57 AM, Dan Abner dan.abne...@gmail.com wrote:

 Hi everyone,

 Any ideas on troubleshooting this memory issue:

  d1-read.csv(arrears.csv)
 Error: cannot allocate vector of size 77.3 Mb
 In addition: Warning messages:
 1: In class(data) - data.frame :
  Reached total allocation of 1535Mb: see help(memory.size)
 2: In class(data) - data.frame :
  Reached total allocation of 1535Mb: see help(memory.size)
 3: In class(data) - data.frame :
  Reached total allocation of 1535Mb: see help(memory.size)
 4: In class(data) - data.frame :
  Reached total allocation of 1535Mb: see help(memory.size)


 Thanks!

 Dan

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Cleaning up messy Excel data

2012-03-02 Thread Greg Snow
Try sending your clients a data set (data frame, table, etc) as an MS
Access data table instead.  They can still view the data as a table,
but will have to go to much more effort to mess up the data, more
likely they will do proper edits without messing anything up (mixing
characters in with numbers, have more sexes than your biology teacher
told you about, add extra lines at top or bottom that makes reading
back into R more difficult, etc.)

I have had a few clients that I talked into using MS Access from the
start to enter their data, there was often a bit of resistance at
first, but once they tried it and went through the process of
designing the database up front they ended up thanking me and believed
that the entire data entry process was easier and quicker than had the
used excel as they originally planned.

Access is still part of MS office, so they don't need to learn R or in
any way break their chains from being prisoners of bill, but they will
be more productive in more ways than just interfacing with you.

Access (databases in general) force you to plan things out and do the
correct thing from the start.  It is possible to do the right thing in
Excel, but Excel does not encourage (let alone force) you to do the
right thing, but makes it easy to do the wrong thing.

On Thu, Mar 1, 2012 at 6:15 AM, jim holtman jholt...@gmail.com wrote:
 But there are some important reasons to use Excel.  In my work there
 are a lot of people that I have to send the equivalent of a data.frame
 to who want to look at the data and possibly slice/dice the data
 differently and then send back to me updates.  These folks do not know
 how to use R, but do have Microsoft Office installed on their
 computers and know how to use the different products.

 I have been very successful in conveying what I am doing for them by
 communicating via Excel spreadsheets.  It is also an important medium
 in dealing with some international companies who provide data via
 Excel and expect responses back via Excel.

 When dealing with data in a tabular form, Excel does provide a way for
 a majority of the people I work with to understand the data.  Yes,
 there are problems with some of the ways that people use Excel, and
 yes I have had to invest time in scrubbing some of the data that I get
 from them, but if I did not, then I would probably not have a job
 working for them.  I use R exclusively for the analysis that I do, but
 find it convenient to use Excel to provide a communication mechanism
 to the majority of the non-R users that I have to deal with.  It is a
 convenient work-around because I would never get them to invest the
 time to learn R.

 So in the real world these is a need to Excel and we are not going to
 cause it to go away; we have to learn how to live with it, and from my
 standpoint, it has definitely benefited me in being able to
 communicate with my users and continuing to provide them with results
 that they are happy with.  They refer to letting me work my magic on
 the data; all they know is they see the result via Excel and in the
 background R is doing the heavy lifting that they do not have to know
 about.

 On Wed, Feb 29, 2012 at 4:41 PM, Rolf Turner rolf.tur...@xtra.co.nz wrote:
 On 01/03/12 04:43, John Kane wrote:

 (mydata- as.factor(c(1,2,3, 2, 5, 2)))
 str(mydata)

 newdata- as.character(mydata)

 newdata[newdata==2]- 0
 newdata- as.numeric(newdata)
 str(newdata)

 We really need to keep Excel (and other spreadsheets) out of peoples
 hands.


 Amen, bro'!!!

    cheers,

        Rolf Turner

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



 --
 Jim Holtman
 Data Munger Guru

 What is the problem that you are trying to solve?
 Tell me what you want to do, not how you want to do it.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Gregory (Greg) L. Snow Ph.D.
538...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Statistical Histograms in R

2012-03-02 Thread Jim Lemon

On 03/02/2012 11:49 PM, SMcG wrote:

Hi,

I'm wondering if anybody could possibly help me? I have a table with
5 tab-delimited columns. Each column has 'e-value' scores for 5
different proteins.

I'd like to plot a distribution curve using hist() for the 5
different proteins and show the 5 distribution curves on the same
graph in different colours. In the case, E-values will be the X-axis
and frequency will be the Y-axis.

Is this at all possible?


Hi SMcG,
Have a look at the last example for the barp function (plotrix). This
may be what you want.

Jim

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] convert list to text file

2012-03-02 Thread Greg Snow
Or

lapply(LIST, cat, file='outtext.txt', append=TRUE)

On Thu, Mar 1, 2012 at 6:20 AM, R. Michael Weylandt
michael.weyla...@gmail.com wrote:
 Perhaps something like

 sink(outtext.txt)
 lapply(LIST, print)
 sink()

 You could replace print with cat and friends if you wanted more
 detailed control over the look of the output.

 Michael

 On Thu, Mar 1, 2012 at 5:28 AM,  t.galesl...@ebh.umcn.nl wrote:
 Dear R users,

 Is it possible to write the following list to a text-file?

 List:

 [[1]]
 [1] 500

 [[2]]
 [1] 1

 [[3]]
    [,1] [,2] [,3] [,4] [,5]
 FID    1    2    3    4    5
 Var    2    0    2    1    1

 I would like to have the textfile look like this:

 500
 1
 FID 1 2 3 4 5
 Var 2 0 2 1 1

 Thank you very much in advance for your help!

 Kind regards,

 Tessel Galesloot
 Department of Epidemiology, Biostatistics and HTA (133)
 Radboud University Nijmegen Medical Centre



 Het UMC St Radboud staat geregistreerd bij de Kamer van Koophandel in het 
 handelsregister onder nummer 41055629.
 The Radboud University Nijmegen Medical Centre is listed in the Commercial 
 Register of the Chamber of Commerce under file number 41055629.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Gregory (Greg) L. Snow Ph.D.
538...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Computing line= for mtext

2012-03-02 Thread Frank Harrell
Thanks Elai.  axis(2) looks like a good approach.   I think the way to solve
for the pos= argument is to use:

usr - par('usr'); plt - par('plt')
usr[2] + (usr[2] - usr[1])/(plt[2] - plt[1]) *  (1 - plt[2])

I think pos should have only one element.

Thanks for your help,
Frank


ilai-2 wrote
 
 On Fri, Mar 2, 2012 at 1:17 PM, Frank Harrell lt;f.harrell@gt; wrote:
 Hi Rich and Peter,

 What I am trying to do is the right-justify a vector of numbers to the
 right
 of the y-axis so that the leftmost digit of all of the numbers is one
 character to the right of the axis line.  axis() plots tick marks and
 
 No it doesn't (not always)
 
 left-justifies the numbers.
 
 For axis(4). If you really want right justify on the right hand side,
 you could use axis(2) with negative line numbers (device size
 dependent). Or you could try something like
 
 plot(1:20)
 axis(2,at=seq(1,20,4),labels=T,tick=F,las=T,pos=c(22.25,1))
 
 Hope this is getting there (after only 6 messages...)
 
 Elai
 

 Peter's idea:
 -
 Since you're setting your right margin to 5, why
 not just

 mtext(s, side=4, las=1, at=5, adj=1, line = 5, cex=1)
 mtext(s, side=4, las=1, at=7, adj=1, line = 5,  cex=2)

 i.e. set the line argument to the right margin?
 -
 comes close to what I'm trying to do but I haven't found out how to
 compute
 line from the maximum width over all the strings in the vector being
 plotted vertically, and I'm not sure it scales properly for different
 cex=.
 strwidth(s, units='inches')/par('cin')[1] doesn't seem to be a complete
 solution.

 Thanks
 Frank


 Richard M. Heiberger wrote

 Frank,

 this is it.  It uses Peter's idea.

  plot(1:10)
  axis(side=2, 1:10, las=1, line=-31.5, lwd=0)
  axis(side=4, 1:10, las=1, labels=FALSE)

 Rich


 On Thu, Mar 1, 2012 at 6:52 PM, Frank Harrell lt;f.harrell@gt;wrote:

 Rich's pointers deals with lattice/grid graphics.  Does anyone have a
 solution for base graphics?
 Thanks
 Frank

 Richard M. Heiberger wrote
 
  Frank,
 
  This can be done directly with a variant of the panel.axis function.
  See function panel.axis.right in the HH package.  This was provided
 for
 me
  by David Winsemius in response to my query on this list in October
 2011
  https://stat.ethz.ch/pipermail/r-help/2011-October/292806.html
 
  The email thread also includes comments by Deepayan Sarkar and Paul
  Murrell.
 
  Rich
 
  On Wed, Feb 29, 2012 at 8:48 AM, Frank Harrell
 lt;f.harrell@gt;wrote:
 
  I want to right-justify a vector of numbers in the right margin of a
  low-level plot.  For this I need to compute the line parameter to
 give
 to
  mtext.  Is this the correct scalable calculation?
 
  par(mar=c(4,3,1,5)); plot(1:20)
  s - 'abcde'; w=strwidth(s, units='inches')/par('cin')[1]
  mtext(s, side=4, las=1, at=5, adj=1, line=w-.5, cex=1)
  mtext(s, side=4, las=1, at=7, adj=1, line=2*(w-.5), cex=2)
 
  Thanks
  Frank
 
  -
  Frank Harrell
  Department of Biostatistics, Vanderbilt University
  --
  View this message in context:
 
 http://r.789695.n4.nabble.com/Computing-line-for-mtext-tp4431554p4431554.html
  Sent from the R help mailing list archive at Nabble.com.
 
  __
  R-help@ mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 
 http://www.R-project.org/posting-guide.htmllt;http://www.r-project.org/posting-guide.htmlgtlt;http://www.r-project.org/posting-guide.htmlamp;lt;http://www.r-project.org/posting-guide.htmlamp;gtgt;
 ;
  and provide commented, minimal, self-contained, reproducible code.
 
 
        [[alternative HTML version deleted]]
 
  __
  R-help@ mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 
 http://www.R-project.org/posting-guide.htmllt;http://www.r-project.org/posting-guide.htmlgt;
  and provide commented, minimal, self-contained, reproducible code.
 


 -
 Frank Harrell
 Department of Biostatistics, Vanderbilt University
 --
 View this message in context:
 http://r.789695.n4.nabble.com/Computing-line-for-mtext-tp4431554p4436923.html
  Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@ mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.htmllt;http://www.r-project.org/posting-guide.htmlgt;
 and provide commented, minimal, self-contained, reproducible code.


       [[alternative HTML version deleted]]

 __
 R-help@ mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


 -
 Frank Harrell
 Department of Biostatistics, Vanderbilt University
 --
 View this message in context:
 

Re: [R] Cleaning up messy Excel data

2012-03-02 Thread Jim Lemon
Unfortunately, a lot of people who use MS Office don't have or know how 
to use MS Access. Where I work now (as in the past) I have to tie 
someone to their chair, give them a few pokes with the cattle prod and 
then show them that a CSV file will load straight into Excel before I 
can convince them that they can use such a heretical data format. You 
don't want to know what I have to do to convince them that they can view 
my listings in HTML.


Jim

PS - Always give them a _copy_ of the CSV file.

On 03/03/2012 10:41 AM, Greg Snow wrote:

Try sending your clients a data set (data frame, table, etc) as an MS
Access data table instead.  They can still view the data as a table,
but will have to go to much more effort to mess up the data, more
likely they will do proper edits without messing anything up (mixing
characters in with numbers, have more sexes than your biology teacher
told you about, add extra lines at top or bottom that makes reading
back into R more difficult, etc.)

I have had a few clients that I talked into using MS Access from the
start to enter their data, there was often a bit of resistance at
first, but once they tried it and went through the process of
designing the database up front they ended up thanking me and believed
that the entire data entry process was easier and quicker than had the
used excel as they originally planned.

Access is still part of MS office, so they don't need to learn R or in
any way break their chains from being prisoners of bill, but they will
be more productive in more ways than just interfacing with you.

Access (databases in general) force you to plan things out and do the
correct thing from the start.  It is possible to do the right thing in
Excel, but Excel does not encourage (let alone force) you to do the
right thing, but makes it easy to do the wrong thing.

On Thu, Mar 1, 2012 at 6:15 AM, jim holtmanjholt...@gmail.com  wrote:

But there are some important reasons to use Excel.  In my work there
are a lot of people that I have to send the equivalent of a data.frame
to who want to look at the data and possibly slice/dice the data
differently and then send back to me updates.  These folks do not know
how to use R, but do have Microsoft Office installed on their
computers and know how to use the different products.

I have been very successful in conveying what I am doing for them by
communicating via Excel spreadsheets.  It is also an important medium
in dealing with some international companies who provide data via
Excel and expect responses back via Excel.

When dealing with data in a tabular form, Excel does provide a way for
a majority of the people I work with to understand the data.  Yes,
there are problems with some of the ways that people use Excel, and
yes I have had to invest time in scrubbing some of the data that I get
from them, but if I did not, then I would probably not have a job
working for them.  I use R exclusively for the analysis that I do, but
find it convenient to use Excel to provide a communication mechanism
to the majority of the non-R users that I have to deal with.  It is a
convenient work-around because I would never get them to invest the
time to learn R.

So in the real world these is a need to Excel and we are not going to
cause it to go away; we have to learn how to live with it, and from my
standpoint, it has definitely benefited me in being able to
communicate with my users and continuing to provide them with results
that they are happy with.  They refer to letting me work my magic on
the data; all they know is they see the result via Excel and in the
background R is doing the heavy lifting that they do not have to know
about.

On Wed, Feb 29, 2012 at 4:41 PM, Rolf Turnerrolf.tur...@xtra.co.nz  wrote:

On 01/03/12 04:43, John Kane wrote:


(mydata- as.factor(c(1,2,3, 2, 5, 2)))
str(mydata)

newdata- as.character(mydata)

newdata[newdata==2]- 0
newdata- as.numeric(newdata)
str(newdata)

We really need to keep Excel (and other spreadsheets) out of peoples
hands.



Amen, bro'!!!

cheers,

Rolf Turner



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Calculation of standard error for a function

2012-03-02 Thread Duncan Murdoch

On 12-03-02 4:47 PM, Jun Shen wrote:

Dear list,

If I know the standard error for k1 and k2, is there anything I can call in
R to calculate the standard error of k1/k2? Thanks.


No, because it depends on the joint distribution of k1 and k2.  Even if 
you knew they were independent, that would not be sufficient (though you 
could use the delta method to get an approximation in that case; look it 
up).


Duncan Murdoch



Jun

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Computing line= for mtext

2012-03-02 Thread baptiste auguie
Hi,

If you're going to use different text sizes and convert between units,
it might be easier to do the calculations with grid.

par(mar=c(1,1,1,5))
plot(1:10)
labels = c(1, 2, 10, 123, 3.141592653589, 1.2, 2)
sizes = c(1, 1, 2, 1, 0.4, 1, 3) # cex of individual labels

## pure base graphics
max_width_base = do.call(max, mapply(function(l, size) strwidth(l,
cex=size, units = inches), l=labels, size=sizes, SIMPLIFY=FALSE))

## calculations with grid graphics
max_width_grid = grid::convertUnit(do.call(max, mapply(function(l,
size) grid::grobWidth(grid::textGrob(l, gp=grid::gpar(cex=size))),
l=labels, size=sizes, SIMPLIFY=FALSE)), in, valueOnly=TRUE)

all.equal(max_width_base, max_width_grid)

## add one line
final_width = grid::convertUnit( grid::unit(max_width_base,in) +
 grid::unit(1,lines),
lines, valueOnly=TRUE)

mapply(function(l, size, ii) mtext(l, side=4, at=ii, las=1, adj=1,
line=final_width, cex=size), l=labels, size=sizes,
ii=seq_along(labels)) - b.quiet


HTH,

b.

On 3 March 2012 09:17, Frank Harrell f.harr...@vanderbilt.edu wrote:
 Hi Rich and Peter,

 What I am trying to do is the right-justify a vector of numbers to the right
 of the y-axis so that the leftmost digit of all of the numbers is one
 character to the right of the axis line.  axis() plots tick marks and
 left-justifies the numbers.

 Peter's idea:
 -
 Since you're setting your right margin to 5, why
 not just

 mtext(s, side=4, las=1, at=5, adj=1, line = 5, cex=1)
 mtext(s, side=4, las=1, at=7, adj=1, line = 5,  cex=2)

 i.e. set the line argument to the right margin?
 -
 comes close to what I'm trying to do but I haven't found out how to compute
 line from the maximum width over all the strings in the vector being
 plotted vertically, and I'm not sure it scales properly for different cex=.
 strwidth(s, units='inches')/par('cin')[1] doesn't seem to be a complete
 solution.

 Thanks
 Frank


 Richard M. Heiberger wrote

 Frank,

 this is it.  It uses Peter's idea.

  plot(1:10)
  axis(side=2, 1:10, las=1, line=-31.5, lwd=0)
  axis(side=4, 1:10, las=1, labels=FALSE)

 Rich


 On Thu, Mar 1, 2012 at 6:52 PM, Frank Harrell lt;f.harrell@gt;wrote:

 Rich's pointers deals with lattice/grid graphics.  Does anyone have a
 solution for base graphics?
 Thanks
 Frank

 Richard M. Heiberger wrote
 
  Frank,
 
  This can be done directly with a variant of the panel.axis function.
  See function panel.axis.right in the HH package.  This was provided for
 me
  by David Winsemius in response to my query on this list in October 2011
  https://stat.ethz.ch/pipermail/r-help/2011-October/292806.html
 
  The email thread also includes comments by Deepayan Sarkar and Paul
  Murrell.
 
  Rich
 
  On Wed, Feb 29, 2012 at 8:48 AM, Frank Harrell lt;f.harrell@gt;wrote:
 
  I want to right-justify a vector of numbers in the right margin of a
  low-level plot.  For this I need to compute the line parameter to give
 to
  mtext.  Is this the correct scalable calculation?
 
  par(mar=c(4,3,1,5)); plot(1:20)
  s - 'abcde'; w=strwidth(s, units='inches')/par('cin')[1]
  mtext(s, side=4, las=1, at=5, adj=1, line=w-.5, cex=1)
  mtext(s, side=4, las=1, at=7, adj=1, line=2*(w-.5), cex=2)
 
  Thanks
  Frank
 
  -
  Frank Harrell
  Department of Biostatistics, Vanderbilt University
  --
  View this message in context:
 
 http://r.789695.n4.nabble.com/Computing-line-for-mtext-tp4431554p4431554.html
  Sent from the R help mailing list archive at Nabble.com.
 
  __
  R-help@ mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 
 http://www.R-project.org/posting-guide.htmllt;http://www.r-project.org/posting-guide.htmlgtlt;http://www.r-project.org/posting-guide.htmlamp;lt;http://www.r-project.org/posting-guide.htmlamp;gtgt;
 ;
  and provide commented, minimal, self-contained, reproducible code.
 
 
        [[alternative HTML version deleted]]
 
  __
  R-help@ mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 
 http://www.R-project.org/posting-guide.htmllt;http://www.r-project.org/posting-guide.htmlgt;
  and provide commented, minimal, self-contained, reproducible code.
 


 -
 Frank Harrell
 Department of Biostatistics, Vanderbilt University
 --
 View this message in context:
 http://r.789695.n4.nabble.com/Computing-line-for-mtext-tp4431554p4436923.html
  Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@ mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.htmllt;http://www.r-project.org/posting-guide.htmlgt;
 and provide commented, minimal, self-contained, reproducible code.


       [[alternative HTML version deleted]]

 __
 R-help@ mailing list
 

Re: [R] fridays date to date

2012-03-02 Thread Greg Snow
If you know that your first date is a Friday then you can use seq with
by=7 day, then you don't need to post filter the vector.

On Thu, Mar 1, 2012 at 1:40 PM, Ben quant ccqu...@gmail.com wrote:
 Great thanks!

 ben

 On Thu, Mar 1, 2012 at 1:30 PM, Marc Schwartz marc_schwa...@me.com wrote:

 On Mar 1, 2012, at 2:02 PM, Ben quant wrote:

  Hello,
 
  How do I get the dates of all Fridays between two dates?
 
  thanks,
 
  Ben


 Days - seq(from = as.Date(2012-03-01),
            to = as.Date(2012-07-31),
            by = day)

  str(Days)
  Date[1:153], format: 2012-03-01 2012-03-02 2012-03-03 2012-03-04
 ...

 # See ?weekdays

  Days[weekdays(Days) == Friday]
  [1] 2012-03-02 2012-03-09 2012-03-16 2012-03-23 2012-03-30
  [6] 2012-04-06 2012-04-13 2012-04-20 2012-04-27 2012-05-04
 [11] 2012-05-11 2012-05-18 2012-05-25 2012-06-01 2012-06-08
 [16] 2012-06-15 2012-06-22 2012-06-29 2012-07-06 2012-07-13
 [21] 2012-07-20 2012-07-27

 HTH,

 Marc Schwartz



        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Gregory (Greg) L. Snow Ph.D.
538...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Connecting points on a line with arcs/curves

2012-03-02 Thread Greg Snow
?xspline

On Thu, Mar 1, 2012 at 8:15 AM, hendersi ir...@cam.ac.uk wrote:

 Hello,

 I have a spreadsheet of pairs of coordinates and I would like to plot a line
 along which curves/arcs connect each pair of coordinates. The aim is to
 visualise the pattern of point connections.

 Thanks! Ian

 --
 View this message in context: 
 http://r.789695.n4.nabble.com/Connecting-points-on-a-line-with-arcs-curves-tp4435247p4435247.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Gregory (Greg) L. Snow Ph.D.
538...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] problem with sum function

2012-03-02 Thread Greg Snow
Others explained why it happens, but you might want to look at the
zapsmall function for one way to deal with it.

On Thu, Mar 1, 2012 at 2:49 PM, Mark A. Albins kamoko...@gmail.com wrote:
 Hi!

 I'm running R version 2.13.0 (2011-04-13)
 Platform: i386-pc-mingw32/i386 (32-bit)

 When i type in the command:

 sum(c(-0.2, 0.8, 0.8, -3.2, 1.8))

 R returns the value:

 -5.551115e-17

 Why doesn't R return zero in this case?  There shouldn't be any rounding
 error in a simple sum.

 Thanks,

 Mark

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Gregory (Greg) L. Snow Ph.D.
538...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Calculation of standard error for a function

2012-03-02 Thread David Winsemius


On Mar 2, 2012, at 7:05 PM, Duncan Murdoch wrote:


On 12-03-02 4:47 PM, Jun Shen wrote:

Dear list,

If I know the standard error for k1 and k2, is there anything I can  
call in

R to calculate the standard error of k1/k2? Thanks.


No, because it depends on the joint distribution of k1 and k2.  Even  
if you knew they were independent, that would not be sufficient  
(though you could use the delta method to get an approximation in  
that case; look it up).


A nice article with useful information on three approaches to this  
problem appeared in BMC Medical Research Methodoogy:


Methods for confidence interval estimation of a ratio parameter with  
application to location quotients, by Beyene and Moineddin.

http://www.biomedcentral.com/1471-2288/5/32

Both were at Department of Public Health Science, University of  
Toronto, Toronto, Ontario, Canada, when this appeared in 2005. I  
thought they might have been neighbors of yours, Duncan, but I looked  
at a map and see that my understanding of Ontario geography is not  
particularly accurate.




Duncan Murdoch



Jun

--
David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Cleaning up messy Excel data

2012-03-02 Thread Rolf Turner

On 03/03/12 12:41, Greg Snow wrote:

SNIP

It is possible to do the right thing in
Excel, but Excel does not encourage (let alone force) you to do the
right thing, but makes it easy to do the wrong thing.

SNIP

Fortune!

cheers,

Rolf Turner

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Count matches in pmatch

2012-03-02 Thread Rui Barradas
Hello,

Where is the reproducible example?



apricum wrote
 
 Hi,
 I need to find number of occurence of each word from one string in other
 string. So I need a function, which is similar to pmatch, but returns not
 references, but number of matches.  Is there any function like this? If
 no, that is the way to calculate what I need?
 

Anyway, this might help.


set.seed(1)
x - paste(sample(letters[1:4], 10, T), collapse =  )
x

x2 - unlist(strsplit(x,  ))
xmatch - pmatch(x2, c(auvw, bxyz), duplicates.ok = TRUE)

# Several ways to count matches
tapply(xmatch, xmatch, length)
aggregate(xmatch, list(xmatch), length)
by(xmatch, xmatch, length)


Rui Barradas


--
View this message in context: 
http://r.789695.n4.nabble.com/Count-matches-in-pmatch-tp4439722p4440316.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] times series trellis plot

2012-03-02 Thread Gabor Grothendieck
On Fri, Mar 2, 2012 at 5:15 PM, sluedtke slued...@gfz-potsdam.de wrote:
 Dear List,

 I am struggling with the trellis graphic. A similar problem was mentioned
 here:

 http://r.789695.n4.nabble.com/R-How-can-you-get-N-replicates-of-a-multi-screen-multivariate-time-series-plot-td811850.html


 I do have 2 time series data sets. The 2 time series differ in some orders
 of magnitude. I managed to plot them into 1 graph, but, since the data is
 that different, on of the data set appears as a line only, well almost. So I
 would need to set a second y axis before, that is scaled to the second data
 set.

 I know how to do it with the usual plot.window routine or something similar,
 but not with the trellis graphic of the lattice package.

xyplot.zoo in the zoo package is a version of lattice's xyplot
specialized to work with time series.  See the help file and also the
examples in it.

-- 
Statistics  Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Correlation of huge matrix saved as binary file

2012-03-02 Thread Bryo
Hi,

I have a 900,000,000*9,000 matrix where I need to calculate the correlation
between all entries along the smaller dimension, thus creating a 9k*9k
correlation matrix. This matrix is too big to be uploaded in R, and is saved
as a binary file. To access the data in the file I use mmap and some
api-functions (to get all values in one row, one column, or one particular
value). I'm looking for some advice in how to calculate the correlation
matrix. Right now my approach is to do something similar to this (toy code):

corr.matrix-matrix('numeric',ncol=9000,nrow=9000)

for (i in 1:9000) {
for (j in (i+1):9000) {
# i1=... getting the index of  item (i) in a second file
# i2=getting the index of item (j)
g1=api$getCol(i1)
g2=api$getCol(i2)
cor.matrix[i,j]=cor(g1,g2)
}}

This will work, but will take forever. Any advice for how this can be done
more efficiently? I'm running on a 2.6.18 linux system, with R version
R-2.11.1.

Thanks!


--
View this message in context: 
http://r.789695.n4.nabble.com/Correlation-of-huge-matrix-saved-as-binary-file-tp4440119p4440119.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Calculation of standard error for a function

2012-03-02 Thread Rolf Turner

On 03/03/12 13:35, David Winsemius wrote:


On Mar 2, 2012, at 7:05 PM, Duncan Murdoch wrote:


On 12-03-02 4:47 PM, Jun Shen wrote:

Dear list,

If I know the standard error for k1 and k2, is there anything I can 
call in

R to calculate the standard error of k1/k2? Thanks.


No, because it depends on the joint distribution of k1 and k2.  Even 
if you knew they were independent, that would not be sufficient 
(though you could use the delta method to get an approximation in 
that case; look it up).


A nice article with useful information on three approaches to this 
problem appeared in BMC Medical Research Methodoogy:


Methods for confidence interval estimation of a ratio parameter with 
application to location quotients, by Beyene and Moineddin.

http://www.biomedcentral.com/1471-2288/5/32


SNIP

Chapter 6 of Beyond Anova by Rupert G. Miller, Chapman  Hall, 1997, 
might also

be of interest.

cheers,

Rolf Turner

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Cleaning up messy Excel data

2012-03-02 Thread jim holtman
Unfortunately they only know how to use Excel and Word.  They are not
folks who use a computer every day.  Many of them run factories or
warehouses and asking them to use something like Access would not
happen in my lifetime (I have retired twice already).

I don't have any problems with them messing up the data that I send
them; they are pretty good about making changes within the context of
the spreadsheet.  The other issue is that I working with people in
twenty different locations spread across the US, so I might be able to
one of them to use Access (there is one I know that uses it), but that
leaves 19 other people I would not be able to communicate with.

The other thing is, is that I use Excel myself to slice/dice data
since there are things that are easier in Excel than R (believe it or
not).  There are a number of tools I keep in my toolkit, and R is
probably the most important, but I have not thrown the rest of them
away since they still serve a purpose.

So if you can come up with a way to 20 diverse groups, who are not
computer literate, to change over in a couple of days from Excel to
Access let me know.  BTW, I tried to use Access once and gave it up
because it was not as intuitive as some other tools and did not give
me any more capability than the ones I was using.  So I know I would
have a problem in convincing other to make the change just so they
could communicate with me, while they still had to use Excel to most
of their other interfaces.

This is the real world where you have to learn how to adapt to your
environment and make the best of it.  So you just have to learn that
Excel can be your friend (or at least not your enemy) and can serve a
very useful purpose in getting your ideas across to other people.

On Fri, Mar 2, 2012 at 6:41 PM, Greg Snow 538...@gmail.com wrote:
 Try sending your clients a data set (data frame, table, etc) as an MS
 Access data table instead.  They can still view the data as a table,
 but will have to go to much more effort to mess up the data, more
 likely they will do proper edits without messing anything up (mixing
 characters in with numbers, have more sexes than your biology teacher
 told you about, add extra lines at top or bottom that makes reading
 back into R more difficult, etc.)

 I have had a few clients that I talked into using MS Access from the
 start to enter their data, there was often a bit of resistance at
 first, but once they tried it and went through the process of
 designing the database up front they ended up thanking me and believed
 that the entire data entry process was easier and quicker than had the
 used excel as they originally planned.

 Access is still part of MS office, so they don't need to learn R or in
 any way break their chains from being prisoners of bill, but they will
 be more productive in more ways than just interfacing with you.

 Access (databases in general) force you to plan things out and do the
 correct thing from the start.  It is possible to do the right thing in
 Excel, but Excel does not encourage (let alone force) you to do the
 right thing, but makes it easy to do the wrong thing.

 On Thu, Mar 1, 2012 at 6:15 AM, jim holtman jholt...@gmail.com wrote:
 But there are some important reasons to use Excel.  In my work there
 are a lot of people that I have to send the equivalent of a data.frame
 to who want to look at the data and possibly slice/dice the data
 differently and then send back to me updates.  These folks do not know
 how to use R, but do have Microsoft Office installed on their
 computers and know how to use the different products.

 I have been very successful in conveying what I am doing for them by
 communicating via Excel spreadsheets.  It is also an important medium
 in dealing with some international companies who provide data via
 Excel and expect responses back via Excel.

 When dealing with data in a tabular form, Excel does provide a way for
 a majority of the people I work with to understand the data.  Yes,
 there are problems with some of the ways that people use Excel, and
 yes I have had to invest time in scrubbing some of the data that I get
 from them, but if I did not, then I would probably not have a job
 working for them.  I use R exclusively for the analysis that I do, but
 find it convenient to use Excel to provide a communication mechanism
 to the majority of the non-R users that I have to deal with.  It is a
 convenient work-around because I would never get them to invest the
 time to learn R.

 So in the real world these is a need to Excel and we are not going to
 cause it to go away; we have to learn how to live with it, and from my
 standpoint, it has definitely benefited me in being able to
 communicate with my users and continuing to provide them with results
 that they are happy with.  They refer to letting me work my magic on
 the data; all they know is they see the result via Excel and in the
 background R is doing the heavy lifting that they do not have to know

Re: [R] Correlation of huge matrix saved as binary file

2012-03-02 Thread Peter Langfelder
I don't think you can speed it up by a whole lot... but you can try a
few things, especially if you don't have missing data in the matrix
(which you probably don't). The main question is what takes most of
the time- the api calls or the cor() call? If it's cor, here's what
you can try:

1. Pre-standardize the entire matrix input matrix, i.e. scale each
column to mean=0 and sum of squares=1. Save the standardized matrix
(or make sure it's available to api). Since your matrix only has 9000
columns, this should not take extremely long.

2. Instead of calculating correlations, calculate simply sum(g1*g2) -
if g1 and g2 are standardized as above, correlation equals sum(g1*g2).

3. Instead of calculating the correlations one-by-one, calculate them
in small blocks (if you have enough memory and you run a 64-bit R).
With 900M rows, you will only be able to put a 900Mx2 into an R
object, but if you have two such standardized matrices loaded in g1,
g2, you can get their (2x2) correlation matrix by t(g1) %*% g2. This
2x2 matrix you can use to fill the appropriate components of the
result matrix.

4. Use one of the multi-threading packages (multicore comes to mind
but there are others) to parallelize your code. If you have 8
available cores, you can expect a nearly 8x speedup.

All in all, this will probably still take forever, but should be one
or two orders of magnitude faster than your current code :)

HTH,

Peter

On Fri, Mar 2, 2012 at 2:50 PM, Bryo bryne...@gmail.com wrote:
 Hi,

 I have a 900,000,000*9,000 matrix where I need to calculate the correlation
 between all entries along the smaller dimension, thus creating a 9k*9k
 correlation matrix. This matrix is too big to be uploaded in R, and is saved
 as a binary file. To access the data in the file I use mmap and some
 api-functions (to get all values in one row, one column, or one particular
 value). I'm looking for some advice in how to calculate the correlation
 matrix. Right now my approach is to do something similar to this (toy code):

 corr.matrix-matrix('numeric',ncol=9000,nrow=9000)

 for (i in 1:9000) {
 for (j in (i+1):9000) {
 # i1=... getting the index of  item (i) in a second file
 # i2=getting the index of item (j)
 g1=api$getCol(i1)
 g2=api$getCol(i2)
 cor.matrix[i,j]=cor(g1,g2)
 }}

 This will work, but will take forever. Any advice for how this can be done
 more efficiently? I'm running on a 2.6.18 linux system, with R version
 R-2.11.1.

 Thanks!

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] pasting several things

2012-03-02 Thread G See
Try this:

x - structure(list(day = 19, C1 = structure(1L, .Label = c(, C1
), class = factor), C2 = structure(2L, .Label = c(, C2), class =
factor),
   C3 = structure(1L, .Label = c(, C3), class = factor),
   Q1 = structure(2L, .Label = c(, Q1), class = factor),
   Q2 = structure(2L, .Label = c(, Q2), class = factor),
   Q3 = structure(1L, .Label = c(, Q3), class = factor)), .Names =
c(day,
C1, C2, C3, Q1, Q2, Q3), row.names = 8, class = data.frame)

paste(x[1, 1], do.call(paste, c(x[1, x != ][, -1], list(sep=_))), sep= -)

# Output looks like this
 paste(x[1, 1], do.call(paste, c(x[1, x != ][, -1], list(sep=_))), sep= 
 -)
[1] 19 -C2_Q1_Q2
 x[1, 2] - C1
 paste(x[1, 1], do.call(paste, c(x[1, x != ][, -1], list(sep=_))), sep= 
 -)
[1] 19 -C1_C2_Q1_Q2

HTH,
Garrett

On Fri, Mar 2, 2012 at 2:39 PM, chuck.01 charliethebrow...@gmail.com wrote:
 I have this type of format:

 structure(list(day = 19, C1 = structure(1L, .Label = c(, C1
 ), class = factor), C2 = structure(2L, .Label = c(, C2), class =
 factor),
    C3 = structure(1L, .Label = c(, C3), class = factor),
    Q1 = structure(2L, .Label = c(, Q1), class = factor),
    Q2 = structure(2L, .Label = c(, Q2), class = factor),
    Q3 = structure(1L, .Label = c(, Q3), class = factor)), .Names =
 c(day,
 C1, C2, C3, Q1, Q2, Q3), row.names = 8, class = data.frame)


 and want something like this:

  19 -C2 _Q1_Q2

 any ideas?  obviously I could use paste() and get this.  Keep in mind I have
 many of these and the presence of C1, C2, ... etc will vary.


 Thanks.

 --
 View this message in context: 
 http://r.789695.n4.nabble.com/pasting-several-things-tp4439770p4439770.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Strategies to deal with unbalanced classification data in randomForest

2012-03-02 Thread Sam Albers
Hello all,

I have become somewhat confused with options available for dealing
with a highly unbalanced data set (1 in one class, 50 in the
other). As a summary I am unsure:

a) if I am perform the two class weighting methods properly,
b) if the data are too unbalanced and that this type of analysis is
appropriate and
c) if there is any interaction between the weighting for class
imbalances and number of trees in a forest.

An example will illustrate this best. Say I have a data set like the following:

df - rbind(
data.frame(var1=runif(1, 10, 50),
   var2=runif(1, -3, 3),
   var3=runif(1, 0.1, 0.25),
   cls=factor(CLASS-1)
   ),
data.frame(var1=runif(50, 10, 50),
   var2=runif(50, 2, 7),
   var3=runif(50, 0.2, 0.35),
   cls=factor(CLASS-2)
   )
)

## Where the response vector is highly imbalanced like so:
summary(df$cls)

library(randomForest)
set.seed(17)

## Now the obviously an extreme case but I am wondering what the
options are to deal with something like this.
## The problem with this situation manifests itself when I try to
train a random forest
## without accounting for this imbalance

df.rf-randomForest(cls~var1+var2+var3, data=df,importance=TRUE)

## Now one option is to down sample the majority variable. However, I
can seem to find exactly
## how to do this. Does this seem correct?

df.rf.downsamp -randomForest(cls~var1+var2+var3,
data=df,sampsize=c(50,50), importance=TRUE)
## 50 being the number of observations in the minority variable

## The other option which there seems to be some confusion over is
establish some class weights
## to balance the error rate. This approach I've mostly drawn from here:
## http://stat-www.berkeley.edu/users/breiman/RandomForests/cc_home.htm#balance
## This might not be appropriate, however, as of September it looks
like Breiman method wasn't used in R
df.rf.weights-randomForest(cls~var1+var2+var3, data=df,classwt=c(1,
600), importance=TRUE)

## Nevertheless, what I am concerned about is the effect of an
unbalanced data set has on my randomForest model
## For example:

par(mfrow=c(1,3))
plot(df.rf)
plot(df.rf.downsamp)
plot(df.rf.weights)

presents three very different scenarios and I having trouble resolving
the issues I mentioned above. I am extremely grateful for all the work
that has been done on randomForests in R up to this point. I was
hoping that someone, with more experience, might be able to advise
what the best strategy is to deal with this problem. Which of these
approaches are best and am I using them right?

Thanks so much in advance for any help.

Sam


 sessionInfo()
R version 2.14.2 (2012-02-29)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_CA.UTF-8   LC_NUMERIC=C
LC_TIME=en_CA.UTF-8
 [4] LC_COLLATE=en_CA.UTF-8 LC_MONETARY=en_CA.UTF-8
LC_MESSAGES=en_CA.UTF-8
 [7] LC_PAPER=C LC_NAME=C
LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=en_CA.UTF-8
LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

loaded via a namespace (and not attached):
[1] ggplot2_0.8.9 plyr_1.7.1tools_2.14.2

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Grouped barchart confidence intervals in lattice

2012-03-02 Thread Nathan Lemoine
Hi everyone,

I'm having trouble adding error bars to a grouped barchart in lattice. I know 
that this topic has been addressed quite a bit, as I've been searching the 
internet for a while to try to troubleshoot the issue, but I've not been able 
to find any solution that I could get working on my data. I was wondering if 
someone could look at my code and tell me what I'm doing wrong. I was hoping 
somebody's found a way to do this (I'm sure they have) and can tell me how to 
fix my code.

# Input example data

growth - c(6.6,7.2,6.9,8.3,7.9,9.2,8.3,8.7,8.1,8.5,9.1,9.0)
diet - as.factor(rep(c(A,B,C),2,each=2))
coat - as.factor(rep(c(light,dark),each=6))

growth.means - aggregate(growth,list(coat,diet),mean)

library(plotrix)

growth.errs - aggregate(growth,list(coat,diet),std.error)

# Try using the superpose call with panel.groups results in an error

panel.ci - function(x, y, subscripts, groups...){
 panel.barchart(x, y, groups=groups, subscripts=subscripts, horiz=F,...)
 panel.segments(x[subscripts], y, x[subscripts], y+growth.errs$x, col = 
'black')
 }

barchart(growth~Group.1, groups=Group.2, data=growth.means,
 panel=panel.superpose,
 panel.groups=panel.ci
 )

# Try using the generic plot.barchart command gives three error bars, all are 
the appropriate sizes, but all are centered in each group and not on the 
grouped bars

barchart(x~Group.1, groups=Group.2, data=growth.means,
 panel=function(x,y,subscripts, groups){
   panel.barchart(x,y,horiz=F,groups=groups, subscripts=subscripts)
   
panel.segments(as.numeric(x)[subscripts],y,as.numeric(x)[subscripts],y+growth.errs$x)
 }
 )

What am I doing wrong?

Thanks,

Nathan Lemoine

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Computing line= for mtext

2012-03-02 Thread Greg Snow
I would use the regular text function instead of mtext (remembering to
set par(xpd=...)), then use the grconvertX and grconvertY functions to
find the location to plot at (possibly adding in the results from
strwidth or stheight).

On Thu, Mar 1, 2012 at 4:52 PM, Frank Harrell f.harr...@vanderbilt.edu wrote:
 Rich's pointers deals with lattice/grid graphics.  Does anyone have a
 solution for base graphics?
 Thanks
 Frank

 Richard M. Heiberger wrote

 Frank,

 This can be done directly with a variant of the panel.axis function.
 See function panel.axis.right in the HH package.  This was provided for me
 by David Winsemius in response to my query on this list in October 2011
 https://stat.ethz.ch/pipermail/r-help/2011-October/292806.html

 The email thread also includes comments by Deepayan Sarkar and Paul
 Murrell.

 Rich

 On Wed, Feb 29, 2012 at 8:48 AM, Frank Harrell lt;f.harrell@gt;wrote:

 I want to right-justify a vector of numbers in the right margin of a
 low-level plot.  For this I need to compute the line parameter to give to
 mtext.  Is this the correct scalable calculation?

 par(mar=c(4,3,1,5)); plot(1:20)
 s - 'abcde'; w=strwidth(s, units='inches')/par('cin')[1]
 mtext(s, side=4, las=1, at=5, adj=1, line=w-.5, cex=1)
 mtext(s, side=4, las=1, at=7, adj=1, line=2*(w-.5), cex=2)

 Thanks
 Frank

 -
 Frank Harrell
 Department of Biostatistics, Vanderbilt University
 --
 View this message in context:
 http://r.789695.n4.nabble.com/Computing-line-for-mtext-tp4431554p4431554.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@ mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.htmllt;http://www.r-project.org/posting-guide.htmlgt;
 and provide commented, minimal, self-contained, reproducible code.


       [[alternative HTML version deleted]]

 __
 R-help@ mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



 -
 Frank Harrell
 Department of Biostatistics, Vanderbilt University
 --
 View this message in context: 
 http://r.789695.n4.nabble.com/Computing-line-for-mtext-tp4431554p4436923.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Gregory (Greg) L. Snow Ph.D.
538...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Memory issue. XXXX

2012-03-02 Thread Prof Brian Ripley

On 02/03/2012 23:36, steven mosher wrote:

1. How much RAM do you have (looks like 2GB ) . If you have more than 2GB
then you can allocate
 more memory with memory.size()


Actually, this looks like 32-bit Windows (unstated), so you cannot.  See 
the rw-FAQ for things your sysadmin can do even there.



2. If you have 2GB or less then you have a couple options

 a) make sure your session is clean of unnecessary objects.
 b) Dont read in all the data if you dont need to ( see colClasses  to
control this )
 c) use the bigmemory package or ff package
 d) buy more RAM


Most importantly, use a 64-bit OS to get a larger real address space. 
(bigmemory and ff are mainly palliative measures for those whose OS does 
not provide a good implementation of out-of-memory objects).




On Fri, Mar 2, 2012 at 6:57 AM, Dan Abnerdan.abne...@gmail.com  wrote:


Hi everyone,

Any ideas on troubleshooting this memory issue:


d1-read.csv(arrears.csv)

Error: cannot allocate vector of size 77.3 Mb
In addition: Warning messages:
1: In class(data)- data.frame :
  Reached total allocation of 1535Mb: see help(memory.size)
2: In class(data)- data.frame :
  Reached total allocation of 1535Mb: see help(memory.size)
3: In class(data)- data.frame :
  Reached total allocation of 1535Mb: see help(memory.size)
4: In class(data)- data.frame :
  Reached total allocation of 1535Mb: see help(memory.size)


Thanks!

Dan

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Brian D. Ripley,  rip...@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] 回复: Bayesian Hidden Markov Models

2012-03-02 Thread monkeylan
Dear Oscar,
 
I have used the the following codes to perform a Bayesian HMM for the exchange 
rate data.
But, one intresting result is that the model fits a 6-state HMM with a common 
variance.
This is very hard to understand. Because, from the plot graph, we could see 
there are obviously differents with high and low volatility.
 
So, could you please help me to take a look at this? Attached is the exchange 
rate data.
I am really grateful for your help and time.
 
Best Regards,
 
James LAN
 
 
#input exchange rate data
exrt-read.table(file=exrt.txt,header=F)
plot(exrt$V2)
library(RJaCGH) 
y-exrt$V2
Pos- 1:length(y) 
Chrom - rep(1, length(y)) 
res-RJaCGH(y=y, Pos=Pos, Chrom=Chrom)
summary(res)
Q.NH(summary(res)[[1]]$beta, x=0)
Summary for ARRAY  array1:
Distribution of the number of hidden states:
1 2 3 4 5 6 
0 0 0 0 0 1 
Model with  6  states:
Distribution of the posterior means of hidden states:
            10%    25%    50%    75%    90%
Loss-1   -0.298 -0.284 -0.284 -0.279 -0.279
Loss-2   -0.144 -0.142 -0.142 -0.135 -0.135
Normal-1 -0.045 -0.043 -0.043 -0.040 -0.040
Normal-2 -0.004 -0.003 -0.003  0.000  0.000
Normal-3  0.047  0.056  0.056  0.059  0.059
Gain      0.177  0.197  0.197  0.198  0.198
Distribution of the posterior variances of hidden states:
           10%   25%   50%   75%   90%
Loss-1   0.001 0.001 0.001 0.001 0.001
Loss-2   0.001 0.001 0.001 0.001 0.001
Normal-1 0.001 0.001 0.001 0.001 0.001
Normal-2 0.001 0.001 0.001 0.001 0.001
Normal-3 0.001 0.001 0.001 0.001 0.001
Gain     0.001 0.001 0.001 0.001 0.001
Parameters of the transition functions:
         Loss-1 Loss-2 Normal-1 Normal-2 Normal-3  Gain
Loss-1    0.000  0.217    0.192    1.229    0.185 0.857
Loss-2    2.104  0.000    0.305    2.190    0.132 1.424
Normal-1  2.728  1.472    0.000    4.606    0.293 2.423
Normal-2  5.919  4.746    5.518    0.000    5.067 5.834
Normal-3  2.295  0.537    0.115    4.329    0.000 2.514
Gain      1.519  0.247    0.036    1.263    0.132 0.000

 Q.NH(summary(res)[[1]]$beta, x=0)
              Loss-1      Loss-2    Normal-1    
Normal-2    Normal-3
Loss-1   0.239381248 0.192598942 0.197535790 0.070058386 0.198853168
Loss-2   0.039503637 0.323847484 0.238632024 0.036241348 0.283843424
Normal-1 0.030559504 0.107234801 0.467453369 0.004669696 0.348627295
Normal-2 0.002624349 0.008474303 0.003915585 0.975979222 0.006151494
Normal-3 0.037727330 0.218834862 0.333794793 0.004936521 0.374412381
Gain     0.053064705 0.189481114 0.233947328 0.068592117 0.212423356
                Gain
Loss-1   0.101572465
Loss-2   0.077932083
Normal-1 0.041455335
Normal-2 0.002855048
Normal-3 0.030294113
Gain     0.242491380


发件人: Oscar Rueda [via R] ml-node+s789695n4431468...@n4.nabble.com
收件人: monkeylan lanjin...@yahoo.com.cn 
发送日期: 2012年2月29日, 星期三, 下午 9:21
主题: Re: Bayesian Hidden Markov Models


Dear James, 

The distances are normalized between zero and 1, so in your case all of them 
will be zero. You can check that with 

 res$Dist.for.model 

And do 

 Q.NH(summary(res)[[1]]$beta, x=0) 

To obtain the common transition matrix. 

Cheers, 
Oscar 


On 29/2/12 03:59, monkeylan [hidden email] wrote: 


 Dear Oscar, 
   
 I am extremely grateful to your help and detailed explanation of the use of 
 RJaCGH package. 
 But, when runing the sample codes you listed, another issue I am a little 
 confused is as following: 
 After runing summary(res), I have got the estimation of the random matrix 
 Beta: 
 
 Parameters of the transition functions: 
        Normal  Gain 
 Normal  0.000 4.258 
 Gain    2.001 0.000 
   
 But, the transition probabilty matrix Q based on the aboving Beta is more 
 concerned in my modeling. 
 Here, I am not sure how can I get the  matrix Q. I did try the Q.NH 
 functions.However, Shoud I set the distance parameter x be 1 or 0? I am not 
 sure. 
   
  If 1( according to my own understanding), the following result seems not 
 reseanable. 
   
 tran-matrix(c(0,2.001,4.528,0),2,2) 
 Q.NH(beta=tran, x=1) 
      [,1] [,2] 
 [1,]  0.5  0.5 
 [2,]  0.5  0.5 
   
 Many thanks for your further help and time. 
   
 James Allan 
 
 --- 12年2月28日,周二, Oscar Rueda [via R] 
 [hidden email] 写道: 
 
 
 发件人: Oscar Rueda [via R] [hidden email] 
 主题: Re: Bayesian Hidden Markov Models 
 收件人: monkeylan [hidden email] 
 日期: 2012年2月28日,周二,下午7:02 
 
 
 Dear James, 
 
 Basically you just need the values (y) and the positions (in your case it 
 would be the index of the times series). The chromosome argument does not 
 apply to your case so it can be a vector of ones. 
 If the positions are at the same distance between (equally spaced) then the 
 

[R] interpreting the output of a glm with an ordered categorical predictor.

2012-03-02 Thread kmuller

Greetings.

I'm a Master's student working on an analysis of herbivore damage on plants.
I have a tried running a glm with one categorical predictor (aphid
abundance) and a binomial response (presence/absence of herbivore damage).
My predictor has four categories: high, medium, low, and none. I used the
ordered function to sort my categories for a glm.

ah -
read.csv(http://depot.northwestern.edu/class/2012WI_PBC_435-0_AND_BIOL_SCI_313/muller/herbivoryEdit.csv;)
ah1- ah[ah$date==110810,] 
ah2-ah[ah$date==110904,]

aphidOrder - ordered(ah2$aphidLevelMax,levels=c(none, low, med,
high))

ordAph - glm(chewholebinom~aphidOrder,family=binomial,data=ah2)


When I ran the summary for the glm (output pasted below), I could not tell
which intercept referred to which factor level. My question is, what do .L,
.Q, and .C mean and how can I relate these factors to my original factors
(none, low, med, high)?

Thank you for your help,

Katherine

summary(ordAph)

Call:
glm(formula = chewholebinom ~ aphidOrder, family = binomial, 
data = ah2)

Deviance Residuals: 
Min   1Q   Median   3Q  Max  
-1.6512  -0.9817   0.7687   0.7687   1.5353  

Coefficients:
 Estimate Std. Error z value Pr(|z|)   
(Intercept)  -0.055670.25097  -0.222   0.8245   
aphidOrder.L -1.367550.49366  -2.770   0.0056 **
aphidOrder.Q  0.368240.50195   0.734   0.4632   
aphidOrder.C -0.098400.51011  -0.193   0.8470   
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 137.99  on 99  degrees of freedom
Residual deviance: 124.05  on 96  degrees of freedom
AIC: 132.05

Number of Fisher Scoring iterations: 4





--
View this message in context: 
http://r.789695.n4.nabble.com/interpreting-the-output-of-a-glm-with-an-ordered-categorical-predictor-tp4440383p4440383.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Noob question - Identity argument within aggregate function?

2012-03-02 Thread knavero
z is a zoo object as a result from reading in the following series

z = suppressWarnings(zoo(1:8), c(1, 2, 2, 2, 3, 4, 5, 5))

This is what z is in the aggregate function. So then that brings us to
aggregate(z, identity, tail, 1). All I was trying to accomplish was trying
to reproduce an example shown on the zoo faq. I've read ?aggregate via
terminal and used /identity to search through the documentation for the
specific term identity. I'm just trying to understand what identity is
used for because I do not understand the error statement. 



--
View this message in context: 
http://r.789695.n4.nabble.com/Noob-question-Identity-argument-within-aggregate-function-tp4439806p4440413.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Noob question - Identity argument within aggregate function?

2012-03-02 Thread knavero
I've also searched ?identity in the R shell and it doesn't seem to be the
definition I'm looking for for this particular usage of identity as an
argument in the aggregate function. I simply would appreciate a conceptual
explanation of what it does here and how it relates to the error.

--
View this message in context: 
http://r.789695.n4.nabble.com/Noob-question-Identity-argument-within-aggregate-function-tp4439806p4440420.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Noob question - Identity argument within aggregate function?

2012-03-02 Thread knavero
Sorry, in regards to the previous post where I said aggregate(z, identity,
tail, 1), replace it with aggregate(z, identity, mean)

--
View this message in context: 
http://r.789695.n4.nabble.com/Noob-question-Identity-argument-within-aggregate-function-tp4439806p4440424.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.