from:"Moshe Olshansky"

Re: [R] Dicrete Laplace distribution

2010-03-11 Thread Moshe Olshansky

Dear Nicolette,

You can always use the bruit force solution which works for every discrete 
distribution with finite number of states: let p0,p1,...,pK be the 
probabilities of 0,1,...,K (such that they sum up to 1).
Let P - c(p0,p1,...,pK) and P1 - c(cumsum(P),1)
Now let x = runif() (uniform in (0,1)) and k - which(P1 = x)[1] has the 
desired distribution.

Regards,
Moshe.

--- On Fri, 12/3/10, Raquel Nicolette nicole...@ua.pt wrote:

 From: Raquel Nicolette nicole...@ua.pt
 Subject: [R] Dicrete Laplace distribution
 To: r-help@r-project.org
 Received: Friday, 12 March, 2010, 2:47 AM
  
 
 Hello,
 
  
 
  http://tolstoy.newcastle.edu.au/R/help/04/07/0312.html#0313qlink1 
 Could
 anybody tell me how to generate discrete Laplacian
 distribution? 
 
  
 
 I need to sample  uma discretised Laplacian density
 like this:
 
  
 
   J(  g - g´)  ~ exp (-lambda | g´ - g
 |)  g in {0,…, gmax} 
 
  
 
 Thanks,
 
  
 
 Nicolette
 
 
     [[alternative HTML version deleted]]
 
 
 -Inline Attachment Follows-
 
 __
 R-help@r-project.org
 mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained,
 reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Optimise huge data.frame construction

2010-02-24 Thread Moshe Olshansky

Hi Daniele,

One possibility would be to make two runs. In the first run you are not 
building the matrix but just calculating the number of rows you need (in a 
loop). Then you allocate such matrix (only once) and fill it in the second run.

Regards,
Moshe.

--- On Wed, 24/2/10, Daniele Amberti daniele.ambe...@ors.it wrote:

 From: Daniele Amberti daniele.ambe...@ors.it
 Subject: [R] Optimise huge data.frame construction
 To: r-help@r-project.org r-help@r-project.org
 Received: Wednesday, 24 February, 2010, 8:34 PM
 I have data for different items (ID)
 in a database.
 For each ID I have to get:
 
 -          Timestamp of the
 observation (timestamp);
 
 -          numerical value (val)
 that will be my response variable in some kind of model;
 
 -          a variable number of
 variables in a know set (if value for a specific variable is
 not present in DB it is 0).
 
 To get to the above mentioned values I have to cycle over
 IDs, make some calculation and store results to construct a
 huge data.frame for subsequent estimations. The number of
 rows for each ID is random (typically 14 to 200).
 
 My current approach is to construct a matrix like this:
 
 out - c('A', 'B', 'C', 'D')
 out - matrix(-1, 5000, 3 + length(out), dimnames =
 list(1:5000, c('ID', 'timestamp' , 'val', out)))
 
 I access to out matrix by numerical index to substitute
 values ( out[1:n,1] - k )
 When matrix is full I add 5000 rows and go on.
 Afterward I clean rows with ID set to -1 and than all other
 -1 values with 0
 
 For my application typically an ID have something between
 14 and 200 observations (mean around 50) but I have 15000
 IDs ...
 After profiling I realize that accessing the out matrix
 this way is too slow.
 
 Do you have any idea on how to speed up this kind of
 process?
 I think something can be done creating a data.frame for
 each ID and bind them in the end. Is it a good idea? How can
 I implement that? List of data.frame? And than?
 
 Below some code that can be useful if someone would like to
 experiment ...
 
 alist - vector('list', 2)
 alist[[1]] - data.frame( ID = 1, timestamp = 1:14, val
 = rnorm(14), A = 1, B = 2, C = 3 )
 alist[[2]] - data.frame( ID = 2, timestamp = 2:15, val
 = rnorm(14), B = 2, C = 3, D = 4 )
 alist[[3]] - data.frame( ID = 3, timestamp = 3:30, val
 = rnorm(28), C = 1, D = 2 )
 
 
 Thanks in advance for your valuable help.
 Daniele
 
 
 ORS Srl
 
 Via Agostino Morando 1/3 12060 Roddi (Cn) - Italy
 Tel. +39 0173 620211
 Fax. +39 0173 620299 / +39 0173 433111
 Web Site www.ors.it
 
 
 Qualsiasi utilizzo non autorizzato del presente messaggio e
 dei suoi allegati ? vietato e potrebbe costituire reato.
 Se lei avesse ricevuto erroneamente questo messaggio, Le
 saremmo grati se provvedesse alla distruzione dello stesso
 e degli eventuali allegati.
 Opinioni, conclusioni o altre informazioni riportate nella
 e-mail, che non siano relative alle attivit? e/o
 alla missione aziendale di O.R.S. Srl si intendono non
 attribuibili alla societ? stessa, n? la impegnano in alcun
 modo.
 
     [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org
 mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained,
 reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] reading surfer files

2010-02-23 Thread Moshe Olshansky

Check read.table (?read.table).

--- On Wed, 24/2/10, RagingJim nowa0...@flinders.edu.au wrote:

 From: RagingJim nowa0...@flinders.edu.au
 Subject: [R] reading surfer files
 To: r-help@r-project.org
 Received: Wednesday, 24 February, 2010, 3:23 PM

 To the R experts,

 I am currently playing with a program which was designed so
 that the outputs
 are to be read in Surfer. I do not have the program, but
 the files, can
 but put into excel and graphed. I figured i could do the
 same thing with R.

 If I open the file with excel, and put the text into
 columns, and separate
 them by spaces, I can then graph it. But how do I do this
 in R? Once I have
 done all that in excel and saved it, I can then open it and
 do a 3d suface
 plot in R, but I would like to skip the excel step.

 Each line looks like this:

 0.2100209E+03  0.2109997E+03  0.2111866E+03 
 0.2095219E+03  0.2079958E+03 
 0.2088303E+03  0.2095171E+03  0.2104975E+03 
 0.2124125E+03  0.2132827E+03 
 0.2125371E+03  0.2119912E+03  0.2119215E+03 
 0.2127108E+03  0.2147589E+03 
 0.2167924E+03  0.2157783E+03  0.2144353E+03 
 0.2127579E+03  0.2085133E+03 
 0.2065096E+03  0.2037935E+03  0.1973606E+03 
 0.1945031E+03  0.2076126E+03 
 0.2131146E+03  0.2014675E+03  0.1999074E+03 
 0.1994730E+03  0.2003694E+03 
 0.2003093E+03  0.2017771E+03  0.2070309E+03 
 0.1945123E+03  0.1892468E+03 
 0.1857383E+03  0.1817865E+03  0.1791424E+03 
 0.1785881E+03  0.1791883E+03 
 0.1803020E+03  0.1797575E+03  0.1807816E+03 
 0.1835612E+03  0.1866963E+03 
 0.1882194E+03  0.1919594E+03  0.1940257E+03 
 0.1977790E+03  0.1970629E+03 
 0.1983556E+03  0.1990551E+03  0.1999159E+03 
 0.2016027E+03  0.2029541E+03 
 0.2049124E+03  0.2077511E+03  0.2081407E+03 
 0.2074211E+03  0.2082496E+03 
 0.2088355E+03  0.2089197E+03  0.2106171E+03 
 0.2125163E+03

 The first thing I have to do is remove the first 4 lines
 (which are just
 file information), and then separate each value into its
 own column. How do
 I go about doing that?

 Cheers lads.
 -- 
 View this message in context: 
 http://n4.nabble.com/reading-surfer-files-tp1566943p1566943.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org
 mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained,
 reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Goodness of fit test for count data

2010-02-22 Thread Moshe Olshansky

You can compute the conditional probability that your variable equals k given 
that it is non-zero. For example, if X has poisson distribution with parameter 
lambda then
P(X=k/X!=0) = P(X=k)/(1-P(X=0)) = (exp(-lambda)/(1-exp(-lambda))*lambda^k/k!
Now you can find lambda for which the sum of squares of your errors is minimal 
and then use CHi-aquared test using these expected frequencies.

Similarly for negative binomial distribution.

--- On Tue, 23/2/10, pinusan anh...@msu.edu wrote:

 From: pinusan anh...@msu.edu
 Subject: [R] Goodness of fit test for count data
 To: r-help@r-project.org
 Received: Tuesday, 23 February, 2010, 6:11 AM
 
 Dear  all, 
 
 I am trying to test goodness of fit. I assume that a data
 follow Poisson or
 Negative binomial distribution. I can test the goodness of
 fit in case of no
 truncated data. However, I could not find any good function
 or packages when
 a data is truncated.  
 
 For example, a frequency table for the number of visiting
 emergency room in
 one hundred one observations past one year is as follow: 
 N freq 
 1 30 
 2 35 
 3 26 
 4 8 
 5 0 
 6 2 
 7 0 
 
  I expect the frequency table to satisfy a Poisson
 distribution or Negative
 binomial distribution. However, the distribution is
 different from the usual
 Poisson or Negative binomial distribution because one
 value, zero, is
 excluded. I expect that the distribution is zero truncated
 distribution. 
 
 In case of SAS, I used NLMIXED procedure to calculate the
 expected
 probability when y=1 … y=n under the assumption that a
 data follows Poisson
 or Negative binomial distribution. And then I run
 Chi-square test. If you
 need the SAS code, I will send E-mail.
 I want to run this test in R.
 Could you suggest any idea that can I perform this test in
 R.
 
 Have a nice day.
 
 -- 
 View this message in context: 
 http://n4.nabble.com/Goodness-of-fit-test-for-count-data-tp1564963p1564963.html
 Sent from the R help mailing list archive at Nabble.com.
 
 __
 R-help@r-project.org
 mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained,
 reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Quadprog help

2010-02-22 Thread Moshe Olshansky

Hi Sergio,

Having singular Dmat is certainly a problem.
I can see two possibilities:
1) try to eliminate X1,...,X9, so that you are left with P1,...,P6 only.
2) if you can not do this, add eps*X1^+...+eps*X9^2 to your matrix Dmat so that 
it is positive definite (eps is a small positive number). You can then try to 
make eps smaller and smaller (but still not too small in order not to get into 
numerical problems) and check whether your solution converges.

--- On Fri, 19/2/10, Araujo-Enciso, Sergio Rene 
sergio-rene.araujo-enc...@agr.uni-goettingen.de wrote:

 From: Araujo-Enciso, Sergio Rene 
 sergio-rene.araujo-enc...@agr.uni-goettingen.de
 Subject: [R] Quadprog help
 To: r-help@r-project.org
 Received: Friday, 19 February, 2010, 7:40 PM
 I am having some problems using
 Quadprog in R. I want to minimize the
 objective function :
 
  
 
 200*P1-1/2*10*P1^2+100*P2-1/2*5*P2^2+160*P3-1/2*8*P3^2+50*P4-1/2*10*P4^2+50*P
 5-1/2*20*P5^2+50*P6-1/2*10*P6^2, 
 
  
 
 Subject to a set of constrains including not only the
 variables P1, P2, P3,
 P4, P5, P6, but also the variables X1,
 X2,X3,X4,X5,X6,X7,X8,X9.
 
  
 
  As the set of variables X's are not affecting the
 objective function, I
 assume that I have to enter them as zero's in the vector
 dvec and the
 matrix Dmat. 
 
  
 
 My program states as: 
 
  
 
 mat-matrix(0,15,15)
 
 diag(Dmat)-c(10,5,8,10,20,10,0,0,0,0,0,0,0,0,0)
 
 dvec- c(-200,-100,-160,-50,-50,-50,0,0,0,0,0,0,0,0,0)
 
 Amat-
 matrix(c(-1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,-1,0,1,
 
              
    0,0,0,0,0,0,0,0,0,0,0,0,0,-1,1,0,0,0,0,
 
              
    0,0,0,0,0,0,0,-1,0,0,0,1,0,0,0,0,0,0,0,
 
              
    0,0,0,0,-1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,
 
              
    0,-1,0,1,0,0,0,0,0,0,0,0,0,0,-1,0,0,0,0,
 
              
    1,0,0,0,0,0,0,0,0,0,0,-1,0,0,0,1,0,0,0,0,
 
              
    0,0,0,0,0,0,0,-1,0,0,1,0,0,0,0,0,0,0,0,0,
 
              
    -10,0,0,0,0,0,-1,0,0,-1,0,0,-1,0,0,0,-5,0,
 
              
    0,0,0,0,-1,0,0,-1,0,0,-1,0,0,0,-8,0,0,0,0,
 
              
    0,-1,0,0,-1,0,0,-1,0,0,0,-10,0,0,1,1,1,0,0,
 
              
    0,0,0,0,0,0,0,0,-20,0,0,0,0,1,1,1,0,0,0,0,0,
 
              
    0,0,0,-10,0,0,0,0,0,0,1,1,1),15,15)
 
 bvec
 -c(0,-2,-2,-2,0,-1,-2,-1,0,-200,-100,-160,-50,-50,-50)
 
 solve.QP(Dmat,dvec,Amat,bvec=bvec)
 
  
 
 Nonetheless I get the message: Error en solve.QP(Dmat,
 dvec, Amat, bvec =
 bvec) : matrix D in quadratic function is not positive
 definite!.
 
  
 
 I think it has to do with the fact that in the Dmat matrix
 I end up with
 several columns with zeros. Do anyone have an idea of how
 to solve such a
 problem? 
 
  
 
 Bests, 
 
  
 
 Sergio René
 
 
     [[alternative HTML version deleted]]
 
 
 -Inline Attachment Follows-
 
 __
 R-help@r-project.org
 mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained,
 reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Integral of function of dnorm

2010-02-17 Thread Moshe Olshansky

Yes, this can be easily computed analytically (even though my result is a bit 
different).

--- On Fri, 12/2/10, dav...@rhotrading.com dav...@rhotrading.com wrote:

 From: dav...@rhotrading.com dav...@rhotrading.com
 Subject: Re: [R] Integral of function of dnorm
 To: Greg Snow greg.s...@imail.org, Trafim Vanishek 
 rdapam...@gmail.com, Peter Dalgaard p.dalga...@biostat.ku.dk
 Cc: r-help@r-project.org
 Received: Friday, 12 February, 2010, 7:35 AM
 and it can be done analytically: =
 -(1 + log(2 pi)) / 2

 -- David L. Reiner

 -Original Message-
 From: r-help-boun...@r-project.org
 [mailto:r-help-boun...@r-project.org]
 On Behalf Of Greg Snow
 Sent: Thursday, February 11, 2010 11:58 AM
 To: Trafim Vanishek; Peter Dalgaard
 Cc: r-help@r-project.org
 Subject: Re: [R] Integral of function of dnorm

 Try:

  tmpfun - function(x)
 dnorm(x,mean=8,sd=1)*log(dnorm(x,mean=8,sd=1))
  integrate( tmpfun, -Inf, Inf)

 Also you may want to look at the log argument to dnorm
 rather than
 taking the log of the function.

 Hope this helps,

 -- 
 Gregory (Greg) L. Snow Ph.D.
 Statistical Data Center
 Intermountain Healthcare
 greg.s...@imail.org
 801.408.8111

  -Original Message-
  From: r-help-boun...@r-project.org
 [mailto:r-help-boun...@r-
  project.org] On Behalf Of Trafim Vanishek
  Sent: Thursday, February 11, 2010 9:49 AM
  To: Peter Dalgaard
  Cc: r-help@r-project.org
  Subject: Re: [R] Integral of function of dnorm

  This is exactly what I mean.

  I need to find
 integrate(dnorm(mean=8,sd=1)*log(dnorm(mean=8,sd=1)), -
  Inf,
  Inf)

  Which doesn't work like that, because it says:
  Error in dnorm(mean = 8, sd = 1) : element 1 is
 empty;
     the part of the args list of '.Internal'
 being evaluated was:
     (x, mean, sd, log)

  So how can I define x?
  THanks a lot

    Dear all,

   How is it possible in R to calculate the
 following integral:
   Integral(-Inf, Inf)[log(dnorm(mean = 3, sd =
 1))]

   how can I define that the density dnorm is
 taken on (-Inf, Inf)

   Thanks a lot!

   Er, if you mean integral with respect to the x
 argument in dnorm,
  then the
   answer is -Inf because log(dnorm(x,...)) goes
 quadratically to -Inf
  in both
   directions. If you meant otherwise, please tell
 us what you meant...

      [[alternative HTML version
 deleted]]

  __
  R-help@r-project.org
 mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting-
  guide.html
  and provide commented, minimal, self-contained,
 reproducible code.

 __
 R-help@r-project.org
 mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained,
 reproducible code.

 This e-mail and any materials attached hereto, including,
 without limitation, all content hereof and thereof
 (collectively, XR Content) are confidential and
 proprietary to XR Trading, LLC (XR) and/or its affiliates,
 and are protected by intellectual property laws. 
 Without the prior written consent of XR, the XR Content may
 not (i) be disclosed to any third party or (ii) be
 reproduced or otherwise used by anyone other than current
 employees of XR or its affiliates, on behalf of XR or its
 affiliates.

 THE XR CONTENT IS PROVIDED AS IS, WITHOUT REPRESENTATIONS
 OR WARRANTIES OF ANY KIND.  TO THE MAXIMUM EXTENT
 PERMISSIBLE UNDER APPLICABLE LAW, XR HEREBY DISCLAIMS ANY
 AND ALL WARRANTIES, EXPRESS AND IMPLIED, RELATING TO THE XR
 CONTENT, AND NEITHER XR NOR ANY OF ITS AFFILIATES SHALL IN
 ANY EVENT BE LIABLE FOR ANY DAMAGES OF ANY NATURE
 WHATSOEVER, INCLUDING, BUT NOT LIMITED TO, DIRECT, INDIRECT,
 CONSEQUENTIAL, SPECIAL AND PUNITIVE DAMAGES, LOSS OF PROFITS
 AND TRADING LOSSES, RESULTING FROM ANY PERSON'S USE OR
 RELIANCE UPON, OR INABILITY TO USE, ANY XR CONTENT, EVEN IF
 XR IS ADVISED OF THE POSSIBILITY OF SUCH DAMAGES OR IF SUCH
 DAMAGES WERE FORESEEABLE.

 __
 R-help@r-project.org
 mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained,
 reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] difftimes; histogram; memory problems

2010-02-15 Thread Moshe Olshansky

Hi Jonathan,

If minDate = min(Condition1) - max(Condition2) and maxDate = max(Condition1) - 
min(Condition2) then all your differences would be between minDay and maxDay, 
and hopefully this is not a very big range (unless you are going many thousands 
years into the past or the future). So basically for any number of days in this 
range you should count the number of times it appears. To speed up the 
calculations you may do this with just one loop (and one vectorized operation) 
- I can not do this without a single loop (if we want to limit the memory use). 
Let me know if you need the actual code.

Regards,
Moshe.

--- On Tue, 16/2/10, Jonathan jonsle...@gmail.com wrote:

 From: Jonathan jonsle...@gmail.com
 Subject: Re: [R] difftimes; histogram; memory problems
 To: r-help r-help@r-project.org
 Received: Tuesday, 16 February, 2010, 1:17 PM
 Let me fix a couple of typos in that
 email:
 
 Hi All:
 
 Let's say I have two dataframes (Condition1 and
 Condition2); each
 being on the order of 12,000 and 16,000 rows; 1
 column.  The entries
 contain dates.
 
 I'd like to calculate, for each possible pair of dates
 (that is:
 Condition1[1:12,000] and Condition2[1:16,000], the number
 of days
 difference between the dates in the pair.  The result
 should be a
 matrix 12,000 by 16,000, which I'll call M.  The
 purpose of building
 such a matrix M is to create a histogram of all the values
 contained
 within it.
 
 Ex):
 Condition1 - data.frame('dates' =
 rep(c('2001-02-10','1998-03-14'),6000))
 Condition2 - data.frame('dates' =
 rep(c('2003-07-06','2007-03-11'),8000))
 
 First, my instinct is to try and vectorize the
 operation.  I tried
 this by expanding each vector into a matrix of repeated
 vectors (I'd
 then just subtract the two resultant matrices to get matrix
 M).  I got
 the following error:
 
  expandedCondition1 - matrix(rep(Condition1[[1]],
 nrow(Condition2)), byrow=TRUE, ncol=nrow(Condition1))
 Error: cannot allocate vector of size 732.4 Mb
  expandedCondition2 - matrix(rep(Condition2[[1]],
 nrow(Condition1)), byrow=FALSE, nrow=nrow(Condition2))
 Error: cannot allocate vector of size 732.4 Mb
 
 Since it seems these matrices are too large, I'm wondering
 whether
 there's a better way to call a hist command without
 actually building
 the said matrix..
 
 I'd greatly appreciate any ideas!
 
 Best,
 Jonathan
 
 On Mon, Feb 15, 2010 at 8:19 PM, Jonathan jonsle...@gmail.com
 wrote:
  Hi All:
 
  Let's say I have two dataframes (Condition1 and
 Condition2); each
  being on the order of 12,000 and 16,000 rows; 1
 column.  The entries
  contain dates.
 
  I'd like to calculate, for each possible pair of dates
 (that is:
  Condition1[1:10,000] and Condition2[1:10,000], the
 number of days
  difference between the dates in the pair.  The result
 should be a
  matrix 12,000 by 16,000.  Really, what I need is a
 histogram of all
  the values in this matrix.
 
  Ex):
  Condition1 - data.frame('dates' =
 rep(c('2001-02-10','1998-03-14'),6000))
  Condition2 - data.frame('dates' =
 rep(c('2003-07-06','2007-03-11'),8000))
 
  First, my instinct is to try and vectorize the
 operation.  I tried
  this by expanding each vector into a matrix of
 repeated vectors (I'd
  then just subtract the two).  I got the following
 error:
 
  expandedCondition1 -
 matrix(rep(Condition1[[1]], nrow(Condition2)), byrow=TRUE,
 ncol=nrow(Condition1))
  Error: cannot allocate vector of size 732.4 Mb
  expandedCondition2 -
 matrix(rep(Condition2[[1]], nrow(Condition1)), byrow=FALSE,
 nrow=nrow(Condition2))
  Error: cannot allocate vector of size 732.4 Mb
 
  Since it seems these matrices are too large, I'm
 wondering whether
  there's a better way to call a hist command without
 actually building
  the said matrix..
 
  I'd greatly appreciate any ideas!
 
  Best,
  Jonathan
 
 
 __
 R-help@r-project.org
 mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained,
 reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Question about rank() function

2010-02-10 Thread Moshe Olshansky

Hi,

I believe that the reason is that even though the first 4 elements of your 
fmodel look equal (when rounded to 4 decimal places) they are actually not.
To check this try
fmodel[1:4]-fmodel[1]

--- On Thu, 11/2/10, Something Something mailinglist...@gmail.com wrote:

 From: Something Something mailinglist...@gmail.com
 Subject: [R] Question about rank() function
 To: r-help@r-project.org
 Received: Thursday, 11 February, 2010, 6:23 PM
 Hello,
 
 I am trying to get the 'rank' function to work for me, but
 not sure what I
 am doing wrong.  Please help.
 
 I ran the following commands:
 
 data = read.table(test1.csv, head=T, as.is=T,
 na.string=., row.nam=NULL)
 X1 = as.factor(data[[3]])
 X2 = as.factor(data[[4]])
 X3 = as.factor(data[[5]])
 Y = data[[2]]
 model = lm(Y ~ X1*X2*X3, na.action = na.exclude)
 fmodel = fitted(model)
 fmodel
 (First line is shown below.)
 
 
        1     
   2        3     
   4        5     
   6        7     
   8
    9       10 
      11   
    12       13 
      14   
    15       16
 17
 180.3763 180.3763 180.3763 180.3763 180.4546 180.3763
 177.9245 177.9245
 181.3859 180.3763       NA
 180.4546 180.3763 180.4546 180.3763 180.3763
 180.4546
 
 Then I run:
 fmodel.rank = rank(fmodel)
 fmodel.rank
 (First line is shown below)
 
     1     2 
    3     4 
    5     6 
    7     8 
    9    10    11 
   12
  13    14    15    16 
   17    18    19   
 20    21    22    23 
   24    25
    26
 375.0 222.0  68.5  68.5 402.5 222.0 
 33.5  33.5 465.5 222.0 500.0 402.5
 222.0 402.5 222.0 222.0 378.5 222.0 222.0 222.0 222.0 222.0
 402.5 222.0
  33.5 222.0
 
 
 As you can see, first 4 values of 'fmodel' are 180.3763, so
 after running
 rank(fmodel) I expected the ranks of first 4 to be the
 same, but they are
 not.  What am I doing wrong?  Please let me
 know.  Thanks.
 
     [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org
 mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained,
 reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Resampling a grid to coarsen its resolution

2010-02-09 Thread Moshe Olshansky

One possibility I can see is to replace - by NA and use mean with 
na.rm=TRUE.

--- On Wed, 10/2/10, Steve Murray smurray...@hotmail.com wrote:

 From: Steve Murray smurray...@hotmail.com
 Subject: [R] Resampling a grid to coarsen its resolution
 To: r-help@r-project.org
 Received: Wednesday, 10 February, 2010, 3:20 AM

 Dear all,

 I have a grid (data frame) dataset at 0.5 x 0.5 degrees
 spatial resolution (720 columns x 360 rows; regular spacing)
 and wish to coarsen this to a resolution of 2.5 x 2.5
 degrees. A simple calculation which takes the mean of a
 block of points to form the regridded values would do the
 trick. Values which should be excluded from the calculation
 are - (unless all points within a block are -, in
 which case - should be returned as the 'new' cell).

 How would I go about achieving this in R?

 Any help or guidelines would be very much appreciated.

 Many thanks,

 Steve

 _
 Got a cool Hotmail story? Tell us now

 __
 R-help@r-project.org
 mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained,
 reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Polynomial equation

2010-01-07 Thread Moshe Olshansky

Hi Chris,

You can use lm with poly (look ?lm, ?poly).
If x and y are your arrays of points and you wish to fit a polynom of degree 4, 
say, enter:  model - lm(y~poly(x,4,raw=TRUE) and then summary(model)
The raw=TRUE causes poly to use 1,x,x^2,x^3,... instead of orthogonal 
polynomials (which are better numerically but may be not what you need).

Regards,
Moshe.

--- On Fri, 8/1/10, chrisli1223 chri...@austwaterenv.com.au wrote:

 From: chrisli1223 chri...@austwaterenv.com.au
 Subject: [R]  Polynomial equation
 To: r-help@r-project.org
 Received: Friday, 8 January, 2010, 12:32 PM
 
 Hi all,
 
 I have got a dataset. In Excel, I can fit a polynomial
 trend line
 beautifully. However, the equation that Excel calculates
 appear to be
 incorrect. So I am thinking about using R.
 
 My questions are:
 (1) How do I fit a polynomial trendline to a graph?
 (2) How do I calculate and display the equation for that
 trendline?
 
 Many thanks for your help. Greatly appreciated.
 
 Chris
 -- 
 View this message in context: 
 http://n4.nabble.com/Polynomial-equation-tp1009398p1009398.html
 Sent from the R help mailing list archive at Nabble.com.
 
 __
 R-help@r-project.org
 mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained,
 reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Polynomial equation

2010-01-07 Thread Moshe Olshansky

Hi Chris,

Curve fitting has nothing to do with statistics (even though it is used in 
statistics).
To get an idea of this, try the following:

x - ((-100):100)/100
cofs-rnorm(4)#create coefficients
y - cofs[1] + cofs[2]*x + cofs[3]*x^2 +cofs[4]*x^3
y1 - y +rnorm(201,0,0.1)#add noise
mm - lm(y1~poly(x,3,raw=TRUE))#fit a polynomial of degree 3
y2 - predict(mm,as.data.frame(x))   #compute the polynomial for every point of 
x
plot(x,y,type=l);lines(x,y1,col=red);lines(x,y2,col=blue)
cofs
mm$coefficients

For the exponential fit, there exist two options:
you are trying to fit y = exp(a*x+b)
one possibility is to fit log(y) = a*x+b by mm - lm(log(y)~x) 
and the other (more correct) one is to use any of the least squares packages.

I believe that you better read a little bit about curve fitting before doing 
all this.

Regards,

Moshe.


--- On Fri, 8/1/10, chrisli1223 chri...@austwaterenv.com.au wrote:

 From: chrisli1223 chri...@austwaterenv.com.au
 Subject: Re: [R] Polynomial equation
 To: r-help@r-project.org
 Received: Friday, 8 January, 2010, 2:14 PM
 
 Thank you very much for your help. Greatly appreciated!
 
 However, due to my limited stats background, I am unable to
 find out the
 equation of the trendline from the summary table. Besides,
 how do I fit the
 trendline on the graph?
 
 I intend to put the first column of data onto x axis and
 the second column
 onto y axis. Are they the x and y in your example?
 
 Many thanks,
 Chris
 
 
 Moshe Olshansky-2 wrote:
  
  Hi Chris,
  
  You can use lm with poly (look ?lm, ?poly).
  If x and y are your arrays of points and you wish to
 fit a polynom of
  degree 4, say, enter:  model -
 lm(y~poly(x,4,raw=TRUE) and then
  summary(model)
  The raw=TRUE causes poly to use 1,x,x^2,x^3,...
 instead of orthogonal
  polynomials (which are better numerically but may be
 not what you need).
  
  Regards,
  Moshe.
  
  --- On Fri, 8/1/10, chrisli1223 chri...@austwaterenv.com.au
 wrote:
  
  From: chrisli1223 chri...@austwaterenv.com.au
  Subject: [R]  Polynomial equation
  To: r-help@r-project.org
  Received: Friday, 8 January, 2010, 12:32 PM
  
  Hi all,
  
  I have got a dataset. In Excel, I can fit a
 polynomial
  trend line
  beautifully. However, the equation that Excel
 calculates
  appear to be
  incorrect. So I am thinking about using R.
  
  My questions are:
  (1) How do I fit a polynomial trendline to a
 graph?
  (2) How do I calculate and display the equation
 for that
  trendline?
  
  Many thanks for your help. Greatly appreciated.
  
  Chris
  -- 
  View this message in context:
  http://n4.nabble.com/Polynomial-equation-tp1009398p1009398.html
  Sent from the R help mailing list archive at
 Nabble.com.
  
  __
  R-help@r-project.org
  mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained,
  reproducible code.
 
  
  __
  R-help@r-project.org
 mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained,
 reproducible code.
  
  
 
 -- 
 View this message in context: 
 http://n4.nabble.com/Polynomial-equation-tp1009398p1009438.html
 Sent from the R help mailing list archive at Nabble.com.
 
 __
 R-help@r-project.org
 mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained,
 reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Confidence intervals - a statistical question, nothing to do with R

2009-11-18 Thread Moshe Olshansky

Dear list,

I have r towns, T1,...,Tr where town i has population Ni. For each town I 
randomly sampled Mi individuals and found that Ki of them have a certain 
property. So Pi = Ki/Mi is an unbiased estimate of the proportion of people in 
town i having that property and the weighted average of Pi is an unbiased 
estimate of the proportion of the entire population (all r towns) having this 
property.
I can compute confidence intervals for the proportion of people having that 
property for each city (in my case Mi  Ni and so binomial distribution is a 
good approximation to Ki).
My question is: how can I compute confidence interval for the proportion of 
people in the entire population (r towns) having that property? Either 
analytical or numerical (simulation?) method will be all right.

Thank you in advance,

Moshe.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Kolmogorov smirnov test

2009-10-12 Thread Moshe Olshansky

Hi Roslina,

I believe that you can ignore the warning.
Alternatively, you may add a very small random noise to pairs with ties, i.e. 
something like
xobs[which(duplicated(xobs))] - xobs[which(duplicated(xobs))] + 
1.0e-6*sd(xobs)*rnorm(length(which(duplicated(xobs

Regards,

Moshe.

--- On Tue, 13/10/09, Roslina Zakaria zrosl...@yahoo.com wrote:

 From: Roslina Zakaria zrosl...@yahoo.com
 Subject: [R] Kolmogorov smirnov test
 To: r-help@r-project.org
 Received: Tuesday, 13 October, 2009, 9:58 AM
 
 Hi r-users,
  
 I would like to use Kolmogorov smirnov test but in my
 observed data(xobs) there are ties.  I got the warning
 message.  My question is can I do something about it?
  
 ks.test(xobs, xsyn)
  
     Two-sample Kolmogorov-Smirnov test
 data:  xobs and xsyn 
 D = 0.0502, p-value = 0.924
 alternative hypothesis: two-sided 
 
 Warning message:
 In ks.test(xobs, xsyn) : cannot compute correct p-values
 with ties
  
 Thank you for all your help.
 
 
 
       
     [[alternative HTML version deleted]]
 
 
 -Inline Attachment Follows-
 
 __
 R-help@r-project.org
 mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained,
 reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] keeping all rows with the same values, and not only unique ones

2009-09-24 Thread Moshe Olshansky

test[which(test[,total] %in% needed),]

--- On Fri, 25/9/09, Dimitri Liakhovitski ld7...@gmail.com wrote:

 From: Dimitri Liakhovitski ld7...@gmail.com
 Subject: [R] keeping all rows with the same values, and not only unique ones
 To: R-Help List r-h...@stat.math.ethz.ch
 Received: Friday, 25 September, 2009, 8:52 AM
 Dear R-ers,

 I have a data frame test:
 test-data.frame(x=c(1,2,3,4,5,6,7,8),y=c(2,3,4,5,6,7,8,9),total=c(7,7,8,8,9,9,10,10))
 test

 I have a vector needed:
 needed-c(7,9)
 needed

 I need the result to look like this:
 1 2 7
 2 3 7
 5 6 9
 6 7 9

 When I do the following:
 result-test[test[total]==needed,]
 result

 I only get unique rows that have 7 or 9 in total:
 1 2 7
 6 7 9

 How could I keep ALL rows that have 7 or 9 in total

 Thanks a million!

 -- 
 Dimitri Liakhovitski
 Ninah.com
 dimitri.liakhovit...@ninah.com

 __
 R-help@r-project.org
 mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained,
 reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Basic population dynamics

2009-09-01 Thread Moshe Olshansky

Assuming that at the end all of them are dead, you can do the following: 

sum(deaths)-cumsum(deaths)

Regards,

Moshe.

--- On Wed, 2/9/09, Frostygoat frostyg...@gmail.com wrote:

 From: Frostygoat frostyg...@gmail.com
 Subject: [R] Basic population dynamics
 To: r-help@r-project.org
 Received: Wednesday, 2 September, 2009, 4:48 AM
 
 Hello,
 
 For insect mortality data I'm trying to get an R script
 that will take
 the data from the raw form and convert it to Lx (%
 survival) for a
 number of treatments.  The raw data has the number of
 days lived for
 each individual for the respective treatment.  Thus,
 for example, when
 R selects the data for a single treatment I end up with the
 following
 vectors:
 
 day=seq(from=0,to=6)
 deaths=c(0,0,2,0,0,1,6)
 
 where deaths is the number of deaths on a given day. Now I
 need to
 create a new vector with the number alive for each day and
 this is
 where I'm stuck... I've tried to work various for and while
 loops but
 haven't had success.  The vector should be:
 
 Alive=c(9,9,7,7,7,6,0)
 
 I realize it is a very basic problem that is easily
 accomplished in
 one's head or on a spreadsheet but in the context of the
 size of the
 data set I wish to have R do it for me. I would welcome
 any
 suggestions please.
 
 Best regards.
 
 __
 R-help@r-project.org
 mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained,
 reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Help on efficiency/vectorization

2009-08-27 Thread Moshe Olshansky

You can do

for (i in 1:ncol(x)) {names - 
rownames(x)[which(x[,i]==1)];eval(parse(text=paste(V,i,.ind-names,sep=)));}



--- On Thu, 27/8/09, Steven Kang stochastick...@gmail.com wrote:

 From: Steven Kang stochastick...@gmail.com
 Subject: [R] Help on efficiency/vectorization
 To: r-help@r-project.org
 Received: Thursday, 27 August, 2009, 4:13 PM
 Dear R users,
 
 I am trying to extract the rownames of a data set for which
 each columns
 meet a certain criteria. (condition - elements of each
 column to be equal
 1)
 
 I have the correct result, however I am seeking for more
 efficient (desire
 vectorization) way in implementing such problem as it can
 get quite messy if
 there are hundreds of columns.
 
 Arbitrary data set and codes are shown below for your
 reference:
 
 x - as.data.frame(matrix(round(runif(50),0),nrow=5))
 
 rownames(x) - letters[1:dim(x)[1]]
 
  x
   V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
 a  0  1   1   
 1   0   0   
 0   0   1    0
 b  1  1   1   
 1   0   1   
 0   0   1    1
 c  0  1   1   
 0   0   0   
 0   0   0    1
 d  1  0   0   
 1   1   1   
 1   1   0    0
 e  1  0   0   
 0   0   1   
 1   0   1    0
 
 V1.ind - rownames(x)[x[,V1]==1]
 V2.ind - rownames(x)[x[,V2]==1]
 V3.ind - rownames(x)[x[,V3]==1]
 V4.ind - rownames(x)[x[,V4]==1]
 :
 :
 V10.ind - rownames(x)[x[,V10]==1]
 
  V1.ind
 [1] b d e
  V2.ind
 [1] a b c
  V3.ind
 [1] a b c
 :
 :
  V10.ind
 [1] b c
 
 
 
 Your expertise in resolving this issue would be highly
 appreciated.
 
 
 Steve
 
     [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org
 mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained,
 reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Submit a R job to a server

2009-08-26 Thread Moshe Olshansky

Hi Deb,

Based on your last note (and after briefly looking at Rserve) I believe that 
you should install R with all the packages you need on the server and then use 
it like you are using any workstation, i.e. log in to it and do whatever you 
need.

Regards,

Moshe.

--- On Thu, 27/8/09, Debabrata Midya debabrata.mi...@commerce.nsw.gov.au 
wrote:

 From: Debabrata Midya debabrata.mi...@commerce.nsw.gov.au
 Subject: Re: [R] Submit a R job to a server
 To: Cedrick W. Johnson cedr...@cedrickjohnson.com, m_olshan...@yahoo.com
 Cc: r-help@r-project.org
 Received: Thursday, 27 August, 2009, 1:48 PM
 
  
  
  
 Cedrick / Moshe,
  
 Thank you very much for such a quick response.
  
 My objective is to do the faster calculations by
 submitting a R job from my desktop to this server.
  
 Oracle 8i Enterprise Edition is currently running on
 this server.
  
 My objective is not only limited to access various
 oracle tables from this server but also I like to
 utilise this server for faster calculations and I like to
 use R for this. So far, I did not install anything related
 to R into this server. 
  
 I have R installed on my desktop (Windows XP). I use
 RODBC / ODBC products to access data from this server and I
 use R / S-PLUS (installed on my desktop) to do the
 analyses.
  
 Is there any way to submit R job from my desktop to
 this server? How can I use this server to do my job
 faster?
 
 Once again, thank you very much for the time you have
 given.
   
  I am looking forward for your reply.
   
  Regards,
   
  Deb
 
  Cedrick W. Johnson
 cedr...@cedrickjohnson.com 27/08/2009 12:41 pm
 
 Good Morning Deb-
 
 It's unclear (to me at least) what you are trying to
 do.. What is the 
 server running? Is it running RServe for which
 you have a userid and 
 pwd or is it just a plain  server running
 some OS?
 
 *IF* this is the case (RServe):
 on the windows machine you will need to:
 
 install.packages(Rserve)
 library(Rserve)
 ?Rserve
 --and optionally--
 ?RSeval
 
 I *think* RSeval/Rserve *may* be what you're looking
 for, but I am going 
 off just an assumption by what you meant regarding
 server..
 
 Rserve can be used as a client and server on both platforms
 (I 
 personally have had more success running the server portion
 under linux, 
 with linux and java clients in which the clients are a
 mixture of the 
 package's java access *and* R client access to the
 rserver instance)
 
 More info on rserve at: http://www.rforge.net/Rserve
 
 HTH,
 cedrick
 
 
 Debabrata Midya wrote:
  Dear R users,
   
  Thanks in advance.
 
  I am Deb, Statistician at NSW Department of Commerce,
 Sydney.
   
  I am using R 2.9.1 on Windows XP.
   
  May I request you to provide me information on the
 following:
   
  1. I have access to a server ( I have userid and pwd)
   
  2. What are the packages I need to submit a job from
 Windows XP to this server? Should I need to install any
 package on this server before submitting a job?
   
  Once again, thank you very much for the time you have
 given.
   
  I am looking forward for your reply.
   
  Regards,
   
  Deb
   
 
 
 
 **
 
  This email message, including any attached files, is
 confidential and
  intended solely for the use of the individual or
 entity to whom it is
  addressed. 
 
  The NSW Department of Commerce prohibits the right to
 publish, 
  copy, distribute or disclose any information contained
 in this email, 
  or its attachments, by any party other than the
 intended recipient. 
  If you have received this email in error please notify
 the sender and
  delete it from your system.
 
  No employee or agent is authorised to conclude any
 binding 
  agreement on behalf of the NSW Department of Commerce
 by email. The
  views or opinions presented in this email are solely
 those of the author
  and do not necessarily represent those of the
 Department, 
  except where the sender expressly, and with authority,
 states them to be
  the views of NSW Department of Commerce.  
 
  The NSW Department of Commerce accepts no liability
 for any loss or
  damage arising from the use of this email and
 recommends that the
  recipient check this email and any attached files for
 the presence of
  viruses. 
 
 
 **
 
  [[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained,
 reproducible code.
    
 
 
 
 **
 
 This email message, including any attached files, is
 confidential and
 intended solely for the use of the individual or entity to
 whom it is
 addressed. 
 
 The NSW Department of Commerce prohibits the

Re: [R] expanding 1:12 months to Jan:Dec

2009-08-20 Thread Moshe Olshansky

One possible (but not very elegant) solution is:

 aa - paste(1:12,:10:2009,sep=)
 dd-as.Date(aa,format=%m:%d:%Y)
 mon - format(dd,%b)
 mon
 [1] Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec


--- On Thu, 20/8/09, Liviu Andronic landronim...@gmail.com wrote:

 From: Liviu Andronic landronim...@gmail.com
 Subject: [R] expanding 1:12 months to Jan:Dec
 To: r-help@r-project.org Help r-help@r-project.org
 Received: Thursday, 20 August, 2009, 5:14 PM
 Dear R users
 I would like to do some spreadsheet style expansion of
 dates. For
 example, I would need to obtain a vector of months. I
 approached in an
 obviously wrong way:
  paste(01:12)
  [1] 1  2  3  4  5 
 6  7  8  9  10 11 12
  as.Date(paste(01:12), %m)
  [1] NA NA NA NA NA NA NA NA NA NA NA NA
 
 to subsequently format(.., %b). Other than writing the
 months
 manually, could anyone suggest an easier way to obtain such
 a list?
 Liviu
 
 
 
 
 -- 
 Do you know how to read?
 http://www.alienetworks.com/srtest.cfm
 Do you know how to write?
 http://garbl.home.comcast.net/~garbl/stylemanual/e.htm#e-mail
 
 __
 R-help@r-project.org
 mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained,
 reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Principle components analysis on a large dataset

2009-08-20 Thread Moshe Olshansky

Hi Misha,

Since PCA is a linear procedure and you have only 6000 observations, you do not 
need 68000 variables. Using any 6000 of your variables so that the resulting 
6000x6000 matrix is non-singular will do. You can choose these 6000 variables 
(columns) randomly, hoping that the resulting matrix is non-singular (and 
checking for this). Alternatively, you can try something like choosing one 
nice column, then choosing the second one which is the mostly orthogonal to 
the first one (kind of Gram-Schmidt), then choose the third one which is mostly 
orthogonal to the first two, etc. (I am not sure how much rounoff may be a 
problem- try doing this using higher precision if you can). Note that you do 
not need to load the entire 6000x68000 matrix into memory (you can load several 
thousands of columns, process them and discard them).
Anyway, you will end up with a 6000x6000 matrix, i.e. 36,000,000 entries, which 
can fit into a memory and you can perform the usual PCA on this matrix.

Good luck!

Moshe.

P.S. I am curious to see what other people think.

--- On Fri, 21/8/09, misha680 mk144...@bcm.edu wrote:

 From: misha680 mk144...@bcm.edu
 Subject: [R]  Principle components analysis on a large dataset
 To: r-help@r-project.org
 Received: Friday, 21 August, 2009, 10:45 AM
 
 Dear Sirs:
 
 Please pardon me I am very new to R. I have been using
 MATLAB.
 
 I was wondering if R would allow me to do principal
 components analysis on a
 very large
 dataset.
 
 Specifically, our dataset has 68800 variables and around
 6000 observations.
 Matlab gives out of memory errors. I have tried also
 doing princomp in
 pieces, but this does not seem to quite work for our
 approach.
 
 Anything that might help much appreciated. If anyone has
 had experience
 doing this in R much appreciated.
 
 Thank you
 Misha
 -- 
 View this message in context: 
 http://www.nabble.com/Principle-components-analysis-on-a-large-dataset-tp25072510p25072510.html
 Sent from the R help mailing list archive at Nabble.com.
 
 __
 R-help@r-project.org
 mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained,
 reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] extra .

2009-08-20 Thread Moshe Olshansky

My guess is that 6. comes for 6.0 - something which comes from programming 
languages where 6 represents 6 as integer while 6. (or 6.0) represents 6 as 
floating point number.

--- On Fri, 21/8/09, kfcnhl zhengchenj...@hotmail.com wrote:

 From: kfcnhl zhengchenj...@hotmail.com
 Subject: [R]  extra .
 To: r-help@r-project.org
 Received: Friday, 21 August, 2009, 12:34 PM

 sigma0 - sqrt((6. * var(maxima))/pi)

 What does the '.' do here?
 -- 
 View this message in context: 
 http://www.nabble.com/extra-.-tp25073255p25073255.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org
 mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained,
 reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] feature weighting in randomForest

2009-08-17 Thread Moshe Olshansky

Hi Tim,

As far as I know you can not weigh predictors (and I believe that you really 
should not). You may weigh classes (and, in a sense, cases), but this is an 
entirely different issue.

--- On Wed, 5/8/09, Häring, Tim (LWF) tim.haer...@lwf.bayern.de wrote:

 From: Häring, Tim (LWF) tim.haer...@lwf.bayern.de
 Subject: [R] feature weighting in randomForest
 To: r-help@r-project.org
 Received: Wednesday, 5 August, 2009, 5:59 PM
 Hello !

 I´m using randomForest for classifacation problems. My
 dataset has 21.000 observations and 96 predictors. I know
 that some predictors of my dataset have more influence to
 classify my data than others.
 Therefore I would like to know if there is a way to weight
 my predictors. I know that for constructing each tree in a
 forest the most influencial predictor is used for
 partitioning the data. But maybe it would have an effect if
 I weight my predictors. 

 Thanks in advance

 TIM

 ---

 Tim Haering
 Bavarian State Institute of Forest Research 
 Department of Forest Ecology
 Am Hochanger 11
 D-85354 Freising

 http://www.lwf.bayern.de

 __
 R-help@r-project.org
 mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained,
 reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] System is computationally singular and scale of covariates

2009-08-16 Thread Moshe Olshansky

Hi,

What do you mean by outer product?

If you have two vectors, say x and y, of lenght n and you define matrix A by 
A(i,j) = x(i)*y(j) then your matrix has rank one and it is VERY singular (in 
exact arithmetics).

Is this is what you mean by outer product?

--- On Sun, 16/8/09, Stephan Lindner lindn...@umich.edu wrote:

 From: Stephan Lindner lindn...@umich.edu
 Subject: [R] System is computationally singular and scale of covariates
 To: r-h...@stat.math.ethz.ch
 Received: Sunday, 16 August, 2009, 9:15 AM
 Dear all,
 
 
 I'm running a self-written numerical optimization routine
 (hazard
 model) which includes computing the inverse of the outer
 product of
 the score. I have been getting the above error message
 (System is
 computationally singular), and after some tweaking, I
 realized that
 these variables have some high numbers and the problem
 could be
 circumvented by scaling them down (i.e. dividing them by
 100 or taking
 log). 
 
 Since this is obviously not the best procedure, and since I
 have to
 estimate more complex models down the rode, I would like to
 understand
 better the reason which causes this problem. It is not a
 multicollinearity issue (I get the error even when using
 one single
 variable), and I think my code is clean (better be
 paranoid
 though). My sense is that the outer product just becomes
 large, and
 these are hard to invert. Maybe there are restrictions
 concering R in
 the size of the numbers? If that is the case, I think I
 would fare
 better scaling down the outer product rather than the
 variable itself,
 but I first wanted to ask the community to get and
 understanding of
 what could be the problem. 
 
 
 Thanks a lot,
 
 
 
     Stephan Lindner
 
 
 
 
 
 
 
 
 -- 
 ---
 Stephan Lindner
 University of Michigan
 
 __
 R-help@r-project.org
 mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained,
 reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] problem about t test

2009-08-13 Thread Moshe Olshansky

You could do the following:

y - apply(dat,1,function(a) t.test(a[1:10],a[11:30])$p.value)

This will produce an array of 2 p-values.

--- On Fri, 14/8/09, Gina Liao yi...@hotmail.com wrote:

 From: Gina Liao yi...@hotmail.com
 Subject: [R] problem about t test
 To: r-h...@stat.math.ethz.ch
 Received: Friday, 14 August, 2009, 12:53 PM

 Hello,
 I have a data frame str(dat)'data.frame':  2
 obs. of 30 variables
 it contains two information-two types of cancers:stage A(A1
 to A10) and stage B(B1 to B20)  ##totally 30
 patients-2 sets of gene expression
 I'd like to find the lists for top 20 differentially
 expressed genes using t-test (by P-value).
 Here is my code, unfortunately it doesn't work...I need the
 help,please. I just learned R for two weeks, and hope you
 can give the hint!
  A-dat[,1:10]  B-dat[,11:30]
 bb-function(t.test)+ { P.value.to.return -
 t.test(A,B)$p.value+ return(P.value.to.return)+ }
 aa-t(apply(dat,1,bb))
 Thanks!!
 Best Regards,vie

 _

     [[alternative HTML version deleted]]

 __
 R-help@r-project.org
 mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained,
 reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Solutions of equation systems

2009-08-13 Thread Moshe Olshansky

Is your system of equations linear?

--- On Fri, 14/8/09, Moreno Mancosu nom...@tiscali.it wrote:

 From: Moreno Mancosu nom...@tiscali.it
 Subject: [R] Solutions of equation systems
 To: r-help@r-project.org
 Received: Friday, 14 August, 2009, 2:29 AM
 Hello all!

 Maybe it's a newbie question(in fact I _am_, a newbie), but
 I hope you'll find the solution.
 I have an equation system which has k equation and n
 variables (kn).
 I would like to obtain a description of the solutions
 (which can be the equation of lines or a plane, or
 everything else) with the lesser degree of freedom,
 obviously using R.
 In other words, I would like to obtain a result which is
 very similar to the results of Maxima or similar software.
 I tried to search on internet and i found the systemfit
 package, but I think it is not related with my problem.
 Any idea?

 Thanks in advance,

 Moreno

 __
 R-help@r-project.org
 mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained,
 reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Counting the number of non-NA values per day

2009-08-12 Thread Moshe Olshansky

Try 

tempFun - function(x) sum(!is.na(x))
nonZeros -  aggregate(pollution[pol],format(pollution[date],%Y-%j), FUN 
= tempFun)

--- On Wed, 12/8/09, Tim Chatterton tim.chatter...@uwe.ac.uk wrote:

 From: Tim Chatterton tim.chatter...@uwe.ac.uk
 Subject: [R] Counting the number of non-NA values per day
 To: r-help@r-project.org
 Received: Wednesday, 12 August, 2009, 3:26 AM
 I have a long dataframe (pollution)
 that contains a column of hourly date information (date)
 and  a column of  pollution  measurements
 (pol)
 
 I have been happily calculating daily means and daily
 maximums using the aggregate function
 
 DMEANpollution =  aggregate(pollution[pol],
 format(pollution[date],%Y-%j), mean, na.rm = TRUE)
 DMAXpollution =  aggregate(pollution[pol],
 format(pollution[date],%Y-%j), max, na.rm = TRUE)
 
 However, I also need to count the number of valid
 measurements for each day to check that the mean and max are
 likely to be valid (for example I need at least 18 hourly
 measurements to calculate a valid daily mean)
 
 Try as I might I have not found a simple way of doing
 this.
 Can anybody help please?
 
 Many thanks,
 Tim.
 
 -- 
 
 __
 
 Dr Tim Chatterton
 Senior Research Fellow
 Air Quality Management Resource Centre
 Faculty of Environment and Technology
 University of the West of England
 Frenchay Campus
 Bristol
 BS16 1QY
 
 Tel: 0117 328 2929
 Fax: 0117 328 3360
 Email: tim.chatter...@uwe.ac.uk
 
 __
 R-help@r-project.org
 mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained,
 reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Re : How to Import Excel file into R 2.9.0 version

2009-08-12 Thread Moshe Olshansky

Alternatively download the xlsReadWrite package from 
http://treetron.googlepages.com/ 
install it an proceed as in older version of R.

--- On Tue, 11/8/09, Inchallah Yarab inchallahya...@yahoo.fr wrote:

 From: Inchallah Yarab inchallahya...@yahoo.fr
 Subject: [R] Re :  How to Import Excel file into R 2.9.0 version
 To: r-help@r-project.org
 Received: Tuesday, 11 August, 2009, 10:32 PM

 i advice you to save the file under csv and then use 
 read.csv2(c:/file.csv,sep=,)
 hope that helps!!!

 inchallahyarab

 2009/8/11 rajclinasia r...@clinasia.com

  Hi Every one,
  I have a problem with Reading Excel file into R 2.9.0
 version. In older
  versions it is working with xlsReadWrite package.
 But in 2.9.0 version
  there is no package like that. so help me out in this
 aspect.

  Thanks in Advance.
  --
  View this message in context:
  http://www.nabble.com/How-to-Import-Excel-file-into-R-2.9.0-version-tp24914638p24914638.html
  Sent from the R help mailing list archive at
 Nabble.com.

  __
  R-help@r-project.org
 mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained,
 reproducible code.

     [[alternative HTML version deleted]]

 __
 R-help@r-project.org
 mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained,
 reproducible code.

     [[alternative HTML version deleted]]

 -Inline Attachment Follows-

 __
 R-help@r-project.org
 mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained,
 reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Matrix Integral

2009-08-12 Thread Moshe Olshansky

Hi,

Is your matrix K symmetric? If yes, there is an analytical solution.

--- On Sat, 1/8/09, nhawrylyshyn nichlas.hawrylys...@gmail.com wrote:

 From: nhawrylyshyn nichlas.hawrylys...@gmail.com
 Subject: [R]  Matrix Integral
 To: r-help@r-project.org
 Received: Saturday, 1 August, 2009, 12:15 AM

 Hi,

 Any help on this would be appreciated:

 I need to integrate where K is a 4x4 matrix, and SIGMA is a
 4x4 matrix from
 say a to b, i.e. 0 to 5:

 integral  MatrixExp(-K * s) %*% SIGMA %*% t(SIGMA) %*%
 MatrixExp(t(-K) s) ds

 t is tranpose , %*% : matrix mult , MatrixExp : matrix
 exponential

 I've use integrate before on univariate functions like f(x)
 = x^2 which is
 fine but when doing this on a matrix I run into problems.
 All I intuitively
 need to do is do this element by element.

 Thanks,

 NH.

 -- 
 View this message in context: 
 http://www.nabble.com/Matrix-Integral-tp24757170p24757170.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org
 mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained,
 reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Sampling of non-overlapping intervals of variable length

2009-07-19 Thread Moshe Olshansky


Another possibility, if the total length of your intervals is small in 
comparison to the big interval is to choose the starting points of all your 
intervals randomly and to dismiss the entire set if some of the intervals 
overlap. Most probably you will not have too many such cases (assuming, as 
stated above, that the total length of all the intervals is a small proportion 
of the length of the big interval).

--- On Mon, 20/7/09, Hadassa Brunschwig hadassa.brunsch...@mail.huji.ac.il 
wrote:

 From: Hadassa Brunschwig hadassa.brunsch...@mail.huji.ac.il
 Subject: Re: [R] Sampling of non-overlapping intervals of variable length
 To: Charles C. Berry cbe...@tajo.ucsd.edu
 Cc: r-help@r-project.org
 Received: Monday, 20 July, 2009, 3:08 PM
 Thanks Chuck.
 
 Ups, did not think of the problem in that way.
 That did exactly what I needed.  I have another
 complication to this problem:
 I do not only have one vector of 1:1e^6 but several vectors
 of
 different length, say 5.
 Initially, my intervals are distributed over those 5
 vectors and the
 ranges of those
 5 vectors in a specific way (and you might have guessed by
 now that I would like
 to do something like a permutation test). Because I have
 this
 additional level, I guess
 I could do something like:
 
 1)Sample the 5 vectors with probabilities proportional to
 the
 frequencies of the intial
 intervals on these vectors.
 2)For each sampled vector: apply Chucks solution.
 
 ?
 Thanks a lot.
 Hadassa
 
 On Sun, Jul 19, 2009 at 11:13 PM, Charles C. Berrycbe...@tajo.ucsd.edu
 wrote:
  On Sun, 19 Jul 2009, Hadassa Brunschwig wrote:
 
  Hi
 
  I am not sure what you mean by sampling an index
 of a group of
  intervals. I will try to give an example:
  Let's assume I have a vector 1:100. Let's say
 I have 10 intervals
  of different but known length, say,
  c(4,6,11,2,8,14,7,2,18,32). For simulation
 purposes I have to sample
  those 10 intervals 1000 times.
  The requirement is, however, that they should be
 of those lengths and
  should not be overlapping.
  In short, I would like to obtain a 10x1000 matrix
 with sampled intervals.
 
  Something like this:
 
 
  lens - c(4,6,11,2,8,14,7,2,18,32)
  perm.lens - sample(lens)
 
 
 sort(sample(1e06-sum(lens)+length(lens),length(lens)))+cumsum(c(0,head(perm.lens,-1)))
 
   [1]  15424 261927 430276 445976 451069 546578
 656123 890494 939714 969643
 
 
  The vector above gives the starting points for the
 intervals whose lengths
  are perm.lens.
 
  I'd bet every introductory combinatorics book has a
 problem or example in
  which the expression for the number of ways in which K
 ordered objects can
  be assigned to I groups consisting of n_i adjacent
 objects each is
  constructed. The construction is along the lines of
 the calculation above.
 
  HTH,
 
  Chuck
 
 
 
  Thanks
  Hadassa
 
  On Sun, Jul 19, 2009 at 9:48 PM, David
 Winsemiusdwinsem...@comcast.net
  wrote:
 
  On Jul 19, 2009, at 1:05 PM, Hadassa
 Brunschwig wrote:
 
  Hi,
 
  I hope I am not repeating a question which
 has been posed already.
  I am trying to do the following in the
 most efficient way:
  I would like to sample from a finite
 (large) set of integers n
  non-overlapping
  intervals, where each interval i has a
 different, set length L_i
  (which is the number
  of integers in the interval).
  I had the idea to sample recursively on a
 vector with the already
  chosen intervals
  discarded but that seems to be too
 complicated.
 
  It might be ridiculously easy if you sampled
 on an index of a group of
  intervals.
  Why not pose the question in the form of
 example data.frames or other
  classes of R objects? Specification of the
 desired output would be
  essential. I think further specification of
 the sampling strategy would
  also
  help because I am unable to understand what
 sort of probability model you
  are hoping to apply.
 
  Any suggestions on that?
 
  Thanks a lot.
 
  Hadassa
 
 
  --
  Hadassa Brunschwig
  PhD Student
  Department of Statistics
 
 
  David Winsemius, MD
  Heritage Laboratories
  West Hartford, CT
 
 
 
 
 
  --
  Hadassa Brunschwig
  PhD Student
  Department of Statistics
  The Hebrew University of Jerusalem
  http://www.stat.huji.ac.il
 
  __
  R-help@r-project.org
 mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained,
 reproducible code.
 
 
  Charles C. Berry                        
    (858) 534-2098
                                     
        Dept of Family/Preventive
  Medicine
  E mailto:cbe...@tajo.ucsd.edu
               UC San Diego
  http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla,
 San Diego 92093-0901
 
 
 
 
 
 
 -- 
 Hadassa Brunschwig
 PhD Student
 Department of Statistics
 The Hebrew University of Jerusalem
 http://www.stat.huji.ac.il
 
 __

Re: [R] searching for elements

2009-07-15 Thread Moshe Olshansky

?outer

--- On Thu, 16/7/09, Chyden Finance fina...@chyden.net wrote:

 From: Chyden Finance fina...@chyden.net
 Subject: [R] searching for elements
 To: r-help@r-project.org
 Received: Thursday, 16 July, 2009, 3:00 AM
 Hello!

 For the past three years, I have been using R extensively
 in my PhD program in Finance for statistical work. 
 Normally, I can figure this kind of thing out on my own, but
 I am completely stumped as to why the following code does
 not work.

 I have two variables: sigs and p0_recent.

 dim(sigs) = 296 3
 dim(p0_recent) = 504 7

 I want to compare each element in the first column of sigs
 with the elements in the first column of p0_recent.

 In other words, does sigs[i,1] == p0_recent[j,1], where i =
 1:dim(sigs)[1] and j = 1:dim(p0_recent)[1].

 I've been trying:

  for (j in 1:dim(p0_recent)[1]) {
 + for (i in 1:dim(sigs)[1]) {
 + if (sigs[i,1] == p0_recent[j,1]) {
 + print(sigs[i,1])
 + }}}

 But, I get:

 Error in Ops.factor(sigs[i, 1], p0_recent[j, 1]) :
   level sets of factors are different

 Is there a better way than for loops to compare each
 element in one column to each element in another column of
 different length?  If not, can anyone suggest a
 solution?

 CWE

 __
 R-help@r-project.org
 mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained,
 reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Nested for loops

2009-07-14 Thread Moshe Olshansky


Make it
for (i in 1:9)

This is not the general solution, but in your case when i=10 you do not want to 
do anything.

--- On Tue, 14/7/09, Michael Knudsen micknud...@gmail.com wrote:

 From: Michael Knudsen micknud...@gmail.com
 Subject: [R] Nested for loops
 To: r-help@r-project.org
 Received: Tuesday, 14 July, 2009, 3:38 PM
 Hi,
 
 I have spent some time locating a quite subtle (at least in
 my
 opinion) bug in my code. I want two nested for loops
 traversing the
 above-diagonal part of a square matrix. In pseudo code it
 would
 something like
 
 for i = 1 to 10
 {
    for j = i+1 to 10
    {
       // do something
    }
 }
 
 However, trying to do the same in R, my first try was
 
 for (i in 1:10)
 {
    for (j in (i+1):10)
    {
        // do something
    }
 }
 
 but there's a problem here. For i=10, the last for loop is
 over 11:10.
 Usually programming laguages would regard what corresponds
 to 11:10 as
 empty, but A:B with A bigger than B is in R interpreted as
 the numbers
 from B to A in reverse order.
 
 Is there a clever way to make nested loops like the one
 above in R?
 
 -- 
 Michael Knudsen
 micknud...@gmail.com
 http://lifeofknudsen.blogspot.com/
 
 __
 R-help@r-project.org
 mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained,
 reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] ifultools on ppc debian

2009-07-14 Thread Moshe Olshansky


Hi Stephen,

The error message clearly says what is wrong.
Big Endian and Little Endian are two ways of storing data (mostly often double 
precision numbers) in memory. A double precision number occupies two blocks of 
4 bytes each. On Big Endian machines (most machines which are not Intel) if the 
memory address of a double precision number is X then it's sign, exponent and 
more significant part of the mantissa start at address X and the less 
significant part of the mantissa starts at X+4, while on Little Endian machines 
(x86 and alike) the less significant part of the mantissa starts at X and the 
sign, exponent and most significant part of the mantissa start at X+4.
In order to get better performance some programs directly manipulate the bit 
pattern of the number and to do this correctly they must know whether they are 
run on a Big Endian or Little Endian machine. So your program is apparently 
looking at /usr/include/bits/endian.h to check this. You should edit this file 
and comment (or #undefine) the Endian which is NOT what you got. 

Good luck!

Moshe.

--- On Wed, 15/7/09, stephen sefick ssef...@gmail.com wrote:

 From: stephen sefick ssef...@gmail.com
 Subject: [R] ifultools on ppc debian
 To: r-help@r-project.org
 Received: Wednesday, 15 July, 2009, 12:04 PM
 I have tried to compile this from
 source.  I don't know what Endianess
 is, but it is probably not debian power pc.  Am I
 would of luck with
 this package?
 
 Stephen Sefick
 
 * Installing *source* package ‘ifultools’ ...
 ** libs
 gcc -std=gnu99 -I/usr/local/lib/R/include
 -I../inst/include/
 -DMUTIL_STATIC -DDEF_TF -DINTERRUPT_ENABLE
 -DNDEBUG
 -DUSE_RINTERNALS -I/usr/local/include   
 -fpic  -g -O2 -c
 RS_fra_dim.c -o RS_fra_dim.o
 In file included from /usr/include/endian.h:37,
              
    from /usr/include/sys/types.h:217,
              
    from /usr/include/stdlib.h:320,
              
    from /usr/local/lib/R/include/R.h:28,
              
    from ../inst/include/ut_type.h:18,
              
    from ../inst/include/fra_dim.h:9,
              
    from RS_fra_dim.c:20:
 /usr/include/bits/endian.h:27:4: error: #error Both
 BIG_ENDIAN and
 LITTLE_ENDIAN defined!
 make: *** [RS_fra_dim.o] Error 1
 ERROR: compilation failed for package ‘ifultools’
 * Removing ‘/home/ssefick/ifultools.Rcheck/ifultools’
 
 -- 
 Stephen Sefick
 
 Let's not spend our time and resources thinking about
 things that are
 so little or so large that all they really do for us is
 puff us up and
 make us feel like gods.  We are mammals, and have not
 exhausted the
 annoying little problems of being mammals.
 
            
            
         -K. Mullis
 
 __
 R-help@r-project.org
 mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained,
 reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Grouping data in dataframe

2009-07-14 Thread Moshe Olshansky

Try ?aggregate

--- On Wed, 15/7/09, Timo Schneider timo.schnei...@s2004.tu-chemnitz.de wrote:

 From: Timo Schneider timo.schnei...@s2004.tu-chemnitz.de
 Subject: [R] Grouping data in dataframe
 To: r-help@r-project.org r-help@r-project.org
 Received: Wednesday, 15 July, 2009, 1:56 PM
 Hello,

 I have a dataframe (obtained from read.table()) which looks
 like

    ExpA   ExpB   ExpC   Size
 1      12     23 
   33      1
 2      12     24 
   29      1
 3      10     22 
   34      1
 4      25     50 
   60      2
 5      24     53 
   62      2
 6      21     49 
   61      2

 now I want to take all rows that have the same value in the
 Size
 column and apply a function to the columns of these rows
 (for example
 median()). The result should be a new dataframe with the
 medians of the
 groups, like this:

    ExpA   ExpB   ExpC   Size
 1      12     23 
   34      1
 2      24     50 
   61      2

 I tried to play with the functions by() and tapply() but I
 didn't get
 the results I wanted so far, so any help on this would be
 great!

 The reason why I am having this problem: (I explain this
 just to make
 sure I don't do something against the nature of R.)

 I am doing 3 simillar experiments, A,B,C and I change a
 parameter in the
 experiment (size). Every experiment is done multiple times
 and I need
 the median or average over all experiments that are the
 same. Should I
 preprocess my data files so that they are completely
 different? Or is it
 fine the way it is and I just overlooked the simple
 solution to the
 problem described above?

 Regards,
 Timo

 __
 R-help@r-project.org
 mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained,
 reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] averaging two matrices whilst ignoring missing values

2009-07-13 Thread Moshe Olshansky


One (awkward) way to do this is:

x - matrix(c(c(test),c(test2)),ncol=2)
y - rowMeans(x,na.rm=TRUE)
testave - matrix(y,nrow=nrow(test))

--- On Tue, 14/7/09, Tish Robertson tishrobert...@hotmail.com wrote:

 From: Tish Robertson tishrobert...@hotmail.com
 Subject: [R] averaging two matrices whilst ignoring missing values
 To: r-help@r-project.org
 Received: Tuesday, 14 July, 2009, 11:46 AM
 
 Hi folks,
 
  
 
 I'm trying to do something that seems like it should easy,
 but it apparently isn't.  I have two large matrices,
 both containing a number of missing values at different
 cells. I would like to average these matrices, but the NAs
 are preventing me. I get a non-numeric argument to binary
 operator error.  That's the first problem.
 
  
 
 test-read.csv(test.csv,header=FALSE)
 test2-read.csv(test2.csv,header=FALSE)
 test_ - as.matrix(test,na.rm=T) 
 test2_ - as.matrix(test2,na.rm=T) 
 testave- (test_+test2_)/2
 
 ??
 
  
 
 So off the bat I'm doing something wrong.
 
  
 
 How would I replace the missing values in one matrix with
 the corresponding non-missing values in another?  It's
 acceptable to me if I only have one value representing the
 average for a particular coordinate.
 
  
 
 Any help would be appreciated!  
 
  
 
  
 
  
 
 _
 Bing™ finds low fares by predicting when to book. Try it
 now.
 
 =WLHMTAGcrea=TXT_MTRHPG_Travel_Travel_TravelDeals_1x1
     [[alternative HTML version deleted]]
 
 
 -Inline Attachment Follows-
 
 __
 R-help@r-project.org
 mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained,
 reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] how to keep row name if there is only one row selected from a data frame

2009-07-12 Thread Moshe Olshansky

Try A[1,,drop=FALSE] - see help(\[)

--- On Mon, 13/7/09, Weiwei Shi helprh...@gmail.com wrote:

 From: Weiwei Shi helprh...@gmail.com
 Subject: [R] how to keep row name if there is only one row selected from a 
 data frame
 To: r-h...@stat.math.ethz.ch r-h...@stat.math.ethz.ch
 Received: Monday, 13 July, 2009, 1:55 PM
 Hi, there:

 Assume I have a dataframe with rownames like A with
 rownames like a to e,

  A
   [,1] [,2]
 a    1    6
 b    2    7
 c    3    8
 d    4    9
 e    5   10

 when I use A[1,], I lost the rowname for it, like below.
 How could I keep
 it? Is there an easy way instead that I have to modify by
 myself after I
 used A[1,] manually.

  A[1,]
 [1] 1 6

 Thanks,

 W.

 -- 
 Weiwei Shi, Ph.D
 Research Scientist
 GeneGO, Inc.

 Did you always know?
 No, I did not. But I believed...
 ---Matrix III

     [[alternative HTML version deleted]]

 __
 R-help@r-project.org
 mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained,
 reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Extracting a column name in loop?

2009-07-09 Thread Moshe Olshansky

If df is your dataframe then names(df) contains the column names and so 
names(df)[i] is the name of i-th column.

--- On Thu, 9/7/09, mister_bluesman mister_blues...@hotmail.com wrote:

 From: mister_bluesman mister_blues...@hotmail.com
 Subject: [R]  Extracting a column name in loop?
 To: r-help@r-project.org
 Received: Thursday, 9 July, 2009, 1:41 AM

 Hi,

 I am writing a script that will address columns using
 syntax like: 

 data_set[,1] 

 to extract the data from the first column of my data set,
 for example. This
 code will be placed in a loop (where the column reference
 will be placed by
 a variable). 

 What I also need to do is extract the column NAME for a
 given column being
 processed in the loop. The dataframe has been set so that R
 knows that the
 top line refers to column headers. 

 Can anyone help me understand how to do this?

 Thanks.
 -- 
 View this message in context: 
 http://www.nabble.com/Extracting-a-column-name-in-loop--tp24393160p24393160.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org
 mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained,
 reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] print() to file?

2009-07-09 Thread Moshe Olshansky

One possibility is to use sink (see ?sink).

--- On Thu, 9/7/09, Steve Jaffe sja...@riskspan.com wrote:

 From: Steve Jaffe sja...@riskspan.com
 Subject: [R]  print() to file?
 To: r-help@r-project.org
 Received: Thursday, 9 July, 2009, 5:03 AM

 I'd like to write some objects (eg arrays) to a log file.
 cat() flattens them
 out. I'd like them formatted as in 'print' but print only
 writes to stdout.
 Is there a simple way to achieve this result? 

 Thanks

 -- 
 View this message in context: 
 http://www.nabble.com/print%28%29-to-file--tp24397445p24397445.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org
 mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained,
 reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Substituting numerical values using `apply'

2009-07-09 Thread Moshe Olshansky


Let M be your matrix.

Do the following:

B - t(matrix(colnames(a),nrow=ncol(M),ncol=nrow(M)))
B[M==0] - NA

--- On Thu, 9/7/09, Olivella olive...@wustl.edu wrote:

 From: Olivella olive...@wustl.edu
 Subject: [R]  Substituting numerical values using `apply'
 To: r-help@r-project.org
 Received: Thursday, 9 July, 2009, 6:25 AM
 
 Hello,
 
 I wish to perform a substitution of certain numerical
 values in a data
 matrix with the corresponding column name. For instance, if
 I have a data
 matrix
 V1  V2  V3
 2    0    1
 0    1    2 
 1    5    0
 5    0    0 
 
 I want to substitute the `1' and the `2' for the
 corresponding column name,
 and make everything else `NA' like this
 V1    V2    V3
 V1    NA    V3
 NA    V2    V3 
 V1    NA    NA
 NA    NA    NA 
 
 I have done this using an explicit `for' loop, but it takes
 a really long
 time to finish. Is there any way I can do this using
 `apply' or some form of
 implicit looping?
 
 Thank you for your help,
 
 SO
 -- 
 View this message in context: 
 http://www.nabble.com/Substituting-numerical-values-using-%60apply%27-tp24398687p24398687.html
 Sent from the R help mailing list archive at Nabble.com.
 
 __
 R-help@r-project.org
 mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained,
 reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] naming of columns in R dataframe consisting of mixed data (alphanumeric and numeric)

2009-07-09 Thread Moshe Olshansky


Hi Mary,

Your data.frame has just one column (not 2)! You can check this by 
dim(tresult2).
What appears to you to be the first column (names) are indeed rownames.
If you really want to have two columns do something like 
tresult2 - cbind(colnames(tresult),data.frame(t(tresult),row.names=NULL))
Now tresult2 contains 2 columns and you can proceed with  
names(tresults2)-c(Statistic , Value)

--- On Fri, 10/7/09, Mary A. Marion mms...@comcast.net wrote:

 From: Mary A. Marion mms...@comcast.net
 Subject: [R] naming of columns in R dataframe consisting of mixed data 
 (alphanumeric and numeric)
 To: r-help@r-project.org
 Received: Friday, 10 July, 2009, 3:50 AM
 Hello,
 
 I have an r function that creates the following dataframe
 tresults2.
 Notice that column 1 does not have a column heading.
 
 Tresults2:
              
    [,1]
 estparam     18.0
 nullval      20.0
 . . .
 ciWidth       2.04622
 HalfInterval  1.02311
 
 pertinent code:
 results-cbind( estparam, nullval, t, pv_left, pv_right,
 pv_two_t, 
 estse, df,  cc, tbox, llim, ulim, ciWidth,
 HalfInterval)
 tresults-round((results),5)
 tresults2-data.frame(t(tresults))
 names(tresults2)-c(Statistic , Value)
 tresults2
 
 ===
 After transposing my dataframe tresults2 consists of 2
 columns.  
 Col1=alphanumeric data (this really is a variable name)
 and
 col2=numeric data (this is value of varaiable).
 
 how do I name columns when columns are different
 (alphanumeric and numeric)
 to avoid the following error:
 
 Error in names(tresults2) - c(Statistic , Value) :
   'names' attribute [2] must be the same length as the
 vector [1]
 
 Am I to use c( , ) or list( , ) with a dataframe?
 
 Thank you for your help.
 Sincerely,
 Mary A. Marion
 
 
 
 
 
     [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org
 mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained,
 reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Counting the number of cycles in a temperature test

2009-07-07 Thread Moshe Olshansky


Hi Antje,

Are your measurements taken every minute (i.e. 30 minutes correspond to 30 
consecutive values)?
How fast is your transition?
If you had 30 minures of upper temperature, then 1000 minutes of room 
temperature and then 30 minutes of lower temperature - would you count this as 
a cycle?
Can a cycle be 30 minutes of lower temperature followed by 30 minutes of upper 
temperature?

--- On Mon, 6/7/09, Steller, Antje (GQL-LM) antje.stel...@volkswagen.de wrote:

 From: Steller, Antje (GQL-LM) antje.stel...@volkswagen.de
 Subject: [R] Counting the number of cycles in a temperature test
 To: r-help@r-project.org
 Received: Monday, 6 July, 2009, 9:14 PM
 Hello dear R-users,
 today I have a question that I completely do not know how
 to solve (R-newbie!).
 
 In a temperature chamber I have measured temperature over
 time. The result is shown in the attached eps-file (if
 attachments are shown): There are two temperature levels,
 150°C and -40°C. A complete cycle includes about 30
 minutes upper temperature, quick change to the lower level,
 30 minutes hold at lower level. Then the temperature rises
 again to the upper level and a new cycle begins. About 500
 cycles have been completed. 
 
 How can I count the number of cycles that have been
 completed?
 
 The data does not only include perfect temperature cycles.
 Once in a while the machine would stand still and for a day
 or so the temperature would remain at room temperature. So I
 cannot simply divide the measured time by the duration of a
 cycle...
 
 Thanks a lot for your support, if necessary I could also
 provide the dataset.
 
 Antje
 
 
  Temperaturzyklen.eps 
 
 -Inline Attachment Follows-
 
 __
 R-help@r-project.org
 mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained,
 reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Uncorrelated random vectors

2009-07-07 Thread Moshe Olshansky


As mentioned by somebody before, there is no problem for the normal case - use 
mvrnorm function from MASS package with any mu and make Sigma be any diagonal 
matrix (with strictly positive diagonal). Note that even though all the 
correlations are 0, the SAMPLE correlations won't be 0. If you want to create a 
set of vectors whose SAMPLE correlations are 0 you will have to use a variant 
of Gramm-Schmidt.
I do not know whether a variant of mvrnorm exists for logistic distribution (my 
guess is that it does not).

--- On Tue, 7/7/09, Stein, Luba (AIM SE) luba.st...@allianz.com wrote:

 From: Stein, Luba (AIM SE) luba.st...@allianz.com
 Subject: [R] Uncorrelated random vectors
 To: r-help@r-project.org r-help@r-project.org
 Received: Tuesday, 7 July, 2009, 11:45 PM
 Hello,
 
 is it possible to create two uncorrelated random vectors
 for a given distribution.
 
 In fact, I would like to have something like the function
 rnorm or rlogis with the extra property that they are
 uncorrelated.
 
 Thanks for your help,
 Luba
 
 
 
 
     [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org
 mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained,
 reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] a really simple question on polynomial multiplication

2008-10-15 Thread Moshe Olshansky

One way is to use convolution (?convolve):

If A(x) = a_p*x^p + ... + a_1*x + a_0
and B(x) = b_q*x^q + ... + b_1*x + b_0
and if C(x) = A(x)*B(x) = c_(p+q)*x^(p+q) + ... + c_0 
then c = convolve(a,rev(b),type=open)
where c is the vector (c_(p+q),...,c_0), a is (a_p,...,a_0) and b is 
(b_q,...,b_0).
In your case:

 a - c(1,-3)
 b - c(1,3)
 c - convolve(a,rev(b),type=open)
 c
[1]  1.0e+00 -6.49654e-16 -9.0e+00

Note that the coefficient of x is not exactly 0 due to the use of fft to 
compute convolutions. 



--- On Thu, 16/10/08, Erin Hodgess [EMAIL PROTECTED] wrote:

 From: Erin Hodgess [EMAIL PROTECTED]
 Subject: [R]  a really simple question on polynomial multiplication
 To: R Help r-help@r-project.org
 Received: Thursday, 16 October, 2008, 8:44 AM
 Dear R people:
 
 Is there a way to perform simple polynomial multiplication;
 that is,
 something like
 (x - 3) * (x + 3) = x^2 - 9, please?
 
 I looked in poly and polyroot and expression.  There used
 to be a
 package that had this, maybe?
 
 thanks,
 Erin
 
 
 -- 
 Erin Hodgess
 Associate Professor
 Department of Computer and Mathematical Sciences
 University of Houston - Downtown
 mailto: [EMAIL PROTECTED]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained,
 reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] runs of heads when flipping a coin

2008-10-09 Thread Moshe Olshansky

First of all, we must define what is a run of length r: is it a tail, then 
EXACTLY r heads and a tail again or is it AT LEAST r heads.
Let's assume that we are looking for a run of EXACTLY r heads (and we toss the 
coin n times).
Let X[1],X[2],...,X[n-r+1] be random variables such that Xi = 1 if there is a 
run of r heads starting at place i and 0 otherwise. Then the number of runs of 
length r is just sum(X), so the expected number of runs of length r is 
sum(E(X)) = sum(P(X[i]=1)).
Now, for i=2,3,...,n-r P(X[i] = 1) = (1-h)*h^r*(1-h) and also P(X[1] = 1) = 
h^r*(1-h) and P(X[n-r+1] = 1) = (1-h)*h^r, so that the expected number of runs 
of length r is
(1-h)*h^r*(2 + (n-r-1)*(1-h))

As to the distribution of the number of such runs, this is a much more 
difficult question (as mentioned by some other people).


--- On Fri, 10/10/08, Harvey [EMAIL PROTECTED] wrote:

 From: Harvey [EMAIL PROTECTED]
 Subject: [R] runs of heads when flipping a coin
 To: r-help@r-project.org
 Received: Friday, 10 October, 2008, 4:16 AM
 Can someone recommend a method to answer the following type
 of question:
 
 Suppose I have a coin with a probability hhh of coming up
 heads (and 1-hhh
 of coming up tails)
 I plan on flipping the coin nnn times (for example, nnn =
 500)
 What is the expected probability or frequency of a run of
 rrr heads* during
 the nnn=500 coin flips?
 Moreover, I would probably (excuse the pun) want the answer
 for a range of
 rrr values, for example rrr = 0:50
 
 Of course I am more interested in an analytical solution
 than a monte carlo
 simulation solution.
 
 Thanks in advance,
 
 Harvey
 
 * OR a run of rrr heads or more ...
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained,
 reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] ordering problem

2008-10-07 Thread Moshe Olshansky

Try 

AA - apply(A,1,function(x) paste(x,collapse=))

and work with AA.


--- On Tue, 30/9/08, Jose Luis Aznarte M. [EMAIL PROTECTED] wrote:

 From: Jose Luis Aznarte M. [EMAIL PROTECTED]
 Subject: [R] ordering problem
 To: [EMAIL PROTECTED]
 Received: Tuesday, 30 September, 2008, 8:43 PM
 Hi there!
 I need some assistance here with vector orderings.
 I have a set 
 of q vectors of length p, grouped by rows in a matrix A,
 q·p, that I 
 need to order lexicographically 
 (http://en.wikipedia.org/wiki/Lexicographical_order).
 I also have another matrix B, p·r, and a vector c,
 that should 
 be ordered according to the order of A. So far, I was doing
 
 ordering -  apply(A, 2, order)[,1]
 A - A[ordering,]
 B - B[ordering,]
 c - c[ordering]
 
 But now I realize that this way I'm ordering by
 considering only 
 the first dimension of the vectors in A, i.e., not
 considering the case 
 where there are ties amongst this first dimension. Does
 anyone have a 
 clue about properly applying the lexicographical ordering?
 Thanks in 
 advance!
 
  
 --  --
 Jose Luis Aznarte M.   http://decsai.ugr.es/~jlaznarte
 Department of Computer Science and Artificial Intelligence
 Universidad de Granada   Tel. +34 - 958 - 24 04 67
 GRANADA (Spain)  Fax: +34 - 958 - 24 00 79
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained,
 reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] design question on piping multiple data sets from 1 file into R

2008-09-24 Thread Moshe Olshansky

I think that you can use read.csv with nrows and skip arguments (see 
?read.table).

--- On Mon, 22/9/08, DS [EMAIL PROTECTED] wrote:

 From: DS [EMAIL PROTECTED]
 Subject: [R] design question on piping multiple data sets from 1 file into R
 To: r-help@r-project.org
 Received: Monday, 22 September, 2008, 8:06 AM
 Hi,
I have some queries that I use  to get time series
 information for 8 seperate queries which deal with a
 different set of time series each.

   I take my queries run them and save the output as csv
 file and them format the data into graphs in excel.

   I wanted to know if there is an elegant and clean way to
 read in 1 csv file but to read the seperate matrices on
 different rows into seperate R data objects.

   if this is easy then I can read the 8 datasets in the csv
 file into 8 r objects and pipe them to time series objects
 for graphs.

 thanks
 Dhruv

 Email Fax
 It's easy to receive faxes via email. Click now to find
 out how!
 http://tagline.excite.com/fc/JkJQPTgLMRGrZRz1SpXTBEyJ7zsqYo4Wrxjvd4ml8SSHhbc6NzbNSo/
   [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained,
 reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] sort a data matrix by all the values and keep the names

2008-09-22 Thread Moshe Olshansky

One possibility is:

 x - data.frame(x1=c(1,7),x2=c(4,6),x3=c(8,2))
 names - t(matrix(rep(names(x),times=nrow(x)),nrow=ncol(x)))
 m - as.matrix(x)
 ind - order(m)
 df - data.frame(name=names[ind],value=m[ind])
 df
  name value
1   x1 1
2   x3 2
3   x2 4
4   x2 6
5   x1 7
6   x3 8



--- On Tue, 23/9/08, zhihuali [EMAIL PROTECTED] wrote:

 From: zhihuali [EMAIL PROTECTED]
 Subject: [R] sort a data matrix by all the values and keep the names
 To: [EMAIL PROTECTED]
 Received: Tuesday, 23 September, 2008, 9:54 AM
 Dear all,
 
 If I have a data frame 
 x-data.frame(x1=c(1,7),x2=c(4,6),x3=c(8,2)):
x1  x2  x3
1 4  8
7 6  2
 
 I want to sort the whole data and get this:
 x1 1
 x3  2
 x2  4
 x2  6
 x1   7
 x3   8
 
  If I do sort(X), R reports:
 Error in order(list(x1 = c(1, 7), x2 = c(4, 6), x3 = c(8,
 2)), decreasing = FALSE) : 
   unimplemented type 'list' in
 'orderVector1'
 
 The only way I can sort all the data is by converting it to
 a matrix:
  sort(as.matrix(x))
 [1] 1 2 4 6 7 8
 
 But now I lost all the names attributes.
 
 Is it possible to sort a data frame and keep all the names?
 
 Thanks!
 
 Zhihua Li
 
 _
 [[elided Hotmail spam]]
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained,
 reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] perl expression question

2008-09-22 Thread Moshe Olshansky

Hi Mark,

stock-/opt/limsrv/mark/research/equity/projects/testDL/stock_data/fhdb/US/BLC.NYSE
 gsub(.*/([^/]+)$, \\1,stock)
[1] BLC.NYSE



--- On Tue, 23/9/08, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote:

 From: [EMAIL PROTECTED] [EMAIL PROTECTED]
 Subject: [R] perl expression question
 To: r-help@r-project.org
 Received: Tuesday, 23 September, 2008, 10:29 AM
 If I have the string below. does someone know a regular
 expression to 
 just get the BLC.NYSE. I bought the
 O'Reilley
 book and read it when I can  and I study the solutions on
 the list but 
 I'm still not self sufficient with these things.
 Thanks.
 
  
 stock-/opt/limsrv/mark/research/equity/projects/testDL/stock_data/fhdb/US/BLC.NYSE
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained,
 reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] help on sampling from the truncated normal/gamma distribution on the far end (probability is very low)

2008-09-18 Thread Moshe Olshansky

Hi Sonia,

If I did not make a mistake, the conditional distribution of X given that X  0 
is very close to exponential distribution with parameter lambda = 40, so you 
can sample from this distribution.


--- On Mon, 15/9/08, Daniel Davis [EMAIL PROTECTED] wrote:

 From: Daniel Davis [EMAIL PROTECTED]
 Subject: [R] help on sampling from the truncated normal/gamma distribution on 
 the far end (probability is very low)
 To: r-help@r-project.org
 Received: Monday, 15 September, 2008, 2:28 PM
 Hi, guys,
 
 I am trying to sample from a truncated normal/gamma
 distribution.
 But only the far end of the distribution (where the
 probability is very low)
 is left. e.g.
 
 mu = - 4;
 sigma = 0.1;
 The distribution is Normal(mu,sigma^2) truncated on
 [0,+Inf];
 
 How can I get a sample? I tried to use inverse CDF method,
 but got Inf as
 answers. Please help me out.
 
 Also, pls help me on the similar situation on gamma
 dist'n.
 
 
 Thanks,
 Sonia
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained,
 reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] help on sampling from the truncated normal/gamma distribution on the far end (probability is very low)

2008-09-18 Thread Moshe Olshansky

Well, I made a mistake - your lambda should be 400 and not 40!!!

--- On Thu, 18/9/08, Moshe Olshansky [EMAIL PROTECTED] wrote:

 From: Moshe Olshansky [EMAIL PROTECTED]
 Subject: Re: [R] help on sampling from the truncated normal/gamma 
 distribution on the far end (probability is very low)
 To: r-help@r-project.org, Daniel Davis [EMAIL PROTECTED]
 Received: Thursday, 18 September, 2008, 5:00 PM
 Hi Sonia,

 If I did not make a mistake, the conditional distribution
 of X given that X  0 is very close to exponential
 distribution with parameter lambda = 40, so you can sample
 from this distribution.

 --- On Mon, 15/9/08, Daniel Davis
 [EMAIL PROTECTED] wrote:

  From: Daniel Davis [EMAIL PROTECTED]
  Subject: [R] help on sampling from the truncated
 normal/gamma distribution on the far end (probability is
 very low)
  To: r-help@r-project.org
  Received: Monday, 15 September, 2008, 2:28 PM
  Hi, guys,

  I am trying to sample from a truncated normal/gamma
  distribution.
  But only the far end of the distribution (where the
  probability is very low)
  is left. e.g.

  mu = - 4;
  sigma = 0.1;
  The distribution is Normal(mu,sigma^2) truncated on
  [0,+Inf];

  How can I get a sample? I tried to use inverse CDF
 method,
  but got Inf as
  answers. Please help me out.

  Also, pls help me on the similar situation on gamma
  dist'n.

  Thanks,
  Sonia

  [[alternative HTML version deleted]]

  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained,
  reproducible code.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained,
 reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] inserting values for null

2008-09-17 Thread Moshe Olshansky

Hi Ramya,

Assuming that the problem is well defined (i.e. the values in col1 of the 
data.frames are unique and every value in D.F.sub.2[,1] appears also in 
D.F1[,1]) you can do the following:

ind - match(D.F.sub.2[,1],D.F1[,1])
D.F1[ind,] - D.F.sub.2


--- On Thu, 18/9/08, Rajasekaramya [EMAIL PROTECTED] wrote:

 From: Rajasekaramya [EMAIL PROTECTED]
 Subject: [R]  inserting values for null
 To: r-help@r-project.org
 Received: Thursday, 18 September, 2008, 2:13 AM
 I have a dataframe D.F1 
 
 dim (D.F1) 
 14351 9 
 This dataframe has values and for some 1000 rows it holds
 NULL values.I hace
 found the missing values for about 500 and have those in
 another dataframe
 D.F.sub.2 
 
 dim(D.F.sub.2) 
 500 9 
 as dataframe is a subset of D.F1 the coulmn 1 in D.F.sub.2
 is a subset of
 D.F1.I have to insert the values in D.F1 in other fields
 while the coulmn 1
 in both the main table and sub table matches. 
 Ih short i have to insert a dataframe (D.F.sub.2) into the
 main D.F1 using
 the column1 that hold similar values. 
 
 Kindly plz help me.
 
 Ramya
 -- 
 View this message in context:
 http://www.nabble.com/inserting-values-for-null-tp19535742p19535742.html
 Sent from the R help mailing list archive at Nabble.com.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained,
 reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] database table merging tips with R

2008-09-11 Thread Moshe Olshansky

One possibility is as follows:

If r$userid is your array of (2000) ID's then
s - paste(r$userid,sep=,)
s- paste(select t.userid, x, y, z from largetable t where t.serid in 
(,s,),sep=)
and finally
d - sqlQuery(connection,s)

Regards,

Moshe.


--- On Fri, 12/9/08, Avram Aelony [EMAIL PROTECTED] wrote:

 From: Avram Aelony [EMAIL PROTECTED]
 Subject: [R] database table merging tips with R
 To: [EMAIL PROTECTED]
 Received: Friday, 12 September, 2008, 4:33 AM
 Dear R list,
 
 What is the best way to efficiently marry an R dataset with
 a very large (Oracle) database table?  
 
 The goal is to only return Oracle table rows that match IDs
 present in the R dataset.  
 I have an R data frame with 2000 user IDs analogous to: r =
 data.frame(userid=round(runif(2000)*10,0))
 
 ...and I need to pull data from an Oracle table only for
 these 2000 IDs.  The Oracle table is quite large.
 Additionally, the sql query may need to join to other tables
 to bring in ancillary fields.
 
 I currently connect to Oracle via odbc: 
 
 library(RODBC)
 connection - odbcConnect(,
 uid=, pwd=)
 d = sqlQuery(connection, select userid, x, y, z from
 largetable where timestamp  sysdate -7)
 
 ...allowing me to pull data from the database table into
 the R object d and then use the R merge
 function.  The problem however is that if d is
 too large it may fail due to memory limitations or be
 inefficient.  I would like to push the merge portion to the
 database and it would be very convenient if it were possible
 to request that the query look to the R object for the
 ID's to which it should restrict the output.  
 
 Is there a way to do this?
 Something like the following fictional code:
 d = sqlQuery(connection, select t.userid, x, y, z
 from largetable t where r$userid=t.userid)
 
 Would sqldf (http://code.google.com/p/sqldf/) help me out
 here? If so, how?   This would be convenient and help me
 avoid needing to create a temporary table to store the R
 data, join via sql, then return the data back to R.
 
 I am using R version 2.7.2 (2008-08-25) / i386-pc-mingw32 .
 
 Thanks for your comments, ideas, recommendations.
 
 
 -Avram
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained,
 reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] database table merging tips with R

2008-09-11 Thread Moshe Olshansky

Just a small correction:

start with 

s - paste(r$userid,collapse=,)

and not 

s - paste(r$userid,sep=,)

--- On Fri, 12/9/08, Moshe Olshansky [EMAIL PROTECTED] wrote:

 From: Moshe Olshansky [EMAIL PROTECTED]
 Subject: Re: [R] database table merging tips with R
 To: [EMAIL PROTECTED], Avram Aelony [EMAIL PROTECTED]
 Received: Friday, 12 September, 2008, 8:59 AM
 One possibility is as follows:
 
 If r$userid is your array of (2000) ID's then
 s - paste(r$userid,sep=,)
 s- paste(select t.userid, x, y, z from largetable
 t where t.serid in (,s,),sep=)
 and finally
 d - sqlQuery(connection,s)
 
 Regards,
 
 Moshe.
 
 
 --- On Fri, 12/9/08, Avram Aelony [EMAIL PROTECTED]
 wrote:
 
  From: Avram Aelony [EMAIL PROTECTED]
  Subject: [R] database table merging tips with R
  To: [EMAIL PROTECTED]
  Received: Friday, 12 September, 2008, 4:33 AM
  Dear R list,
  
  What is the best way to efficiently marry an R dataset
 with
  a very large (Oracle) database table?  
  
  The goal is to only return Oracle table rows that
 match IDs
  present in the R dataset.  
  I have an R data frame with 2000 user IDs analogous
 to: r =
  data.frame(userid=round(runif(2000)*10,0))
  
  ...and I need to pull data from an Oracle table only
 for
  these 2000 IDs.  The Oracle table is quite large.
  Additionally, the sql query may need to join to other
 tables
  to bring in ancillary fields.
  
  I currently connect to Oracle via odbc: 
  
  library(RODBC)
  connection - odbcConnect(,
  uid=, pwd=)
  d = sqlQuery(connection, select userid, x, y, z
 from
  largetable where timestamp  sysdate -7)
  
  ...allowing me to pull data from the database table
 into
  the R object d and then use the R merge
  function.  The problem however is that if
 d is
  too large it may fail due to memory limitations or be
  inefficient.  I would like to push the merge portion
 to the
  database and it would be very convenient if it were
 possible
  to request that the query look to the R object for the
  ID's to which it should restrict the output.  
  
  Is there a way to do this?
  Something like the following fictional code:
  d = sqlQuery(connection, select t.userid, x, y,
 z
  from largetable t where r$userid=t.userid)
  
  Would sqldf (http://code.google.com/p/sqldf/) help me
 out
  here? If so, how?   This would be convenient and help
 me
  avoid needing to create a temporary table to store the
 R
  data, join via sql, then return the data back to R.
  
  I am using R version 2.7.2 (2008-08-25) /
 i386-pc-mingw32 .
  
  Thanks for your comments, ideas, recommendations.
  
  
  -Avram
  
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained,
  reproducible code.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained,
 reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] densities with overlapping area of 0.35

2008-09-08 Thread Moshe Olshansky

Let X be normally distributed with mean 0 and let f be it's density. Now the 
density of X+a will be f shifted right by a. Since the density is symmetric 
around mean it follows that the area of overlap of the two densities is exactly 
P(Xa) + P(X-a).
So if X~N(0,1), we want P(Xa) + P(X-a) = 2P(X-a) = 0.35, so P(X-a) = 0.175 
which yields -a =  qnorm(0.175) = -0.9345893, so a = 0.9345893.


--- On Tue, 9/9/08, Lavan [EMAIL PROTECTED] wrote:

 From: Lavan [EMAIL PROTECTED]
 Subject: [R]  densities with overlapping area of 0.35
 To: r-help@r-project.org
 Received: Tuesday, 9 September, 2008, 12:11 PM
 Hi,
 
 I like to generate two normal densities such that the
 overlapping area
 between them is 0.35. Is there any code/package available
 in R to do that??
 
 Regards,
 
 Lavan
 -- 
 View this message in context:
 http://www.nabble.com/densities-with-overlapping-area-of-0.35-tp19384741p19384741.html
 Sent from the R help mailing list archive at Nabble.com.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained,
 reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] densities with overlapping area of 0.35

2008-09-08 Thread Moshe Olshansky

Just a correction:
if we take X+2a then everything is OK (the curves intersect at a), so a = 
0.9345893 is correct but one must take X ~ N(0,1) and Y ~N(2*a,1).

--- On Tue, 9/9/08, Moshe Olshansky [EMAIL PROTECTED] wrote:

 From: Moshe Olshansky [EMAIL PROTECTED]
 Subject: Re: [R] densities with overlapping area of 0.35
 To: r-help@r-project.org, Lavan [EMAIL PROTECTED]
 Received: Tuesday, 9 September, 2008, 12:37 PM
 Let X be normally distributed with mean 0 and let f be
 it's density. Now the density of X+a will be f shifted
 right by a. Since the density is symmetric around mean it
 follows that the area of overlap of the two densities is
 exactly P(Xa) + P(X-a).
 So if X~N(0,1), we want P(Xa) + P(X-a) =
 2P(X-a) = 0.35, so P(X-a) = 0.175 which yields -a = 
 qnorm(0.175) = -0.9345893, so a = 0.9345893.

 --- On Tue, 9/9/08, Lavan [EMAIL PROTECTED]
 wrote:

  From: Lavan [EMAIL PROTECTED]
  Subject: [R]  densities with overlapping area of 0.35
  To: r-help@r-project.org
  Received: Tuesday, 9 September, 2008, 12:11 PM
  Hi,

  I like to generate two normal densities such that the
  overlapping area
  between them is 0.35. Is there any code/package
 available
  in R to do that??

  Regards,

  Lavan
  -- 
  View this message in context:

 http://www.nabble.com/densities-with-overlapping-area-of-0.35-tp19384741p19384741.html
  Sent from the R help mailing list archive at
 Nabble.com.

  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained,
  reproducible code.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained,
 reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] intercept of 3D line? (Orthogonal regression)

2008-09-01 Thread Moshe Olshansky

I do not see why you can not use regression even in this case.

To make things more simple suppose that the exact model is:

y = a + b*x, i.e.
y1 = a + b*x1
...
yn = a + b*xn

But you can not observe y and x. Instead you observe 
ui = xi + ei (i=1,...,n) and
vi = yi + di (i=1,...,n)

Now you have

vi = yi + di = a + b*xi + di = a + b*(ui - ei) + di 
= a + b*ui + (di - b*ei)

and under regular assumptions about ei's end di's we get a standard regression 
problem (note that b is unknown to you but is constant).


--- On Tue, 2/9/08, William Simpson [EMAIL PROTECTED] wrote:

 From: William Simpson [EMAIL PROTECTED]
 Subject: [R] intercept of 3D line? (Orthogonal regression)
 To: r-help@r-project.org
 Received: Tuesday, 2 September, 2008, 4:53 AM
 I posted before recently about fitting 3D data x, y, z where
 all have
 error attached.
 I want to predict z from x and y; something like
 z = b0 + b1*x + b2*y
 But multiple regression is not suitable because all of x,
 y, and z have errors.
 
 I have plotted a 3D scatterplot of some data using rgl. I
 see that the
 data form a cigar-shaped cloud. I think multiple regression
 is only
 suitable when the points fall on a plane (forgetting about
 the error
 in x and y).
 
 I now know the right way how to find the best fitting plane
 to x,y,z
 data using princomp.
 But a new problem is how to get the best fitting *line*. I
 actually
 know how to do that too using princomp. But there is a
 mathematical
 problem: there's no way to specify a line in 3D space
 in the form
 z=f(x,y) or in other words with an intercept and slopes.
 Instead, one way to deal with the problem is to use a
 parametric
 version of the line: you use an arbitrary starting point
 x0, y0, z0
 and the direction vector of your line (I know how to get
 the direction
 vector).
 
 BUT how do I get the intercept??? At this point my lines
 just go
 through the origin.
 Do I just use $center from the princomp output modified in
 some way?
 
 Thanks for any help!
 
 Cheers
 Bill
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained,
 reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Integrate a 1-variable function with 1 parameter (Jose L. Romero)

2008-08-27 Thread Moshe Olshansky

This can be done analytically: after changing a variable (2*t - t) and some 
scaling we need to compute 
f(x) = integral from 0 to 20 of (t^x*exp(-t))dt/factorial(x)

f(0) = int from 0 to 20 of exp(-t)dt = 1 - exp(-20)
and integration by parts yields (for x=1,2,3,...) 

f(x) = -exp(-20)*20^x/factorial(x) + f(x-1) so that

f(x) = 1 - exp(-20)*sum(20^k/factorial(k)) where the sum is for k=0,1,...,x

If I did not a mistake, your original quantity should be f(x)/20.


--- On Thu, 28/8/08, jose romero [EMAIL PROTECTED] wrote:

 From: jose romero [EMAIL PROTECTED]
 Subject: [R] Integrate a 1-variable function with 1 parameter (Jose L. Romero)
 To: r-help@r-project.org
 Received: Thursday, 28 August, 2008, 12:23 AM
 Hey fellas:
 
 I would like to integrate the following function:
 
 integrand - function (x,t) {
   exp(-2*t)*(2*t)^x/(10*factorial(x))
 }
 
 with respect to the t variable, from 0 to 10.
 The variable x here works as a parameter: I would like to
 integrate the said function for each value of x in
 0,1,..,44.
 
 I have tried Vectorize to no avail.
 
 Thanks in advance,
 jose romero
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained,
 reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Problem with Integrate for NEF-HS distribution

2008-08-26 Thread Moshe Olshansky

If you look at your sech(pi*x/2) you can write it as 
sech(pi*x/2) = 2*exp(pi*x/2)/(1 + exp(pi*x))
For x  -15, exp(pi*x)  10^-20, so for this interval you can replace 
sech(pi*x/2) by 2*exp(pi*x/2) and so the integral from -Inf to -15 (or even -10 
- depends on your accuracy requirements) can be computed analytically, so you 
are left with integral from -10 (or -15) to your upper bound and this should be 
all right for numerical integration.


--- On Wed, 27/8/08, xia wang [EMAIL PROTECTED] wrote:

 From: xia wang [EMAIL PROTECTED]
 Subject: RE: [R] Problem with Integrate for NEF-HS distribution
 To: [EMAIL PROTECTED], [EMAIL PROTECTED]
 Received: Wednesday, 27 August, 2008, 12:26 AM
 Thanks. I revised the code but it doesn't make
 difference.  The cause of the error is just as stated in the
 ?integrate  If the function is approximately constant
 (in particular, zero) over nearly all 
 its range it is possible that the result and error estimate
 may be seriously 
 wrong.   I have searched R-archive.  It may not be
 feasible to solve the problem by rectangle
 approximation as some postings suggested because the
 integration is in fact within MCMC samplings as follows. 
 The lower bound is always -Inf.  The upper bound and the
 value of theta changes for each sampling in MCMC.
 
 I tried to multiple the integrand by a large number ( like
 10^16) and changes the rel.tol.  It does help for some range
 of theta but not all.
 
 Xia
 
 like.fun - function(beta,theta) {
 integrand - function(X,theta)
 {0.5*cos(theta)*exp(X*theta)*sech(pi*X/2)}
 upper-x%*%beta
 prob-matrix(NA, nrow(covariate),1)
  for (i in 1:nrow(covariate))
{prob[i]-integrate(integrand,lower=-Inf,
 upper=upper[i],theta=theta)$value
 } 
 likelihood - sum(log(dbinom(y,n,prob)))
 return(likelihood)
 }
 
 
  
 
  Date: Tue, 26 Aug 2008 00:49:02 -0700
  From: [EMAIL PROTECTED]
  Subject: Re: [R] Problem with Integrate for NEF-HS
 distribution
  To: [EMAIL PROTECTED]
  
  Shouldn't you define
  integrand - function(X,theta) ..
  and not
  integrand - function(X) .
  
  
  --- On Tue, 26/8/08, xia wang
 [EMAIL PROTECTED] wrote:
  
   From: xia wang [EMAIL PROTECTED]
   Subject: [R] Problem with Integrate for NEF-HS
 distribution
   To: r-help@r-project.org
   Received: Tuesday, 26 August, 2008, 3:00 PM
   I need to calcuate the cumulative probability for
 the
   Natural Exponential Family - Hyperbolic secant
 distribution
   with a parameter theta between -pi/2  and pi/2.
 The
   integration should be between 0 and 1 as it is a
   probability. 
   
   The function integrate works fine
 when the
   absolute value of theta is not too large.  That
 is, the
   NEF-HS distribution is not too skewed.  However,
 once the
   theta gets large in absolute value, such as -1 as
 shown in
   the example below, integrate keeps
 give me error
   message for non-finite function when
 I put the
   lower bound as -Inf.  I suspect that it is caused
 by the
   very heavy tail of the distribution. 
   
   Is there any way that I can get around of this
 and let
   integrate work for the skewed
 distribution?  Or
   is there any other function for integrating in
 R-package?
   Thanks a lot for your advice in advance!
   
   
theta--1
sech-function(X) 2/(exp(-X)+exp(X))
integrand - function(X)
   {0.5*cos(theta)*exp(X*theta)*sech(pi*X/2)}
   
integrate(integrand, -3,1)
   0.8134389 with absolute error  7e-09
integrate(integrand, -10,1)
   0.9810894 with absolute error  5.9e-06
integrate(integrand, -15,1)
   0.9840505 with absolute error  7e-09
integrate(integrand, -50,1)
   0.9842315 with absolute error  4.4e-05
integrate(integrand, -100,1)
   0.9842315 with absolute error  3.2e-05
integrate(integrand, -Inf,1)
   Error in integrate(integrand, -Inf, 1) :
 non-finite
   function value
   
   
   Xia
  
 _
   Be the filmmaker you always wanted to be—learn
 how to
   burn a DVD with Windows®.
   
   __
   R-help@r-project.org mailing list
   https://stat.ethz.ch/mailman/listinfo/r-help
   PLEASE do read the posting guide
   http://www.R-project.org/posting-guide.html
   and provide commented, minimal, self-contained,
   reproducible code.
 
 _
 Be the filmmaker you always wanted to be—learn how to
 burn a DVD with Windows®.
 http://clk.atdmt.com/MRT/go/108588797/direct/01/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Finding a probability

2008-08-26 Thread Moshe Olshansky

You commands are correct and the interpretation is that the probability that a 
normal random variable with mean 1454.190 and
standard deviation 162.6301 achieves a value of 417 or less is 8.99413e-11

--- On Wed, 27/8/08, rr400 [EMAIL PROTECTED] wrote:

 From: rr400 [EMAIL PROTECTED]
 Subject: [R]  Finding a probability
 To: r-help@r-project.org
 Received: Wednesday, 27 August, 2008, 11:50 AM
 Hi, i am trying to determine a certain probability and i am
 unsure if the
 commands i have entered into R are correct. 
 I am trying to determine how unusual it would be to find
 something with the
 value 417 for a set of numbers with mean 1454.190 and
 standard deviation
 162.6301. The command i entered was:
 pnorm (417, 1454.190, 162.6301)
 Which returned:
 [1] 8.99413e-11
 I am unsure how to interpret this number as it is, and
 hence whether i have
 entered the appropriate commands for what i am trying to
 find out. If the
 commands are correct, how can i simplify the answer to
 understand the
 probability?

 Thanks in advance, R.

 -- 
 View this message in context:
 http://www.nabble.com/Finding-a-probability-tp19173491p19173491.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained,
 reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Igraph library: How to calculate APSP (shortest path matrix) matrix for a subset list of nodes.

2008-08-25 Thread Moshe Olshansky

I was too optimistic - the complexity is O(E*log(V)) where V is the number of 
nodes, but since log(25000)  20 this is still reasonable.

--- On Mon, 25/8/08, Moshe Olshansky [EMAIL PROTECTED] wrote:

 From: Moshe Olshansky [EMAIL PROTECTED]
 Subject: Re: [R] Igraph library: How to calculate APSP (shortest path matrix) 
 matrix for a subset list of nodes.
 To: r-help@r-project.org, dinesh kumar [EMAIL PROTECTED]
 Received: Monday, 25 August, 2008, 3:27 PM
 As far as I know/remember, if your graph is connected and
 contains E edges then you can find the shortest distance
 from any particular vertex to all other vertices in O(E)
 operations. You can repeat this procedure starting from
 every node (out of the 500). If you have 100,000 edges this
 will require about 100,000*500 = 50,000,000 operations
 (times a small factor) which is very reasonable.

 --- On Mon, 25/8/08, dinesh kumar [EMAIL PROTECTED]
 wrote:

  From: dinesh kumar [EMAIL PROTECTED]
  Subject: [R] Igraph library: How to calculate APSP
 (shortest path matrix) matrix for a subset list of nodes.
  To: r-help@r-project.org
  Received: Monday, 25 August, 2008, 8:00 AM
  Dear R Users,

  I have a network of 25000 total nodes and a list of
 500
  node which is a
  subset of all nodes. Now I want to calculate the APSP
 (all
  pair shortest
  path) matrix only for these 500 nodes.
  I would appreciate any help.
  Thanks in advance

  Dinesh

  -- 
  Dinesh Kumar Barupal
  Research Associate
  Metabolomics Fiehn Lab
  UCD Genome Center
  451 East Health Science Drive
  GBSF Builidng
  University of California
  DAVIS
  95616
  http://fiehnlab.ucdavis.edu/staff/kumar

  [[alternative HTML version deleted]]

  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained,
  reproducible code.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained,
 reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] paste: multiple collapse arguments?

2008-08-25 Thread Moshe Olshansky

One possibility is:
y - rep( ,6)
y[6] - 
y[c(2,4)] - \n
res - paste(paste(x,y,sep=),collapse=)


--- On Tue, 26/8/08, remko duursma [EMAIL PROTECTED] wrote:

 From: remko duursma [EMAIL PROTECTED]
 Subject: [R] paste: multiple collapse arguments?
 To: r-help@r-project.org
 Received: Tuesday, 26 August, 2008, 9:36 AM
 Dear R-helpers,
 
 I have a numeric vector, like:
 
 x - c(1,2,3,4,5,6)
 
 I make this into a string for output to a text file,
 separated by \n:
 
 paste(x, collapse=\n)
 
 Is there a way to alternate the collapse argument? So
 between the first two elements of x, I want to separate by
  , then by \n, and so forth.
 The result should then look like:
 1 2\n3 4\n5 6
 
 (This way I get 2 elements of x on each line using
 writeLines, instead of one or all).
 I could do this in some ugly loop, but surely there is a
 better way?
 
 thanks,
 Remko
 
 
 
 
 
 
 _
 Get thousands of games on your PC, your mobile phone, and
 the web with Windows®.
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained,
 reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] deconvolution: Using the output and a IRF to get the input

2008-08-24 Thread Moshe Olshansky

Hi Wolf,

Without noise you could use FFT, i.e. FFT of a convolution is the product of 
the individual FFTs and so you get the FFT of your input signal and using 
inverse FFT you get the signal itself. 
When there is noise you must experiment. You may want to filter the response 
before doing FFT. Whay do you know about the noise?

Regards,

Moshe.


--- On Mon, 25/8/08, wolf zinke [EMAIL PROTECTED] wrote:

 From: wolf zinke [EMAIL PROTECTED]
 Subject: [R] deconvolution: Using the output and a IRF to get the input
 To: r-help@r-project.org
 Received: Monday, 25 August, 2008, 8:22 AM
 Hi,
 
 Maybe someone could give me some pointers for my problem.
 So far I have 
 not found a good solution, maybe it is just ill posed?
 
 I have a signal that is the result of an input signal
 convolved with a 
 given impulse response function (IRF) plus noise. I want to
 use the this 
 signal and the IRF to determine the underlying input
 signal. In my 
 naivety I thought this just might be a deconvolution
 problem. But here I 
 found only routines that use the input signal and the
 output signal to 
 get the IRF. Is it possible to derive the input signal when
 output and 
 IRF are given? If so, how could I do this with R?
 
 Thanks a lot for any hints,
 wolf
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained,
 reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Igraph library: How to calculate APSP (shortest path matrix) matrix for a subset list of nodes.

2008-08-24 Thread Moshe Olshansky

As far as I know/remember, if your graph is connected and contains E edges then 
you can find the shortest distance from any particular vertex to all other 
vertices in O(E) operations. You can repeat this procedure starting from every 
node (out of the 500). If you have 100,000 edges this will require about 
100,000*500 = 50,000,000 operations (times a small factor) which is very 
reasonable.


--- On Mon, 25/8/08, dinesh kumar [EMAIL PROTECTED] wrote:

 From: dinesh kumar [EMAIL PROTECTED]
 Subject: [R] Igraph library: How to calculate APSP (shortest path matrix) 
 matrix for a subset list of nodes.
 To: r-help@r-project.org
 Received: Monday, 25 August, 2008, 8:00 AM
 Dear R Users,
 
 I have a network of 25000 total nodes and a list of 500
 node which is a
 subset of all nodes. Now I want to calculate the APSP (all
 pair shortest
 path) matrix only for these 500 nodes.
 I would appreciate any help.
 Thanks in advance
 
 Dinesh
 
 -- 
 Dinesh Kumar Barupal
 Research Associate
 Metabolomics Fiehn Lab
 UCD Genome Center
 451 East Health Science Drive
 GBSF Builidng
 University of California
 DAVIS
 95616
 http://fiehnlab.ucdavis.edu/staff/kumar
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained,
 reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Help Regarding 'integrate'

2008-08-21 Thread Moshe Olshansky

The phenomenon is most likely caused by numerical errors. I do not know how 
'integrate' works but numerical integration over a very long interval does not 
look a good idea to me.
I would do the following:

f1-function(x){
   return(dchisq(x,9,77)*((13.5/x)^5)*exp(-13.5/x))
}

f2-function(y){
   return(dchisq(1/y,9,77)*((13.5*y)^5)*exp(-13.5*y)/y^2)
}
# substitute y = 1/x

I1 - integrate(f1,0,1,abs.tol=abs.tol=1.0e-20*.Machine$double.eps^0.25)

I2 - integrate(f2,0,1,abs.tol=abs.tol=1.0e-20*.Machine$double.eps^0.25)

--- On Thu, 21/8/08, A [EMAIL PROTECTED] wrote:

 From: A [EMAIL PROTECTED]
 Subject: [R] Help Regarding 'integrate'
 To: r-help@r-project.org
 Received: Thursday, 21 August, 2008, 4:44 PM
 I have an R function defined as:
 
 f-function(x){
   return(dchisq(x,9,77)*((13.5/x)^5)*exp(-13.5/x))
 }
 
 Numerically integrating this function, I observed a couple
 of things:
 
 A) Integrating the function f over the entire positive real
 line gives an
 error:
  integrate(f,0,Inf)
 Error in integrate(f, 0, Inf) : the integral is probably
 divergent
 
 B)  Increasing the interval of integration actually
 decreases the value of
 the integral:
  integrate(f,0,10^5)
 9.813968e-06 with absolute error  1.9e-05
  integrate(f,0,10^6)
 4.62233e-319 with absolute error  4.6e-319
 
 
 Since the function f is uniformly positive, B) can not
 occur. Also, the
 theory tells me that the infinite integral actually exists
 and is finite, so
 A) can not occur. That means there are certain problems
 with the usage of
 function 'integrate' which I do not understand. The
 help document tells me
 that 'integrate' uses quadrature approximation to
 evaluate integrals
 numerically. Since I do not come from the numerical methods
 community, I
 would not know the pros and cons of various methods of
 quadrature
 approximation. One naive way that I thought of evaluating
 the above integral
 was by first locating the maximum of the function (may be
 by using
 'optimize' in R) and then applying the
 Simpson's rule to an interval around
 the maximum. However, I am sure that the people behind the
 R project and
 other users have much better ideas, and I am sure the
 'correct' method has
 already been implemented in R. Therefore, I would
 appreciate if someone can
 help me find it.
 
 
 Thanks
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained,
 reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Null and Alternate hypothesis for Significance test

2008-08-21 Thread Moshe Olshansky

Hi Nitin,

I believe that you can not have null hypothesis to be that A and B come from 
different distributions.
Asymptotically (as both sample sizes go to infinity) KS test has power 1, i.e. 
it will reject H0:A=B for any case where A and B have different distributions.
To work with a finite sample you must be more specific, i.e. your null 
hypothesis must be not that A and B just have different distributions but must 
be more specific, i.e that their means are different by at least something or 
that certain distance between their distributions is bigger than something, 
etc. and such hypotheses can be tested (and rejected).


--- On Fri, 22/8/08, Nitin Agrawal [EMAIL PROTECTED] wrote:

 From: Nitin Agrawal [EMAIL PROTECTED]
 Subject: [R] Null and Alternate hypothesis for Significance test
 To: r-help@r-project.org
 Received: Friday, 22 August, 2008, 6:58 AM
 Hi,
 I had a question about specifying the Null hypothesis in a
 significance
 test.
 Advance apologies if this has already been asked previously
 or is a naive
 question.
 
 I have two samples A and B, and I want to test whether A
 and B come from
 the same distribution. The default Null hypothesis would be
 H0: A=B
 But since I am trying to prove that A and B indeed come
 from the same
 distribution, I think this is not the right choice for the
 null hypothesis
 (it should be one that is set up to be rejected)
 
 How do I specify a null hypothesis H0: A not equal to B for
 say a KS test.
 An example to do this in R would be greatly appreciated.
 
 On a related note: what is a good way to measure the
 difference between
 observed and expected PDFs? Is the D statistic of the KS
 test a good choice?
 
 Thanks!
 Nitin
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained,
 reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How I can read the binary file with different type?

2008-08-21 Thread Moshe Olshansky

Hi Miao,

I can write a function which takes an integer and produces a float number whose 
binary representation equals to that of the integer, but this would be an 
awkward solution.
So if nobody suggests anything better I will write such a function for you, but 
let's wait for a better solution.


--- On Fri, 22/8/08, Bingxiang Miao [EMAIL PROTECTED] wrote:

 From: Bingxiang Miao [EMAIL PROTECTED]
 Subject: [R] How I can read the binary file with different type?
 To: r-help@r-project.org
 Received: Friday, 22 August, 2008, 11:51 AM
 Hi all,
 
 I  have a binary file  which have 8*100 bytes. The
 structure of the file is
 as follows: every eigth bytes represent 2 data:the first
 four bytes is the
 little-endian for integer, the next four bytes is the
 little-endian for
 floating32. The structure of the following bytes is as the
 same as the first
 eight bytes'.
 
 As the function readBin only read the binary file with one
 structure like
 this: readBin(my data
 file,what=integer,size=4,n=200) or
 readBin(my
 data file,what=numeric,size=4,n=200).But
 only one type of the data can be
 read properly.
 
 Can anyone suggest me how should I read this file?
 
 Thanks in advance.
 
 Miao
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained,
 reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Random sequence of days?

2008-08-20 Thread Moshe Olshansky

How about

d[sample(length(d),10)]

--- On Wed, 20/8/08, Lauri Nikkinen [EMAIL PROTECTED] wrote:

 From: Lauri Nikkinen [EMAIL PROTECTED]
 Subject: [R] Random sequence of days?
 To: [EMAIL PROTECTED]
 Received: Wednesday, 20 August, 2008, 4:04 PM
 Dear list,

 I tried to find a solution for this problem from the
 archives but
 couldn't find any. I would like sample sequence of ten
 days from
 vector d

 d - seq(as.Date(2007-02-12),
 as.Date(2008-08-18), by=days)

 so that the days follow each other (sample(d, 10) is not
 the
 appropriate solution). Any ideas?

 Best regards,
 Lauri

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained,
 reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Conversion - lowercase to Uppercase letters

2008-08-19 Thread Moshe Olshansky

Use toupper or tolower (see ?toupper, ?tolower)

--- On Wed, 20/8/08, suman Duvvuru [EMAIL PROTECTED] wrote:

 From: suman Duvvuru [EMAIL PROTECTED]
 Subject: [R] Conversion - lowercase to Uppercase letters
 To: r-help@r-project.org
 Received: Wednesday, 20 August, 2008, 2:19 PM
 I would like to know how to convert a string with characters
 to all
 uppercase or all lowercase? If anyone could let me know if
 there exists a
 function in R for the conversion, that will be very
 helpful.

 Regards,
 Suman

   [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained,
 reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] matrix row product and cumulative product

2008-08-18 Thread Moshe Olshansky

Hi Jeff,

If I understand correctly, the overhead of a loop is that at each iteration the 
command must be interpreted, and this time is independent of the number of rows 
N. So if N is small this overhead may be very significant but when N is large 
this should be very small compared to the time needed to multiply N pairs of 
numbers. 


--- On Mon, 18/8/08, Jeff Laake [EMAIL PROTECTED] wrote:

 From: Jeff Laake [EMAIL PROTECTED]
 Subject: [R] matrix row product and cumulative product
 To: r-help@r-project.org
 Received: Monday, 18 August, 2008, 12:49 PM
 I spent a lot of time searching and came up empty handed on
 the 
 following query. Is there an equivalent to rowSums that
 does product or 
 cumulative product and avoids use of apply or looping? I
 found a rowProd 
 in a package but it was a convenience function for apply.
 As part of a 
 likelihood calculation called from optim, I’m computing
 products and 
 cumulative products of rows of matrices with far more rows
 than columns. 
 I started with apply and after some thought realized that a
 loop of 
 columns might be faster and it was substantially faster
 (see below). 
 Because the likelihood function is called many times I’d
 like to speed 
 it up even more if possible.
 
 Below is an example showing the cumprod.matrix and
 prod.matrix looping 
 functions that I wrote and some timing comparisons to the
 use of apply 
 for different column and row dimensions. At this point
 I’m better off 
 with looping but I’d like to hear of any further
 suggestions.
 
 Thanks –jeff
 
   prod.matrix=function(x)
 + {
 + y=x[,1]
 + for(i in 2:dim(x)[2])
 + y=y*x[,i]
 + return(y)
 + }
 
   cumprod.matrix=function(x)
 + {
 + y=matrix(1,nrow=dim(x)[1],ncol=dim(x)[2])
 + y[,1]=x[,1]
 + for (i in 2:dim(x)[2])
 + y[,i]=y[,i-1]*x[,i]
 + return(y)
 + }
 
   N=1000
   xmat=matrix(runif(N),ncol=10)
   system.time(cumprod.matrix(xmat))
 user system elapsed
 1.07 0.09 1.15
   system.time(t(apply(xmat,1,cumprod)))
 user system elapsed
 29.27 0.21 29.50
   system.time(prod.matrix(xmat))
 user system elapsed
 0.29 0.00 0.30
   system.time(apply(xmat,1,prod))
 user system elapsed
 30.69 0.00 30.72
   xmat=matrix(runif(N),ncol=100)
   system.time(cumprod.matrix(xmat))
 user system elapsed
 1.05 0.13 1.18
   system.time(t(apply(xmat,1,cumprod)))
 user system elapsed
 3.55 0.14 3.70
   system.time(prod.matrix(xmat))
 user system elapsed
 0.38 0.01 0.39
   system.time(apply(xmat,1,prod))
 user system elapsed
 2.87 0.00 2.89
   xmat=matrix(runif(N),ncol=1000)
   system.time(cumprod.matrix(xmat))
 user system elapsed
 1.30 0.18 1.46
   system.time(t(apply(xmat,1,cumprod)))
 user system elapsed
 1.77 0.27 2.05
   system.time(prod.matrix(xmat))
 user system elapsed
 0.46 0.00 0.47
   system.time(apply(xmat,1,prod))
 user system elapsed
 0.7 0.0 0.7
   xmat=matrix(runif(N),ncol=1)
   system.time(cumprod.matrix(xmat))
 user system elapsed
 1.28 0.00 1.29
   system.time(t(apply(xmat,1,cumprod)))
 user system elapsed
 1.19 0.08 1.26
   system.time(prod.matrix(xmat))
 user system elapsed
 0.40 0.00 0.41
   system.time(apply(xmat,1,prod))
 user system elapsed
 0.57 0.00 0.56
   xmat=matrix(runif(N),ncol=10)
   system.time(cumprod.matrix(xmat))
 user system elapsed
 3.18 0.00 3.19
   system.time(t(apply(xmat,1,cumprod)))
 user system elapsed
 1.42 0.21 1.64
   system.time(prod.matrix(xmat))
 user system elapsed
 1.05 0.00 1.05
   system.time(apply(xmat,1,prod))
 user system elapsed
 0.82 0.00 0.81
   R.Version()
 $platform
 [1] i386-pc-mingw32
 .
 .
 .
 $version.string
 [1] R version 2.7.1 (2008-06-23)
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained,
 reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] A doubt about lm and the meaning of its summary

2008-08-18 Thread Moshe Olshansky

Hi Alberto,

Please disregard my previous note - I probably had a black-out!!!

--- On Tue, 19/8/08, Alberto Monteiro [EMAIL PROTECTED] wrote:

 From: Alberto Monteiro [EMAIL PROTECTED]
 Subject: [R] A doubt about lm and the meaning of its summary
 To: r-help@r-project.org
 Received: Tuesday, 19 August, 2008, 4:31 AM
 I have a conceptual problem (sort of). Maybe there's a
 simple solution, 
 maybe not.

 First, let me explain the test case that goes ok.

 Let x be a (fixed) n-dimensional vector. I simulate a lot
 of linear models, 
 y = m x + c + error, then I do a lot of regressions. As
 expected, the 
 estimated values of m (let's call them m.bar) are
 distributed according to a 
 Student's t distribution.

 More precisely (test case, it takes a few minutes to run):
 #
 # start with fixed numbers
 #
   m - sample(c(-1, -0.1, 0.1, 1), 1)
   c - sample(c(-1, -0.1, 0, 0.1, 1), 1)
   sigma - sample(c(0.1, 0.2, 0.5, 1), 1)
   n - sample(c(4,5,10,1000), 1)
   x - 1:n # it's possible to use other x
   NN - sample(c(3000, 4000, 5000), 1)
   m.bar - m.std.error - 0  # these vectors will
 hold the simulation output

 #
 # Now let's repeat NN times:
 # simulate y
 # regress y ~ x
 # store m.bar and its error
 #
   for (i in 1:NN) {
 y - m * x + c + sigma * rnorm(n)
 reg - lm(y ~ x)
 m.bar[i] - summary(reg)$coefficients['x',
 'Estimate']
 m.std.error[i] -
 summary(reg)$coefficients['x', 'Std. Error']
   }
 #
 # Finally, let's analyse it
 # The distribution of (m.bar - m) / m.std.error is
 # a Student's t with n - 2 degrees of freedom.
 # Is it? Let's test!
 #
   print(ks.test((m.bar - m) / m.std.error, rt(NN, n - 2)))

 Now, this goes ok, with ks.test returning a number
 uniformly distributed in 
 the 0-1 interval.

 Next, I did a slightly different model. This model is a
 reversion model, 
 where the simulated random variable lp goes along the
 equation:
 lp[t + 1] - (1 + m) * lp[t] + c + error

 I tried to use the same method as above, defining 
 x = lp[1:n], y = lp[2:(n+1}] - lp[1:n], with equation
 y - m * x + c + error

 And now it breaks. Test case (it takes some minutes to
 run):

 #
 # start with fixed numbers
 #
   m - runif(1, -1, 0)  # m must be something in the
 (-1,0) interval
   c - sample(c(-1, -0.1, 0, 0.1, 1), 1)
   sigma - sample(c(0.1, 0.2, 0.5, 1), 1)
   n - sample(c(4,5,10,1000), 1)
   NN - sample(c(3000, 4000, 5000), 1)
   m.bar - m.std.error - 0  # these vectors will
 hold the simulation output

 #
 # Now let's repeat NN times:
 # simulate y
 # regress y ~ x
 # store m.bar and its error
 #
   for (i in 1:NN) {
 lp - 0
 lp[1] - 0
 for (j in 1:n) {
   lp[j+1] = (1 + m) * lp[j] + c + sigma * rnorm(1)
 }
 x - lp[1:n]
 y - lp[-1] - x
 reg - lm(y ~ x)
 m.bar[i] - summary(reg)$coefficients['x',
 'Estimate']
 m.std.error[i] -
 summary(reg)$coefficients['x', 'Std. Error']
   }
 #
 # Finally, let's analyse it
 # The distribution of (m.bar - m) / m.std.error should be
 # a Student's t with n - 2 degrees of freedom.
 # Is it? Let's test!
 #
   print(ks.test((m.bar - m) / m.std.error, rt(NN, n - 2)))

 ... and now it's not.

 What's wrong with this? I suspect that the model y ~ x
 does only give 
 meaningful values when x is deterministic; in this case x
 is stochastic. Is 
 there any correct way to estimate this model giving
 meaningful outputs?

 Alberto Monteiro

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained,
 reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Vectorization of duration of the game in the gambler ruin's problem

2008-08-15 Thread Moshe Olshansky

Hi Jose,

If you are only interested in the expected duration, the problem can be solved 
analytically - no simulation is needed.
Let P be the probability to get total.capital (and then 1-P is the probability 
to loose all the money) when starting with initial.capital. This probability P 
is well known (I do not remember it now but I can derive the formula if you 
need - let me know). Let X(i) be the gain at game i and let D be the duration. 
Let S(n) = X(1)+...+X(n).
Since EX(i) = p - (1-p) = 2p-1, S(n) - n*(2p-1) is a martingale, and since D is 
a stopping time we get that E(S(D) - (2p-1)*D) = 0, so that (2p-1)*E(D) = 
E(S(D)) = P*(total.capital-initial.capital) + (1-P)*(-initial.capital), and so 
E(D) can be computed provided that p != 1/2.
If p = 1/2 then S(n) is a martingale and then by Wald's Lemma, E(S(D)^2) = 
E(D)*E(X^2) = E(D). Since E(S(D)^2) = P*(total.capital-initial.capital)^2 + 
(1-P)*(-initial.capital)^2, we can compute E(D).

Regards,

Moshe.

--- On Fri, 15/8/08, jose romero [EMAIL PROTECTED] wrote:

 From: jose romero [EMAIL PROTECTED]
 Subject: [R] Vectorization of duration of the game in the gambler ruin's 
 problem
 To: r-help@r-project.org
 Received: Friday, 15 August, 2008, 2:26 PM
 Hey fellas:
 
 In the context of the gambler's ruin problem, the
 following R code obtains the mean duration of the game, in
 turns:
 
 # total.capital is a constant, an arbitrary positive
 integer
 # initial.capital is a constant, an arbitrary positive
 integer between, and not including
 # 0 and total.capital
 # p is the probability of winning 1$ on each turn
 # 1-p is the probability of loosing 1$
 # N is a large integer representing the number of times to
 simulate
 # dur is a vector containing the simulated game durations
 
 
 T - total.capital
 dur - NULL
 for (n in 1:N) {
     x - initial.capital
     d - 0
     while ((x!=0)(x!=T)) {
    x -
 x+sample(c(-1,1),1,replace=TRUE,c(1-p,p))
    d - d+1
     }
    dur - c(dur,d)
 }
 mean(dur) #returns the mean duration of the game
 
 The problem with this code is that, using the traditional
 control structures (while, for, etc.) it is rather slow.
 Does anyone know of a way i could vectorize the
 while and the for to produce a
 faster code?
 
 And while I'm at it, does anyone know of a
 discrete-event simulation package in R such as the
 SimPy for Python?
 
 
 Thanks in advance
 
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained,
 reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] ignoring zeros or converting to NA

2008-08-13 Thread Moshe Olshansky

Since 0 can be represented exactly as a floating point number, there is no 
problem with something like x[x==0].
What you can not rely on is something like 0.1+0.2 == 0.3 to be TRUE.

--- On Thu, 14/8/08, Roland Rau [EMAIL PROTECTED] wrote:

 From: Roland Rau [EMAIL PROTECTED]
 Subject: Re: [R] ignoring zeros or converting to NA
 To: rcoder [EMAIL PROTECTED]
 Cc: r-help@r-project.org
 Received: Thursday, 14 August, 2008, 1:23 AM
 Hi,

 since many suggestions are following the form of
 x[x==0] (or similar)
 I would like to ask if this is really recommended?
 What I have learned (the hard way) is that one should not
 test for 
 equality of floating point numbers (which is the default
 for R's numeric 
 values, right?) since the binary representation of these
 (decimal) 
 floating point numbers is not necessarily exact (with the
 classic 
 example of decimal 0.1).
 Is it okay in this case for the value zero where all binary
 elements are 
 zero? Or does R somehow recognize that it is an integer?

 Just some questions out of curiosity.

 Thank you,
 Roland

 rcoder wrote:
  Hi everyone,

  I have a matrix that has a combination of zeros and
 NAs. When I perform
  certain calculations on the matrix, the zeros generate
 Inf values. Is
  there a way to either convert the zeros in the matrix
 to NAs, or only
  perform the calculations if not zero (i.e. like using
 something similar to
  an !all(is.na() construct)?

  Thanks,

  rcoder

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained,
 reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] missing TRUE/FALSE error in conditional construct

2008-08-13 Thread Moshe Olshansky

The problem is that if x is either NA or NaN then x != 0 is NA (and not FALSE 
or TRUE) and the function is.nan tests for a NaN but not for NA, i.e. 
is.nan(NA) returns FALSE.

You can do something like:

mat_zeroless[!is.na(mat)  mat != 0] - mat[!is.na(mat)  mat != 0]


--- On Thu, 14/8/08, rcoder [EMAIL PROTECTED] wrote:

 From: rcoder [EMAIL PROTECTED]
 Subject: [R]  missing TRUE/FALSE error in conditional construct
 To: r-help@r-project.org
 Received: Thursday, 14 August, 2008, 8:16 AM
 Hi everyone,
 
 I posted something similar to this in reply to another
 post, but there seems
 to be a problem getting it onto the board, so I'm
 starting a new post.
 
 I am trying to use conditional formatting to select
 non-zero and non-NaN
 values from a matrix and pass them into another matrix. The
 problem is that
 I keep encountering an error message indicating the
 :missing value where
 TRUE/FALSE needed 
 
 My code is as follows:
 
 ##Code Start 
 mat_zeroless-matrix(NA,5000,2000) #generating holding
 matrix 
 
 ##Assume source matrix containing combination of values,
 NaNs and zeros##
 for (j in 1:5000) 
 { 
 for (k in 1:2000) 
 { 
 if(mat[j,k]!=0  !is.NaN(mat[j,k]))
 {mat_zeroless[j,k]-mat[j,k]} 
 } 
 } 
 ##Code End 
 
 Error in if (mat[j,k] !=0  !is.NaN(mat[j,k])) {
 :missing value where
 TRUE/FALSE needed 
 
 I'm not sure how to resolve this.
 
 rcoder 
 -- 
 View this message in context:
 http://www.nabble.com/missing-TRUE-FALSE-error-in-conditional-construct-tp18972244p18972244.html
 Sent from the R help mailing list archive at Nabble.com.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained,
 reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Covariance matrix

2008-08-07 Thread Moshe Olshansky

Just interchange rows 2 and 3 and then columns 2 and 3 of the original 
covariance matrix.

--- On Fri, 8/8/08, Zhang Yanwei - Princeton-MRAm [EMAIL PROTECTED] wrote:

 From: Zhang Yanwei - Princeton-MRAm [EMAIL PROTECTED]
 Subject: [R] Covariance matrix
 To: r-help@r-project.org r-help@r-project.org
 Received: Friday, 8 August, 2008, 12:18 AM
 Hi all,
Assume I have a random vector with four variables, i.e.
 A=(a,b,c,d). I am able to get the covariance matrix of
 vector A, but how can I get the covariance matrix of vector
 B=(a,c,b,d) by manipulating the corresponding covariance
 matrix of A? Thanks.

 Sincerely,
 Yanwei Zhang
 Department of Actuarial Research and Modeling
 Munich Re America
 Tel: 609-275-2176
 Email:
 [EMAIL PROTECTED]mailto:[EMAIL PROTECTED]

   [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained,
 reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] simulate data based on partial correlation matrix

2008-08-05 Thread Moshe Olshansky

Hi Benjamin,

Creating 0 correlations is easier and always possible, but creating arbitrary 
correlations can be done as well (when possible - see below).
Suppose that x1,x2,x3,x4 have mean 0 and suppose that the desired correlations 
are r = (r1,r2,r3,r4). Let A be an orthogonal 4x4 matrix such that 
(y1,y2,y3,y4) = (x1,x2,x3,x4)*A are orthonormal (i.e. the norm of y1,y2,y3,y4 
is 1 and they are orthogonal to each other - this can be done if x1,x2,x3,x4 
are independent). Then the correlations between z and y1,y2,y3,y4 should be q = 
(q1,q2,q3,q4) = (r1,r2,r3,r4)*A. Let now v be any vector which has norm 1, mean 
0 and is orthogonal to y1,y2,y3,y4 (it exists and can be easily found - see 
below). Now let z = a1*y1 + a2*y2 + a3*y3 +a4*y4 + a*v where a1,a2,a3,a4,a are 
scalars and assume that the norm of z is 1. In order for the correlations 
between z and y1,y2,y3,y4 to be q1,q2,q3,q4 we must have 
a1=q1,a2=q2,a3=q3,a4=q4. Moreover, the square of the norm of z is q1^2 + q2^2 + 
q3^2 + q4^2 + a^2 = 1, which means that q1^2 + q2^2 + q3^2 + q4^2 = 1.
This condition is necessary and sufficient for a solution to exist!!!
So now we need to things: (a) to find the matrix A and (b) to find the vector v.
(a) Let M = (x1,x2,x3,x4) be a nx4 matrix, Then t(M)*M is a 4x4 matrix which is 
positive definite if x1,x2,x3,x4 are independent (let's assume this). Then the 
exists a unitary matrix U such that t(M)*M = t(U)*D*U where D is a diagonal 
matrix (U and D can be found using the function eigen). In such a case A = 
t(U)*(1/sqrt(D))*U.
(b) To find v, test Vi = (-1,0,..,0,1,0,0,...,0) where the 1 is at place i. We 
need to find Vi which is independent of y1,y2,y3,y4. Most probably any of 
V2,V3,... is independent of y1,...,y4, but in any case at least one of 
V2,V3,V4,V5,V6 is independent of them. To check which one compute (y1%*%V)^2 + 
... + (y4%*%V)^2 (where %*% is dot product). If this sum is noticeably less 
than 2 (= V%*%V) then this is what we need (let's say it is less than 1.99). 
Now take v = V - (V%*%y1)*y1 - (V%*%y2)*y2- (V%*%y3)*y3 - (V%*%y4)*y4 and 
normalize v so that it's norm is 1.

Regards,

Moshe.


--- On Tue, 5/8/08, Benjamin Michael Kelcey [EMAIL PROTECTED] wrote:

 From: Benjamin Michael Kelcey [EMAIL PROTECTED]
 Subject: [R] simulate data based on partial correlation matrix
 To: r-help@r-project.org
 Received: Tuesday, 5 August, 2008, 6:24 AM
 Given four known and fixed vectors, x1,x2,x3,x4, I am trying
 to  
 generate a fifth vector,z, with specified known and fixed
 partial  
 correlations.
 How can I do this?
 
 In the past I have used the following (thanks to Greg Snow)
 to  
 generate a fifth vector based on zero order
 correlations---however I'd  
 like to modify it so that it can generate a fifth vector
 with specific  
 partial correlations rather than zero order correlations:
 
 # create x1-x4
 x1 - rnorm(100, 50, 3)
 x2 - rnorm(100) + x1/5
 x3 - rnorm(100) + x2/5
 x4 - rnorm(100) + x3/5
 
 # find current correlations
 cor1 - cor( cbind(x1,x2,x3,x4) )
 cor1
 
 # create 1st version of z
 z - rnorm(100)
 # combine in a matrix
 m1 - cbind( x1, x2, x3, x4, z )
 
 # center and scale
 m2 - scale(m1)
 
 # find cholesky decomp
 c1 - chol(var(m2))
 
 # force to be independent
 m3 - m2 %*% solve(c1)
 
 # create new correlation matrix:
 cor2 - cbind( rbind( cor1, z=c(.5,.3,.1,.05) ),
 z=c(.5,.3,.1,.05,1) )
 
 # create new matrix
 m4 - m3 %*% chol(cor2)
 
 # uncenter and unscale
 m5 - sweep( m4, 2, attr(m2, 'scaled:scale'),
 '*')
 m5 - sweep( m5, 2, attr(m2, 'scaled:center'),
 '+')
 
 ##Check they are equal
 zapsmall(cor(m5))==zapsmall(cor2)
 
 Thanks, ben
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained,
 reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] stats question

2008-07-31 Thread Moshe Olshansky

Hello Jason,

You are not specific enough. What do you mean by significant difference?
Let's assume that indeed the incidence in A is 6% and in B is 10% and we are 
looking for Na and Nb such that with probability of at least 80% the mean of Nb 
sample from B will be at least, say, 0.03 (=3%) above the mean of Na sample 
from A.
The solution is not unique.
If Mb is the mean of the sample from B and Ma is the one from A, using Normal 
approximation we get the Mb is approximately normal with mean 0.10 and variance 
0.1*0.9/Nb and Ma is approximately normal with mean 0.06 and variance 
0.06*0.94/Na, so Mb - Ma is approximately normal with mean 0.04 and variance 
0.09/Nb + 0.0564/Na. So let V be the maximal variance for which the probability 
that a normal rv with mean 0.04 and variance V is above 0.03 equals 0.80 
(finding such V is straightforward). Then you must choose Na and Nb which 
satisfy 0.09/Nb + 0.0564/Na = V. One such choice is Nb = 2*0.09/V, Na = 
2*0.0564/V.

As I said, this solution is only approximate and probably not optimal, so see 
what other people say.

Regards,

Moshe.


--- On Thu, 31/7/08, Iasonas Lamprianou [EMAIL PROTECTED] wrote:

 From: Iasonas Lamprianou [EMAIL PROTECTED]
 Subject: [R] stats question
 To: r-help@r-project.org
 Received: Thursday, 31 July, 2008, 2:46 PM
 Dear friends, 
 I am not sure that this is the right place to ask,  but
 please feel free to suggest an alternative discussion
 group.
 My question is that I want to do a comparative study in
 order to compare the rate of incidence in two populations. I
 know that a pilot study was conducted a few weeks ago and
 found 8/140 (around 6%) incidence in population A.
 Population B was not sampled. Assuming this is (about) the
 right proportion in the Population A what is the sample
 size I need for population A and B in the main study, in
 order to have power of 80% to idenitfy
 significant differences? I would expect the incidence in
 population B to be around 10% compared to the 6% of the
 Population A.
 Any suggestions? How can I do this in R?
  
 Jason
 
 
  
 __
 Not happy with your email address?.
 Get the one you really want - millions of new email
 addresses available now at Yahoo!
 http://uk.docs.yahoo.com/ymail/new.html
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained,
 reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Code to calculate internal rate of return

2008-07-31 Thread Moshe Olshansky

You can use uniroot (see ?uniroot).

As an example, suppose you have a $100 bond which pays 3% every half year (6% 
coupon) and lasts for 4 years. Suppose that it now sells for $95. In such a 
case your time intervals are 0,0.5,1,...,4 and the payoffs are: 
-95,3,3,...,3,103.
To find internal rate of return you can do the following:

 tim - (0:8)/2
 tim
[1] 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0
 pay - 3 + 0*tim
 pay[1] - -95
 pay[9] - 103
 pay
[1] -95   3   3   3   3   3   3   3 103
 f - function(r) sum(pay*exp(-r*tim))
 z - uniroot(f,c(0,1))
 z
$root
[1] 0.07329926

$f.root
[1] 0.01035543

$iter
[1] 5

$estim.prec
[1] 6.103516e-05


So the internal rate of return is 0.07329926 (z$root) = 7.33% (continuously 
compound).


--- On Fri, 1/8/08, Thomas E [EMAIL PROTECTED] wrote:

 From: Thomas E [EMAIL PROTECTED]
 Subject: [R]  Code to calculate internal rate of return
 To: r-help@r-project.org
 Received: Friday, 1 August, 2008, 2:05 AM
 Hi all.
 
 I am an R newbie and trying to grasp how the simple
 optimization routines in
 R work. Specifically, I would like some guidance on how to
 set up a code to
 calculate the internal rate of return on an investment
 project
 (http://en.wikipedia.org/wiki/Internal_rate_of_return).
 
 My main problem (I think) is that I want a generic code
 where N (number of
 periods) can be easily changed and set within the code,
 
 Hope this makes sense - any help appreciated !!!
 
 Thomas
 -- 
 View this message in context:
 http://www.nabble.com/Code-to-calculate-internal-rate-of-return-tp18757967p18757967.html
 Sent from the R help mailing list archive at Nabble.com.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained,
 reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] cutting out numbers from vectors

2008-07-31 Thread Moshe Olshansky

This is something that is easier done in C than in R (to the best of my very 
limited knowledge).

To do this in R you could do something like:

 x - 082-232-232-1
 y -unlist(strsplit(x,))
 i - which(y != 0)[1]-1
 paste(y[-(1:i)],collapse=)
[1] 82-232-232-1



--- On Fri, 1/8/08, calundergrad [EMAIL PROTECTED] wrote:

 From: calundergrad [EMAIL PROTECTED]
 Subject: [R]  cutting out numbers from vectors
 To: r-help@r-project.org
 Received: Friday, 1 August, 2008, 6:40 AM
 i have a vector with values similar to the below text
 [1] 001-010-001-0
 
 I want to get rid of all leading zeroes.  for example i
 want to change the
 values of the vector so that [1] 001-010-001-0 becomes [1]
 1-010-001-0.  
 
 Another example
 [1]082-232-232-1 becomes [1] 82-232-232-1
 
 how do i do this?
 -- 
 View this message in context:
 http://www.nabble.com/cutting-out-numbers-from-vectors-tp18763058p18763058.html
 Sent from the R help mailing list archive at Nabble.com.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained,
 reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Grouping Index of Matrix Based on Certain Condition

2008-07-31 Thread Moshe Olshansky

y - 2 - (x[,1]  x[,2])

you can also do

cbind(x,y) 

if you wish.


--- On Fri, 1/8/08, Gundala Viswanath [EMAIL PROTECTED] wrote:

 From: Gundala Viswanath [EMAIL PROTECTED]
 Subject: [R] Grouping Index of Matrix Based on Certain Condition
 To: [EMAIL PROTECTED]
 Received: Friday, 1 August, 2008, 1:00 PM
 Hi,
 
 I have the following (M x N) matrix, where M = 10 and N =2
 What I intend to do is to group index of (M) based on this
 condition
 of x_mn , namely
 
 For each M,
 If x_m1  x_m2, assign index of M to Group1
 otherwise assign index of M into Group 2
 
 
  x
[,1]   [,2]
   [1,] 4.482909e-01 0.55170907
   [2,] 9.479594e-01 0.05204063
   [3,] 8.923553e-01 0.10764474
   [4,] 9.295003e-01 0.07049966
   [5,] 8.880434e-01 0.11195664
   [6,] 9.197367e-01 0.08026327
   [7,] 9.431232e-01 0.05687676
   [8,] 9.460356e-01 0.05396442
   [9,] 6.053829e-01 0.39461708
  [10,] 9.515173e-01 0.04848268
 
 Is there a simple way to do it in R?
 - Gundala Viswanath
 Jakarta - Indonesia
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained,
 reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] cutting out numbers from vectors

2008-07-31 Thread Moshe Olshansky

Yes, this is how it should be done!

--- On Fri, 1/8/08, Christos Hatzis [EMAIL PROTECTED] wrote:

 From: Christos Hatzis [EMAIL PROTECTED]
 Subject: Re: [R] cutting out numbers from vectors
 To: 'calundergrad' [EMAIL PROTECTED], r-help@r-project.org
 Received: Friday, 1 August, 2008, 2:11 PM
 Have a look at gsub:

  x - 001-010-001-0
  gsub(^0+, , x)
 [1] 1-010-001-0 

 -Christos

  -Original Message-
  From: [EMAIL PROTECTED] 
  [mailto:[EMAIL PROTECTED] On Behalf Of
 calundergrad
  Sent: Thursday, July 31, 2008 4:40 PM
  To: r-help@r-project.org
  Subject: [R] cutting out numbers from vectors

  i have a vector with values similar to the below text
 [1] 
  001-010-001-0

  I want to get rid of all leading zeroes.  for example
 i want 
  to change the values of the vector so that [1]
 001-010-001-0 
  becomes [1] 1-010-001-0.  

  Another example
  [1]082-232-232-1 becomes [1] 82-232-232-1

  how do i do this?
  --
  View this message in context: 

 http://www.nabble.com/cutting-out-numbers-from-vectors-tp18763
  058p18763058.html
  Sent from the R help mailing list archive at
 Nabble.com.

  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide 
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained,
 reproducible code.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained,
 reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Urgent

2008-07-30 Thread Moshe Olshansky

Hi Yunlei,

Is your problem constrained or not?
If it is unconstrained and your matrix is not positive definite, the minimum is 
unbounded (unless you are extremely lucky and the matrix is positive 
semi-definite and the vector which multiplies the unknowns is exactly 
perpendicular to all the eigenvectors with 0 eigenvalues).
If the problem is constrained you may use regular optimization functions (like 
optim).

Regards,

Moshe.


--- On Tue, 22/7/08, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote:

 From: [EMAIL PROTECTED] [EMAIL PROTECTED]
 Subject: [R] Urgent
 To: r-help@r-project.org
 Received: Tuesday, 22 July, 2008, 6:12 PM
 Hi everyone
 
 I try to use the solve.QP to solve the quadratic
 programming problem.
 But the Dmat in the funciotn is non-positive definite.
 
 Can anyone tell me which R function can deal with the
 quadratic
 programming with non positive definite Dmat matrix ( in
 solve.QP)
 
 I really appreaciated.
 
 Yunlei Hu
 
 ___
 
 This e-mail may contain information that is confidential,
 privileged or otherwise protected from disclosure. If you
 are not an intended recipient of this e-mail, do not
 duplicate or redistribute it by any means. Please delete it
 and any attachments and notify the sender that you have
 received it in error. Unless specifically indicated, this
 e-mail is not an offer to buy or sell or a solicitation to
 buy or sell any securities, investment products or other
 financial product or service, an official confirmation of
 any transaction, or an official statement of Barclays. Any
 views or opinions presented are solely those of the author
 and do not necessarily represent those of Barclays. This
 e-mail is subject to terms available at the following link:
 www.barcap.com/emaildisclaimer. By messaging with Barclays
 you consent to the foregoing.  Barclays Capital is the
 investment banking division of Barclays Bank PLC, a company
 registered in England (number 1026167) with its registered
 offi!
  ce at 1 Churchill Place, London, E14 5HP.  This email may
 relate to or be sent from other members of the Barclays
 Group.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained,
 reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Sampling two exponentials

2008-07-30 Thread Moshe Olshansky

I am not sure that this is well defined.

For a multivariate normal distribution (which is well defined), the covariance 
matrix (and the means vector) fully determine the distribution. In the 
exponential case, what is multivariate (bivariate) exponential distribution?  I 
believe that knowing that the two marginal distributions are exponential and 
knowing the covariance matrix does not uniquely determine the joint 
distribution.

On the other hand, if you want just any such pair, this is not difficult (at 
least if we want the correlation to be non-negative).
Let X and Z be independent exponential variables with lambda = 1 and let W be 
independent of X and Z with P(W=1) = p = 1 - P(W=0). Now let Y = X if W = 1 and 
Y = Z if W = 0. It is easy to see that Y is also exponential with lambda = 1 
and the correlation coefficient between X and Y is p. Now multiply X and Y by 
appropriate constants to get the desired covariance matrix (correlation 
coefficient won't be affected).


--- On Thu, 31/7/08, Ben Bolker [EMAIL PROTECTED] wrote:

 From: Ben Bolker [EMAIL PROTECTED]
 Subject: Re: [R] Sampling two exponentials
 To: [EMAIL PROTECTED]
 Received: Thursday, 31 July, 2008, 10:40 AM
 Zhang Yanwei - Princeton-MRAm YZhang at
 munichreamerica.com writes:
 
  
  Hi all,
I am going to sample two variables from two
 exponential distributions, but I
 want to specify a covariance
  structure between these two variables. Is there any
 way to do it in R? Or is
 there a Multivariate
  Exponential thing corresponding to the
 multivariate normal? Thanks in advance.
  
 
   It's not at all easy, but I'd suggest that you do
 some reading
 about copulas (and take a look at the copula package).
 
   Ben Bolker
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained,
 reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] product of successive rows

2008-07-29 Thread Moshe Olshansky

Assuming that the number of rows is even and that your matrix is A, 
element-wise product of pairs of rows can be calculated as

A[seq(1,nrow(A),by=2),]*A{seq(2,nrow(A),by=2),]

--- On Mon, 28/7/08, rcoder [EMAIL PROTECTED] wrote:

 From: rcoder [EMAIL PROTECTED]
 Subject: [R]  product of successive rows
 To: r-help@r-project.org
 Received: Monday, 28 July, 2008, 8:20 AM
 Hi everyone,

 I want to perform an operation on a matrx that outputs the
 product of
 successive pairs of rows. For example: calculating the
 product between rows
 1  2; 3  4; 5  6...etc.

 Does anyone know of any readily available functions that
 can do this?

 Thanks,

 rcoder

 -- 
 View this message in context:
 http://www.nabble.com/product-of-successive-rows-tp18681259p18681259.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained,
 reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] finding a faster way to do an iterative computation

2008-07-29 Thread Moshe Olshansky

Try

abs(outer(xk,x,-))

(see ?outer)


--- On Wed, 30/7/08, dxc13 [EMAIL PROTECTED] wrote:

 From: dxc13 [EMAIL PROTECTED]
 Subject: [R]  finding a faster way to do an iterative computation
 To: r-help@r-project.org
 Received: Wednesday, 30 July, 2008, 4:12 AM
 useR's,
 
 I am trying trying to find out if there is a faster way to
 do a certain
 computation.  I have successfully used FOR loops and the
 apply function to
 do this, but it can take some time to fully compute, but I
 was wondering if
 anyone may know of a different function or way to do this:
  x
 [1] 1 2 3 4 5
  xk
  [1] 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0
 
 I want to do: abs(x-xk[i]) where i = 1 to length(xk)=13. 
 It should result
 in a 13 by 5 matrix or data frame.  Does anyone have a
 quicker solution
 than FOR loops or apply()?
 
 Much appreciation and thanks,
 dxc13
 -- 
 View this message in context:
 http://www.nabble.com/finding-a-faster-way-to-do-an-iterative-computation-tp18718233p18718233.html
 Sent from the R help mailing list archive at Nabble.com.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained,
 reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Chi-square parameter estimation

2008-07-29 Thread Moshe Olshansky

If v is your vector of sample variances (and assuming that their distribution 
is chi-square) you can define
f(df) - sum(dchisq(v,df,log=TRUE))
and now you need to maximize f, which can be done using any optimization 
function (like optim).


--- On Sat, 26/7/08, Julio Rojas [EMAIL PROTECTED] wrote:

 From: Julio Rojas [EMAIL PROTECTED]
 Subject: [R] Chi-square parameter estimation
 To: r-help@r-project.org
 Received: Saturday, 26 July, 2008, 12:03 AM
 Hi. I have made 100 experiments of an M/M/1 queue, and for
 each one I have calculated both, mean and variance of the
 queue size. Now, a professor has told me that variance is
 usually chi-squared distributed. Is there a way in R that I
 can find the parameter that best fits a chi-square to the
 variance data? I know there's fitdistr()m but this
 function doesn't handle chi-square. I believe the mean
 estimator for the chi-square is df (degrees of freedom).
 The theoretical variance for an M/M/1 queue with
 lambda=30/33 is ~108. So, should I expect the chi-square
 with parameter 108 is the one that best fits the data?
 
 Thanks a lot for your help.
 
 
 
 
 
  
 
 Yahoo! MTV Blog  Rock gt;¡Cuéntanos tu
 historia, inspira una canción y gánate un viaje a los
 Premios MTV! Participa aquí http://mtvla.yahoo.com/
   [[alternative HTML version
 deleted]]__
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained,
 reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Constrained coefficients in lm (correction)

2008-07-23 Thread Moshe Olshansky

This problem can be easily solved analytically:

we want to minimize sum(res(i) -a*st(i) -b*mod(i))^2 subject to a+b=1,a,b=0, 
so we want to minimize

f(a) = sum((res(i)-mod(i)) - a*(st(i)-mod(i)))^2 for 0=a=1

Define Xi = res(i) - mod(i),  Yi = st(i) - mod(i), then

f(a) = sum(Xi - a*Yi)^2

f(0) = sum(Xi^2), f(1) = sum(Xi-Yi)^2, f'(a) = -2*sum(Yi*(Xi-aYi)), so

f'(a) = 0 for a* = sum(Xi*Yi)/sum(Yi^2)

So if 0 = a* = 1 then a* is the solution, otherwise take a=0 or a=1 depending 
on whether f(0)  f(1) or not.






--- On Thu, 24/7/08, Jorge Ivan Velez [EMAIL PROTECTED] wrote:

 From: Jorge Ivan Velez [EMAIL PROTECTED]
 Subject: [R] Constrained coefficients in lm (correction)
 To: R mailing list r-help@r-project.org
 Received: Thursday, 24 July, 2008, 5:39 AM
 Dear list,
 
 In my previous email, the model I'd like to estimate is
 rea=a*st+b*mod+error, where a+b=1 and a,b0. My
 apologies for the
 misunderstanding.
 
 Thanks for all your help,
 
 Jorge
 
 
 On Wed, Jul 23, 2008 at 3:35 PM, Jorge Ivan Velez
 [EMAIL PROTECTED]
 wrote:
 
 
  Dear list,
 
  I have a data set which looks like myDF (see below)
 and I'd like to
  estimate the linear model rea=a*rea+b*mod+error, where
 a+b=1 and a,b0. I
  tried
 
  mymodel=lm(rea~ st + mod-1, data=myDF)
  summary(mymodel)
 
  but I couldn't get what I'm looking for. 
 Also, I tried
  RSiteSearch(constrained coefficients in
 lm) which produces 20 hints, but
  they're not what I need.
 
  Any help would be appreciated.
 
  Thanks in advance,
 
 
  Jorge
 
 
  # Data set to estimate the model
  myDF=read.table(textConnection(rea st 
 mod
   14482734 14305907 14761000
   14427969 14502228 14848024
   14698723 14408264 14259492
   15741064 14797138 14565303
   15914249 16162714 14348592
   16475594 15977623 15592229
   17124613 16688456 14988151
   17575281 17383822 15647240
   17721548 17757635 15624907
   18178273 17779858 15598802
   18005810 18364233 16363321
   18032618 17938963 16536844
   17737470 18043063 17096089
   17652014 17624039
 17056355),header=TRUE,sep=)
  closeAllConnections()
 
 
 
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained,
 reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] spectral decomposition for near-singular pd matrices

2008-07-16 Thread Moshe Olshansky

How large is your matrix?

Are the very small eigenvalues well separated?

If your matrix is not very small and the lower eigenvalues are clustered, this 
may be a really hard problem! You may need a special purpose algorithm and/or 
higher precision arithmetic.
If your matrix is A and there exists a sequence of matrices A1,A2,...An = A 
such that A1 is good, A2 is a bit worse (and is not very different from A1), 
etc., you may be able to compute the eigensystem for A1 and then track it down 
to A2, A3,...,An. I have worked on such problems some 15 years ago but I 
believe that by now there should be packages doing this (I do not know whether 
they exist in R).


--- On Thu, 17/7/08, Prasenjit Kapat [EMAIL PROTECTED] wrote:

 From: Prasenjit Kapat [EMAIL PROTECTED]
 Subject: [R] spectral decomposition for near-singular pd matrices
 To: [EMAIL PROTECTED]
 Received: Thursday, 17 July, 2008, 7:27 AM
 Hi,
 
 I have a matrix M, quite a few ( 1/4th) of its eigen
 values are of
 O(10^-16). Analytically I know that M is positive definite,
 but
 numerically of course it is not. Some of the small
 (O(10^-16)) eigen
 values (as obtained from eigen()) are negative. It is the
 near-singularity that is causing the numerical errors. I
 could use
 svd(), but then the left ($u) and right ($v) vectors are
 not
 identical, again numerical errors. Is there any function
 that imposes
 pd property while calculating the eigen decomposition.
 
 Thanks,
 PK
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained,
 reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] spectral decomposition for near-singular pd matrices

2008-07-16 Thread Moshe Olshansky

Your problem seems to be pretty hard (for larger dimensions)!

When speaking about the A1,A2,...,An sequence I did not mean representing A as 
a product of A1*A2*...*An. What I mean is that A1,...,An is a sequence of 
matrices which starts at a good matrix, i.e. it's eigenvalues are well 
separated, then A2 is not too different from A1 and having already found the 
eigensystem (or a part of it) for A1 you can use them to find that system for 
A2 (without doing it from scratch), A3 is not too much different from A2 and so 
having the eigensystem for A2 you get ne for A3, etc. until you get to the last 
matrix which is A.

We had a matrix arising from elasticity. When the elasticity coefficient mu was 
small the matrix was good but when mu approached 0.5, half of the eigenvalues 
approached 0 (for mu = 0.5 half of the eigenvalues are really 0).
Starting from something like mu = 0.4 we were able to track several selected 
(lowest) eigenvalues up to something like mu = 0.4999, while no other program 
(at that time) could find the eigensystem of such matrix even for much less 
problematic values of mu.


--- On Thu, 17/7/08, Prasenjit Kapat [EMAIL PROTECTED] wrote:

 From: Prasenjit Kapat [EMAIL PROTECTED]
 Subject: Re: [R] spectral decomposition for near-singular pd matrices
 To: [EMAIL PROTECTED]
 Received: Thursday, 17 July, 2008, 10:56 AM
 Moshe Olshansky m_olshansky at yahoo.com
 writes:
 
  How large is your matrix?
 
 Right now I am looking at sizes between 30x30 to 150x150,
 though it will 
 increase in future.
 
  Are the very small eigenvalues well separated?
  
  If your matrix is not very small and the lower
 eigenvalues are clustered, 
 this may be a really hard problem!
 
 Here are the deciles of the eigenvalues (using eigen())
 from a typical random 
 generation (30x30):
   Min  10th 20th 30th  
   40th
 -1.132398e-16 -6.132983e-17 3.262002e-18 1.972702e-17
 8.429709e-17
  50th 60th  70th 80th 
 90th Max
  2.065645e-13  1.624553e-09 4.730935e-06   0.00443353
 0.9171549 16.5156
 
 I am guessing they are *not* well-separated. 
 
  You may need a special purpose algorithm and/or higher
 precision arithmetic.
  If your matrix is A and there exists a sequence of
 matrices A1,A2,...An = A 
 such that A1 is good, A2 is a bit
  worse (and is not very different from A1), etc., you
 may be able to compute 
 the eigensystem for A1 and then
  track it down to A2, A3,...,An. I have worked on such
 problems some 15 years 
 ago but I believe that by now
  there should be packages doing this (I do not know
 whether they exist in R).
 
 I will have to think on the possibility of such a
 product-decomposition. But 
 even if it were possible, it would be an infinite product
 (just thinking out 
 loud).
 
 Thanks again,
 PK
 
 PS: Kindly CC me when replying.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained,
 reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] rounding

2008-07-10 Thread Moshe Olshansky

The problem is that neither 0.55 nor 2.55 are exact machine numbers (the 
computer uses binary representation), so it may happen that the machine 
representation of 0.55 is slightly less than 0.55 while the machine 
representation of 2.55 is slightly above 2.55.

--- On Fri, 11/7/08, Korn, Ed (NIH/NCI) [E] [EMAIL PROTECTED] wrote:

 From: Korn, Ed (NIH/NCI) [E] [EMAIL PROTECTED]
 Subject: [R] rounding
 To: [EMAIL PROTECTED]
 Received: Friday, 11 July, 2008, 6:07 AM
 Hi,

 Round(0.55,1)=0.5

 Round(2.55,1)=2.6

 Can this be right?

 Thanks,

 Ed

   [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained,
 reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] rounding

2008-07-10 Thread Moshe Olshansky

Actually, your result is strange, since if x = 0.55 then a hexadecimal 
representation of x*100 is 404B8000 0001 which is above 55, meaning that x 
is above 0.55 and should have been rounded to 0.6, while if x = 2.55 then the 
hexadecimal representation of x*100 is 406FDFFF  which is below 255, so 
that x is less than 2.55 and should have been rounded to 2.5.


--- On Fri, 11/7/08, Moshe Olshansky [EMAIL PROTECTED] wrote:

 From: Moshe Olshansky [EMAIL PROTECTED]
 Subject: Re: [R] rounding
 To: [EMAIL PROTECTED], Korn, Ed (NIH/NCI) [E] [EMAIL PROTECTED]
 Received: Friday, 11 July, 2008, 10:39 AM
 The problem is that neither 0.55 nor 2.55 are exact machine
 numbers (the computer uses binary representation), so it
 may happen that the machine representation of 0.55 is
 slightly less than 0.55 while the machine representation of
 2.55 is slightly above 2.55.
 
 
 --- On Fri, 11/7/08, Korn, Ed (NIH/NCI) [E]
 [EMAIL PROTECTED] wrote:
 
  From: Korn, Ed (NIH/NCI) [E]
 [EMAIL PROTECTED]
  Subject: [R] rounding
  To: [EMAIL PROTECTED]
  Received: Friday, 11 July, 2008, 6:07 AM
  Hi,
   
  Round(0.55,1)=0.5
   
  Round(2.55,1)=2.6
   
  Can this be right?
   
  Thanks,
   
  Ed
  
  [[alternative HTML version deleted]]
  
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained,
  reproducible code.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained,
 reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] number of effective tests

2008-07-10 Thread Moshe Olshansky

It looks like SR, SU and ST are strongly correlated to each other, as well as 
DR, DU and DT.
You can try to do PCA on your 6 variables, pick the first 2 principal 
components as your new variables and use them for regression.

--- On Fri, 11/7/08, Georg Ehret [EMAIL PROTECTED] wrote:

 From: Georg Ehret [EMAIL PROTECTED]
 Subject: [R] number of effective tests
 To: r-help [EMAIL PROTECTED]
 Received: Friday, 11 July, 2008, 11:46 AM
 Dear R community,
I am using 6 variables to test for an effect (by
 linear regression).
 These 6 variables are strongly correlated among each other
 and I would like
 to find out the number of independent test that I perform
 in this
 calcuation. For this I calculated a matrix of correlation
 coefficients
 between the variables (see below). But to find the rank of
 the table in R is
 not the right approach... What else could I do to find the
 effective number
 of independent tests?
 Any suggestion would be very welcome!
 Thanking you and with my best regards, Georg.

  for (a in 1:6){
 + for (b in 1:6){
 +
 r[a,b]-summary(lm(unlist(d[a])~unlist(d[b])),na.action=na.exclude)$adj.r.squared
 + }
 + }

  r
   SRSUSTDRDU   
 DT
 SR 1.000 0.9636642 0.9554952 0.2975892 0.3211303
 0.3314694
 SU 0.9636642 1.000 0.9101678 0.3324979 0.3331389
 0.3323826
 ST 0.9554952 0.9101678 1.000 0.2756876 0.3031676
 0.3501157
 DR 0.2975892 0.3324979 0.2756876 1.000 0.9981733
 0.9674843
 DU 0.3211303 0.3331389 0.3031676 0.9981733 1.000
 0.9977780
 DT 0.3314694 0.3323826 0.3501157 0.9674843 0.9977780
 1.000

 *
 Georg Ehret
 Johns Hopkins University
 Baltimore, US

   [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained,
 reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Sum(Random Numbers)=100

2008-07-08 Thread Moshe Olshansky

For arbitrary lambda_i it can take years until the sum of 50 such random 
variables is 100! 
But if one makes lambda_i = 2, then the probability that the sum of 50 of them 
equals 100 is about 1/sqrt(2*pi*100), so on average that sequence of 50 numbers 
must be generated about sqrt(2*pi*100)) ~ 25 times, which is very reasonable.


--- On Tue, 8/7/08, Peng Jiang [EMAIL PROTECTED] wrote:

 From: Peng Jiang [EMAIL PROTECTED]
 Subject: Re: [R] Sum(Random Numbers)=100
 To: [EMAIL PROTECTED]
 Cc: [EMAIL PROTECTED], Shubha Vishwanath Karanth [EMAIL PROTECTED]
 Received: Tuesday, 8 July, 2008, 4:56 PM
 Hi,
I am afraid there is no other way except using brute
 force, that  
 is , loop until their sum reaches your expectation.
   it is easy to figure out this probability by letting
 their sum to be  
 a new random variable Z and Z = X_1 + \ldots + X_n
   where X_i ~ Poisson({\lambda}_i) . By calculating
 their moment  
 generate function we can find the pmf of Z which is
 a new Poisson random variable with the parameter
 \sum_{i}{{\lambda}_i}.
 
   and Moshe Olshansky's method  is also correct except
 it is based on  
 the conditioning.
 On 2008-7-8, at 下午1:58, Shubha Vishwanath Karanth
 wrote:
 On 2008-7-8, at 下午2:39, Moshe Olshansky wrote:
 
  If they are really random you can not expect their sum
 to be 100.
  However, it is not difficult to get that given that
 the sum of n  
  independent Poisson random variables equals N, any
 individual one  
  has the conditional binomial distribution with size =
 N and p = 1/n,  
  i.e.
  P(Xi=k/Sn=N) = (N over k)*(1/n)^k*((n-1)/n)^(N-k).
  So you can generate X1 binomial with size = 100 and p
 = 1/50; if X1  
  = k1 then the sum of the rest 49 must equal 100 - k1,
 so now you  
  generate X2 binomial with size = 100-k1 and p = 1/49;
 if X2 = k2  
  then generate X3 binomial with size = 100 -(k1+k2) and
 p = 1/48, etc.
 
  Why do you need this?
 
 
  --- On Tue, 8/7/08, Shubha Vishwanath Karanth
 [EMAIL PROTECTED] 
   wrote:
 
  From: Shubha Vishwanath Karanth
 [EMAIL PROTECTED]
  Subject: [R] Sum(Random Numbers)=100
  To: [EMAIL PROTECTED]
  Received: Tuesday, 8 July, 2008, 3:58 PM
  Hi R,
 
 
 
  I need to generate 50 random numbers (preferably
 poisson),
  such that
  their sum is equal to 100. How do I do this?
 
 
 
 
 
  Thank you,
 
  Shubha
 
 
 
  This e-mail may contain confidential and/or
 privileged
  i...{{dropped:13}}
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained,
  reproducible code.
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained,
 reproducible code.
 
 ---
 Peng Jiang 江鹏 ,Ph.D. Candidate
 Antai College of Economics  Management
 安泰经济管理学院
 Department of Mathematics
 数学系
 Shanghai Jiaotong University (Minhang Campus)
 800 Dongchuan Road
 200240 Shanghai
 P. R. China

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] multiplication question

2008-07-07 Thread Moshe Olshansky

The answer to your first question is 
sum(x)8sum(y) - sum(x*y)

and for the second one

x %*% R %*% y - sum(x*y*diag(R))


--- On Thu, 3/7/08, Murali Menon [EMAIL PROTECTED] wrote:

 From: Murali Menon [EMAIL PROTECTED]
 Subject: [R] multiplication question
 To: [EMAIL PROTECTED]
 Received: Thursday, 3 July, 2008, 2:30 AM
 folks,
  
 is there a clever way to compute the sum of the product of
 two vectors such that the common indices are not multiplied
 together?
  
 i.e. if i have vectors X, Y, how can i compute 
  
 Sum  (X[i] * Y[j])
 i != j
  
 where i != j
  
 also, what if i wanted 
  
 Sum (X[i] * Y[j] * R[i, j])
 i != j
  
 where R is a matrix?
  
 thanks,
  
 murali
  
  
 _
 Enter the Zune-A-Day Giveaway for your chance to win —
 day after day after day
 
 M_Mobile_Zune_V1
   [[alternative HTML version
 deleted]]__
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained,
 reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] odd dnorm behaviour (?)

2008-07-07 Thread Moshe Olshansky

dnorm() computes the density, so it may be  1;  pnorm() computes the 
distribution function.

--- On Tue, 8/7/08, Mike Lawrence [EMAIL PROTECTED] wrote:

 From: Mike Lawrence [EMAIL PROTECTED]
 Subject: Re: [R] odd dnorm behaviour (?)
 To: Rhelp [EMAIL PROTECTED]
 Received: Tuesday, 8 July, 2008, 2:51 PM
 #Let's try this again! This time the code is more
 sensible (curve  
 range, same sd value).

 #Quick one hopefully. Shouldn't dnorm be returning the
 pdf? Last time  
 I checked,
 #a probability shouldn't be greater than 1 as produced
 by:

 curve(dnorm(x,0,.1),from=-.5,to=.5)

 #Shouldn't I be getting an axis more like that produced
 by:

 f=function(x,m,s){
   y=rep(NA,length(x))
   for(i in 1:length(x)){
   y[i]=integrate(
   dnorm
   , upper=x[i]+sqrt(.Machine$double.eps)
   , lower=x[i]-sqrt(.Machine$double.eps)
   , mean=m
   , sd=s
   )$value
   }
   return(y)
 }
 curve(f(x,0,.1),from=-.5,to=.5)

 #If the latter code betrays a misunderstanding of what a
 pdf is, be  
 gentle!

 #Mike

 --
 Mike Lawrence
 Graduate Student, Department of Psychology, Dalhousie
 University

 www.memetic.ca

 The road to wisdom? Well, it's plain and simple
 to express:
 Err and err and err again, but less and less and
 less.
   - Piet Hein

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained,
 reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Lots of huge matrices, for-loops, speed

2008-07-06 Thread Moshe Olshansky

Another possibility is to use explicit formula, i.e. if you are doing linear 
regression like y = a*x + b then the explicit formulae are:

a = (meanXY - meanX*meanY)/(meanX2 - meanX^2)
b = (meanY*meanX2 - meanX*meanXY)/(meanX2 - meanX^2)

where meanX is mean(x), meanXY is mean(x*y), meanX2 is mean(x^2), etc.

So if x is your time vector you could do something like:

meanY - matrix(0,nrow=4000,ncol=3500)
meanXY - matrix(0,nrow=4000,ncol=3500)
meanY2 - matrix(0,nrow=4000,ncol=3500)

for (i in 1:80)
{
A - read.table(file[i])
meanY = meanY +A
meanXY - meanXY + x[i]*A
meanY2 - meanY2 + A*A
}

now:
meanY - meanY/80
meanXY - meanXY/80
meanY2 - meanY2/80
meanX - mean(x)
meanX2 - mean(x*x)

and now use the above formulae to compute regression coefficients.

You will need space for 4 4000x3500 matrices and this should not be a problem.
Once a and b are found you can use meanX,meanX2,meanXY,meanY,meanY2 to compute 
R2 as well.





--- On Mon, 7/7/08, Zarza [EMAIL PROTECTED] wrote:

 From: Zarza [EMAIL PROTECTED]
 Subject: [R]  Lots of huge matrices, for-loops, speed
 To: r-help@r-project.org
 Received: Monday, 7 July, 2008, 1:39 AM
 Hello,
 we have 80 text files with matrices. Each matrix represents
 a map (rows for
 latitude and columns for longitude), the 80 maps represent
 steps in time. In
 addition, we have a vector x of length 80. We would like to
 compute a
 regression between matrices (response through time) and x
 and create maps
 representing coefficients, r2 etc. Problem: the 80 matrices
 are of the size
 4000 x 3500 and we were running out of memory. We computed
 line by line and
 the results for each line were appended to output grids.
 This works. But -
 for each line, 80 text files must be scanned and output
 must be written. And
 there are several for-loops involved. This takes a lot of
 time (about a
 week). I read the contributions related to speeding up code
 and maybe
 vectorizing parts of the procedure could help a bit.
 However, I am a
 neophyte (as you may see from the code below) and did not
 find a way by now.
 I would appreciate very much any suggestions for speeding
 up the procedure.
 Thanks, Zarza 
 
 
 
 
 The code (running but sloow):
 
 regrid - function (infolder, x, outfolder) {
 
 # List of input files
 setwd (infolder)
 filelist - dir (pattern=.*.asc$, full.names
 = F)
 
 # Dimensions (making use of the header information coming
 with
 # the .asc-input files, ESRI-format)
 hd - read.table (filelist [1], nrows = 6)
 cols - hd[1,2]   
 rows - hd[2,2]
 times - length (filelist)
 items - 4 + ncol (x)
 
 # Prepare output
 out1 - matrix (numeric (times * cols), ncol = cols)
 out2 - matrix (numeric (items * cols), ncol = items)
 out3 - as.numeric (items)
 
 # Prepare .asc-files
 filenames - c(R2, adj.R2,
 p, b0, colnames (x))
 for (i in 1:items) {
 write.table (hd, file = paste (outfolder, filenames
 [i],.asc,sep =),
   quote=F, row.names=F, col.names=F) }
 rm (hd)
 
 # Prepare regression
 xnam - paste (x[,,
 1:(ncol(x)),], sep=)
 form - paste(y ~ , paste(xnam,
 collapse=+))
 rm (xnam)
 
 # Loop through rows
 for (j in 1:rows) {
   getgrid - function (j) {
 print (paste
 (Row,j,/,rows),quote = F)
 
   # Read out multi-temporal response values for one
 grid-row of cells
   for (k in 1:times)
 {
 getslice -  function (k) {
   values - scan (filelist [k], what=0,
 na.strings = -, 
 skip = (5 + j), nlines = 1, nmax = cols,
 quiet=T)
   values  }
 out1[k,] - getslice (k)
 }
 
   # Regression
   for (l in 1:cols)
 {
 y - as.vector (out1 [,l])
 if(length (y)  length (na.omit (y)))
   {   
setNA - function (l) {
NAs - rep (NA, length (out3)) 
NAs }
   out2[l,] - setNA (l)  
   } 
 else  
   { 
   regression - function (l) {
model - lm (as.formula(form))
out3[1] - summary (model)$r.squared
out3[2] - summary (model)$adj.r.squared
f - summary (model)$fstatistic
out3[3] - 1-pf(f[1],f[2],f[3])
out3[4:items] - coef(model)[1:(1 +
 ncol(x))]
out3 }
   out2[l,] - regression (l) 
   }
  }
   out2
   }
 fillrow - getgrid (j)
 
 # Append results to output files
 for (m in 1:items) {
   write.table (t(fillrow [,m]), file = paste (outfolder,
 filenames [m], 
 .asc, sep =), append=T,
 quote=F, na = as.character (-), 
row.names = F, col.names = F, dec=.) }
 }
 }
 -- 
 View this message in context:
 http://www.nabble.com/Lots-of-huge-matrices%2C-for-loops%2C-speed-tp18303230p18303230.html
 Sent from the R help mailing list archive at Nabble.com.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read

Re: [R] Lots of huge matrices, for-loops, speed

2008-07-06 Thread Moshe Olshansky

This is correct for larger (more columns) matrices - computing the t(A)*A 
matrix and inverting it may cause numerical problems, but this should not be 
the case with two columns (one of which is all 1's).

In any case, the matrix depends on vector x only and since it is small (80 
entries), the resulting matrix can be decomposed/processed in any desirable way 
independently of the huge Y matrices (but once again, this is not needed for 
the particular case).


--- On Mon, 7/7/08, Rolf Turner [EMAIL PROTECTED] wrote:

 From: Rolf Turner [EMAIL PROTECTED]
 Subject: Re: [R] Lots of huge matrices, for-loops, speed
 To: [EMAIL PROTECTED]
 Cc: r-help@r-project.org, Zarza [EMAIL PROTECTED]
 Received: Monday, 7 July, 2008, 9:40 AM
 On 7/07/2008, at 11:05 AM, Moshe Olshansky wrote:
 
  Another possibility is to use explicit formula, i.e.
 if you are  
  doing linear regression like y = a*x + b then the
 explicit formulae  
  are:
 
  a = (meanXY - meanX*meanY)/(meanX2 - meanX^2)
  b = (meanY*meanX2 - meanX*meanXY)/(meanX2 - meanX^2)
 
  where meanX is mean(x), meanXY is mean(x*y), meanX2 is
 mean(x^2), etc.
 
 
 My understanding is that such formulae, while
 ``algebraically  
 correct'' are
 highly unstable numerically and hence should be avoided.
 
 Others who are wiser and more knowledgeable than I may wish
 to  
 comment or
 correct me.
 
   cheers,
 
   Rolf Turner
 
 ##
 Attention: 
 This e-mail message is privileged and confidential. If you
 are not the 
 intended recipient please delete the message and notify the
 sender. 
 Any views or opinions presented are solely those of the
 author.
 
 This e-mail has been scanned and cleared by MailMarshal 
 www.marshalsoftware.com
 ##

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Plot Mixtures of Synthetically Generated Gamma Distributions

2008-07-06 Thread Moshe Olshansky

I know very little about graphics, so my primitive and brute force solution 
would be

plot(density(x[1:30]),col=blue);lines(density(x[31:60]),col=red);lines(density(x[61:90]),col=green)

--- On Mon, 7/7/08, Gundala Viswanath [EMAIL PROTECTED] wrote:

 From: Gundala Viswanath [EMAIL PROTECTED]
 Subject: [R] Plot Mixtures of Synthetically Generated Gamma Distributions
 To: [EMAIL PROTECTED]
 Received: Monday, 7 July, 2008, 1:24 PM
 Hi,

 I have the following vector
 which is created from 3 distinct distribution (three
 components) of gamma:

 x=c(rgamma(30,shape=.2,scale=14),rgamma(30,shape=12,scale=10),rgamma(30,shape=5,scale=6))

 I want to plot the density curve of X, in a way that it
 shows
 a distinct 3 curves that represent each component.

 How can I do that?

 I tried this but doesn't work:

 lines(density(x))

 Please advise.

 - Gundala Viswanath
 Jakarta - Indonesia

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained,
 reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] A regression problem using dummy variables

2008-07-01 Thread Moshe Olshansky

Do you have a reason to treat all 3 levels together and not have a separate 
regression for each level?

--- On Tue, 1/7/08, rlearner309 [EMAIL PROTECTED] wrote:

 From: rlearner309 [EMAIL PROTECTED]
 Subject: [R]  A regression problem using dummy variables
 To: r-help@r-project.org
 Received: Tuesday, 1 July, 2008, 11:38 PM
 This is actually more like a Statistics problem:
 I have a dataset with two dummy variables controlling three
 levels.  The
 problem is, one level does not have many observations
 compared with other
 two levels (a couple of data points compared with 1000+
 points on other
 levels).  When I run the regression, the result is bad.  I
 have unbalanced
 SE and VIF.  Does this kind of problem also belong to
 near sigularity
 problem?  Does it make any difference if I code the level
 that lacks data
 (0,0) in stead of (0,1)?

 thanks a lot!
 -- 
 View this message in context:
 http://www.nabble.com/A-regression-problem-using-dummy-variables-tp18214377p18214377.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained,
 reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] help_transformation

2008-06-26 Thread Moshe Olshansky

Let F be the distribution function of Y, PSI the standard normal distribution 
anf IPSI it's inverse. Let f(x) = IPSI(F(x)). It is not difficult to see that 
f(Y) has standard normal distribution.
So you can replace F with the empirical distribution and IPSI is the qnorm 
function of R.


--- On Wed, 25/6/08, Indermaur Lukas [EMAIL PROTECTED] wrote:

 From: Indermaur Lukas [EMAIL PROTECTED]
 Subject: [R] help_transformation
 To: [EMAIL PROTECTED]
 Received: Wednesday, 25 June, 2008, 9:56 PM
 heya,
 i am fitting linear mixed effect model to a response Y. Y
 shows an s-shaped distribution when using QQ-plots (some
 zero values and some very high values). hence, which
 transformation should i apply that Y follows a normal
 distribution? any r-function/package available to do this?
  
 thanks for any hint,
 regards,
 lukas
  
  
  
 °°° 
 Lukas Indermaur, PhD student 
 eawag / Swiss Federal Institute of Aquatic Science and
 Technology 
 ECO - Department of Aquatic Ecology
 Überlandstrasse 133
 CH-8600 Dübendorf
 Switzerland
  
 Phone: +41 (0) 71 220 38 25
 Fax: +41 (0) 44 823 53 15 
 Email: [EMAIL PROTECTED]
 www.lukasindermaur.ch http://www.lukasindermaur.ch/
 
  
  
  
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained,
 reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] [SPAM] - constructing arbitrary (positive definite) covariance matrix - Found word(s) list error in the Text body

2008-06-26 Thread Moshe Olshansky

If the main diagonal element of matrix A is 1 and the off diagonal element is a 
then for any vector x we get that t(x)*A*x = (1-a)*sum(x^2) +a*(sum(x))^2 . If 
we want A to be positive (semi)definite we need this expression to be positive 
(non-negative) for any x!= 0. Since sum(x)^2/sum(x*2) = n where n is the 
dimension of the matrix and equality is possible we get that A is positive 
(semi)definite if and only if -1/(n-1) = a = 1 (sharp inequalities for 
positive definiteness).
Since any symmetric (semi)positive definite matrix can be a covariance matrix 
this describes all the matrices which satisfy the requirement.


--- On Fri, 27/6/08, Patrick Burns [EMAIL PROTECTED] wrote:

 From: Patrick Burns [EMAIL PROTECTED]
 Subject: Re: [R] [SPAM] - constructing arbitrary (positive definite) 
 covariance matrix - Found word(s) list error in the Text body
 To: [EMAIL PROTECTED]
 Cc: Mizanur Khondoker [EMAIL PROTECTED], r-help@r-project.org
 Received: Friday, 27 June, 2008, 3:15 AM
 To make David's approach a little more concrete:
 You can always have correlations all equal to 1 --
 the variables are all the same, except for the names
 you've given them.  You can have two variables
 with correlation -1, but you can't get a third variable
 that has -1 correlation to both of the first two.
 
 
 Patrick Burns
 [EMAIL PROTECTED]
 +44 (0)20 8525 0696
 http://www.burns-stat.com
 (home of S Poetry and A Guide for the Unwilling S
 User)
 
 [EMAIL PROTECTED] wrote:
  Well, if you think about the geometry, all
 correlations equal usually
  won't work. Think of the SDs as the sides of a
 simplex and the
  correlations as the cosines of the angles between the
 sides (pick one
  variable as the 'origin'.) Only certain values
 will give a valid
  covariance or correlation matrix.
  HTH,
  David L. Reiner, PhD
  Head Quant
  Rho Trading Securities, LLC
  -Original Message-
  From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED]
  On Behalf Of Mizanur Khondoker
  Sent: Thursday, June 26, 2008 11:11 AM
  To: r-help@r-project.org
  Subject: [SPAM] - [R] constructing arbitrary (positive
 definite)
  covariance matrix - Found word(s) list error in the
 Text body
 
  Dear list,
 
  I am trying to use the  'mvrnorm'  function
 from the MASS package for
  simulating multivariate Gaussian data with given
 covariance matrix.
  The diagonal elements of my covariance matrix should
 be the same,
  i.e., all variables have the same marginal variance.
 Also all
  correlations between all pair of variables should be
 identical, but
  could be any value in [-1,1]. The problem I am having
 is that the
  matrix I create is not always positive definite (and
 hence mvrnorm
  fails).
 
  Is there any simple way of constructing covariance
 matrix of the above
  structure (equal variance, same pairwise correlation
 from [-1, 1])
  that will always be positive definite?
  I have noticed that covraince matrices created using
 the following COV
  function are positive definite for  -0.5  r 1.
 However, for  r 
  -0.5, the matrix is not positive definite.
  Does anyone have any idea why this is the case?  For
 my simualtion, I
  need to generate multivariate data for the whole range
 of r,  [-1, 1]
  for a give value of sd.
 
  Any help/ suggestion would be greatly appreciated.
 
  Examples
  
  COV-function (p = 3, sd = 1, r= 0.5){
  cov - diag(sd^2, ncol=p, nrow=p)
  for (i in 1:p) {
  for (j in 1:p) {
  if (i != j) {
  cov[i, j] - r * sd*sd
  }
  }
  }
 cov
  }
 

  library(MASS)
  ### Simualte multivarite gaussin data (works OK)
  Sigma-COV(p = 3, sd = 2, r= 0.5)
  mu-1:3
  mvrnorm(5, mu=mu, Sigma=Sigma)
  
[,1] [,2] [,3]
  [1,] 1.2979984 1.843248 4.460891
  [2,] 2.1061054 1.457201 3.774833
  [3,] 2.1578538 2.761939 4.589977
  [4,] 0.8775056 4.240710 2.203712
  [5,] 0.2698180 2.075759 2.869573

  ### Simualte multivarite gaussin data ( gives
 Error)
  Sigma-COV(p = 3, sd = 2, r= -0.6)
  mu-1:3
  mvrnorm(5, mu=mu, Sigma=Sigma)
  
  Error in mvrnorm(5, mu = mu, Sigma = Sigma) :
'Sigma' is not positive definite
 
 
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained,
 reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Measuring Goodness of a Matrix

2008-06-24 Thread Moshe Olshansky

What do you mean by A similar to X?
Do you mean norm of the difference, similar eigenvalues/vectors, anything else?

--- On Wed, 25/6/08, Gundala Viswanath [EMAIL PROTECTED] wrote:

 From: Gundala Viswanath [EMAIL PROTECTED]
 Subject: [R] Measuring Goodness of a Matrix
 To: [EMAIL PROTECTED]
 Received: Wednesday, 25 June, 2008, 12:41 AM
 Hi all,

 Suppose I have 2 matrices A and B.
 And I want to measure how good each of this matrix is.

 So I intend to compare A and B with another gold
 standard
 matrix X. Meaning the more similar a matrix to X the better
 it is.

 What is the common way in R to
 measure matrix similarity (ie. A vs X, and B vs X) ?

 - Gundala Viswanath
 Jakarta - Indonesia

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained,
 reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Pairwise Partitioning of a Vector

2008-06-23 Thread Moshe Olshansky

One possibility is:

x - c(30.9, 60.1  , 70.0 ,  73.0 ,  75.0 ,  83.9 ,  93.1 ,  97.6 ,  98.8 , 
113.9)
for (i in 1:9) assign(paste(PAIR,i,sep=),list(part1 = x[1:i],part2 = 
x[-(1:i)]))




--- On Mon, 23/6/08, Gundala Viswanath [EMAIL PROTECTED] wrote:

 From: Gundala Viswanath [EMAIL PROTECTED]
 Subject: [R] Pairwise Partitioning of a Vector
 To: [EMAIL PROTECTED]
 Received: Monday, 23 June, 2008, 4:13 PM
 Hi,
 
 How can I partitioned an example vector like this
 
  print(myvector)
  [1]   30.9   60.1   70.0   73.0   75.0   83.9   93.1  
 97.6   98.8  113.9
 
 into the following pairwise partition:
 
 PAIR1
 part1  = 30.9
 part2  = 60.1   70.0   73.0   75.0   83.9   93.1   97.6  
 98.8  113.9
 
 PAIR2
 part1 = 30.9   60.1
 part2 =  70.0   73.0   75.0   83.9   93.1   97.6   98.8 
 113.9
 
 
 
 PAIR9
 part1 = 30.9   60.1   70.0   73.0   75.0   83.9   93.1  
 97.6   98.8
 part2 = 113.9
 
 
 I'm stuck with this kind of loop:
 
 __BEGIN__
 
 # gexp is a Vector
 
 process_two_partition - function(gexp) {
 sort.gexp - sort(as.matrix(gexp))
 print(sort.gexp)
 
 for (posb in 1:ncol(gexp)) {
 for (pose in 1:ncol(gexp)) {
 
   sp_b - pose+1
   sp_e - ncol(gexp)
 
   # This two doesn't do what I want
   part1 - sort.gexp[posb:pose]
   part2 - sort.gexp[sp_b:sp_e]
 
  # we later want to process part1 and part2
 separately
 
 }
 }
 
 }
 
 __END__
 
 - Gundala Viswanath
 Jakarta - Indonesia
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained,
 reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

1 2 >

1 - 100 of 185 matches

Mail list logo