Re: [R] Principle Component Analysis in R

2005-04-04 Thread Jari Oksanen
On Tue, 2005-04-05 at 16:59 +1200, Brett Stansfield wrote:
> Dear R 
> Should I be concerned if the loadings to a Principle Component Analysis are
> as follows:
> 
> Loadings:
>   Comp.1 Comp.2 Comp.3 Comp.4
> X100m -0.500  0.558 0.661
> X200m -0.508  0.379  0.362 -0.683
> X400m -0.505 -0.274 -0.794 -0.197
> X800m -0.486 -0.686  0.486  0.239
> 
>Comp.1 Comp.2 Comp.3 Comp.4
> SS loadings  1.00   1.00   1.00   1.00
> Proportion Var   0.25   0.25   0.25   0.25
> Cumulative Var   0.25   0.50   0.75   1.00
> 
> I just got concerned that no loading value was given for  X100m, component
> 3. I have looked at the data using list() and it all seems OK

You don't have to worry about one empty cell in loadings: the print
function (called behind the curtain to show the results to you) is so
clever that it doesn't show you small numbers, although they are there.
I guess this happens because people with Factor Analysis background
expect this. However, I would be worried if I got results like this, and
would not use Princip*al* Components at all, since none of the
components seems to be any more principal than others. Wouldn't original
data do?

cheers, jari oksanen
-- 
Jari Oksanen <[EMAIL PROTECTED]>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Principle Component Analysis in R

2005-04-04 Thread Brett Stansfield
Dear R 
Should I be concerned if the loadings to a Principle Component Analysis are
as follows:

Loadings:
  Comp.1 Comp.2 Comp.3 Comp.4
X100m -0.500  0.558 0.661
X200m -0.508  0.379  0.362 -0.683
X400m -0.505 -0.274 -0.794 -0.197
X800m -0.486 -0.686  0.486  0.239

   Comp.1 Comp.2 Comp.3 Comp.4
SS loadings  1.00   1.00   1.00   1.00
Proportion Var   0.25   0.25   0.25   0.25
Cumulative Var   0.25   0.50   0.75   1.00

I just got concerned that no loading value was given for  X100m, component
3. I have looked at the data using list() and it all seems OK

brett stansfield

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] exclusion rules for propensity score matchng (pattern rec)

2005-04-04 Thread adiamond
Dear R-list,

i have 6 different sets of samples.  Each sample has about 5000 observations,
with each observation comprised of 150 baseline covariates (X), 125 of which
are dichotomous. Roughly 20% of the observations in each sample are "treatment"
and the rest are "control" units.

i am doing propensity score matching, i have already estimated propensity
scores(predicted probabilities) using logistic regression, and in each sample i
am going to have to exclude approximately 100 treated observations for which I
cannot find matching control observations (because the scores for these treated
units are outside the support of the scores for control units).

in each sample, i must identify an exclusion rule that is interpretable on the
scale of the X's that excludes these unmatchable treated observations and
excludes as FEW of the remaining treated observations as possible.
(the reason is that i want to be able to explain, in terms of the Xs, who the
individuals are that I making causal inference about.)

i've tried some simple stuff over the past few days and nothing's worked.
is there an R-package or algorithm, or even estimation strategy that anyone
could recommend?
(i am really hoping so!)

thank you,

alexis diamond

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] hexbin and grid - input data values as coordinates

2005-04-04 Thread Paul Murrell
Hi
Thanks Martin and Adai.
This gives me a good starting point.
Paul

Adaikalavan Ramasamy wrote:
Thank you to Paul Murrell and Martin Maechler for their help.
pushHexport() and the rest of the codes have done the trick.
I spent the afternoon trying to code up something that might be used as
grid.abline() and grid.grid() before I read Martin's suggestion. Sigh. 

But here it is anyway in case you can salvage something out of my
inelegant solution.
mygrid.abline <- function(a=NULL, b=NULL, h=NULL, v=NULL,
  vps, gp=gpar(col=1, lwd=1) ) {
  # a, b, h and v are as documented in help(abline)
  # vps and gp are the viewport and its graph parameters
  if(!is.null(h)){ a <- h; b <- 0 }
  if(!is.null(v)){ a <- v; b <- Inf }  
   
  pushHexport(vps$plot.vp)
  
  xmin <- [EMAIL PROTECTED];  xmax <- [EMAIL PROTECTED]
  ymin <- [EMAIL PROTECTED];  ymax <- [EMAIL PROTECTED]
  
  x0 <- max( (ymin - a)/b, xmin )
  x0 <- min( x0, xmax )
  y0 <- a + b*x0
  
  x1 <- min( (ymax - a)/b, xmax )
  x1 <- max( x1, xmin )
  y1 <- a + b*x1

  if( !is.finite(b) ){   # fudge for vertical lines
x0 <- a;x1 <- a
y0 <- ymin; y1 <- ymax
  } 
  
  grid.move.to( x0, y0, default.units="native" )
  grid.line.to( x1, y1, default.units="native", gp=gp )

  popViewport()
  invisible( c( x0=x0, y0=y0, x1=x1, y1=y1 ) )
}
mygrid.grid <- function(vps, nx=NULL, ny=nx, 
gp=gpar(col=8, lwd=1, lty=2)){

  xmin <- [EMAIL PROTECTED];  xmax <- [EMAIL PROTECTED]
  ymin <- [EMAIL PROTECTED];  ymax <- [EMAIL PROTECTED]
  if( is.null(nx) ){ # the default
vv <- seq(as.integer(xmin), as.integer(xmax), by=1)
hh <- seq(as.integer(ymin), as.integer(ymax), by=1)
  } else {
  
vv <- seq( xmin, xmax, length.out = nx + 1 )
hh <- seq( ymin, ymax, length.out = ny + 1 )

  }
  sapply( vv, function(v) mygrid.abline( v=v, vps=vps, gp=gp ) )
  sapply( hh, function(h) mygrid.abline( h=h, vps=vps, gp=gp ) )
  invisible()
}
# USAGE EXAMPLE
x <- rnorm(1000)
y <- 10 + 1.6*x + rnorm(1000) 

vps <- plot( hexbin( x, y ), style = "nested.lattice")
mygrid.abline( a=10, b=2, vps=vps,  gp=gpar(col=2, lwd=3) )
mygrid.abline( a=10, b=-2, vps=vps, gp=gpar(col=1, lty=2) )
mygrid.grid( vps )
Regards, Adai
On Fri, 2005-04-01 at 14:40 +0200, Martin Maechler wrote:
"Paul" == Paul Murrell <[EMAIL PROTECTED]>
   on Fri, 01 Apr 2005 10:45:16 +1200 writes:

   Paul> Hi Adaikalavan Ramasamy wrote:
   >> Dear all,
   >> 
   >> I am trying to use hexbin and read the very interesting
   >> article on grid (
   >> http://www.ci.tuwien.ac.at/Conferences/useR-2004/Keynotes/Murrell.pdf
   >> ) and am hoping for some advice from more experienced
   >> users of hexbin.
   >> 
   >> I am trying to visualise a data and fit a straight line
   >> trough it. For example, here is how I would do it in the
   >> usual way
   >> 
   >> # simulate data x <- rnorm(1000) y <- 5*x + rnorm(1000,
   >> sd=0.5)
   >> 
   >> plot( x, y, pch="*" ) abline(0, 1, col=2)
   >> 
   >> 
   >> And here is my failed attempt at fitting the "abline" on
   >> hexbin
   >> 
   >> library(hexbin); library(grid) plot( hexbin( x, y ),
   >> style = "nested.lattice") grid.move.to(0.2,0.2)
   >> grid.line.to(0.8,0.8)
   >> 
   >> I realise that grid.* is taking plotting coordinates on
   >> the graph but how do I tell it to use the coordinates
   >> based on the data values ? For my real data, I would like
   >> lines with different slopes and intercepts.

   Paul> gplot.hexbin() returns the viewports it used to
   Paul> produce the plot and the legend.  Here's an example of
   Paul> annotating the plot ...
   Paul>   # capture the viewports returned vps <- plot(
   Paul> hexbin( x, y ), style = "nested.lattice") # push the
   Paul> viewport corresponding to the plot # this is actually
   Paul> a hexViewport rather than a plain grid viewport # so
   Paul> you use pushHexport rather than grid's pushViewport
   Paul> pushHexport(vps$plot.vp) # use "native" coordinates to
   Paul> draw relative to the axis scales grid.move.to(-2, -10,
   Paul> default.units="native") grid.line.to(2, 10,
   Paul> default.units="native", gp=gpar(col="yellow", lwd=3))
   Paul> # tidy up popViewport()
   Paul> There's another annotation example at the bottom of
   Paul> the help page for gplot.hexbin
   Paul> A grid.abline() function would obviously be a useful
   Paul> addition.  Must find where I put my todo list ...
well, it seems to me that if you start with panel.abline() from
lattice, you're almost finished right from start.
But then, sometimes the distance between "almost" and
"completely" can become quite large...
Further, from the looks of it, if you finish it, panel.abline()
could become a simple wrapper around grid.abline().
Martin
   Paul> Paul
   >> I am using the hexbin version 1.2-0 ( which is the devel
   >> version ), R-2.0.1 and Fedora Core 3.
   >> 
   >> Many thanks in advance.
   >> 
   >> Regards, Adai




--
Dr Paul Murrell
Department of Statistics
The University of Au

Re: [R] Change density and angle in barplot

2005-04-04 Thread Paul Murrell
Hi
Jan Sabee wrote:
Dear R user,
I want to change each density and angle with symbol "+","-","o","#"
and "*". How can I do that?
library(gplots)
barplot2(VADeaths, 
 density=c(5,7,11,15,17), 
 angle=c(65,-45,45,-45,90),
 col = "black",
 legend = rownames(VADeaths))
title(main = list("Death Rates in Virginia", font = 4))

There is no general support for pattern fills in R graphics.
If you are really desperate to do this and are prepared to write some 
code yourself, please contact me off-list and I can suggest some 
starting points.

Paul
--
Dr Paul Murrell
Department of Statistics
The University of Auckland
Private Bag 92019
Auckland
New Zealand
64 9 3737599 x85392
[EMAIL PROTECTED]
http://www.stat.auckland.ac.nz/~paul/
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] R package that has (much) the same capabilities as SAS v9 PROC GENMOD

2005-04-04 Thread Simon Blomberg
The questioner clearly wants generalized linear mixed models. lmer in 
package lme4 may be more appropriate. (Prof. Bates is a co-author.). 
glmmPQL should do the same job, though, but with less accuracy.

Simon.
check glm()
On Apr 4, 2005 6:46 PM, William M. Grove <[EMAIL PROTECTED]> wrote:
 I need capabilities, for my data analysis, like the Pinheiro & Bates
 S-Plus/R package nlme() but with binomial family and logit link.
 I need multiple crossed, possibly interacting fixed effects (age cohort of
 twin when entered study, sex of twin, sampling method used to acquire twin
 pair, and twin zygosity), a couple of random effects other than the cluster
 variable, and the ability to have a variable of the sort that P&B call
 "outer" to the clustering variable---zygosity.  Dependent variables are all
 parental (mom, dad separately of course) psychiatric diagnoses.
 In my data, twin pair ID is the clustering variable; correlations are
 expected to be exchangeable but substantially different between members of
 monozygotic twin pairs and members of dizygotic twin pairs.  Hence, in my
 analyses, the variable that's "outer" to twin pair is monozygotic vs.
 dizygotic which of course applies to the whole pair.
 nlme() does all that but requires quasi-continuous responses, according to
 the preface/intro of P&B's mixed models book and what I infer from online
 help (i.e., no family= or link= argument).
 The repeated() library by Lindsey seems to handle just one nested random
 effect, or so I believe I read while scanning backlogs of the R-Help list.
 glmmPQL() is in the ballpark of what I need, but once again seems to lack
 the "outer" variable specification that nlme() has, and which PROC GENMOD
 also has---and which I need.
 I read someplace of yags() that apparently uses GEE to estimate parameters
 of nonlinear models including GLIMs/mixed models, just the way PROC GENMOD
 (and many another program) does.  But on trying to install it (either
 v4.0-1.zip or v4.0-2.tar.gz from Carey's site, or Ripley's Windows port)
 from a local, downloaded zip file (or tar.gz file converted to zip file), I
 always get an error saying:
  > Error in file(file, "r") : unable to open connection
  > In addition: Warning message:
  > cannot open file `YAGS/DESCRIPTION'
 with no obvious solution.
 So I can't really try it out to see if it does what I want.
 You may ask:  Why not just use GENMOD and skip the R hassles?  Because I
 want to embed the GLIM/mixed model analysis in a stratified resampling
 bootstrapping loop.  Very easy to implement in R, moderately painful to do
 in SAS.
 Can anybody give me a lead, or some guidance, about getting this job done
 in R?  Thanks in advance for your help.
 Regards,
 Will Grove  | Iohannes Paulus PP. II, xxx
 Psychology Dept. |
 U. of Minnesota  |
 -+
 X-headers have PGP key info.; Call me at 612.625.1599 to verify 
key fingerprint
 before accepting signed mail as authentic!

 
 
 Will Grove   | Iohannes Paulus PP. II,
 xxx 
 Psychology Dept. |
 U. of Minnesota  |
 -+
 
 X-headers have PGP key info.; Call me at 612.625.1599 to verify key
 fingerprint
 before accepting signed mail as authentic!
 
 
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html


--
WenSui Liu, MS MA
Senior Decision Support Analyst
Division of Health Policy and Clinical Effectiveness
Cincinnati Children Hospital Medical Center
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

--
Simon Blomberg, B.Sc.(Hons.), Ph.D, M.App.Stat.
Visiting Fellow
School of Botany & Zoology
The Australian National University
Canberra ACT 0200
Australia
T: +61 2 6125 8057  email: [EMAIL PROTECTED]
F: +61 2 6125 5573
CRICOS Provider # 00120C
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: [R] locfit and memory allocation

2005-04-04 Thread Liaw, Andy
The code sniplet you provided is nowhere near correct or sufficient for
anyone to help.  Please (re-)read the posting guide and try again.

Andy

> From: Mike Hickerson
> 
> Hello
> 
> I am getting memory allocation errors when running a function 
> that uses  
> locfit within a for loop.  After 25 or so loops, it gives this error.
> 
> "Error: cannot allocate vector of size 281250 Kb"
> 
> Running on linux cluster with a Gb of RAM.  Problem never 
> happens on my  
> OS X (less memory).  The total data is 130 cols by 5000 rows
> The first 129 cols are response variables, the 130th is the parameter
> The function fits a local regression between the 129 
> variables in the  
> ith row of m[ ] to the 129 variables in 5000 rows after m was 
> fed into  
> 130 different vectors called Var1, .Var129, and PARAMETER.
> 
> array <- scan(("DataFile"),nlines=5000)
>   m<-matrix(array,ncol=130,byrow=T)
> 
> for (i in 1:200)
> {
> result<-  
> function(m[i,c(1,,129)],PARAMETER,cbind(Var1,...,Var129)se
> q(1,len=50 
> 00),F)
> }
> 
> Any ideas on how to avoid this memory allocation problem would be  
> greatly appreciated.  Garbage collection? (or is that too slow?)
> 
> Many Thanks in Advance!
> 
> Mike
> 
> 
> 
> 
> Mike Hickerson
> University of California
> Museum of Vertebrate Zoology
> 3101 Valley Life Sciences Building
> Berkeley, California  94720-3160  USA
> voice 510-642-8911
> cell: 510-701-0861
> fax 510-643-8238
> [EMAIL PROTECTED]
> 
>   [[alternative text/enriched version deleted]]
> 
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! 
> http://www.R-project.org/posting-guide.html
> 
> 
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] R package that has (much) the same capabilities as SAS v9 PROC GENMOD

2005-04-04 Thread Wensui Liu
check glm()

On Apr 4, 2005 6:46 PM, William M. Grove <[EMAIL PROTECTED]> wrote:
> I need capabilities, for my data analysis, like the Pinheiro & Bates
> S-Plus/R package nlme() but with binomial family and logit link.
> 
> I need multiple crossed, possibly interacting fixed effects (age cohort of
> twin when entered study, sex of twin, sampling method used to acquire twin
> pair, and twin zygosity), a couple of random effects other than the cluster
> variable, and the ability to have a variable of the sort that P&B call
> "outer" to the clustering variable---zygosity.  Dependent variables are all
> parental (mom, dad separately of course) psychiatric diagnoses.
> 
> In my data, twin pair ID is the clustering variable; correlations are
> expected to be exchangeable but substantially different between members of
> monozygotic twin pairs and members of dizygotic twin pairs.  Hence, in my
> analyses, the variable that's "outer" to twin pair is monozygotic vs.
> dizygotic which of course applies to the whole pair.
> 
> nlme() does all that but requires quasi-continuous responses, according to
> the preface/intro of P&B's mixed models book and what I infer from online
> help (i.e., no family= or link= argument).
> 
> The repeated() library by Lindsey seems to handle just one nested random
> effect, or so I believe I read while scanning backlogs of the R-Help list.
> 
> glmmPQL() is in the ballpark of what I need, but once again seems to lack
> the "outer" variable specification that nlme() has, and which PROC GENMOD
> also has---and which I need.
> 
> I read someplace of yags() that apparently uses GEE to estimate parameters
> of nonlinear models including GLIMs/mixed models, just the way PROC GENMOD
> (and many another program) does.  But on trying to install it (either
> v4.0-1.zip or v4.0-2.tar.gz from Carey's site, or Ripley's Windows port)
> from a local, downloaded zip file (or tar.gz file converted to zip file), I
> always get an error saying:
>  > Error in file(file, "r") : unable to open connection
>  > In addition: Warning message:
>  > cannot open file `YAGS/DESCRIPTION'
> with no obvious solution.
> 
> So I can't really try it out to see if it does what I want.
> 
> You may ask:  Why not just use GENMOD and skip the R hassles?  Because I
> want to embed the GLIM/mixed model analysis in a stratified resampling
> bootstrapping loop.  Very easy to implement in R, moderately painful to do
> in SAS.
> 
> Can anybody give me a lead, or some guidance, about getting this job done
> in R?  Thanks in advance for your help.
> 
> Regards,
> 
> Will Grove  | Iohannes Paulus PP. II, xxx
> Psychology Dept. |
> U. of Minnesota  |
> -+
> 
> X-headers have PGP key info.; Call me at 612.625.1599 to verify key 
> fingerprint
> before accepting signed mail as authentic!
> 
> 
> 
> Will Grove   | Iohannes Paulus PP. II,
> xxx 
> Psychology Dept. |
> U. of Minnesota  |
> -+
> 
> X-headers have PGP key info.; Call me at 612.625.1599 to verify key
> fingerprint
> before accepting signed mail as authentic!
> 
> 
> 
> 
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
> 


-- 
WenSui Liu, MS MA
Senior Decision Support Analyst
Division of Health Policy and Clinical Effectiveness
Cincinnati Children Hospital Medical Center

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] R package that has (much) the same capabilities as SAS v9 PROC GENMOD

2005-04-04 Thread William M. Grove
I need capabilities, for my data analysis, like the Pinheiro & Bates 
S-Plus/R package nlme() but with binomial family and logit link.

I need multiple crossed, possibly interacting fixed effects (age cohort of 
twin when entered study, sex of twin, sampling method used to acquire twin 
pair, and twin zygosity), a couple of random effects other than the cluster 
variable, and the ability to have a variable of the sort that P&B call 
"outer" to the clustering variable---zygosity.  Dependent variables are all 
parental (mom, dad separately of course) psychiatric diagnoses.

In my data, twin pair ID is the clustering variable; correlations are 
expected to be exchangeable but substantially different between members of 
monozygotic twin pairs and members of dizygotic twin pairs.  Hence, in my 
analyses, the variable that's "outer" to twin pair is monozygotic vs. 
dizygotic which of course applies to the whole pair.

nlme() does all that but requires quasi-continuous responses, according to 
the preface/intro of P&B's mixed models book and what I infer from online 
help (i.e., no family= or link= argument).

The repeated() library by Lindsey seems to handle just one nested random 
effect, or so I believe I read while scanning backlogs of the R-Help list.

glmmPQL() is in the ballpark of what I need, but once again seems to lack 
the "outer" variable specification that nlme() has, and which PROC GENMOD 
also has---and which I need.

I read someplace of yags() that apparently uses GEE to estimate parameters 
of nonlinear models including GLIMs/mixed models, just the way PROC GENMOD 
(and many another program) does.  But on trying to install it (either 
v4.0-1.zip or v4.0-2.tar.gz from Carey's site, or Ripley's Windows port) 
from a local, downloaded zip file (or tar.gz file converted to zip file), I 
always get an error saying:
> Error in file(file, "r") : unable to open connection
> In addition: Warning message:
> cannot open file `YAGS/DESCRIPTION'
with no obvious solution.

So I can't really try it out to see if it does what I want.
You may ask:  Why not just use GENMOD and skip the R hassles?  Because I 
want to embed the GLIM/mixed model analysis in a stratified resampling 
bootstrapping loop.  Very easy to implement in R, moderately painful to do 
in SAS.

Can anybody give me a lead, or some guidance, about getting this job done 
in R?  Thanks in advance for your help.

Regards,
Will Grove  | Iohannes Paulus PP. II, xxx
Psychology Dept. |
U. of Minnesota  |
-+
X-headers have PGP key info.; Call me at 612.625.1599 to verify key fingerprint
before accepting signed mail as authentic!



Will Grove   | Iohannes Paulus PP. II, 
xxx 
Psychology Dept. |
U. of Minnesota  |
-+

X-headers have PGP key info.; Call me at 612.625.1599 to verify key 
fingerprint
before accepting signed mail as authentic!




__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: [R] a question about box counting

2005-04-04 Thread Ben Fairbank
Perhaps the following, substituting your vectors of x and y for
runif(1)

> x<-trunc(100*runif(1))
> y<-trunc(100*runif(1))/100
> length(unique(x+y))
[1] 6390

Ben Fairbank

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Rajarshi Guha
Sent: Monday, April 04, 2005 1:23 PM
To: R
Subject: [R] a question about box counting

Hi,
  I have a set of x,y data points and each data point lies between (0,0)
and (1,1). Of this set I have selected all those that lie in the lower
triangle (of the plot of these points).

What I would like to do is to divide the region (0,0) to (1,1) into
cells of say, side = 0.01 and then count the number of cells that
contain a point.

My first approach is to generate the coordinates of these cells and then
loop over the point list to see whether a point lies in a cell or not.

However this seems to be very inefficient esepcially since I will have
1000's of points.

Has anybody dealt with this type of problem and are there routines to
handle it?


---
Rajarshi Guha <[EMAIL PROTECTED]> 
GPG Fingerprint: 0CCA 8EE2 2EEB 25E2 AB04 06F7 1BB9 E634 9B87 56EE
---
Alone, adj.: In bad company.
-- Ambrose Bierce, "The Devil's Dictionary"

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] locfit and memory allocation

2005-04-04 Thread Mike Hickerson
Hello

I am getting memory allocation errors when running a function that uses  
locfit within a for loop.  After 25 or so loops, it gives this error.

"Error: cannot allocate vector of size 281250 Kb"

Running on linux cluster with a Gb of RAM.  Problem never happens on my  
OS X (less memory).  The total data is 130 cols by 5000 rows
The first 129 cols are response variables, the 130th is the parameter
The function fits a local regression between the 129 variables in the  
ith row of m[ ] to the 129 variables in 5000 rows after m was fed into  
130 different vectors called Var1, .Var129, and PARAMETER.

array <- scan(("DataFile"),nlines=5000)
  m<-matrix(array,ncol=130,byrow=T)

for (i in 1:200)
{
result<-  
function(m[i,c(1,,129)],PARAMETER,cbind(Var1,...,Var129)seq(1,len=50 
00),F)
}

Any ideas on how to avoid this memory allocation problem would be  
greatly appreciated.  Garbage collection? (or is that too slow?)

Many Thanks in Advance!

Mike




Mike Hickerson
University of California
Museum of Vertebrate Zoology
3101 Valley Life Sciences Building
Berkeley, California  94720-3160  USA
voice 510-642-8911
cell: 510-701-0861
fax 510-643-8238
[EMAIL PROTECTED]

[[alternative text/enriched version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] a question about box counting

2005-04-04 Thread Ray Brownrigg
I said:
> myfun <- function(x, y, ints) {
>   fx <- x %/% (1/ints)
>   fy <- y %/% (1/ints)
>   txy <- hist(fx + ints*fy+ 1, breaks=0:(ints*ints), plot=FALSE)$counts
>   dim(fxy) <- c(ints, ints)
^^^
>   return(txy)
> }
Of course it should be:
  dim(txy) <- c(ints, ints)
  ^^^

Sorry about that,
Ray

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] a question about box counting

2005-04-04 Thread Rajarshi Guha
On Mon, 2005-04-04 at 14:22 -0400, Rajarshi Guha wrote:
> Hi,
>   I have a set of x,y data points and each data point lies between (0,0)
> and (1,1). Of this set I have selected all those that lie in the lower
> triangle (of the plot of these points).
> 
> What I would like to do is to divide the region (0,0) to (1,1) into
> cells of say, side = 0.01 and then count the number of cells that
> contain a point.

Thanks very much to Deepayan Sarkar, James Holtman and Ray Brownrigg for
very efficient (and elegant) solutions. I've summarized them below:

Deepayan Sarkar

A combination of cut and table/xtabs should do it, e.g.:


x <- runif(3000)
y <- runif(3000)

fx <- cut(x, breaks = seq(0, 1, length = 101))
fy <- cut(y, breaks = seq(0, 1, length = 101))

txy <- xtabs(~ fx + fy)
image(txy > 0)
sum(txy > 0)

-
james Holtman

Here is a start.  This creates a dataframe and then divides the data up
into 10 segments (you wanted 100, so extend it) and then counts the
number
in each cell.


> df <- data.frame(x=runif(100), y=runif(100))  # create data
> breaks <- seq(0,1,.1)  # define breaks; you would use 0.01
> table(cut(df$x, breaks=breaks,labels=F),cut(df
$y,breaks=breaks,labels=F))
# use 'cut' to partition and then 'table' to count

 1 2 3 4 5 6 7 8 9 10
  1  0 2 0 1 0 3 0 1 0 0
  2  0 1 0 0 0 2 1 2 0 0
  3  0 1 0 0 3 0 2 2 1 2
  4  0 0 1 2 3 3 1 2 2 0
  5  3 1 2 2 1 2 1 1 1 0
  6  2 0 2 0 0 0 0 1 0 0
  7  0 1 1 1 2 1 1 1 2 1
  8  0 3 2 1 1 2 2 2 1 1
  9  0 0 2 2 0 1 2 0 2 2
  10 0 2 1 0 0 0 0 0 0 3

-
Ray Brownrigg

Another significantly faster way (but not generating row/column names)
is:
x <- runif(3000)
y <- runif(3000)
ints <- 100
myfun <- function(x, y, ints) {
  fx <- x %/% (1/ints)
  fy <- y %/% (1/ints)
  txy <- hist(fx + ints*fy+ 1, breaks=0:(ints*ints), plot=FALSE)$counts
  dim(fxy) <- c(ints, ints)
  return(txy)
}
myfun(x, y, ints)


---
Rajarshi Guha <[EMAIL PROTECTED]> 
GPG Fingerprint: 0CCA 8EE2 2EEB 25E2 AB04 06F7 1BB9 E634 9B87 56EE
---
Q: Why did the mathematician name his dog "Cauchy"?
A: Because he left a residue at every pole.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] custom loss function + nonlinear models

2005-04-04 Thread Kjetil Brinchmann Halvorsen
Christian Mora wrote:
Hi all;
I'm trying to fit a reparameterization of the
assymptotic regression model as that shown in
Ratkowsky (1990) page 96. 

Y~y1+(((y2-y1)*(1-((y2-y3)/(y3-y1))^(2*(X-x1)/(x2-x1/(1-((y2-y3)/(y3-y1))^2))
where y1,y2,y3 are expected-values for X=x1, X=x2, and
X=average(x1,x2), respectively.
I tried first with Statistica v7 by LS and
Gauss-Newton algorithm without success (no
convergence: predictors are redundant). Then I
tried with the option CUSTOM LOSS FUNCTION and several
algorithms like Quasi-Newton, Simplex, Hookes-Jeeves,
among others. In all these cases the model converged
to some values for the parameters in it.
My question is (after searching the help pages) : Is
there such a thing implemented in R or can it be
easily implemented? In other words, is it possible to
define which loss function to use and the algorithm to
find the parameters estimates? 

Thanks
Christian
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

 

try directly with optim()
Kjetil
--
Kjetil Halvorsen.
Peace is the most effective weapon of mass construction.
  --  Mahdi Elmandjra

--
Internal Virus Database is out-of-date.
Checked by AVG Anti-Virus.
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] a question about box counting

2005-04-04 Thread Ray Brownrigg
> From: Deepayan Sarkar <[EMAIL PROTECTED]> Mon, 4 Apr 2005 13:52:48 -0500
> 
> On Monday 04 April 2005 13:22, Rajarshi Guha wrote:
> > Hi,
> >   I have a set of x,y data points and each data point lies between
> > (0,0) and (1,1). Of this set I have selected all those that lie in
> > the lower triangle (of the plot of these points).
> >
> > What I would like to do is to divide the region (0,0) to (1,1) into
> > cells of say, side = 0.01 and then count the number of cells that
> > contain a point.
> >
> > My first approach is to generate the coordinates of these cells and
> > then loop over the point list to see whether a point lies in a cell
> > or not.
> >
> > However this seems to be very inefficient esepcially since I will
> > have 1000's of points.
> >
> > Has anybody dealt with this type of problem and are there routines to
> > handle it?
> 
> A combination of cut and table/xtabs should do it, e.g.:
> 
> 
> x <- runif(3000)
> y <- runif(3000)
> 
> fx <- cut(x, breaks = seq(0, 1, length = 101))
> fy <- cut(y, breaks = seq(0, 1, length = 101))
> 
> txy <- xtabs(~ fx + fy)
> :

Another significantly faster way (but not generating row/column names)
is:
x <- runif(3000)
y <- runif(3000)
ints <- 100
myfun <- function(x, y, ints) {
  fx <- x %/% (1/ints)
  fy <- y %/% (1/ints)
  txy <- hist(fx + ints*fy+ 1, breaks=0:(ints*ints), plot=FALSE)$counts
  dim(fxy) <- c(ints, ints)
  return(txy)
}
myfun(x, y, ints)

Hope this helps,
Ray Brownrigg

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] custom loss function + nonlinear models

2005-04-04 Thread Christian Mora
Hi all;

I'm trying to fit a reparameterization of the
assymptotic regression model as that shown in
Ratkowsky (1990) page 96. 

Y~y1+(((y2-y1)*(1-((y2-y3)/(y3-y1))^(2*(X-x1)/(x2-x1/(1-((y2-y3)/(y3-y1))^2))

where y1,y2,y3 are expected-values for X=x1, X=x2, and
X=average(x1,x2), respectively.

I tried first with Statistica v7 by LS and
Gauss-Newton algorithm without success (no
convergence: predictors are redundant). Then I
tried with the option CUSTOM LOSS FUNCTION and several
algorithms like Quasi-Newton, Simplex, Hookes-Jeeves,
among others. In all these cases the model converged
to some values for the parameters in it.

My question is (after searching the help pages) : Is
there such a thing implemented in R or can it be
easily implemented? In other words, is it possible to
define which loss function to use and the algorithm to
find the parameters estimates? 

Thanks
Christian

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Package 'outliers' (Dixon, Grubbs)

2005-04-04 Thread Lukasz Komsta
Dnia 2005-04-04 20:59, UÅytkownik Ben Fairbank napisaÅ:
Forbidden
You don't have permission to access /outliers/ on this server.

Bad Options in httpd.conf, just corrected. Thanks.
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] a question about box counting

2005-04-04 Thread Deepayan Sarkar
On Monday 04 April 2005 13:22, Rajarshi Guha wrote:
> Hi,
>   I have a set of x,y data points and each data point lies between
> (0,0) and (1,1). Of this set I have selected all those that lie in
> the lower triangle (of the plot of these points).
>
> What I would like to do is to divide the region (0,0) to (1,1) into
> cells of say, side = 0.01 and then count the number of cells that
> contain a point.
>
> My first approach is to generate the coordinates of these cells and
> then loop over the point list to see whether a point lies in a cell
> or not.
>
> However this seems to be very inefficient esepcially since I will
> have 1000's of points.
>
> Has anybody dealt with this type of problem and are there routines to
> handle it?

A combination of cut and table/xtabs should do it, e.g.:


x <- runif(3000)
y <- runif(3000)

fx <- cut(x, breaks = seq(0, 1, length = 101))
fy <- cut(y, breaks = seq(0, 1, length = 101))

txy <- xtabs(~ fx + fy)
image(txy > 0)
sum(txy > 0)


Deepayan

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Data set for loglinear analysis

2005-04-04 Thread Kjetil Brinchmann Halvorsen
Warfield Jr., Joseph D. wrote:
Dear users
I need to perform a loglinear analysis of a real data set for a course
project.  I need a real data set with contingency tables in at least 3
dimensional, each with 
more than 2 levels.

Thanks
Joe Warfield  

[[alternative HTML version deleted]]
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

 

Do
data(package="datasets")
and look.
maybe data(UCBAdmissions)
Kjetil
--
Kjetil Halvorsen.
Peace is the most effective weapon of mass construction.
  --  Mahdi Elmandjra

--
Internal Virus Database is out-of-date.
Checked by AVG Anti-Virus.
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Package 'outliers' (Dixon, Grubbs)

2005-04-04 Thread Lukasz Komsta
Dear useRs,
I have just uploaded to CRAN first version of my package "outliers" for 
testing data for outlying observations. It contains all types of Dixon 
and Grubbs test and the Cochran test for outlying variance.

Until placing in package collection, the files are also availalble at my 
homepage, http://www.komsta.net/outliers/. I will remove them after 
adding package to CRAN.

Documentation in pdf and dvi is also supplied.
I will greatly appresiate any comments and bug reports.
Greetings,
Lukasz
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] coalesce in its

2005-04-04 Thread Achim Zeileis
On Mon, 4 Apr 2005 13:46:46 -0400 Omar Lakkis wrote:

> I have two data sets that I converted to its objects to get:
> 
> > ts1 
> date  settle
> 1 2000-09-29 107.830
> 2 2000-10-02 108.210
> 3 2000-10-03 108.800
> 4 2000-10-04 108.800
> 5 2000-10-05 109.155
> ts2>
> date  settle
> 1 2000-09-25 107.610
> 2 2000-09-26 107.585
> 3 2000-09-27 107.385
> 4 2000-09-28 107.595
> 5 2000-09-29 107.875
> 6 2000-10-02 108.805
> 7 2000-10-03 108.665
> 8 2000-10-04 109.280
> 9 2000-10-05 109.290
> 
> I want to get a list of the values of ts1 with the missing dates
> substitute from ts2. When I do union(ts1,ts2) I get
> 
> > u
>  1   1
> 2000-09-24  NA 107.610
> 2000-09-25  NA 107.585
> 2000-09-26  NA 107.385
> 2000-09-27  NA 107.595
> 2000-09-28 107.830 107.875
> 2000-10-01 108.210 108.805
> 2000-10-02 108.800 108.665
> 2000-10-03 108.800 109.280
> 2000-10-04 109.155 109.290
> 
> Other than looping, is there a way to get the first column with NA
> values substituted from the second column?

If I understand you correctly, you want:

R> ts3 <- union(ts1, ts2)
R> repIndex <- is.na(ts3[,1])
R> ts3[repIndex, 1] <- ts3[repIndex, 2]
R> ts3[,1]
settle
2000-09-25 107.610
2000-09-26 107.585
2000-09-27 107.385
2000-09-28 107.595
2000-09-29 107.830
2000-10-02 108.210
2000-10-03 108.800
2000-10-04 108.800
2000-10-05 109.155

hth,
Z

> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] a question about box counting

2005-04-04 Thread Rajarshi Guha
Hi,
  I have a set of x,y data points and each data point lies between (0,0)
and (1,1). Of this set I have selected all those that lie in the lower
triangle (of the plot of these points).

What I would like to do is to divide the region (0,0) to (1,1) into
cells of say, side = 0.01 and then count the number of cells that
contain a point.

My first approach is to generate the coordinates of these cells and then
loop over the point list to see whether a point lies in a cell or not.

However this seems to be very inefficient esepcially since I will have
1000's of points.

Has anybody dealt with this type of problem and are there routines to
handle it?


---
Rajarshi Guha <[EMAIL PROTECTED]> 
GPG Fingerprint: 0CCA 8EE2 2EEB 25E2 AB04 06F7 1BB9 E634 9B87 56EE
---
Alone, adj.: In bad company.
-- Ambrose Bierce, "The Devil's Dictionary"

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] coalesce in its

2005-04-04 Thread Omar Lakkis
I have two data sets that I converted to its objects to get:

> ts1 
date  settle
1 2000-09-29 107.830
2 2000-10-02 108.210
3 2000-10-03 108.800
4 2000-10-04 108.800
5 2000-10-05 109.155
ts2>
date  settle
1 2000-09-25 107.610
2 2000-09-26 107.585
3 2000-09-27 107.385
4 2000-09-28 107.595
5 2000-09-29 107.875
6 2000-10-02 108.805
7 2000-10-03 108.665
8 2000-10-04 109.280
9 2000-10-05 109.290

I want to get a list of the values of ts1 with the missing dates
substitute from ts2. When I do union(ts1,ts2) I get

> u
 1   1
2000-09-24  NA 107.610
2000-09-25  NA 107.585
2000-09-26  NA 107.385
2000-09-27  NA 107.595
2000-09-28 107.830 107.875
2000-10-01 108.210 108.805
2000-10-02 108.800 108.665
2000-10-03 108.800 109.280
2000-10-04 109.155 109.290

Other than looping, is there a way to get the first column with NA
values substituted from the second column?

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Data set for loglinear analysis

2005-04-04 Thread Warfield Jr., Joseph D.
Dear users

I need to perform a loglinear analysis of a real data set for a course
project.  I need a real data set with contingency tables in at least 3
dimensional, each with 
more than 2 levels.

Thanks
Joe Warfield  

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: [R] Amount of memory under different OS

2005-04-04 Thread bogdan romocea
You need another OS. Standard/32-bit Windows (XP, 2000 etc) can't use
more than 4 GB of RAM. Anyway, if you try to buy a box with 16 GB of
RAM, the seller will probably warn you about Windows and recommend a
suitable OS.


-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
Sent: Saturday, April 02, 2005 12:48 PM
To: r-help@stat.math.ethz.ch
Subject: [R] Amount of memory under different OS


Hi,
I have a problem: I need to perform a very tough analysis, so I would
like
to buy a new computer with about 16 GB of RAM. Is it possible to use
all
this memory under Windows or  have I to install other OS?
Thanks,


  Marco

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html




__ 

Show us what our next emoticon should look like. Join the fun. 
http://www.advision.webevents.yahoo.com/emoticontest

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] help with kolmogorov smirnov test

2005-04-04 Thread James Reilly
Agnes Gault wrote:
Hello!
I am an 'R beginner'. I am trying to check if my data follow a negative 
binomial function.
the command i've typed in is:

 >  nbdo=rnegbin(58,mu=27.82759,theta=0.7349851)
 > ks.test(do$DO,nbdo)
Each time i do that, p given is different
The p-values are different each time because you are using a two-sample 
test, where one of the samples is randomly generated (and thus will be 
different each time). ks.test offers a one-sample test against a 
specified distribution, but this will still have problems with the ties.

---
James Reilly
Department of Statistics, University of Auckland
Private Bag 92019, Auckland, New Zealand
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] mysql retrive question

2005-04-04 Thread David James
simone gabbriellini wrote:
> hello R-Users,
> I have this simple but not for me question:
> 
> I do:
> 
>  > res<-dbSendQuery(con, "SELECT * FROM tabellaProva")
>  > myDataFrame<-fetch(res)
>  > myDataMatrix<-as.matrix(myDataFrame[,-1])
>  > namerows(myDataMatrix)<-as.character(myDataFrame[,1])
> 
> and I have:
> 
>io  tu
> io  "0" "1"
> tu  "1" "0"
> 
> my problem is that the content of the matrix is interpreted by R as 
> strings, not as numbers.
> Is there a way to convert those characters to numbers like
> 
>  io  tu
> io  0 1
> tu  1 0
> 
> thanx in advance,
> simone
> 

Hi Simone,

If you use dbReadTable, as I mentioned in my previous email, you should 
be able to coerce myDataFrame to a numeric matrix.

A couple of extra observations: 
(1) If you really want to use fetch() to extract all the rows resulting
from a SELECT statement in a single fetch, you may need to specify 
n=-1, e.g.,

   > fetch(res, n = -1)

otherwise you may only get the first 500 rows. (See ?fetch, ?MySQL, 
and ?dbHasCompleted.) The reason there is this default is to prevent 
crashing R with a very large and unexpected amount of data. By specifying 
n=-1 you're effectively asserting that the output of SELECT can be 
properly handled by R.

(2) Tables in a relational database are only superficially similar to
data.frames (the SQL term "relation" for tables conveys semantics
that do not exist in R), thus fetch() and dbReadTable() do not
coerce their columns to factors.  Clearly, there is a need to allow
users to specify their own converters, as other interfaces (e.g.,
RSPython, RSPerl), and functions (e.g., read.table) actually provide.

Hope this helps,

--
David

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] scan html: sep = ""

2005-04-04 Thread Uwe Ligges
Christoph Lehmann wrote:
entry from html:
  BM 
0.952 0.136 6.9840.00
  BH 
1.338 0.136 9.8210.00


 using
left.data<- scan(paste(path, left.file, sep = ""), what = 'character',
   sep=c("", ""))
yields
 > left.data
 [1] "  "  "tr bgcolor=#9090f0>" "td align=right>"
 [4] "b>BM""/b>" "/td>"
 [7] "td> 0.952"   "/td>""td> 0.136"
[10] "/td>""td> 6.984"   "/td>"
[13] "td>0.00" "/td>""/tr>"
[16] "  "  "tr bgcolor=#9090f0>" "td align=right>"
[19] "b>BH""/b>" "/td>"
[22] "td> 1.338"   "/td>""td> 0.136"
[25] "/td>""td> 9.821"   "/td>"
[28] "td>0.00" "/td>""/tr>"
why doesn't it detect the whole ' as sep?
Uwe Ligges wrote:
Christoph Lehmann wrote:
Hi
I try to import html text and I need to split the fields at each  
or  entry

How can I succeed? sep = '' doens't yield the right result

If it fits pairwise together, use
  sep=c("", "")
Apologies, one should not send untested code.
"sep" must be a character rather than a string containg more than one 
character.

So you may want to try out my second suggestion.
Uwe Ligges


if not, you can read the whole lot with readLines and strsplit for 
both pattern after that, for example.

Uwe Ligges

thanks for hints
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: [R] scan html: sep = ""

2005-04-04 Thread Eric Lecoutre
You can import the whole thing and use on it "strsplit"

?strsplit

Eric

Eric Lecoutre
UCL /  Institut de Statistique
Voie du Roman Pays, 20
1348 Louvain-la-Neuve
Belgium

tel: (+32)(0)10473050
[EMAIL PROTECTED]
http://www.stat.ucl.ac.be/ISpersonnel/lecoutre

If the statistics are boring, then you've got the wrong numbers. -Edward
Tufte   


> -Original Message-
> From: [EMAIL PROTECTED] 
> [mailto:[EMAIL PROTECTED] On Behalf Of 
> Christoph Lehmann
> Sent: lundi 4 avril 2005 16:51
> To: r-help@stat.math.ethz.ch
> Subject: [R] scan html: sep = ""
> 
> 
> Hi
> I try to import html text and I need to split the fields at 
> each  or 
>  entry
> 
> How can I succeed? sep = '' doens't yield the right result
> 
> thanks for hints
> 
> __
> R-help@stat.math.ethz.ch mailing list 
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! 
> http://www.R-project.org/posting-guide.html
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] RMySQL question

2005-04-04 Thread David James
simone gabbriellini wrote:
> Dear List,
> I have this little problem:
> 
> I work with adiacency matrix like:
> 
> data  me you
> me0   1
> you   1   0
> 
> I store those matrix in a mysql database
> 
> actually I use RMySQL with:
> 
> res<-dbSendQuery(connection, "SELECT * FROM table")
> myDataFrame<-fetch(res)
> 
> to retrive the table, and I have
> 
> data me  you
> 1  io 0 1
> 2  tu 1 0
> 
> I would like the first column to be seen not as data, but as label, 
> like:
> 
> data me  you
> io  0 1
> tu  1 0
> 
> should I change something in the table structure in mysql, or should I 
> tell R something particular like "the first column is not data"? If so, 
> how?
> 
> hope I have expressed well my intent
> thanx in advance
> 
> simone
> 

Hi,

In this case, since you are extracting all the rows you could simply try

  > dbReadTable(connection, "table", row.names = 1)

See help(dbReadTable) for more details.

Hope this helps,

--
David

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] help with kolmogorov smirnov test

2005-04-04 Thread Uwe Ligges
Agnes Gault wrote:
Hello!
I am an 'R beginner'. I am trying to check if my data follow a negative 
binomial function.
the command i've typed in is:

 >  nbdo=rnegbin(58,mu=27.82759,theta=0.7349851)
 > ks.test(do$DO,nbdo)
Each time i do that, p given is different and i get this warning message:
'Warning message:
cannot compute correct p-values with ties in: ks.test(do$DO, nbdo) '
Could someone tell me what's wrong? What does 'with ties in' mean?
If some values are duplicated, you have so called ties. Please read some 
textbook on this problem. This is not an R issue.

Uwe Ligges

Thank you!
--- 

Agnès GAULT
graduate student
UMR 5173 MNHN-CNRS
case postale 50
Species Conservation, Restoration and Population Survey (CERSP)
61 rue Buffon, 1er étage
75005 PARIS
FRANCE
Tel: 33 (0)1 40 79 57 64
Fax: 33 (0)1 40 79 38 35
Email: [EMAIL PROTECTED]
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] help with kolmogorov smirnov test

2005-04-04 Thread Ken Knoblauch

What does 'with ties in' mean?

with some identical elements (par ex., au moins une paire ex-equo)

HTH


Ken Knoblauch
Inserm U371, Cerveau et Vision
Department of Cognitive Neurosciences
18 avenue du Doyen Lepine
69675 Bron cedex
France
tel: +33 (0)4 72 91 34 77
fax: +33 (0)4 72 91 34 61
portable: 06 84 10 64 10
http://www.lyon.inserm.fr/371/

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] scan html: sep = ""

2005-04-04 Thread Uwe Ligges
Christoph Lehmann wrote:
Hi
I try to import html text and I need to split the fields at each  or 
 entry

How can I succeed? sep = '' doens't yield the right result
If it fits pairwise together, use
  sep=c("", "")
if not, you can read the whole lot with readLines and strsplit for both 
pattern after that, for example.

Uwe Ligges

thanks for hints
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] plotting mathematical notation and values substitution

2005-04-04 Thread Sundar Dorai-Raj

Luca Scrucca wrote on 4/4/2005 1:50 PM:
Dear R-users,
I'm trying to add a title on a plot with both mathematical notation and
values substitution. I read the documentation and search the mailing list
but I was not able to solve my problem. Actually, there is a message by
Uwe Ligges on June 2003 which addresses a question very close to mine, but
the code provided doesn't work. The code is the following:
# I add this to let you run the example by copy and paste
t1 <- 0.5; len <- 1
# then
plot(1:10,
   main = substitute("Monotonic Multigamma run (" * n == len * ", " *
theta == t1 * ").", list(len = len, t1 = t1)))
but I got the following:
Error: syntax error
I also tried with just one value substitution:

plot(1:10,
   main = substitute("Monotonic Multigamma run (" theta == t1 * ").",
list(len = len, t1 = t1)))
which works fine. How can I have more than one value substitution,
together with mathematical notation and text?
Thanks in advance for any reply.
Luca Scrucca

Luca,
I believe you need paste in this instance:
t1 <- 0.5; len <- 1
plot(1:10, main = substitute(paste("Monotonic Multigamma run (", n == 
len, ", ", theta == t1, ").", sep = ""), list(len = len, t1 = t1)))

--sundar
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] double backslashes in usage section of .Rd files

2005-04-04 Thread Uwe Ligges
joerg van den hoff wrote:
I have written a package, where a function definition includes a regexp 
pattern including double backslashes, such as

myfunction <- function (pattern = ".*\\.txt$")
when I R CMD CHECK the corresponding .Rd file, I get warnings 
(code/documentation mismatch), if I enforce two backslashes in the 
documentation print out by

\usage { myfunction (pattern = ".*.txt$") }
have I to live with this or is their a way to avoid the warnings (apart 
from being satisfied with a wrong manpage ...)?
Can you check with R-devel please? I think this has been fixed.
Uwe Ligges

regards
joerg
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] mysql retrive question

2005-04-04 Thread Peter Dalgaard
simone gabbriellini <[EMAIL PROTECTED]> writes:

> hello R-Users,
> I have this simple but not for me question:
> 
> I do:
> 
>  > res<-dbSendQuery(con, "SELECT * FROM tabellaProva")
>  > myDataFrame<-fetch(res)
>  > myDataMatrix<-as.matrix(myDataFrame[,-1])
>  > namerows(myDataMatrix)<-as.character(myDataFrame[,1])
> 
> and I have:
> 
>io  tu
> io  "0" "1"
> tu  "1" "0"
> 
> my problem is that the content of the matrix is interpreted by R as
> strings, not as numbers.
> Is there a way to convert those characters to numbers like
> 
>  io  tu
> io  0 1
> tu  1 0

 mode(m)<-"numeric" should do the trick

It looks a bit odd that you seem get numeric data from mySql as mode
"character", but I don't know enough about the interface to say why. 


-- 
   O__   Peter Dalgaard Blegdamsvej 3  
  c/ /'_ --- Dept. of Biostatistics 2200 Cph. N   
 (*) \(*) -- University of Copenhagen   Denmark  Ph: (+45) 35327918
~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] help with kolmogorov smirnov test

2005-04-04 Thread Agnes Gault
Hello!
I am an 'R beginner'. I am trying to check if my data follow a negative 
binomial function.
the command i've typed in is:

>  nbdo=rnegbin(58,mu=27.82759,theta=0.7349851)
> ks.test(do$DO,nbdo)
Each time i do that, p given is different and i get this warning message:
'Warning message:
cannot compute correct p-values with ties in: ks.test(do$DO, nbdo) '
Could someone tell me what's wrong? What does 'with ties in' mean?
Thank you!
---
Agnès GAULT
graduate student
UMR 5173 MNHN-CNRS
case postale 50
Species Conservation, Restoration and Population Survey (CERSP)
61 rue Buffon, 1er étage
75005 PARIS
FRANCE
Tel: 33 (0)1 40 79 57 64
Fax: 33 (0)1 40 79 38 35
Email: [EMAIL PROTECTED]
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] scan html: sep = ""

2005-04-04 Thread Christoph Lehmann
Hi
I try to import html text and I need to split the fields at each  or 
 entry

How can I succeed? sep = '' doens't yield the right result
thanks for hints
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] double backslashes in usage section of .Rd files

2005-04-04 Thread joerg van den hoff
I have written a package, where a function definition includes a regexp 
pattern including double backslashes, such as

myfunction <- function (pattern = ".*\\.txt$")
when I R CMD CHECK the corresponding .Rd file, I get warnings 
(code/documentation mismatch), if I enforce two backslashes in the 
documentation print out by

\usage { myfunction (pattern = ".*.txt$") }
have I to live with this or is their a way to avoid the warnings (apart 
from being satisfied with a wrong manpage ...)?

regards
joerg
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] mysql retrive question

2005-04-04 Thread simone gabbriellini
hello R-Users,
I have this simple but not for me question:
I do:
> res<-dbSendQuery(con, "SELECT * FROM tabellaProva")
> myDataFrame<-fetch(res)
> myDataMatrix<-as.matrix(myDataFrame[,-1])
> namerows(myDataMatrix)<-as.character(myDataFrame[,1])
and I have:
  io  tu
io  "0" "1"
tu  "1" "0"
my problem is that the content of the matrix is interpreted by R as 
strings, not as numbers.
Is there a way to convert those characters to numbers like

io  tu
io  0 1
tu  1 0
thanx in advance,
simone
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] RMySQL question

2005-04-04 Thread Don MacQueen
Use the rownames() function to set the rownames equal to your first 
column, and then drop the first column.

I don't know if there is a way to do it during retrieval from MySQL.
-Don
At 11:10 PM +0200 4/2/05, simone gabbriellini wrote:
Dear List,
I have this little problem:
I work with adiacency matrix like:
datame you
me  0   1
you 1   0
I store those matrix in a mysql database
actually I use RMySQL with:
res<-dbSendQuery(connection, "SELECT * FROM table")
myDataFrame<-fetch(res)
to retrive the table, and I have
   data me  you
1  io 0 1
2  tu 1 0
I would like the first column to be seen not as data, but as label, like:
data me  you
io  0 1
tu  1 0
should I change something in the table structure in mysql, or should 
I tell R something particular like "the first column is not data"? 
If so, how?

hope I have expressed well my intent
thanx in advance
simone
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

--
--
Don MacQueen
Environmental Protection Department
Lawrence Livermore National Laboratory
Livermore, CA, USA
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Error in save.image(): image could not be renamed

2005-04-04 Thread Peter Dalgaard
[EMAIL PROTECTED] writes:

> Hello,
> 
> I am doing intensive tests on SVMs parameter selection. Once a while I got the
> error:
>  Error in save.image(): image could not be renamed and is left in .RDataTmp1


> I cannot use the information saves in .RDataTmp1.

Why? Anything wrong with load(".RDataTmp1") ?? Or renaming it manually
to .RData ?

> When that happens I loose
> several hours of tests. It happens, ussualy when the computer is locked, i.e.,
> there is not other relevant processes running on. I can do tests and get the
> problem an repeat axactly the same tests and everythings runs o.k.


-- 
   O__   Peter Dalgaard Blegdamsvej 3  
  c/ /'_ --- Dept. of Biostatistics 2200 Cph. N   
 (*) \(*) -- University of Copenhagen   Denmark  Ph: (+45) 35327918
~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Error in save.image(): image could not be renamed

2005-04-04 Thread jmoreira
Hello,

I am doing intensive tests on SVMs parameter selection. Once a while I got the
error:
 Error in save.image(): image could not be renamed and is left in .RDataTmp1
I cannot use the information saves in .RDataTmp1. When that happens I loose
several hours of tests. It happens, ussualy when the computer is locked, i.e.,
there is not other relevant processes running on. I can do tests and get the
problem an repeat axactly the same tests and everythings runs o.k.

Thanks for any help

Joao Moreira

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Categorizing functions (was: [R] is there a function like %in% for characters?)

2005-04-04 Thread Duncan Murdoch
On Sat, 2 Apr 2005 23:00:43 -0500, Terry Mu <[EMAIL PROTECTED]> wrote :

>thx, that's perfect. I thought of grep(), it also can do this.
>
>I wonder if there is a document or book that explains things
>categorically so it's easy to look up a function.

The HTML help does this:  try help.start(), and look at "search engine
and keywords".  

help.search() also has a keyword argument, but you need to know the
keywords to know what to look for.  ?help.search shows you how to find
them.

Duncan Murdoch

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: [R] How to extrct F value

2005-04-04 Thread John Fox
Dear Xin Meng,

This output presumably was produced by summarizing an object produced by aov(). 
The trick to figuring out what you want to do is to examine the structure of 
the summary object (say, sumry), via str(sumry). In this case sumry[["Error: 
Within"]][[1]]$"F value"[1] should do what you want.

I hope this helps,
 John


John Fox
Department of Sociology
McMaster University
Hamilton, Ontario
Canada L8S 4M4
905-525-9140x23604
http://socserv.mcmaster.ca/jfox 
 

> -Original Message-
> From: [EMAIL PROTECTED] 
> [mailto:[EMAIL PROTECTED] On Behalf Of ??
> Sent: Sunday, April 03, 2005 10:36 PM
> To: r-help@stat.math.ethz.ch
> Subject: [R] How to extrct F value
> 
> Hello sir:
> Here's the result of repeated measures ANOVA.
> 
> 
> $"Error: Within"
>   Df Sum Sq Mean Sq F valuePr(>F)
> t  2 524177  262089  258.24 1.514e-06 ***
> Residuals  6   60891015  
> ---
> Signif. codes:  0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 
> 
> 
> My question is: How to extract the F value only?
> If the result is a dataframe,it'll be much better for 
> extracting F value.But it isn't.
> I'll perform the ANOVA for many genes and rank the F value.So 
> the only useful item is F value. But how to extract the F value?
> 
> Thanks from the bottom of my heart!
> 
> 
> 
> 
> --
> ***
> Xin Meng
> Capitalbio Corporation
> National Engineering Research Center
> for Beijing Biochip Technology
> Microarray and Bioinformatics Dept. 
> Research Engineer
> Tel: +86-10-80715888/80726868-6364/6333
> Fax: +86-10-80726790
> [EMAIL PROTECTED]
> Address:18 Life Science Parkway,
> Changping District, Beijing 102206, China 
> Website:http://www.capitalbio.com
> 
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! 
> http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] plotting mathematical notation and values substitution

2005-04-04 Thread Peter Dalgaard
Luca Scrucca <[EMAIL PROTECTED]> writes:

> Dear R-users,
> 
> I'm trying to add a title on a plot with both mathematical notation and
> values substitution. I read the documentation and search the mailing list
> but I was not able to solve my problem. Actually, there is a message by
> Uwe Ligges on June 2003 which addresses a question very close to mine, but
> the code provided doesn't work. The code is the following:
> 
> # I add this to let you run the example by copy and paste
> > t1 <- 0.5; len <- 1
> # then
> > plot(1:10,
>main = substitute("Monotonic Multigamma run (" * n == len * ", " *
> theta == t1 * ").", list(len = len, t1 = t1)))
> 
> but I got the following:
> 
> Error: syntax error

There's a ")" too many, but more importantly, you have the structure 

A*B==C*D*E==F*G

Since * has higher precedence than ==, this involves associative use
of relational operators (as in 3 < 2 < 1), which is syntactically
forbidden. So you need braces as in

A*{B==C}*D*{E==F}*G

or, maybe easier to read, use:

paste(A, B==C, D, E==F, G}


-- 
   O__   Peter Dalgaard Blegdamsvej 3  
  c/ /'_ --- Dept. of Biostatistics 2200 Cph. N   
 (*) \(*) -- University of Copenhagen   Denmark  Ph: (+45) 35327918
~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] plotting mathematical notation and values substitution

2005-04-04 Thread Rich FitzJohn
Gidday,

See ?plotmath and demo(plotmath) for lots of information on plotting
with mathematical symbols.

This produces what you seem to be after (paste() being the missing
ingredient):
plot(1:10, main=substitute(paste("Monotonic Multigamma run ( *",
 list(n==len, theta==t1), " * )"),
 list(len=len, t1=t1)))

This does seem to be a lot of work just to get a theta symbol, and
this seems just as informative:
plot(1:10, main=paste("Monotonic Multigamma run ( * n =", len,
 "theta =", t1, " * )"))

Your second example gives a syntax error for me.

Cheers!
Rich

On Apr 5, 2005 6:50 AM, Luca Scrucca <[EMAIL PROTECTED]> wrote:
> # I add this to let you run the example by copy and paste
> > t1 <- 0.5; len <- 1
> # then
> > plot(1:10,
>main = substitute("Monotonic Multigamma run (" * n == len * ", " *
> theta == t1 * ").", list(len = len, t1 = t1)))
> 
> but I got the following:
> 
> Error: syntax error
> 
> I also tried with just one value substitution:
> 
> > plot(1:10,
>main = substitute("Monotonic Multigamma run (" theta == t1 * ").",
> list(len = len, t1 = t1)))
> 
> which works fine. How can I have more than one value substitution,
> together with mathematical notation and text?

--
Rich FitzJohn
rich.fitzjohn  gmail.com   |http://homepages.paradise.net.nz/richa183
  You are in a maze of twisty little functions, all alike

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Change density and angle in barplot

2005-04-04 Thread Jan Sabee
Dear R user,

I want to change each density and angle with symbol "+","-","o","#"
and "*". How can I do that?

library(gplots)
barplot2(VADeaths, 
 density=c(5,7,11,15,17), 
 angle=c(65,-45,45,-45,90),
 col = "black",
 legend = rownames(VADeaths))
title(main = list("Death Rates in Virginia", font = 4))

Thanks for your help.
Jan Sabee

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: Re[2]: [R] need any advises for code optimization.

2005-04-04 Thread Rich FitzJohn
Hi again,

The arguments to %in% are in the wrong order in your version.  Because
of that, the statement
  row.names(to.drop) %in% row.names(whole)
will be TRUE for the first nrow(to.drop) elements, and FALSE for the remainder.

To fix it, just switch the order around, or use the simpler version:
  whole[!row.names(whole) %in% row.names(to.drop),]

The fact that your row names are different to the row indices in whole
will be what is causing the error when trying my variant.

Cheers,
Rich

On Apr 4, 2005 10:21 PM, Wladimir Eremeev <[EMAIL PROTECTED]> wrote:
> Dear Rich,
> 
> Thank you for reply. I think, optimization, you offered will satisfy
> my needs.
> I don't completely understand the following.
> 
> RF> ## And wrap the original in a function for comparison:
> RF>   ## This does not subset the way you want:
> RF>   ##  whole[-which(row.names(to.drop) %in% row.names(whole)),]
> RF>   whole[-as.integer(row.names(to.drop)),]
> 
> Why doesn't my subset work properly?
> 
> My data frame 'whole' was created from 3 another data frames by rbind,
> if it makes sense...
> 
> Moreover, your variant gives the error:
> 
> > as.integer(row.names(to.drop)[120:220])
>   [1]   2761   3616   3629   5808   7204   7627   8192  10851  20275 273611   
> 4492 256691   8797
>  [14]  11756  46673 246981 250401 335591773774786993995   
> 1454   2715   6990
>  [27]   7951   7962   8185   8662   9406 442100 478100 528100 208710 211710 
> 215910  19846  28660
>  [40]  28661  28691  28806  28878 450611 497411  81672  91572 119232 166191 
> 166281 203981 204201
>  [53] 255171 255212 255301 300651 331212 371761 397651 405241 415331   8779 
> 195510 197910 203210
>  [66] 205410 205510 211810 220610  19615  27165  28581  28640  28641  28642  
> 28662  28714  48692
>  [79] 449611 449911 497211  81702 195451 202491 202551 253931 255071 259102 
> 266971 303341 331831
>  [92] 353912 371931 374612 394461 397641 412671   9227 464100   1558   2161
> > whole[-as.integer(row.names(to.drop)[120:220]),]
> Error in "[.data.frame"(whole, -as.integer(row.names(to.drop)[120:220]),  :
> subscript out of bounds
> 
> Row names don't coincide with row order numbers in my case.
> 
> --
> Best regards
> Wladimir Eremeev mailto:[EMAIL PROTECTED]
> 
> 


-- 
--
Rich FitzJohn
rich.fitzjohn  gmail.com   |http://homepages.paradise.net.nz/richa183
  You are in a maze of twisty little functions, all alike

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re[2]: [R] need any advises for code optimization.

2005-04-04 Thread Wladimir Eremeev
Dear Rich,

Thank you for reply. I think, optimization, you offered will satisfy
my needs.
I don't completely understand the following.

RF> ## And wrap the original in a function for comparison:
RF>   ## This does not subset the way you want:
RF>   ##  whole[-which(row.names(to.drop) %in% row.names(whole)),]
RF>   whole[-as.integer(row.names(to.drop)),]

Why doesn't my subset work properly?

My data frame 'whole' was created from 3 another data frames by rbind,
if it makes sense...

Moreover, your variant gives the error:

> as.integer(row.names(to.drop)[120:220])
  [1]   2761   3616   3629   5808   7204   7627   8192  10851  20275 273611   
4492 256691   8797
 [14]  11756  46673 246981 250401 335591773774786993995   
1454   2715   6990
 [27]   7951   7962   8185   8662   9406 442100 478100 528100 208710 211710 
215910  19846  28660
 [40]  28661  28691  28806  28878 450611 497411  81672  91572 119232 166191 
166281 203981 204201
 [53] 255171 255212 255301 300651 331212 371761 397651 405241 415331   8779 
195510 197910 203210
 [66] 205410 205510 211810 220610  19615  27165  28581  28640  28641  28642  
28662  28714  48692
 [79] 449611 449911 497211  81702 195451 202491 202551 253931 255071 259102 
266971 303341 331831
 [92] 353912 371931 374612 394461 397641 412671   9227 464100   1558   2161
> whole[-as.integer(row.names(to.drop)[120:220]),]
Error in "[.data.frame"(whole, -as.integer(row.names(to.drop)[120:220]),  : 
subscript out of bounds

Row names don't coincide with row order numbers in my case.

--
Best regards
Wladimir Eremeev mailto:[EMAIL PROTECTED]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: [R] emacs + R?

2005-04-04 Thread Rau, Roland
Hi, 

> 
> However, as I try to start R within emacs as recommended:
> C-u M-x R
> emacs answers [no match]
> 
> the same if I provide the whole path to  the executable:
> C-u M-x /usr/bin/R[no match]
> 

given you have installed ESS (Emacs Speaks Statistics), you can start an
R session within Emacs easily like this:
M-x R

It just worked for me using:
XEmacs 21.4.13 on Windows XP.


Best,
Roland


+
This mail has been sent through the MPI for Demographic Rese...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] plotting mathematical notation and values substitution

2005-04-04 Thread Luca Scrucca
Dear R-users,

I'm trying to add a title on a plot with both mathematical notation and
values substitution. I read the documentation and search the mailing list
but I was not able to solve my problem. Actually, there is a message by
Uwe Ligges on June 2003 which addresses a question very close to mine, but
the code provided doesn't work. The code is the following:

# I add this to let you run the example by copy and paste
> t1 <- 0.5; len <- 1
# then
> plot(1:10,
   main = substitute("Monotonic Multigamma run (" * n == len * ", " *
theta == t1 * ").", list(len = len, t1 = t1)))

but I got the following:

Error: syntax error

I also tried with just one value substitution:

> plot(1:10,
   main = substitute("Monotonic Multigamma run (" theta == t1 * ").",
list(len = len, t1 = t1)))

which works fine. How can I have more than one value substitution,
together with mathematical notation and text?

Thanks in advance for any reply.

Luca Scrucca


+---+
| Dr. Luca Scrucca  |
| Dipartimento di Economia, Finanza e Statistica|
| Sezione di Statistica  tel. +39-075-5855226   |
| Università degli Studi di Perugia  fax. +39-075-5855950   |
| Via Pascoli - C.P. 1315 Succ. 1   |
| 06100 PERUGIA  (ITALY)|
|  (o_   (o_   (o_  |
| E-mail:   [EMAIL PROTECTED]//\   //\   //\   |
| Web page: http://www.stat.unipg.it/luca V_/_  V_/_  V_/_  |
+---+

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] emacs + R?

2005-04-04 Thread Federico Calboli
On Mon, 2005-04-04 at 10:11 +0200, Giorgio Corani wrote:
> Dear All,
> 
> 
> As far I as I have understood reading both your past posting and the
> documentation, in order to have the  command-line completion facility, I
> have to run R within emacs.
> 
> However, as I try to start R within emacs as recommended:
> C-u M-x R
> emacs answers [no match]
> 
> the same if I provide the whole path to  the executable:
> C-u M-x /usr/bin/R[no match]
> 

You need to install the package (library?) ESS for R to work with emacs.
It's really easy if you are using Debian, gust apt-get ess.


F.


-- 
Federico C. F. Calboli
Department of Epidemiology and Public Health
Imperial College, St Mary's Campus
Norfolk Place, London W2 1PG

Tel  +44 (0)20 7594 1602 Fax (+44) 020 7594 3193

f.calboli [.a.t] imperial.ac.uk
f.calboli [.a.t] gmail.com

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] need any advises for code optimization.

2005-04-04 Thread Rich FitzJohn
Hi,

One fruitful course for optimisation is to vectorise wherever
possible, and avoid for-loops.

Something like the code below might be a good place to start.

=

## Generate a thousand rows of data
cube.half.size <- 2
mult.sigma <- 2
n <- 1000
whole <- data.frame(a=runif(n), b=runif(n), c=rnorm(n), d=runif(n))*10

cube.look <- function() {
  f <- function(x) {
with(whole,
 {i <- (abs(a - a[x]) < cube.half.size &
abs(b - b[x]) < cube.half.size &
abs(d - d[x]) < cube.half.size)
  if ( any(i) ) {
subdat <- c[i]
which(i)[abs(subdat - mean(subdat)) > sd(subdat)*mult.sigma]
  } else NULL
})
  }

  td <- lapply(seq(length=n), f)
  whole[-unique(unlist(td)),]
}

## And wrap the original in a function for comparison:
cube.look.orig <- function() {
  to.drop<-data.frame()

  for(i in 1:length(whole$c)){
pv<-subset(whole,abs(a-whole$a[i])1){
  mean.c<-mean(pv$c)
  sd.c<-sd(pv$c)
  td<-subset(pv,abs(c-mean.c)>sd.c*mult.sigma)
  if(length(td$c)>0){
td.index<-which(row.names(td) %in% row.names(to.drop))
to.drop<-rbind(to.drop,if(length(td.index)>0) td[-td.index,]else td)
if(length(td.index)!=length(td$c))
  print(c("i=",i,"Points to drop: ",length(to.drop$c)))
  }
}
  }
  td.orig <<- to.drop
  ## This does not subset the way you want:
  ##  whole[-which(row.names(to.drop) %in% row.names(whole)),]
  whole[-as.integer(row.names(to.drop)),]
}

## Time how long it takes to run over the test data.frame (10 runs):
t.new <- t(replicate(10, system.time(cube.look(
t.orig <- t(replicate(10, system.time(cube.look.orig(

## On my alpha, the version using lapply() takes 4.9 seconds of CPU
## time (avg 10 runs), while the original version takes 23.3 seconds -
## so we're 4.8 times faster.
apply(t.orig, 2, mean)/apply(t.new, 2, mean)

## And the results are the same.
r.new <- cube.look()
r.orig <- cube.look.orig()
identical(r.new, r.orig)

==

The above code could almost certainly be tweaked (replacing which()
with a stripped down version would probably save some time, since the
profile indicates we spend about 10% of our time there).  Using with()
saved another 10% or so, compared with indexing a..d (e.g. whole$a)
every iteration.  However, trying a completely different approach
would be more likely to yield better time savings.  mapply() might be
one to try, though I have a feeling this is just a wrapper around
lapply().  I believe there is a section in the "Writing R Extensions"
manual that deals with profiling, and may touch on optimisation.

Cheers,
Rich

On Apr 4, 2005 6:50 PM, Wladimir Eremeev <[EMAIL PROTECTED]> wrote:
> Dear colleagues,
> 
>   I have the following code. This code is to 'filter' the data set.
> 
>   It works on the data frame 'whole' with four numeric columns: a,b,d, and c.
>   Every row in the data frame is considered as a point in 3-D space.
>   Variables a,b, and d are the point's coordinates, and c is its value.
>   This code looks at every point, builds a cube 'centered' at this
>   point, selects the set of points inside this cube,
>   calculates mean and SD of their values,
>   and drops points whose values differ from the mean more than 2 SD.
> 
>   Here is the code.
> 
> ===
> # initialization
> cube.half.size<-2# half size of a cube to be built around every point
> 
> mult.sigma<-2# we will drop every point with value differing
>  # from mean more than mult.sigma * SD
> 
> to.drop<-data.frame() # the list of points to drop.
> 
> for(i in 1:length(whole$c)){   #look at every point...
>   pv<-subset(whole,abs(a-whole$a[i])abs(b-whole$b[i])abs(d-whole$d[i])   if(length(pv$c)>1){  # if subset includes not only considered point, then
> mean.c<-mean(pv$c)   #  calculate mean and SD
> sd.c<-sd(pv$c)
> 
> #make a list of points to drop from current subset
> td<-subset(pv,abs(c-mean.c)>sd.c*mult.sigma)
> if(length(td$c)>0){
> 
>#check which of these point are already already in the list to drop
>   td.index<-which(row.names(td) %in% row.names(to.drop))
> 
>#and replenish the list of points to drop
>   to.drop<-rbind(to.drop,if(length(td.index)>0) td[-td.index,] else td)
> 
>#print out the message showing,  we're alive (these messages will
>#not appear regularly, that's OK)
>   if(length(td.index)!=length(td$c))
> print(c("i=",i,"Points to drop: ",length(to.drop$c)))
> }
>   }
> }
> 
> # make a new data set without droppped points.
> whole.flt.3<-whole[-which(row.names(to.drop) %in% row.names(whole)),]
> ===
> 
>   The problem is: the 'whole' data set is large, more than 10
>   rows, and the script runs several hours.
>   The running time becomes greater, if I build a sphere instead of a
>   cube.
> 
>   I would like to optimize it in order to make it run faster.
>   Is it possible?
>   Wi

Re: [R] Handling very large integers with factorial and combinat (nCm)

2005-04-04 Thread Prof Brian Ripley
On Mon, 4 Apr 2005, Marco Chiarandini wrote:
Dear list,
perhpas this question is more suitable for R-dev but since I am not
really a developer I post it here first.
Apparently the following lines do not create any problem in R:
library(combinat)
r <- 20; b <- 2;
sum( sapply(0:r,function(x) nCm(r,x)^(2*b)) ) > 2^64
while in C I obtain an overflow of data even using unsigned long long
and with long double I incurr in precision problems.
Where can I find information about how R (or the combinat package)
handles very large integer numbers?
In this case, as doubles.  R numeric variables are doubles, and 'r' and 
'b' are numeric, not integer.  However,

r <- as.integer(20); b <- as.integer(2)
sum( sapply(0:r,function(x) nCm(r,x)^(2*b)) )
gives the same result (and internally nCm computes in doubles: Note that 
factorials are computed via lgamma).

--
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Generating a binomial random variable correlated with a normalrandom variable

2005-04-04 Thread Dimitris Rizopoulos
one idea is to consider that the underlying (for ease normally 
distributed) latent variables that produce the Bernoulli trials are 
correlated with your original normal random variable.

I hope it helps.
Best,
Dimitris

Dimitris Rizopoulos
Ph.D. Student
Biostatistical Centre
School of Public Health
Catholic University of Leuven
Address: Kapucijnenvoer 35, Leuven, Belgium
Tel: +32/16/336899
Fax: +32/16/337015
Web: http://www.med.kuleuven.ac.be/biostat/
http://www.student.kuleuven.ac.be/~m0390867/dimitris.htm
- Original Message - 
From: "Ashraf Chaudhary" <[EMAIL PROTECTED]>
To: 
Sent: Monday, April 04, 2005 12:41 AM
Subject: [R] Generating a binomial random variable correlated with a 
normalrandom variable


Hi All:
I would like to generate a binomial random variable that correlates 
with a
normal random variables with a specified correlation. Off course, 
the
correlation coefficient would not be same at each run because of 
randomness.
I greatly appreciate your input.
Ashraf

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: [R] Object item extraction

2005-04-04 Thread Eric Lecoutre


Any R output is an object you can manipulate basically using
exctractions functions $, [ and [[
To look at the content of the object, try:

> str(model)

And (in this case)

> str(summary(model))

Then you can extract what you need, such as:

> summary(model)$adj.r.squared
[1] 0.02158191(an other model...)

 1.419101  1.00 18.00 
> summary(model)$fstatistic
value numdf dendf 
 1.419101  1.00 18.00 
> summary(model)$fstatistic[["value"]]
[1] 1.419101

HTH,

Eric



Eric Lecoutre
UCL /  Institut de Statistique
Voie du Roman Pays, 20
1348 Louvain-la-Neuve
Belgium

tel: (+32)(0)10473050
[EMAIL PROTECTED]
http://www.stat.ucl.ac.be/ISpersonnel/lecoutre

If the statistics are boring, then you've got the wrong numbers. -Edward
Tufte   


> -Original Message-
> From: [EMAIL PROTECTED] 
> [mailto:[EMAIL PROTECTED] On Behalf Of Smit, 
> R. (Robin) (IenT)
> Sent: lundi 4 avril 2005 10:36
> To: r-help@stat.math.ethz.ch
> Subject: [R] Object item extraction
> 
> 
> Hello 
>  
> I am able to extract partial regression coefficients from a 
> fitted model object "model", i.e.
>  
> model <- lm(var.sel.gkm, weights = count.gkm, data = DATA)
> 
> summary(model)
> 
> write.table(model$coef, file = "C:/coef_CO_gkm.txt", 
> row.names = TRUE, col.names = TRUE)
> 
> I was wondering if anyone could advise me how to extract 
> other object items such as std. error, t-values and adjusted 
> R2 in the same way.
>  
> Many thanks.
> Robin Smit
> 
>  
> 
> This e-mail and its contents are subject to the DISCLAIMER at 
> http://www.tno.nl/disclaimer/email.html
>   [[alternative HTML version deleted]]
> 
> __
> R-help@stat.math.ethz.ch mailing list 
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! 
> http://www.R-project.org/posting-guide.html
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Handling very large integers with factorial and combinat (nCm)

2005-04-04 Thread Marco Chiarandini
Dear list,

perhpas this question is more suitable for R-dev but since I am not
really a developer I post it here first.

Apparently the following lines do not create any problem in R:

library(combinat)
r <- 20; b <- 2;
sum( sapply(0:r,function(x) nCm(r,x)^(2*b)) ) > 2^64

while in C I obtain an overflow of data even using unsigned long long
and with long double I incurr in precision problems.

Where can I find information about how R (or the combinat package)
handles very large integer numbers?


Thank you for consideration,


Marco

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Object item extraction

2005-04-04 Thread Smit, R. \(Robin\) \(IenT\)
Hello 
 
I am able to extract partial regression coefficients from a fitted model
object "model", i.e.
 
model <- lm(var.sel.gkm, weights = count.gkm, data = DATA)

summary(model)

write.table(model$coef, file = "C:/coef_CO_gkm.txt", row.names = TRUE,
col.names = TRUE)

I was wondering if anyone could advise me how to extract other object
items such as std. error, t-values and adjusted R2 in the same way.
 
Many thanks.
Robin Smit

 

This e-mail and its contents are subject to the DISCLAIMER at 
http://www.tno.nl/disclaimer/email.html
[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: [R] Generating a binomial random variable correlated with a

2005-04-04 Thread Ted Harding
On 03-Apr-05 Ashraf Chaudhary wrote:
> Hi All:
> I would like to generate a binomial random variable that
> correlates with a normal random variables with a specified
> correlation. Off course, the correlation coefficient would
> not be same at each run because of randomness.
> I greatly appreciate your input.
> Ashraf

It's not at all clear what you mean by this. For example,
are you seeking:

A) X (continuous) and R (discrete, distributed on (0,n))
   are such that the marginal distribution of X is normal,
   the marginal distribution of R is binomial, and the
   correlation coefficient between X and R is specified?

B) Given X, R on (0,n) has a binomial distribution with
   probability parameter p which depends on X?

C) For each of n values of X, R is binary (0,1) with
   P[R=1] depending on X, such that sum(R from 1 to n)
   has a binomial distribution, and the correlation
   between X and R is specified?

And so on.

Also, it is not obvious what interpretation to put on
the correlation coefficient between a discrete variable
and a continuous variable.

How large is the "n" parameter in the binomial distribution
intended to be?

It would help if you described what you are really looking
for in much more explicit detail!

Bestg wishes,
Ted.



E-Mail: (Ted Harding) <[EMAIL PROTECTED]>
Fax-to-email: +44 (0)870 094 0861
Date: 04-Apr-05   Time: 09:03:29
-- XFMail --

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] emacs + R?

2005-04-04 Thread Giorgio Corani
Dear All,
As far I as I have understood reading both your past posting and the
documentation, in order to have the  command-line completion facility, I
have to run R within emacs.
However, as I try to start R within emacs as recommended:
C-u M-x R
emacs answers [no match]
the same if I provide the whole path to  the executable:
C-u M-x /usr/bin/R[no match]
sorry for such beginner question.
regards
Giorgio Corani
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html