from:"Mike"


On Wed, 22 Jun 2011, Jim Silverton wrote:

I am generating 1,000 replicates of 10,000 of these 2 x 3 tables but R 
cannot seem to save it. Its over 1 Gig. Any ideas on how I can store 
this large amount of data? Should I use a list or a matrix?


Is English your first language?  If so, you can probably understand our 
questions and answer them.


You might consider storing a 2x3 table as one row of length 6 in a matrix 
with 10,000 rows.  Store that as one file -- it's only 60,000 elements, 
all of them integers.  Store 1,000 files.


You aren't being clear about what you are doing.

Mike

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Hardy Weinberg


On Thu, 23 Jun 2011, David Duffy wrote:

I am interested in simulating 10,000 2 x 3 tables for SNPs data with 
the Hardy Weinberg formulation. Is there a quick way to do this? I am 
assuming that the minor allelle frequency is uniform in (0.05, 0.25).


rmultinom() with HWE expectations



I'm also unsure of what the 2x3 table is (what are the rows?), and David 
gave you a good answer for how to make tabular data, but I just thought 
I'd add some info for those who may be interested...


If you want to make genotype data in the form of minor allele counts under 
HW, you can use rbinom() with size=2 and prob=MAF.


Suppose variable "n" is the number of subjects for whom data is to be 
generated, "maf" is a vector of minor allele frequencies, and "m" is the 
length of maf (i.e., the number of markers to be generated.  Then this 
will produce a data matrix (subjects by markers):



maf <- c( .1, .2, .5 )
n <- 7
m <- length( maf )
t( matrix( rbinom( n*m, 2, maf ), m, n ) )

 [,1] [,2] [,3]
[1,]002
[2,]011
[3,]102
[4,]021
[5,]111
[6,]012
[7,]001

Of course, when data are produced that way, the markers are in linkage 
equilibrium (an unfortunate term meaning that the genotypes for pairs of 
markers are not associated).


Mike

--
Michael B. Miller, Ph.D.
Minnesota Center for Twin and Family Research
Department of Psychology
University of Minnesota

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Time-series analysis with treatment effects - statistical approach

2011-06-22 Thread Mike Marchywka





> Date: Wed, 22 Jun 2011 17:21:52 -0700
> From: jmo...@student.canterbury.ac.nz
> To: r-help@r-project.org
> Subject: Re: [R] Time-series analysis with treatment effects - statistical 
> approach
> 
> Hi Mike, here's a sample of my data so that you get an idea what I'm working
> with. 

Thanks, data helps make statements easier to test :)  I'm quite
busy at moment but I will try to look during dead time.

> 
> http://r.789695.n4.nabble.com/file/n3618615/SampleDataSet.txt
> SampleDataSet.txt 
> 
> Also, I've uploaded an image showing a sample graph of daily soil moisture
> by treatment. The legend shows IP, IP+, PP, PP+ which are the 4 treatments.
> Also, I've included precipitation to show the soil moisture response to
> precip.

Personally I'd try to write a simple physical model or two and see which one(s)
fit best. It shouldn't be too hard to find sources and sinks of water and write
a differential equation with a few parameters.  There are probably online
lecture notes that cover this or related examples. You probably suspect a
mode of action for the treatments, see if that is consistent with observed 
dyanmics.
You may need to go get temperature and cloud data but it may or may not
be worth it. 

> 
> http://r.789695.n4.nabble.com/file/n3618615/MeanWaterPrecipColour2ndSeasonOnly.jpeg
>  
> 
> I have used ANOVA previously, but I don't like it for 2 reasons. The first
> is that I have to average away all of the interesting variation. But mainly,

There are  a number of assumptions that go into that to make it useful. If
you are just drawing samples from populations of identical independent things
great but here I would look at things related to non-stationary statistics of
time series. 

> it becomes quite cumbersome to do a separate ANOVA for each day (700+ days)
> or even each week (104 weeks). 

I discovered a way to do repetitive tasks that can be concisely specified using
something called a computer.  Writing loops is pretty easy, don't give up
due to cumbersomeness. Also, you could try a few simple things like plotting
difference charts ( plot treatment minus control for example). 

If you approach this purely empirically, there are time series packages 
and maybe the econ/quant financial analysts would have some thoughts
that wouldn't be well known in your field. 


> 
> Thanks for your help,
> -Justin
> 
> --
> View this message in context: 
> http://r.789695.n4.nabble.com/Time-series-analysis-with-treatment-effects-statistical-approach-tp3615856p3618615.html
> Sent from the R help mailing list archive at Nabble.com.
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Re; Getting SNPS from PLINK to R


Resending to correct bad subject line...


On Mon, 20 Jun 2011, Jim Silverton wrote:

I a using plink on a large SNP dataset with a .map and .ped file. I want 
to get some sort of file say a list of all the SNPs that plink is saying 
that I have. ANyideas on how to do this?



All the SNPs you have are listed in the .map file.  An easy way to put the 
data in to R, if there isn't too much, is to do this:


plink --file whatever --out whatever --recodeA

That will make a file called whatever.raw, single space delimited, 
consisting of minor allele counts (0, 1, 2, NA) that you can bring into R 
like this:


data <- read.table("whatever.raw", delim=" ", header=T)

If you have tons of data, you'll want to work with the compact binary 
format (four genotypes per byte):


plink --file whatever --out whatever --make-bed

Then see David Duffy's reply.  However, I'm not sure if R can work with 
the compact format in memory.  It might expand those genotypes (minor 
allele counts) from two-bit integers to double-precision floats.  What 
does read.plink() create in memory?


There is another package I've been meaning to look at that is supposed to 
help with the memory management problem for large genotype files:


http://cran.r-project.org/web/packages/ff/

I haven't used it yet, but I am hopeful.  Maybe David Duffy or someone 
else here will know more about it.


If you have a lot of data, also consider chopping the data into pieces 
before loading it into R.  That's what we do.  With a 100 core system, I 
break the data into 100 files (I use the GNU/Linux "split" command and a 
few other tricks) and have all 100 cores run at once to analyze the data.


When I work with genotype data as allele counts using Octave, I store the 
data, both in files and in memory, as unsigned 8-bit integers, using 3 as 
the missing value.  That's still inefficient compared to the PLINK system, 
but it is way better than using doubles.


Best,
Mike

--
Michael B. Miller, Ph.D.
Minnesota Center for Twin and Family Research
Department of Psychology
University of Minnesota

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] several messages


On Mon, 20 Jun 2011, Jim Silverton wrote:

I a using plink on a large SNP dataset with a .map and .ped file. I want 
to get some sort of file say a list of all the SNPs that plink is saying 
that I have. ANyideas on how to do this?



All the SNPs you have are listed in the .map file.  An easy way to put the 
data in to R, if there isn't too much, is to do this:


plink --file whatever --out whatever --recodeA

That will make a file called whatever.raw, single space delimited, 
consisting of minor allele counts (0, 1, 2, NA) that you can bring into R 
like this:


data <- read.table("whatever.raw", delim=" ", header=T)

If you have tons of data, you'll want to work with the compact binary 
format (four genotypes per byte):


plink --file whatever --out whatever --make-bed

Then see David Duffy's reply.  However, I'm not sure if R can work with 
the compact format in memory.  It might expand those genotypes (minor 
allele counts) from two-bit integers to double-precision floats.  What 
does read.plink() create in memory?


There is another package I've been meaning to look at that is supposed to 
help with the memory management problem for large genotype files:


http://cran.r-project.org/web/packages/ff/

I haven't used it yet, but I am hopeful.  Maybe David Duffy or someone 
else here will know more about it.


If you have a lot of data, also consider chopping the data into pieces 
before loading it into R.  That's what we do.  With a 100 core system, I 
break the data into 100 files (I use the GNU/Linux "split" command and a 
few other tricks) and have all 100 cores run at once to analyze the data.


When I work with genotype data as allele counts using Octave, I store the 
data, both in files and in memory, as unsigned 8-bit integers, using 3 as 
the missing value.  That's still inefficient compared to the PLINK system, 
but it is way better than using doubles.


Best,
Mike

--
Michael B. Miller, Ph.D.
Minnesota Center for Twin and Family Research
Department of Psychology
University of Minnesota

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Time-series analysis with treatment effects - statistical approach

2011-06-22 Thread Mike Marchywka



> Subject: [R] Time-series analysis with treatment effects - statistical
> approach
> 
> Hello all R listers,
> I'm struggling to select an appropriate statistical method for my data set.
> I have collected soil moisture measurements every hour for 2 years. There
> are 75 sensors taking these automated measurements, spread evenly across 4
> treatments and a control. I'm not interested in being able to predict soil
> future soil moisture trends, but rather in knowing whether the treatment
> affected soil moisture response overall.  In particular, it would be
> interesting to inspect treatment related response within defined periods.
> For example, a visual inspection of my data suggests that soil moisture is
> equivalent across treatments during wet winter months, but during the dry
> summer months, a treatment effect appears. 
> 
> Any help on this topic would be very appreciated. I've looked far and wide
> through academic literature for similar experimental designs, but have not
> had any success as yet.


Can you post your data? This makes it more interesting for potential responders
and makes it more likely you get a relevant answer or observation. 

I guess the stats people would just suggest something like anova and
you could just compare moisture means among the treatments and controls
but presumably things are not stationary and it may be easy to get confusing
results. 

http://en.wikipedia.org/wiki/Analysis_of_variance



Usually there is some analysis plan made up a priori I would imagine or at least
there is prior literature in your field. I guess citeseer or google 
scholar/books may
be useful if you have some key words or author names. The data you have
sounds to an engineer just like any other discrete time signal and maybe digital
signal processing literature would be of interest. Presumably you have some 
models to test, and more details depend on the specifics of your competing
hypotheses. Kalman filtering maybe? 



> 
> Cheers,
> Justin
> 
> Dr. Justin Morgenroth
> New Zealand School of Forestry
> Christchurch, New Zealand
> 
> 
> 
> 
> --
> View this message in context: 
> http://r.789695.n4.nabble.com/Time-series-analysis-with-treatment-effects-statistical-approach-tp3615856p3615856.html
> Sent from the R help mailing list archive at Nabble.com.
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] different results from nls in 2.10.1 and 2.11.1

2011-06-20 Thread Mike Marchywka


> Date: Mon, 20 Jun 2011 11:25:54 +0200
> From: lig...@statistik.tu-dortmund.de
> To: p.e.b...@dunelm.org.uk
> CC: r-help@r-project.org; pb...@astro.uni-bonn.de
> Subject: Re: [R] different results from nls in 2.10.1 and 2.11.1
> 
> Since one is a 32-bit and the other one a 64-bit, and therefore the 
> compiler is also different, you can get different numerical results 
> easily, even with identical versions of R (and most of us do not have 
> outdated R installations around).
> 
> I just tried your example on 3 different systems with R-2.13.0 and all 
> told me "singular convergence"...
> 
> Uwe Ligges
> 
> 
> 
> 
> 
> On 18.06.2011 15:44, Philip Bett wrote:
> > Hi,
> > I've noticed I get different results fitting a function to some data on
> > my laptop to when I do it on my computer at work.

I guess the best all around approach is to dump the results from your
FisherAvgPdf function and get some idea what trajectory it takes
in the different cases. This is presumably not just an issue with R
and could have something to tell you about your data vs the model.
Even if it converged without incident, you'd probably want to 
look at more details.  You apparently are trying to fit a histogram to
a pdf and it is not too hard to just plot fit over histo and
spot check parameter space. 
 
Consider for example, 
 
plot(h$mids,h$density,type="b")
points(h$mids,FisherAvgdPdf(acos(h$mids),3.527e-8,3.198),pch=20)
points(h$mids,FisherAvgdPdf(acos(h$mids),3.527e-8,3.298),pch=13)
points(h$mids,FisherAvgdPdf(acos(h$mids),4.527e-8,3.198),pch=14)

and the find various measures for how good/bad these are etc. 
 
e<-h$density-FisherAvgdPdf(acos(h$mids),3.527e-8,3.198)
sum(e*e)
e<-h$density-FisherAvgdPdf(acos(h$mids),3.527e-8,3.298)
sum(e*e)

the error surface as function of parameters seems smooth etc,
 
foo <- function (x,y) { e<-h$density-FisherAvgdPdf(acos(h$mids),x,y) ; sum(e*e) 
}

x=(2+(1:100)*.02)/1e8;
y=3+((1:100)/100);


df<-data.frame(x=0,y=0,z=0);
for ( i in x ) { for ( j in y ) { gg<-data.frame(i,j,foo(i,j));str(gg); 
colnames(gg)<-colnames(df); df<-rbind(df,gg); }}
sel=which(df$x!=0)
scatterplot3d(df$x[sel],df$y[sel],df$z[sel])

 
 
 
> >
> > Here's a code snippet of what I do:
> >
> > ##--
> > require(circular) ## for Bessel function I.0
> >
> > ## Data:
> > dd <- c(0.9975948929787, 0.9093316197395, 0.7838819026947,
> > 0.9096108675003, 0.8901804089546, 0.2995955049992, 0.9461286067963,
> > 0.8248071670532, 0.2442084848881, 0.2836948633194, 0.7353935241699,
> > 0.5812761187553, 0.8705610632896, 0.8744471669197, 0.7490273118019,
> > 0.9947383403778, 0.9154829382896, 0.8659985661507, 0.6448246836662,
> > 0.8588128685951, 0.7347437739372, -0.1645197421312, 0.970999121666,
> > 0.8038327097893, 0.9558997154236, 0.6846113204956, 0.6286814808846,
> > 0.9201356172562, 0.9422197341919, 0.3470877110958, 0.4154576957226,
> > 0.0721184238791, 0.14151956141, -0.6142936348915, -0.4688512086868,
> > 0.6805665493011, 0.3594025671482, 0.8991097211838, 0.7656877636909,
> > 0.9282909035683, 0.9454715847969, 0.9766132831573, 0.4316343963146,
> > 0.62679708004, 0.2093886137009, 0.3937581181526, 0.4254160523415,
> > 0.8684504628181, 0.3844584524632, 0.9578431844711, 0.956972181797,
> > 0.4456568360329, 0.9793710708618, 0.5825698971748, 0.929228246212,
> > 0.9211971759796, 0.9407976865768, 0.821156680584, 0.2048042863607,
> > 0.6473184227943, 0.9456319212914, 0.7021154165268, 0.9761978387833,
> > 0.1485801786184, 0.2195029109716, 0.5378785729408, 0.8304615020752,
> > 0.8596342802048, 0.950027525425, 0.9102076888084, 0.5108731985092,
> > 0.7200184464455, 0.3571084141731, 0.9765330553055, -0.143017962575,
> > 0.8576183915138, 0.1283493340015, -0.3226098418236, 0.7031792402267,
> > 0.8708582520485, 0.56754809618, 0.060470353812, 0.8015220761299,
> > 0.7363410592079, 0.671902179718, 0.8082517385483, 0.9468197822571,
> > 0.9729647636414, 0.7919752597809, 0.9539568424225, 0.4840737581253,
> > 0.850653231144, 0.5909016132355, 0.8414449691772, 0.9699150323868)
> >
> > xlims <- c(-1,1)
> > bw <- 0.05
> > b <- seq(xlims[1],xlims[2],by=bw) ; nb <- length(b)
> > h <- hist( dd, breaks=b, plot=FALSE)
> >
> > FisherAvgdPdf <- function(theta,theta0,kappa){
> > A <- kappa/(2*sinh(kappa))
> > A * I.0( kappa*sin(theta)*sin(theta0) ) * exp(
> > kappa*cos(theta)*cos(theta0) )
> > }
> >
> >
> > nls(dens ~ FisherAvgdPdf(theta,theta0,kappa),
> > data = data.frame( theta=acos(h$mids), dens=h$density ),
> > start=c( theta0=0.5, kappa=4.0 ),
> > algorithm="port", lower=c(0,0), upper=c(acos(xlims[1]),500),
> > control=list(warnOnly=TRUE) )
> >
> > ##--
> >
> > On one machine, nls converges, and on the other it doesn't. Any ideas
> > why, and which is "right"? I can't see anything in R News that could be
> > relevant.
> >
> >
> > The different R versions (and computers) are:
> > > R.version
> > _
> > platform i686-pc-linux-gnu
> > arch

Re: [R] Multivariate HPD credible volume -- is it computable in R?

2011-06-19 Thread Mike Marchywka







> Date: Sun, 19 Jun 2011 19:06:20 +1200
> From: agw1...@gmail.com
> To: r-help@r-project.org
> Subject: [R] Multivariate HPD credible volume -- is it computable in R?
> 
> Hi all,
> 
> I'm new to the list and am hoping to get some advice. I have a set of
> multivariate data and would like to find the densest part of the data cloud
> containing 95% of the data, like a 95% HPD credible volume. Is there any R
> code available to compute that?

It looks like the LaplacesDemon pkg was just updated FWIW,

http://www.google.com/search?hl=en&q=hpd+credible+volume+cran

If youjust  want to find the density of your data in some n-dim space, that 
sounds like
multi dim binning or histogram would work (you can check google but IIRC there 
is
no general n-dim binning pacakge.  I also mentioned a version based
on a density field being associated with each point which could be summed over
all points to get density at arbirtary point but you need to implrement that in 
some
intelligent way for it to be fast ). I guess you could bin using aggregate see 
if this
example would work,

df<-data.frame(z=runif(100), x<-runif(100)*10,y<-runif(100)*5,c<-rep(1,100))
d<-aggregate(c~floor(x)+floor(y),df,FUN="sum")
d
which (d$c==max(d$c))

now at this point you can imagine you may be able to find a surface that 
encloses 
a certain amount of your data



> 
> Thank you very much! Your help and patience are much appreciated.
> 
> G.S.
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Server question

2011-06-18 Thread Mike Marchywka







 

> From: oliver.jo...@digred.com
> To: r-help@r-project.org
> Date: Fri, 17 Jun 2011 15:35:38 +0100
> CC: rebeccahard...@deltaeconomics.com
> Subject: [R] Server question
> 
> Hi
> 
> 
> 
> A client of mine has asked me to investigate the installation of R-software.
> 
> 
> 
> Could anyone tell me whether the software works only on a client machine or
> whether it sits on a server with clients attaching to it?

 
Did anyone answer this yet? See Rserve and Rapache on google. Both could
be useful to you or get you started.
 
 
 
> 
> 
> 
> Not immediately clear from the docs.
> 
> 
> 
> Best
> 
> 
> 
> Oliver
> 
> --
> 
> Oliver Jones
> 
> T: 01845 595911
> M: 07977 122089
> 
> DIG*RED Web Production |  www.digred.com
> 2 Vyner's Yard
> Rainton
> North Yorkshire
> 
> YO7 3PH
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> [[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] About 'hazard ratio', urgent~~~

2011-06-14 Thread Mike Marchywka












> Date: Tue, 14 Jun 2011 09:11:25 -0700
> From: dr.jz...@gmail.com
> To: r-help@r-project.org
> Subject: Re: [R] About 'hazard ratio', urgent~~~
>
> Thanks a lot for the great help~


well, if you are referring to my post, as I indicated I made a lot of blunders 
to get that out the
door and posted the first thing that made sense. Note also that by
making the one hazard rate "1" it wasn't clear that the column
I cited was divided by "1" etc. Caveat emptor. I usually
just answer questions like this when I "meant to look at that
anyway" and I have a chance of putting forth an answer which
I and others can verify. In any case, if you are using
some new code it always helps to run known test cases through
it, even if it is just new to you LOL.





>
> --
> View this message in context: 
> http://r.789695.n4.nabble.com/About-hazard-ratio-urgent-tp3595527p3597025.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>   
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Need script to create new waypoint

2011-06-14 Thread Mike Marchywka


 
 
> Now I need to change this data set into one with waypoints at regular 
> intervals: for example 2 
 
I guess your question is about algorithm ideas for making up data due to
unspecified percrived need. Anything you can specify completely should be easy
to code in R, maybe a bash script, or c++ or java, not sure the Excel was 
intended
for arbitrary code. Generally the choice of how you manufacture data 
would be dictated by the percieved need. If you can elaborate on the "need"
maybe someone can help you pick an approach to making things up.
 
There is a sig list for geo problems that may know your requirement better,
 
https://stat.ethz.ch/mailman/listinfo/r-sig-geo
 
and of course if your need is to replicate prior work in the field, you
can contact authors directly for questions about what they did.
 
If your question is about how R can handle time series with pre-packaged
things, there is quite a bit notably "zoo" and "ts" which
I think should come up with "?zoo" and "?ts"
 
http://www.google.com/search?hl=en&safe=off&q=R+ts+regular+time+intervals
 
also try "interpolation" search. 
 
> library(zoo)
> ?zoo
> ?ts
>
 
 


> From: anti...@hotmail.com
> To: r-help@r-project.org
> Date: Tue, 14 Jun 2011 11:55:57 +
> Subject: [R] Need script to create new waypoint
>
>
> Dear help-list members,
>
> I am a student at Durham University (UK) conducting a PhD on spatial 
> representation in baboons. Currently, I'm analysing the effect of sampling 
> interval on home range calculations.
>
> I have followed the baboons for 234 days in the field, each day is 
> represented by about 1000 waypoints (x,y coordinates) recorded at irregular 
> time intervals. Consecutive waypoints in this raw data set have an average 
> distance interval of 5 meters an average time interval of 23 seconds (but 
> when baboons were stationary the time interval could be much larger - e.g. 
> waypoint 7 below).This raw data set needs to become a data set with waypoints 
> at regular intervals and thus, 'new' waypoints have to be 'created'. 
> Eventually, I want to use seven different time intervals: 2, 5, 10, 15, 30, 
> 45 and 60 minute intervals. I have tried in Excel, but I am not managing it. 
> I have some experience with R, and although I can 'read' quite complicated 
> scripts, I am unable to write them, so I would very much appreciate any help 
> anybody would be willing to offer me.
>
>
>
> My current data set has 9 columns (in csv / excel file):
> x coordinate, y coordinate, year, month, day, record, time interval (duration 
> between this waypoint and the previous) (hh:dd:ss), summed time intervals, 
> distance interval (m)
>
>
> EXAMPLE (24th of april 2007)
>
> (wp1) x1, y1, 2007, 7, 24, 1, 00:00:00, 00:00:00, 0
> (wp2) x2, y2, 2007, 7, 24, 2, 00:00:23, 00:00:23, 2
> (wp3) x3, y3, 2007, 7, 24, 3, 00:00:50, 00:00:73, 3
> (wp4) x4, y4, 2007, 7, 24, 4, 00:01:20, 00:02:33, 5
> (wp5) x5, y5, 2007, 7, 24, 5, 00:00:03, 00:02:36, 1
> (wp6) x6, y6, 2007, 7, 24, 6, 00:00:12, 00:02:48, 2
> (wp7) x7, y7, 2007, 7, 24, 7, 00:05:45, 00:08:33, 2
>
> Now I need to change this data set into one with waypoints at regular 
> intervals: for example 2 minutes = 120 seconds >> 2 minutes after the first 
> waypoint (x1,y1) the baboons would be somewhere between WP3 and WP4 (at WP3 
> sum duration is 73 seconds and after WP4 sum duration is 153 seconds), and so 
> this is where I would like a new waypoint created. Note that there are time 
> intervals which will be so large that multiple 'new' waypoints have to be 
> made / copied (e.g. WP 7 for a 2 minute interval).
>
> Three ways of calculating the new coordinates for this new waypoint from very 
> precise to not so precise (in order of preference) are:
>
>
>
> 1) Basing the new waypoint coordinates on the relative 'time distance' to 
> each waypoint of the two surrounding waypoints (therewith assuming a constant 
> movement during the time interval). The 'time distance' bewteen WP3 and WPnew 
> is (120-73) 47 seconds and the 'time distance' between WPnew and WP4 is 
> (153-120) 33 seconds (whereas total time interval between WP3 and WP4 is 80 
> seconds). WPnew (with coordinates Xnew,Ynew) should then be located at 
> 80/33*100=41.25% from WP3: Xnew = X3 + (X4-X3)*41.25% and Ynew= Y3 + 
> (Y4-Y3)*41.25%
>
>
> 2) Calculate the average location (average of x3,y3 and x4,y4), at which to 
> create a new waypoint at 2 minutes.
>
>
> 3) A simpler alternative is that the location of the 'closer waypoint in 
> time', in this example WP 4, (WP4: 153-120=33 versus WP3: 120-73=47) could be 
> copied as being the new location.
>
>
>
> I hope I explained my query so that it makes sense to you. I realize I am 
> asking a 'big' question and apologize for this - the software programs I have 
> thorough knowledge of (ArcGIS, excel, Mapsource), are all unable to solve 
> this problem. I would be very grateful for any advice or suggestions.
>
> Best wishes,
> Louise
>
>

Re: [R] About 'hazard ratio', urgent~~~

2011-06-14 Thread Mike Marchywka














> Date: Mon, 13 Jun 2011 19:44:15 -0700
> From: dr.jz...@gmail.com
> To: r-help@r-project.org
> Subject: [R] About 'hazard ratio', urgent~~~
>
> Hi,
>
> I am new to R.
>
> My question is: how to get the 'hazard ratio' using the 'coxph' function in
> 'survival' package?

You can probably search the docs for hazard terms, for example, 

http://cran.r-project.org/doc/contrib/Fox-Companion/appendix-cox-regression.pdf


and try running known test data through to verify. For example,  
it does seem that the "exp" column contains a decent estimate of hazard
ratio in simple cases( not quite right 3 sig figs but haven't given this
a lot of thought and it is still early here, hoping from input from someone
who can explain better) ,

> ?rexp

> ns=10
> df<-data.frame(gr<-c(rep(0,ns),rep(1,ns)),t=c(rexp(ns,1),rexp(ns,3)))
> coxph(Surv(t)~gr,df)
Call:
coxph(formula = Surv(t) ~ gr, data = df)

   coef exp(coef) se(coef)   z p
gr 1.09  2.98  0.00503 217 0
Likelihood ratio test=47382  on 1 df, p=0  n= 20, number of events= 2e+05


> df<-data.frame(gr<-c(rep(0,100),rep(1,100)),t=c(rexp(100,1),rexp(100,2)))
> coxph(Surv(t)~gr,df)
Call:
coxph(formula = Surv(t) ~ gr, data = df)

coef exp(coef) se(coef)z   p
gr 0.658  1.930.148 4.44 8.8e-06
Likelihood ratio test=19.6  on 1 df, p=9.5e-06  n= 200, number of events= 200
> df<-data.frame(gr<-c(rep(0,100),rep(1,100)),t=c(rexp(100,1),rexp(100,1)))
> coxph(Surv(t)~gr,df)
Call:
coxph(formula = Surv(t) ~ gr, data = df)

  coef exp(coef) se(coef)  zp
gr -0.0266 0.9740.142 -0.187 0.85
Likelihood ratio test=0.03  on 1 df, p=0.852  n= 200, number of events= 200
>

>
> thanks,
>
> karena
>
> --
> View this message in context: 
> http://r.789695.n4.nabble.com/About-hazard-ratio-urgent-tp3595527p3595527.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>   
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] smoothScatter function (color density question) and adding a legend

2011-06-13 Thread Mike Marchywka











> Date: Sat, 11 Jun 2011 21:52:21 -0400
> From: ccoll...@purdue.edu
> To: r-help@r-project.org
> Subject: [R] smoothScatter function (color density question) and adding a 
> legend
>
> Dear R experts,
>
> I am resending my questions below one more time
> just in case someone out there could help
> but missed my email.

Thanks, I was curious about this and so I installed the geneplot package
and that went fine but I had another issue and need to take care of that.
First, see if "?legend" helps at all ( no idea, I have not used this stuff 
much).
Also source code should be available for the pacakge you care about.


On quick read, it sounds like this uses a binning system for colors. If you
want something slightly different and much slower, I have some examples
( IIRC) that calculate densities in R using something similar to electric
field calculation around points, ( again I'm in a hurry and pulling this
from archive so caveat emptor this plot may not correspond exactly to
example script below etc), 


http://98.129.232.232/coloumb1.pdf


mydensity<-function(x1,x2)
{

len=length(x1);
z<-1:len;
for ( y in 1:len )
{

nx=c(1:(y-1),(y+1):len); 
if ( y==1) {nx=2:len;} 
if (y==len) {nx=1:(len-1); } 
# coloumb is a bit much with overlapping data points, so limite it a bit
z[y]=sum(1.0/(.001*1+(x1[nx]-x1[y])*(x1[nx]-x1[y])+(x2[nx]-x2[y])*(x2[nx]-x2[y])));
 
#for
}
print(max(z));
print(min(z));
z=z-min(z);
z=100*z/max(z);
hist(z);
color=rainbow(100);
#color=heat.colors(10);
tmap=color[floor(z)+1];
scatterplot3d(x1,x2,z,color=tmap);
plot(x2,x1,col=tmap,cex=.5)
#library("VecStatGraphs2D")
#DrawDensityMap(x1,x2,PaintPoint=TRUE)
#color=terrain.colors(26)
#color=heat.colors(26)

}





>
> I don't think my questions are too hard.
>
> I am most concerned about the transformation function.
> See below.
>
> Thanks,
> Clayton
>
> Hello,
>
> I have a few questions, regarding the smoothScatter function.
>
>
> I have a scatter plot with more than 500,000 data points for
> two samples. So, I am wanting to display the density in colors
> to convince people that my good correlation coefficient is not
> due to an "influential point effect" and plus, I also want to make
> my scatter plot look pretty. Anyway ...
>
> I have been able to make the plot, but I have a couple of questions about it.
> I also want to make a legend for it, which I don't know how to do.
> I have only been playing around with R for a few weeks now off and on.
> So, I am definitely a beginner.
>
>
> 1.
> I have 10 colors for my plot: white representing zero density and dark red 
> representing the maximum density (I presume).
> According to the R documentation, the transformation argument represents the 
> "function mapping the density scale to the color scale."
> Note that in my R code below, transformation = function(x) x^0.14.
> I was wondering how exactly or mathematically that this function relates the 
> density to color.
> I believe the colorRampPalette ramps colors between 0 and 1. I am not sure if 
> x represents the color or the
> density. Since I have 10 colors, I suppose the colorRampPalette would assign 
> values of
> 0, 0.11, 0.22, 0.33, 0.44, 0.55, 0.66, 0.77, 0.88, and 1 for white to dark 
> red. I am not sure though.
> Does anyone know how this works? I am sure its not too too complicated.
>
>
> 2.
> In a related issue, I also would like to make a legend for this plot. Then, I 
> would be able to see the transformation
> function's effects on the color-density relationship. Could someone help me 
> in making a legend for my smoothScatter plot? I would like to place it 
> immediately to the right of my plot as a vertical bar, matching the vertical 
> length of the plot as is often convention.
>
> I really like the smoothScatter function. It is easy to use, and I believe 
> it's relatively new.
>
> Thanks in advance.
>
> -Clayton
>
> > clayton <- c("white", "lightblue", "blue", "darkgreen", "green", "yellow", 
> > "orange", "darkorange", "red", "darkred")
> > x <- read.table("c:/users/ccolling/log_u1m1.txt", sep ="\t")
> > smoothScatter(x$V8~x$V7,
> nbin=1000,
> colramp = colorRampPalette(c(clayton)),
> nrpoints=Inf,
> pch="",
> cex=.7,
> transformation = function(x) x^.14,
> col="black",
> main="M1 vs. M3 (r = 0.92)",
> xlab="Normalized M1",
> ylab="Normalized M2",
> cex.axis = 1.75,
> cex.lab = 1.25
> cex.main = 2)
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>   
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Amazon AWS, RGenoud, Parallel Computing

2011-06-11 Thread Mike Marchywka













> Date: Sat, 11 Jun 2011 19:57:47 +0200
> Subject: Re: [R] Amazon AWS, RGenoud, Parallel Computing
> From: lui.r.proj...@googlemail.com
> To: marchy...@hotmail.com
> CC: r-help@r-project.org
>
> Hello Mike,
>
[[elided Hotmail spam]]
> Best to my knowledge the sort algorithm implemented in R is already
> "backed by C++" code and not natively written in R. Writing the code
> in C++ is not really an option either (i think rGenoud is also written
> in C++). I am not sure whether there really is a "bottleneck" with
> respect to the computer - I/O is pretty low, plenty of RAM left etc.
> It really seems to me as if parallelizing is not easily possible or
> only at high costs so that the benefits diminish through all the
> coordination and handling needed...
> Did anybody use rGenoud in "cluster mode" an experience sth similar?
> Are quicksort packages available using multiple processors efficiently
> (I didnt find any... :-( ).

I'm no expert but these don't seem to be terribly subtle problems
in most cases. Sure, if the task is not suited to parallelism and
you force it to be parallel and it spends all its time syncing
up, that can be a problem. Just making more tasks to fight over
the bottle neck- memory, CPU, locks- can easily make things worse.
I think I posted my link earlier on IEEE blurb showing 
how easy it is for many cores to make things worse on non-contrived
benchmarks.




>
> I am by no means an expert on parallel processing, but is it possible,
> that benefits from parallelizing a process greatly diminish if a large
> set of variables/functions need to be made available and the actual
> function (in this case sorting a few hundred entries) is quite short
> whereas the number of times the function is called is very high!? It
> was quite striking that the "first run" usually took several hours
> (instead of half an hour) and the subsequent runs were much much
> faster..
>
> There is so much happening "behind the scenes" that it is a little
> hard for me to tell what might help - and what will not...
>
> Help appreciated :-)
> Thank you
>
> Lui
>
> On Sat, Jun 11, 2011 at 4:42 PM, Mike Marchywka  wrote:
> >
> >
> >
> >
> > 
> >> Date: Sat, 11 Jun 2011 13:03:10 +0200
> >> From: lui.r.proj...@googlemail.com
> >> To: r-help@r-project.org
> >> Subject: [R] Amazon AWS, RGenoud, Parallel Computing
> >>
> >> Dear R group,
> >>
> >>
> > [...]
> >
> >> I am a little bit puzzled now about what I could do... It seems like
> >> there are only very limited options for me to increase the
> >> performance. Does anybody have experience with parallel computations
> >> with rGenoud or parallelized sorting algorithms? I think one major
> >> problem is that the sorting happens rather quick (only a few hundred
> >> entries to sort), but needs to be done very frequently (population
> >> size >2000, iterations >500), so I guess the problem with the
> >> "housekeeping" of the parallel computation deminishes all benefits.
> >>
> > Your sort is part of algorithm or you have to sort results after
> > getting then back out of order from async processes? One of
> > my favorite anecdotes is how I used a bash sort on huge data
> > file to make program run faster ( from impractical zero percent CPU
> > to very fast with full CPU usage and you complain about exactly
> > a lack of CPU saturation). I guess a couple of comments. First,
> > if you have specialized apps you need optimized, you may want
> > to write dedicated c++ code. However, this won't help if
> > you don't find the bottleneck. Lack of CPU saturation could
> > easily be due to "waiting for stuff" like disk IO or VM
> > swap. You really ought to find the bottle neck first, it
> > could be anything ( except the CPU maybe LOL). The sort
> > that I used prevented VM thrashing with no change to the app
> > code- the app got sorted data and so VM paging became infrequent.
> > If you can specify the problem precisely you may be able to find
> > a simple solution.
> >
> >
  
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Amazon AWS, RGenoud, Parallel Computing

2011-06-11 Thread Mike Marchywka






> Date: Sat, 11 Jun 2011 13:03:10 +0200
> From: lui.r.proj...@googlemail.com
> To: r-help@r-project.org
> Subject: [R] Amazon AWS, RGenoud, Parallel Computing
>
> Dear R group,
>
>
[...]

> I am a little bit puzzled now about what I could do... It seems like
> there are only very limited options for me to increase the
> performance. Does anybody have experience with parallel computations
> with rGenoud or parallelized sorting algorithms? I think one major
> problem is that the sorting happens rather quick (only a few hundred
> entries to sort), but needs to be done very frequently (population
> size >2000, iterations >500), so I guess the problem with the
> "housekeeping" of the parallel computation deminishes all benefits.
>
Your sort is part of algorithm or you have to sort results after 
getting then back out of order from async processes? One of 
my favorite anecdotes is how I used a bash sort on huge data
file to make program run faster ( from impractical zero percent CPU
to very fast with full CPU usage and you complain about exactly
a lack of CPU saturation). I guess a couple of comments. First, 
if you have specialized apps you need optimized, you may want
to write dedicated c++ code. However, this won't help if
you don't find the bottleneck. Lack of CPU saturation could
easily be due to "waiting for stuff" like disk IO or VM
swap. You really ought to find the bottle neck first, it
could be anything ( except the CPU maybe LOL). The sort
that I used prevented VM thrashing with no change to the app
code- the app got sorted data and so VM paging became infrequent.
If you can specify the problem precisely you may be able to find
a simple solution. 

  
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] RES: Linear multivariate regression with Robust error

2011-06-11 Thread Mike Marchywka

> Date: Fri, 10 Jun 2011 16:50:24 -0300
> From: filipe.bote...@vpar.com.br
> To: frien...@yorku.ca; bkkoc...@gmail.com
> CC: r-help@r-project.org
> Subject: [R] RES: Linear multivariate regression with Robust error
>
>
>
> --Forwarded Message Attachment--
> Subject: RES: [R] Linear multivariate regression with Robust error
> Date: Fri, 10 Jun 2011 16:50:24 -0300
> From: filipe.bote...@vpar.com.br
> To: frien...@yorku.ca; bkkoc...@gmail.com
> CC: r-help@r-project.org
>
>
> Hi Barjesh,
>
> I am not sure which data you analyze, but once I had a similar situation
> and it was a multicolinearity issue. I realized this after finding a
> huge correlation between the independent variables, then I dropped one
> of them and signs of slopes made sense.
>
> Beforehand, have a glance at your correlation matrix of independent
> variables to see the relationship between them.

I guess "look at the data" ( LATFD ) seems to be the recurring solution.
Certainly with polynomial regression, there's a tendency to think that
the fit parameter is a pure property of the system being examined. With
something like a Taylor series, and you have a slope at a specific point,
maybe you could think about it that way but here the coefficient
is just "whatever optimizes your (arbitrary ) error function for the data
you have." A linear approaximation to a non-line could be made at
one point and maybe that property should remain constant but 
you have people claiming " past authors samples some curve near x=a
and got a different slope than my work largely smapling f(x) around
x=b what is wrong with my result?" It may be interesting to try to
write a taylor series around some point and see how those coefficients
vary with data sets for example( you still need arbitrary way to
estimate slopes and simply differencing two points may be a bit noisy LOL 
but you could play with some wavelet families maybe ). 

If you try to describe a fruit as a linear combination of vegetables,
you may get confusing but possibly useful results even if they don't correspond 
to
properties of the fruits so much as a specific need you have. For example, if 
you
are compressing images of fruits and your decoder already has a dictionary
of vegetables, it may make sense to do this.  This is not much different from 
trying to compress non-vocal music with an ACELP codec that attempts to fit the 
sounds
to models of human vocal tract. Sometimes this may be even be informative
about how a given sound was produced even if it sounds silly. 

>
> Not sure how helpful my advice will be, but it did the trick for me back
> then ;)
>
> Cheers,
> Filipe Botelho
>
> -Mensagem original-
> De: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
> Em nome de Michael Friendly
> Enviada em: sexta-feira, 10 de junho de 2011 12:51
> Para: Barjesh Kochar
> Cc: r-help@r-project.org
> Assunto: Re: [R] Linear multivariate regression with Robust error
>
> On 6/10/2011 12:23 AM, Barjesh Kochar wrote:
> > Dear all,
> >
> > i am doing linear regression with robust error to know the effect of
> > a (x) variable on (y)other if i execute the command i found positive
> > trend.
> > But if i check the effect of number of (x.x1,x2,x3)variables
> > on same (y)variable then the positive effect shwon by x variable turns
> > to negative. so plz help me in this situation.
> >
> > Barjesh Kochar
> > Research scholar
> >
> You don't give any data or provide any code (as the posting guide
> requests) , so I have to guess that you
> have just rediscovered Simpson's paradox -- that the coefficient of a
> variable in a marginal regression can have an opposite sign to that in
> a joint model with other predictors. I have no idea what you mean
> by 'robust error'.
>
> One remedy is an added-variable plot which will show you the partial
> contributions of each predictor in the joint model, as well as whether
> there are any influential observations that are driving the estimated
> coefficients.
>
>
> --
> Michael Friendly Email: friendly AT yorku DOT ca
> Professor, Psychology Dept.
> York University Voice: 416 736-5115 x66249 Fax: 416 736-5814
> 4700 Keele Street Web: http://www.datavis.ca
> Toronto, ONT M3J 1P3 CANADA
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
> "This message and its attachments may contain confidential and/or
> privileged information. If you are not the addressee, please, advise
> the sender immediately by replying to the e-mail and delete this
> message." "Este mensaje y sus anexos pueden contener información
> confidencial o privilegiada. Si ha recibido este e-mail por error por
> favor bórrelo y envíe un mensaje al remitente." "Esta mensagem e seus
> anexos podem conter inf

Re: [R] Linear multivariate regression with Robust error

2011-06-10 Thread Mike Marchywka














> Date: Fri, 10 Jun 2011 09:53:20 +0530
> From: bkkoc...@gmail.com
> To: r-help@r-project.org
> Subject: [R] Linear multivariate regression with Robust error
>
> Dear all,
>
> i am doing linear regression with robust error to know the effect of
> a (x) variable on (y)other if i execute the command i found positive
> trend.
> But if i check the effect of number of (x.x1,x2,x3)variables
> on same (y)variable then the positive effect shwon by x variable turns
> to negative. so plz help me in this situation.

take y as goodness and x and x1 have something to do with
a product. The first analysis is from company A, second is from company
B and the underlying relationship is given with some noise LOL, 
( I'm still on first cup of cofee, this was fist example to
come to mind as these question keep coming up here everyday )

> x=1:100
> x1=x*x
> y<-x-x1+runif(100)
> lm(y~x)

Call:
lm(formula = y ~ x)

Coefficients:
(Intercept)    x
   1718 -100

> lm(y~x+x1)

Call:
lm(formula = y ~ x + x1)

Coefficients:
(Intercept)    x   x1
 0.5253   1.0024  -1.

>











>
> Barjesh Kochar
> Research scholar
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
  
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] error with geomap in googleVis

2011-06-10 Thread Mike Marchywka





> Subject: RE: [R] error with geomap in googleVis
> Date: Fri, 10 Jun 2011 11:06:47 +0100
> From: markus.gesm...@lloyds.com
> To: marchy...@hotmail.com; r-h...@stat.math.ethz.ch
>
> Hi Mike,
>
> I believe you are trying to put two charts on the same page.
> The demo(googleVis) provides you with an example for this.
> So applying the same approach to your data sets would require:


Thanks, I thought however I tried to different examples the first
being my own which failed and  then the "Fruits" example as a second
test. I was lazy and used install package from probably wrong mirror
as in the past on 'dohs I had problems with CMD INSTALL. I reloaded
as per your suggestion and tried first example. The output did
change but it is still lacking a chart. Do I need API key or something?
Ive never used google visualization before and was just reacting to OP
as I had wanted to try it. Process is below, final result may
be here, http://98.129.232.232/xxx.html , but I will be varying as
I get time. Thanks again.

--2011-06-10 05:17:47--  http://cran.r-project.org/src/contrib/googleVis_0.2.5.t
ar.gz
Resolving cran.r-project.org... 137.208.57.37
Connecting to cran.r-project.org|137.208.57.37|:80... connected.
HTTP request sent, awaiting response...
  HTTP/1.1 200 OK
  Date: Fri, 10 Jun 2011 10:17:47 GMT
  Server: Apache/2.2.9 (Debian)
  Last-Modified: Tue, 07 Jun 2011 06:35:56 GMT
  ETag: "a83451-b36e9-4a5196ea64b00"
  Accept-Ranges: bytes
  Content-Length: 734953
  Keep-Alive: timeout=15, max=100
  Connection: Keep-Alive
  Content-Type: application/x-gzip
Length: 734953 (718K) [application/x-gzip]
Saving to: `googleVis_0.2.5.tar.gz'

100%[==>] 734,953  527K/s   in 1.4s

2011-06-10 05:17:49 (527 KB/s) - `googleVis_0.2.5.tar.gz' saved [734953/734953]

[marchywka@351915-www1 downloads]$ R CMD INSTALL googleVis_0.2.5.tar.gz
* installing to library `/usr/local/lib64/R/library'
* installing *source* package `googleVis' ...
** R
** data
**  moving datasets to lazyload DB
** demo
** inst
** preparing package for lazy loading
** help
*** installing help indices
** building package indices ...
** testing if installed package can be loaded

* DONE (googleVis)
[marchywka@351915-www1 downloads]$ R

R version 2.13.0 (2011-04-13)
Copyright (C) 2011 The R Foundation for Statistical Computing
ISBN 3-900051-07-0
Platform: x86_64-unknown-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

  Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(googleVis)
df<-data.frLoading required package: RJSONIO
ame(foo=c("BrazilTo suppress the following message use the statement:
suppressPackageStartupMessages(library(googleVis))

Welcome to googleVis version 0.2.5

Type ?googleVis to access the overall documentation and
vignette('googleVis') for the package vignette.
You can execute the demo of the package via: demo(googleVis)

More information is available on the googleVis project web-site:
http://code.google.com/p/google-motion-charts-with-r/

Please read also the Google Visualisation API Terms of Use:
http://code.google.com/apis/visualization/terms.html

Feel free to send us an email 
if you would like to be keept informed of new versions,
or if you have any feedback, ideas, suggestions or would
like to collaborate.

","C> df<-data.frame(foo=c("Brazil","Canada"), bar=c(123,456))
>
> map1<-gvisGeoMap(df,locationvar='foo',numvar='bar')
> cat(map1$html)
Error in cat(list(...), file, sep, fill, labels, append) :
  argument 1 (type 'list') cannot be handled by 'cat'
> cat(map1$html$header)
http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd";>
http://www.w3.org/1999/xhtml";>

  GeoMapID23b504e5
  
  
    body {
  color: #44;
  font-family: Arial,Helvetica,sans-serif;
  font-size: 75%;
    }
    a {
  color: #4D87C7;
  text-decoration: none;
    }
  


> cat(map1$html$header,file="xxx.html",appeand=F)
> cat(map1$html$chart,file="xxx.html",appeand=t)
Error in cat(list(...), file, sep, fill, labels, append) :
  argument 2 (type 'closure') cannot be handled by 'cat'
> str(map1$html)
List of 4
 $ header : chr "http://www.w3.org/TR/xhtml1/DT

Re: [R] error with geomap in googleVis



I still got blanks with Firefox with the two examples below, I put html
up here if you want to look at it,

http://98.129.232.232/xxx.html

I just downloaded googlevis from mirror 68 and it claimed it
was 0.2.5 ( I thought, but maybe I should check again). 


install.packages("googleVis",dep=T)
library(googleVis)
df<-data.frame(foo=c("Brazil","Canada"), bar=c(123,456))
map1<-gvisGeoMap(df,locationvar='foo',numvar='bar')

cat(map1$html$header,filename="xxx.html",append=F)
cat(map1$html$header,file="xxx.html",append=F)
cat(map1$html$chart,file="xxx.html",append=T)
cat(map1$html$caption,file="xxx.html",append=T)
cat(map1$html$footer,file="xxx.html",append=T)
m<-gvisMotionChart(Fruits,idvar="Fruit",timevar="Year")
str(m)
cat(m$html$header,file="xxx.html",append=F)
cat(m$html$chart,file="xxx.html",append=T)
cat(m$html$caption,file="xxx.html",append=T)
cat(m$html$footer,file="xxx.html",append=T)




> Subject: RE: [R] error with geomap in googleVis
> Date: Thu, 9 Jun 2011 14:06:22 +0100
> From: markus.gesm...@lloyds.com
> To: marchy...@hotmail.com; mjphi...@tpg.com.au; r-h...@stat.math.ethz.ch
>
> Hi all,
>
> This issue occurs with googleVis 0.2.4 and RJSONIO > 0.7.1.
> Version 0.2.5 of the googleVis package has been uploaded to CRAN two
> days ago and should have fixed this issue.
> Can you please try to update to that version, e.g. from
> http://cran.r-project.org/web/packages/googleVis/
>
> Further version 0.2.5 provides new interfaces to more interactive Google
> charts:
> - gvisLineChart
> - gvisBarChart
> - gvisColumnChart
> - gvisAreaChart
> - gvisScatterChart
> - gvisPieChart
> - gvisGauge
> - gvisOrgChart
> - gvisIntensityMap
>
> Additionally a new demo 'AnimatedGeoMap' has been added which shows how
> a Geo Map can be animated with additional JavaScript. Thanks to Manoj
> Ananthapadmanabhan and Anand Ramalingam, who provided the idea and
> initial code.
>
> For more information and examples see:
> http://code.google.com/p/google-motion-charts-with-r/
>
> I hope this helps
>
> Markus
>
> -Original Message-
> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
> On Behalf Of Mike Marchywka
> Sent: 09 June 2011 11:19
> To: mjphi...@tpg.com.au; r-h...@stat.math.ethz.ch
> Subject: Re: [R] error with geomap in googleVis
>
>
>
> 
> > To: r-h...@stat.math.ethz.ch
> > From: mjphi...@tpg.com.au
> > Date: Wed, 8 Jun 2011 10:14:01 +
> > Subject: Re: [R] error with geomap in googleVis
> >
> > SNV Krishna primps.com.sg> writes:
> >
> > >
> > > Hi All,
> > >
> > > I am unable to get the plot geomap in googleVis package. data is as
> > > follows
> > >
> > > > head(index.ret)
> > > country ytd
> > > 1 Argentina -10.18
> > > 2 Australia -3.42
> > > 3 Austria -2.70
> > > 4 Belgium 1.94
> > > 5 Brazil -7.16
> > > 6 Canada 0.56
> > >
> > > > map1 = gvisGeoMap(index.ret,locationvar = 'country', numvar =
> > > > 'ytd')
> > > > plot(map1)
> > >
> > > But it just displays a blank page, showing an error symbol at the
> > > right bottom corner. I tried demo(googleVis), it also had a similar
> > > problem. The demo showed all other plots/maps except for those
> > > geomaps. Could any one please hint me what/where could be the
> > > problem? Many thanks for the idea and support.
>
>
> I had never used this until yesterday but it seems to generate html.
> I didn't manage to get a chart to display but if you are familiar with
> this package and html perhaps you could look at map1$html and see if
> anything is obvious. One great thing about html/js is that it is human
> readable and you can integrate it well with other page material without
> much in the way of special tools.
>
>
>
>
>
>
>
>
> > >
> > > Regards,
> > >
> > > SNV Krishna
> > >
> > > [[alternative HTML version deleted]]
> > >
> > >
> >
> > Hi All,
> >
> > I have also encountered this problem. I have tested the problem in

>
> 3.0. I
> > have latest java and flash and I have tried both Firefox and IE (both
> > latest
>
> k just
> > fine.
> >
> > I too would like to know how to solve this problem.
> >
> > Kind regards,
> >
> > Michael Phipps
&

Re: [R] any documents














> Date: Thu, 9 Jun 2011 02:21:21 -0700
> From: wa7@gmail.com
> To: r-help@r-project.org
> Subject: [R] any documents
>
> Hi,
>
> I'm doing a textual analysis of several articles discussing the evolution of
> prices in order to give a forecast. if someone can give me a clear approach
> to this knowing that I work on the package tm.

LOL, are you talking about the computer generated "analysis" such as the thin
text platititudes around bandwagon stats such as " is trading xx above 30 day
moving average " etc etc. This sounds funny but is actually an interesting
test case as the hidden structured nature of the documents should be easier to 
analyse
than, say, poetry. The field is very much researchy AFAIK and you will need
to define an algorithm of do a literature search to get much in the
way of helpful response beyond "?tm"  Absent that, you are almost asking for
someone to invent an algorithm. I've refered many posters to terms like
"computational lingquistics" but people who have used these kinds
of things don't seem to post here much. If you can give us more details
or source article maybe someone can point you in useful direction. 





>
> Thank you very much
>
>
> --
> View this message in context: 
> http://r.789695.n4.nabble.com/any-documents-tp3584961p3584961.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
  
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] error with geomap in googleVis




> To: r-h...@stat.math.ethz.ch
> From: mjphi...@tpg.com.au
> Date: Wed, 8 Jun 2011 10:14:01 +
> Subject: Re: [R] error with geomap in googleVis
>
> SNV Krishna  primps.com.sg> writes:
>
> >
> > Hi All,
> >
> > I am unable to get the plot geomap in googleVis package. data is as follows
> >
> > > head(index.ret)
> > country ytd
> > 1 Argentina -10.18
> > 2 Australia -3.42
> > 3 Austria -2.70
> > 4 Belgium 1.94
> > 5 Brazil -7.16
> > 6 Canada 0.56
> >
> > > map1 = gvisGeoMap(index.ret,locationvar = 'country', numvar = 'ytd')
> > > plot(map1)
> >
> > But it just displays a blank page, showing an error symbol at the right
> > bottom corner. I tried demo(googleVis), it also had a similar problem. The
> > demo showed all other plots/maps except for those geomaps. Could any one
> > please hint me what/where could be the problem? Many thanks for the idea and
> > support.


I had never used this until yesterday but it seems to generate html.
I didn't manage to get a chart to display but if you are familiar
with this package and html perhaps you could look at map1$html
and see if anything is obvious. One great thing about html/js
is that it is human readable and you can integrate it
well with other page material without much in the way of special
tools. 








> >
> > Regards,
> >
> > SNV Krishna
> >
> > [[alternative HTML version deleted]]
> >
> >
>
> Hi All,
>
> I have also encountered this problem. I have tested the problem in Windows XP

3.0. I
> have latest java and flash and I have tried both Firefox and IE (both latest

k just
> fine.
>
> I too would like to know how to solve this problem.
>
> Kind regards,
>
> Michael Phipps
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
  
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Resources for utilizing multiple processors











> From: rjeffr...@ucla.edu
> Date: Wed, 8 Jun 2011 20:54:45 -0700
> To: r-help@r-project.org
> Subject: [R] Resources for utilizing multiple processors
>
> Hello,
>
> I know of some various methods out there to utilize multiple processors but
> am not sure what the best solution would be. First some things to note:
> I'm running dependent simulations, so direct parallel coding is out
> (multicore, doSnow, etc).

the
> *nix languages.

Well, for the situation below you seem to want a function
server. You could consider Rapache and just write this like a big
web application. A web server, like a DB, is not the first thing
you think of with high performance computing but if your computationally
intenstive tasks are in native code this could be a reasoanble
overhead that requires little learning. 

If you literally means cores instead of machines keep in mind
that cores can end up fighting over resources, like memory
( this cites IEEE article with cores making things worse
in non-contrived case)

http://lists.boost.org/boost-users/2008/11/42263.php


I think people have mentioned some classes like bigmemory, I forget
the names exactly, that let you handle larger things. Launching a bunch
of threads and letting VM thrash can easily make things slower quickly.

I guess a better approach would be to get an implementation that is
block oriented and you can do the memory/file stuff in R until
they get a data frame that uses disk transparently and with hints on
expected access patterns ( prefetch etc). 



>
> My main concern deals with Multiple analyses on large data sets. By large I
> mean that when I'm done running 2 simulations R is using ~3G of RAM, the
> remaining ~3G is chewed up when I try to create the Gelman-Rubin statistic
> to compare the two resulting samples, grinding the process to a halt. I'd
> like to have separate cores simultaneously run each analysis. That will save
> on time and I'll have to ponder the BGR calculation problem another way. Can
> R temporarily use HD space to write calculations to instead of RAM?
>
> The second concern boils down to whether or not there is a way to split up
> dependent simulations. For example at iteration (t) I feed a(t-2) into FUN1
> to generate a(t), then feed a(t), b(t-1) and c(t-1) into FUN2 to simulate
> b(t) and c(t). I'd love to have one core run FUN1 and another run FUN2, and
[[elided Hotmail spam]]
>
>
> So if anyone has any suggestions as to a direction I can look into, it would
> be appreciated.
>
>
> Robin Jeffries
> MS, DrPH Candidate
> Department of Biostatistics
> UCLA
> 530-633-STAT(7828)
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
  
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] RgoogleMaps Axes

2011-06-08 Thread Mike Marchywka















> Date: Tue, 7 Jun 2011 09:50:10 -0700
> From: egregory2...@yahoo.com
> To: r-help@r-project.org
> Subject: [R] RgoogleMaps Axes
>
> R Help,
>
>
> I posted a question on StackOverflow yesterday regarding an issue I've been 
> having with the RgoogleMaps packages' displaying of axes. Here is the text of 
> that submission:
> http://stackoverflow.com/questions/6258408/rgooglemaps-axes
>
> "I can't find any documentation of the following problem I'm having with the 
> axis labels in RGoogleMaps:
> library(RgoogleMaps)
> datas <-structure(list(LAT =c(37.875,37.925,37.775,37.875,37.875),
> LON =c(-122.225,-122.225,-122.075,-122.075,-122.025)),
> .Names=c("LAT","LON"),class="data.frame",
> row.names =c(1418L,1419L,1536L,1538L,1578L))
> # Get bounding box.
> boxt <-qbbox(lat =datas$LAT,lon =datas$LON)
> MyMap<-GetMap.bbox(boxt$lonR,boxt$latR,destfile ="Arvin12Map.png",
> maptype ="mobile")
> PlotOnStaticMap(MyMap,lat =datas$LAT,lon =datas$LON,
> axes =TRUE,mar =rep(4,4))
>

I haven't gotten too far as I had to download and install proj4 and rgdal
but I did get a transofrm related warning. Did you get warnings?


> MyMap<-GetMap.bbox(boxt$lonR,boxt$latR,destfile ="Arvin12Map.png",maptype ="m
obile")
[1] "http://maps.google.com/maps/api/staticmap?center=37.85,-122.125&zoom=12&siz
e=640x640&maptype=mobile&format=png32&sensor=true"
Loading required package: rgdal
Loading required package: sp
Geospatial Data Abstraction Library extensions to R successfully loaded
Loaded GDAL runtime: GDAL 1.8.0, released 2011/01/12
Path to GDAL shared files: /usr/local/share/gdal
Loaded PROJ.4 runtime: Rel. 4.7.1, 23 September 2009
Path to PROJ.4 shared files: (autodetected)
Warning message:
In readGDAL(destfile, silent = TRUE) : GeoTransform values not available
>


> PlotOnStaticMap(MyMap,lat =datas$LAT,lon =datas$LON,
+  axes =TRUE,mar =rep(4,4))
List of 6
 $ lat.center: num 37.8
 $ lon.center: num -122
 $ zoom  : num 12
 $ myTile: int [1:640, 1:640] 968 853 855 969 1033 888 855 884 888 995 ...
  ..- attr(*, "COL")= chr [1:1132] "#00" "#020201" "#020202" "#030302" ...
  ..- attr(*, "type")= chr "rgb"
 $ BBOX  :List of 2
  ..$ ll: num [1, 1:2] 37.8 -122.2
  .. ..- attr(*, "dimnames")=List of 2
  .. .. ..$ : chr "Y"
  .. .. ..$ : chr [1:2] "lat" "lon"
  ..$ ur: num [1, 1:2] 37.9 -122
  .. ..- attr(*, "dimnames")=List of 2
  .. .. ..$ : chr "Y"
  .. .. ..$ : chr [1:2] "lat" "lon"
 $ url   : chr "google"
NULL
[1] -291.2711  291.2711
[1] -276.5158  276.7972



> When I run this on my computer the horizontal axis ranges from 300W to 60E, 
> but the ticks in between aren't linearly spaced (300W, 200W, 100W, 0, 100E, 
> 160W, 60W). Also, the vertical axis moves linearly from 300S to 300N. It 
> seems that no matter what data I supply for datas, the axes are always 
> labeled this way.
> My question is:
> 1. Does this problem occur on other machines using this code?
> 2. Does anyone have an explanation for it?
> and
> 3. Can anybody suggest a way to get the correct axes labels (assuming these 
> are "incorrect", but maybe i'm somehow misinterpreting the plot!)?
> Thank you for your time."
>
> There has been no answer other than that I ought to contact the package 
> maintainer. Since it would be nice to have a publicly displayed solution, I 
> opted to post here first before doing so. Does anyone have any insight to 
> share?
> Thank you very much for your time,
>
> -Erik Gregory
> CSU Sacramento, Mathematics
> Student Assistant, California Environmental Protection Agency
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>   
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Can we prepare a questionaire in R

2011-06-08 Thread Mike Marchywka








> Date: Wed, 8 Jun 2011 12:37:33 +0530
> From: ammasamri...@gmail.com
> To: r-help@r-project.org
> Subject: [R] Can we prepare a questionaire in R
>
> Is there a way to prepare a questionnaire in R like html forms whose data
> can be directly populated into R?
>

I've started to use Rapache although Rserve would also be an option.
When installed on server, your can point your html form to an rhtml
page and get the form variables in R just as with other web languages.
For writing html output, I had been using R2HTML ( see code excerpt below
from rhtml page). I had also found Cairo works ok if you don't
need X-11 for anything. 

For your specific situation however, it may be easier to just
use whatever you already have and just use R for the data analysis.
When a request for a results report is made,  send that to Rapache for
example. I would mention that I have gone to running two versions
of Apache, one with R and one with PHP, to allow for security and
easier development( I can write the php to fail nicely if the R apache
server is not up and no new security issues are exposed). 


library("Cairo")
library("R2HTML")
library("RColorBrewer")


You can of course also generate html from normal R commands, or for that
matter bash scripts etc.

  
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Populating values from html

2011-06-07 Thread Mike Marchywka










> Date: Tue, 7 Jun 2011 03:35:46 -0700
> From: ammasamri...@gmail.com
> To: r-help@r-project.org
> Subject: [R] Populating values from html
>
> can we populate values into an excel sheet from html forms that has to be
> used in R for data analysis
>
> Can we directly retireve the values from html forms into R fro analysis


Judging from the way many websites offer data, you'd think that jpg is
the best means for getting it LOL. html, pdf, and image formats
are designed for human consumption of limited aspects of a data set.
Normally you would prefer something closer to raw data like csv.
After having visited yet another public website that offers
data in pdf ( YAPWTODIP) I would suggest you first contact the
site and ask them to present data in a form which allows it to
be easily examined in an open source environment ( note this criterion  does not
make Excel a preferred choice either). If you have to scrape web pages,
apparently there are some R facilities but depending on the page you
may need to do a lot of work and it will not  survive if the page
is redone for artistic reasons etc. See any of these results for example,

http://www.google.com/search?hl=en&q=cran+page+scraping








>
> --
> View this message in context: 
> http://r.789695.n4.nabble.com/Populating-values-from-html-tp3579215p3579215.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
  
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Logistic Regression

2011-06-07 Thread Mike Marchywka






> Date: Tue, 7 Jun 2011 01:38:32 -0700
> From: farah.farid@student.aku.edu
> To: r-help@r-project.org
> Subject: [R] Logistic Regression
>
> I am working on my thesis in which i have couple of independent variables
> that are categorical in nature and the depndent variable is dichotomus.
> Initially I run univariate analysis and added the variables with significant
> p-values (p<0.25) in my full model.
> I have three confusions. Firstly, I am looking for confounding variables by

I'm not sure what your thesis is about, some system that you are strying
by statistics or maybe the thesis is about statistics, but
according to this disputed wikipedia entry,

http://en.wikipedia.org/wiki/Confounding

confounding or extraneous is determined by the reality of your system.
It may help to consider factors related to that and use the statistics
to avoid fooling yourself. Look at the pictures ( non-pompous way of saying
look at graphs and scatter plots for some ideas to test ) and then test various
ideas. You see bad cause/effect inferences all the time in many fields- 
from econ to biotech ( although anecdotes suggest these mistakes usually
favour the sponsors LOL). Consider some mundane "known" examples about
what your data would look like and see if that relates to what you have.
If you were naively measuring car velocity 
at a single point in front of traffic light and color of light, what might you
observe ( much like with an earlier example on iron in patients, there are a 
number
of more precisely defined measurements you could take on a given "thing.").

If your concern is that " I ran test A and it said B but test C said D and
D seems inconsistent with B" it generally helps to look at assumptions and 
detailed
equations for each model and explore what those mean with your data. With 
continuous
variables anyway, non-monotonic relationships can easily destroy a correlation
even with strong causality. 


> using formula "(crude beta-cofficient - adjusted beta-cofficient)/ crude
> beta-cofficient x 100" as per rule if the percentage of any variable is >10%
> than I have considered that as confounder. I wanted to know that from
> initial model i have deducted one variable with insignificant p-value to
> form adjusted model. Now how will i know if the variable that i deducted
> from initial model was confounder or not?
> Secondly, I wanted to know if the percentage comes in negative like
> (-17.84%) than will it be considered as confounder or not? I also wanted to
> know that confounders should be removed from model? or should be kept in
> model?
> Lastly, I wanted to know that I am running likelihood ratio test to identify
> if the value is falling in critical region or not. So if the value doesnot
> fall in critical region than what does it show? what should I do in this
> case? In my final reduced model all p-values are significant but still the
> value identified via likelihood ratio test is not falling in critical
> region. So what does that show?
>
>
> --
> View this message in context: 
> http://r.789695.n4.nabble.com/Logistic-Regression-tp3578962p3578962.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
  
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] predict with eha package

2011-06-02 Thread Mike Harwood

Hello list, and thank you in advance.

I'm unable to generate predicted values when specifying newdata using
phreg and aftreg models
in the eha package, but I do not have the same problem with a
proportional hazards model from coxph.  Without the newdata argument
the predicted values are returned, but with
"newdata=" coxph is fine but both aftreg and phreg
models return an  "Error in predict.coxph(f.ph.eha,
newdata = mort, type = "lp") :  Data is not the same size as it was in
the original fit" message.  Since I ultimately want a parametric model
and the "real"
data is left truncated and right censored, I think the aftreg function
in the eha package is what I must use.  Following is my sample code,
without the output.

#~ All models generated successfully -
f.ph <- coxph(Surv(enter, exit, event) ~ ses, data = mort)
f.ph.eha <- phreg(Surv(enter, exit, event) ~ ses, data = mort)
f.aft <- aftreg(Surv(enter, exit, event) ~ ses, data = mort)

#~ All fits generated successfully ---
f.ph.fit <- predict(f.ph, type='lp')
f.ph.eha.fit <- predict(f.ph.eha, type='lp')
f.aft.fit <- predict(f.aft, type='lp')

#~ First fit generated successfully, others output
error
f.ph.fit <- predict(f.ph, newdata=mort, type='lp')
f.ph.eha.fit <- predict(f.ph.eha, newdata=mort, type='lp')
f.aft.fit <- predict(f.aft, newdata=mort, type='lp')


Mike

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] error in model specification for cfa with lavaan-package

2011-06-01 Thread Mike Cheung

Dear Alain,

You may speed up the analysis by using the sample covariance matrix based on
a listwise deletion:
cov.cfa <- cov(your.raw.data, use="complete.obs")

Since you have 36671 cases, the results should be similar to those based on
the raw data unless you have lots of missing data and/or the data are
missing at random.

By the way, if your questions are directly related to SEM, you may get more
responses from SEMNET (http://alabamamaps.ua.edu/archives/semnet.html).

Hope it helps.

Regards,
Mike
-- 
-----
 Mike W.L. Cheung   Phone: (65) 6516-3702
 Department of Psychology   Fax:   (65) 6773-1843
 National University of Singapore
 http://courses.nus.edu.sg/course/psycwlm/internet/
-

On Thu, Jun 2, 2011 at 1:21 AM, yrosseel  wrote:

> On 06/01/2011 05:39 PM, D. Alain wrote:
>
>> Thank you Yves and Mike,
>>
>> your comments make sence, however they do not resolved my problem: The p
>> < N is the result of my poor attempt to give a reproducible example. My
>> "real" dataframe has a dim of 36671 cases an 41 variables.
>> Following your advice, Yves, I passed my model without lavaanification
>> (just "cfa(cfa.model, data=df.cfa,missing="ml")"), but now R is working
>> for hours without printing any results...
>>
>
> To monitor progress, you can use the verbose=TRUE argument. If you are not
> fitting your final model, you may want to use se="none" (no standard errors)
> and test="none" (no test statistic) to speed things up.
>
> I've ran analyses that took multiple days, using both lavaan and commercial
> software. It is annoying (the wait), but not unusual. The time to fit is a
> function of the number of missing patterns you have in your data, and the
> number of variables.
>
> Yves.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Identifying sequences

2011-06-01 Thread Mike Marchywka








> Date: Wed, 1 Jun 2011 17:12:29 +0200
> From: cjp...@gmail.com
> To: r-help@r-project.org
> Subject: Re: [R] Identifying sequences
>
> Thanks to David, Thierry and Jonathan for your help.
> I have been able to put this function together
>
> a=1:10
> b=20:30
> c=40:50
> x=c(a,b,c)
>
> seq.matrix <- function(x){
> lower<- x[which(diff(x) != 1)]
> upper <- x[which(diff(x) != 1)+1]
> extremities <- c(1,lower, upper,x[length(x)])
> m <- 
> data.frame(matrix(extremities[order(extremities)],ncol=2,byrow=TRUE,dimnames=list(rows=paste("group",1:(length(lower)+1),sep=""),cols=c("lower","upper"
> m$length=m$upper-m$lower+1
> m
> }
>
> s.m=seq.matrix(x)
> s.m
> lower upper length
> group1 1 10 10
> group2 20 30 11
> group3 40 50 11
>
> One can then make a test to see if a certain value (say 9) falls
> within one of the groups and use that to find the group name or lower
> or upper border

As I understand, you are looking for large derivatives
or approx discontinuity against smooth signal.
This seems like a natural application for wavelets,
try the haar wavelet and use package 
wavelets,

> library(wavelets)
> f=wt.filter(c(-1,1),modwt=T)
> z<-modwt(X=as.numeric(x),filter=f,n.levels=1)
> z@W
$W1
  [,1]
 [1,]   49
 [2,]   -1
 [3,]   -1
 [4,]   -1
 [5,]   -1
 [6,]   -1
 [7,]   -1
 [8,]   -1
 [9,]   -1
[10,]   -1
[11,]  -10
[12,]   -1
[13,]   -1
[14,]   -1
[15,]   -1
[16,]   -1
[17,]   -1
[18,]   -1
[19,]   -1
[20,]   -1
[21,]   -1
[22,]  -10
[23,]   -1
[24,]   -1
[25,]   -1
[26,]   -1
[27,]   -1
[28,]   -1
[29,]   -1
[30,]   -1
[31,]   -1
[32,]   -1







>
> s.m.test=function(s.m,i){which(s.m[,1] > s.m.test(s.m,i=9)
> [1] 1
> e.g.
> row.names(s.m)[s.m.test(s.m,i=9)]
> [1] "group1"
>
> Cheers
> Christiaan
>
> On 1 June 2011 14:31, Jonathan Daily wrote:
> >
> > I am assuming in this case that you are looking for continuity along
> > integers, so if you expect noninteger values this will not work.
> >
> > You can get the index of where breaks can be found in your example using
> >
> > which(diff(x) > 1)
> >
> > On Wed, Jun 1, 2011 at 6:27 AM, christiaan pauw wrote:
> > > Hallo Everybody
> > >
> > > Consider the following vector
> > >
> > > a=1:10
> > > b=20:30
> > > c=40:50
> > > x=c(a,b,c)
> > >
> > > I need a function that can tell me that there are three set of continuos
> > > sequences and that the first is from 1:10, the second from 20:30 and the
> > > third from 40:50. In other words: a,b, and c.
> > >
> > > regards
> > > Christiaan
> > >
> > > [[alternative HTML version deleted]]
> > >
> > > __
> > > R-help@r-project.org mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide 
> > > http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.
> > >
> >
> >
> >
> > --
> > ===
> > Jon Daily
> > Technician
> > ===
> > #!/usr/bin/env outside
> > # It's great, trust me.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>   
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] error in model specification for cfa with lavaan-package

2011-06-01 Thread Mike Cheung

Dear Alain,

There were 16 variables with 10 cases with missing values. The sample
covariance matrix is not positive definite. It has nothing to do with
lavaan. You need more cases before you can fit a CFA with 16 variables.

Regards,
Mike

-- 
-
 Mike W.L. Cheung   Phone: (65) 6516-3702
 Department of Psychology   Fax:   (65) 6773-1843
 National University of Singapore
 http://courses.nus.edu.sg/course/psycwlm/internet/
-

On Wed, Jun 1, 2011 at 6:31 PM, D. Alain  wrote:

> Dear R-List,
>
> (I am not sure whether this list is the right place for my question...)
>
> I have a dataframe df.cfa
>
>
> df.cfa<-data.frame(x1=c(5,4,1,5,5,NA,4,NA,NA,5),x2=c(2,3,3,3,NA,1,2,1,2,1),x3=c(5,3,4,1,5,5,5,5,5,5),x4=c(5,3,4,5,5,5,5,5,5,5),x5=c(5,4,3,3,4,4,4,5,NA,5),x6=c(3,5,2,1,4,NA,NA,5,3,4),x7=c(5,4,3,4,4,3,4,NA,NA,4),x8=c(5,5,3,4,5,4,5,4,5,5),x9=c(5,1,1,1,NA,3,3,2,2,3),x10=c(5,5,2,3,4,3,1,3,2,5),x11=c(5,4,4,5,5,4,5,4,5,5),x12=c(5,4,5,5,5,4,5,5,5,5),x13=c(5,4,4,5,5,5,5,5,5,5),x14=c(5,3,3,NA,5,4,5,3,NA,5),x15=c(4,4,4,3,4,5,2,4,5,5),x16=c(5,4,3,3,4,4,3,3,4,5))
>
>
> and I want to carry out a confirmatory factor analysis using the "cfa"
> function (lavaan).
>
> library(lavaan)
>
> then I specify my model
>
> cfa.model<-'  f1=~x1+x2+x3+x4+x5
> f2=~x6+x7+x8+x9+x10+x11
> f3=~x12+x13+x14+x15+x16 '
>
> and run the cfa routine
>
> fit<-cfa(cfa.model,data=df.cfa,missing="ml")
>
> The output is an error message (here partly in german)
>
> "Error in chol.default(S) : der führende Minor der Ordnung 4 ist nicht
> positiv definit
> Fehler in Sample(data = data, group = group, sample.cov = sample.cov,
> sample.mean = sample.mean,  : sample covariance can not be inverted"
>
> Then I tried to "lavaanify" my model specification first
>
> cfa.model<- lavaanify(cfa.model)
>
>id lhs op rhs user group free ustart fixed.x   label eq.id free.uncon
> 1   1  f1 =~  x11 11 NA   0  f1=~x1 0  1
> 2   2  f1 =~  x21 12 NA   0  f1=~x2 0  2
> 3   3  f1 =~  x31 13 NA   0  f1=~x3 0  3
> 4   4  f1 =~  x41 14 NA   0  f1=~x4 0  4
> 5   5  f1 =~  x51 15 NA   0  f1=~x5 0  5
> 6   6  f2 =~  x61 16 NA   0  f2=~x6 0  6
> 7   7  f2 =~  x71 17 NA   0  f2=~x7 0  7
> 8   8  f2 =~  x81 18 NA   0  f2=~x8 0  8
> 9   9  f2 =~  x91 19 NA   0  f2=~x9 0  9
> 10 10  f2 =~ x101 1   10 NA   0 f2=~x10 0 10
> 11 11  f2 =~ x111 1   11 NA   0 f2=~x11 0 11
> 12 12  f3 =~ x121 1   12 NA   0 f3=~x12 0 12
> 13 13  f3 =~ x131 1   13 NA   0 f3=~x13 0 13
> 14 14  f3 =~ x141 1   14 NA   0 f3=~x14 0 14
> 15 15  f3 =~ x151 1   15 NA   0 f3=~x15 0 15
> 16 16  f3 =~ x161 1   16 NA   0 f3=~x16 0 16
>
> I run cfa again
>
> fit<-cfa(cfa.model,data=df.cfa,missing="ml")
>
> And this time I get another error information
>
> "Error in start.idx[i]:end.idx[i] : NA/NaN Argument"
>
> I must admit that I am stuck, can anyone help?
>
> Thanks, Alain
>[[alternative HTML version deleted]]
>
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] newbie: fourier series for time series data

2011-05-31 Thread Mike Marchywka

( hotmail won't mark text so I'm top posting... )

Can you post the data? Personally I'd just plot abs(fft(x))
and see what you see, as well as looking at Im(fft(x))/Re(fft(x))
or phase spectrum. Now, presumably you are nominally looking
for something with a period of 1 year, that part could
be expressed in harmonic specrum as suggested below, but you'd
also be looking for trends and noises of various types- additive
gausssian, amplitude modulation, maybe even frequncy modulation, etc.
I guess you could remove a few power terms ( average, linear, etc)
just to simplify ( often you get this spectrum with huge uniformative
DC or zero frequency component just gets in the way). You can probably
find book online dealing with signal processing and RF ( this is where
I'm used to seeing this things). It would of course be helpful to 
then examine known simple cases and see if you can tell them apart.
Create fft( sin(t)*(1+a*sin(epsilon*t)+b*t ) for example. 

I guess if you want to look at writing a model, you could look at
phase portrait ( plot derviative versus value ) to again get some idea what 
you may have that makes sense as model to fit.

Date: Tue, 31 May 2011 10:35:16 -0700
From: spencer.gra...@structuremonitoring.com
To: eddie...@gmail.com
CC: r-help@r-project.org
Subject: Re: [R] newbie: fourier series for time series data

On 5/31/2011 5:12 AM, eddie smith wrote:
> Hi Guys,
>
> I had a monthly time series's data of land temperature from 1980 to 2008.
> After plotting a scatter diagram, it seems that annually, there is a semi
> sinusoidal cycle. How do I run Fourier's series to the data so that I can
> fit model on it?

There are several methods.

1. The simplest would be to select the number of terms you
want, put the data into a data.frame, and use lm(y ~ sin(t/period) +
cos(t/period) + sin(2*t/period) + cos(2*t/period) + ..., data),
including as many terms as you want in the series. This is not
recommended, because it ignores the time series effects and does not
apply a smoothness penalty to the Fourier approximation.

2. A second is to use the 'fda' package. Examples are
provided (even indexed) in Ramsay, Hooker and Graves (2009) Functional
Data Analysis with R and Matlab (Springer). This is probably what
Ramsay and Hooker would do, but I wouldn't, because it doesn't treat the
time series as a time series. It also requires more work on your part.

3. A third general class of approaches uses Kalman
filtering, also called dynamic linear models or state space models.
This would allow you to estimate a differential equation model, whose
solution could be a damped sinusoid. It would also allow you to
estimate regression coefficients of a finite Fourier series but without
the smoothness penalty you would get with 'fda'. For this, I recommend
the 'dlm' package with its vignette and companion book, Petris, Petrone
and Campagnoli (2009) Dynamic Linear Models with R (Springer).

If you want something quick and dirty, you might want option 1.
For that, I might use option 2, because I know and understand it
moderately well (being third author on the book). However, if you
really want to understand time series, I recommend option 3. That has
the additional advantage that I think it would have the greatest chances
of acceptance in a refereed academic journal of the three approaches.

> I am really sorry for my question sound stupid, but I just don't know where
> to start.

There also are specialized email lists that you might consider
for a future post. Go to www.r-project.org 
-> "Mailing Lists". In particular, you might be most interested in
R-sig-ecology.

Hope this helps.
Spencer Graves

> I am desperately looking for help from you guys.
>
> Thanks in advance.
>
> Eddie
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

--
Spencer Graves, PE, PhD
President and Chief Operating Officer
Structure Inspection and Monitoring, Inc.
751 Emerson Ct.
San José, CA 95126
ph: 408-655-4567

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.  

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Compiling C-code in Windows

2011-05-31 Thread Mike Marchywka






> Date: Tue, 31 May 2011 11:37:56 -0400
> From: murdoch.dun...@gmail.com
> To: tom.osb...@iinet.net.au
> CC: r-h...@stat.math.ethz.ch; pmi...@ff.uns.ac.rs

>
> On 31/05/2011 7:50 AM, Tom Osborn wrote:
> > You could use cygwin's cc/gcc, or the Watcom opensource compiler.
>
> Neither of those is supported. Use what the R Admin manual suggests, or
> you're on your own.

I have had fairly good luck with cygwin or mingw but you may need to
have a few conditionals etc. Not sure what R suggests but cygwin should
work.


>
> Duncan Murdoch
>
> > [Watcom used to be a commercial compiler which ceased and has
> > become an open project].
> >
> > But listen to people who've experienced the options. [Not I].
> >
> > Cheers, Tom.
> > - Original Message -
> > From: Petar Milin
> > To: R-HELP
> > Sent: Tuesday, May 31, 2011 9:43 PM

> >
> >
> > Hello ALL!
> > I am an Linux user (Debian testing i386), with very dusty
> > Win-experience. Nevertheless, my colleagues and I are making some
> > package in R, and I built C-routines to speed up things.
> > I followed instruction how to compile C for R (very useful link:
> > http://www.stat.lsa.umich.edu/~yizwang/software/maxLinear/noteonR.html,
> > and links listed there). Everything works like a charm in Linux. I have
> > *.so and wrapper function from R is doing a right call.
> > However, I wanted to make *.dll library for Win-users. Now, I used my
> > colleague's computer with Win XP on it, and with the latest R. In MS-DOS
> > console, I positioned prompt in 'C;\Program Files\R\R-2.13.0\bin\i386\',
> > and then I run: 'R CMD SHLIB C:\trial.c'. However, nothing happened, no
> > trial.dll, nothing. Then, I tried with: 'R CMD SHLIB --output=trial.dll
> > C:\trial.c', but no luck, again.
> > Please, can anyone help me with this? Can I use: 'R CMD SHLIB
> > --output=trial.dll C:\trial.c' under Linux, and expecting working DLL?
> >
> > Best,
> > PM
> >
> > __
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
> > [[alternative HTML version deleted]]
> >
> > __
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>   
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Value of 'pi'

2011-05-30 Thread Mike Marchywka













> Date: Sun, 29 May 2011 23:09:47 -0700
> From: jwiley.ps...@gmail.com
> To: vincy_p...@yahoo.ca
> CC: r-help@r-project.org
> Subject: Re: [R] Value of 'pi'
>
> Dear Vincy,
>
> I hope that in school you also learned that 22/7 is an approximation.
> Please consult your local mathematician for a proof that pi != 22/7.
> A quick search will provide you with volumes of information on what pi
> is, how it may be calculated, and calculations out to thousands of
> digits.
>
> Cheers,
>
> Josh
>
> On Sun, May 29, 2011 at 11:01 PM, Vincy Pyne wrote:
> > Dear R helpers,
> >
> > I have one basic doubt about the value of pi. In school, we have learned 
> > that
> >
> > pi = 22/7 (which is = 3.142857). However, if I type pi in R, I get pi = 
> > 3.141593. So which value of pi should be considered?

You could do this if you trust your trig functions since
that is presumably what you want a value of pi for,

> atan(1)
[1] 0.7853982
> atan(1)*4
[1] 3.141593

> atan(1)*4*7-22
[1] -0.008851425

> atan(1)*4-pi
[1] 0





> >
> > Regards
> >
> > Vincy
> >
> >
> >
> >
> >
> > [[alternative HTML version deleted]]
> >
> > __
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
>
>
> --
> Joshua Wiley
> Ph.D. Student, Health Psychology
> University of California, Los Angeles
> http://www.joshuawiley.com/
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>   
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Nested design

2011-05-28 Thread Mike Marchywka

> Date: Sat, 28 May 2011 09:33:03 -0700
> From: jwiley.ps...@gmail.com
> To: bjorn.robr...@gmail.com
> CC: r-help@r-project.org
> Subject: Re: [R] Nested design
>
> Hi,
>
> If you are not asking for stats help, then do you understand the model
> and are just confused by how R labels it? We can help match R's
> labels to the ones you are used to, if you tell us what you are used
> to.

I would not suggest as a rule to use a tool to validate itself but
you can use R to make sure your interpretation of other R output is right by 
giving 
contrived datasets to the analysis package and see what you get back.
Comparison can be to examples from text book or your own paper and pencil
analysis.  This is also a good way to learn things from basic terms to
things like sign or unit conventions in different fields etc. 

You can generate samples from normal distro and feed that to
the questionable package to see what comes back. 

>
> Cheers,
>
> Josh
>
> On Sat, May 28, 2011 at 6:54 AM, unpeatable  wrote:
> > Dear Dennis,
> > In my opinion I am not at all asking for any stats help, just a question how
> > to read this output.
> > Thanks, Bjorn
> >
> > -
> > Dr. Bjorn JM Robroek
> > Ecology and Biodiversity Group
> > Institute of Environmental Biology, Utrecht University
> > Padualaan 8, 3584 CH Utrecht, The Netherlands
> > Email address: b.j.m.robr...@uu.nl
> > http://www.researcherid.com/rid/C-4379-2008
> > Tel: +31-(0)30-253 6091
> > --
> > View this message in context: 
> > http://r.789695.n4.nabble.com/Nested-design-tp3557404p3557472.html
> > Sent from the R help mailing list archive at Nabble.com.
> >
> > __
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
>
>
> --
> Joshua Wiley
> Ph.D. Student, Health Psychology
> University of California, Los Angeles
> http://www.joshuawiley.com/
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] predictive accuracy

2011-05-26 Thread Mike Marchywka

> Date: Thu, 26 May 2011 13:50:15 -0700
> From: gunter.ber...@gene.com
> To: ahmed.el-taht...@pfizer.com
> CC: r-help@r-project.org
> Subject: Re: [R] predictive accuracy
>
> 1. This is not about R, and should be taken off list.

Well, depending on what mod's think a little bit of
generic "how do I REALLY use this tool" discusssion may be of 
benefit for all here- a maillist for a certain brand of hammer
may discuss various uses and types of nails etc.

Pesonally I have an interest in this-if the OP will 
post the data it may be possible to explore some analysis
options. 

>
> 2. You are wading in an alligator infested swamp. Get help from
> (other) statisticians at Pfizer (there are many good ones there).

I thought that is what statisticians do? LOL. 
We don't know the situation- intern, looking for outside ideas after
exhausting internals, specific issues with internal peers,
summer student not wishing to bother everyone there for details etc. 

>
> Best,
> Bert
>
> P.S. The answer to all your questions is "no" (imho).

>
>
>
> On Thu, May 26, 2011 at 1:35 PM, El-Tahtawy, Ahmed
>  wrote:
> > The strong predictor is the country/region where the study was
> > conducted. So it is not important/useful for a clinician to use it (as
> > long he/she is in USA or Europe).
> > Excluding that predictor will make another 2 insignificant predictors to
> > become significant!!  Can the new model have a reliable predictive
> > accuracy? I thought of excluding all patients from other countries and
> > develop the model accordingly- is the exclusion of a lot of patients and
> > compromise of the power is more acceptable??

LOL, quite the contrary, post hoc selection increases power to find
whatever you or sponsor desire... 

Presuming your general interest is in finding out attributes of a given
drug under various conditions, you would probably want to combine 
the observations with tentative thoughts on causality and see
what makes the best story. 

Statistical significance in isolation is a function of the data and analysis 
method,
doesn't really have anything specific to do with underlying systems.

In this case, if you have other continuous prognostic factors, say 
age, LDH, hemoglobin come to mind, you may be able to find that you
have nonmonotinc  relations between prognostic factor and outcome.
But, furhter,say you have enough patients that you could in fact map
dose response curves. It may turn out that this curve is in fact non-montonic
with parameters non-monotonic in prognsotic factor. Consider 

avg_survival= a+b*d-c*d^2

where d is the dose. At for small d, it seems to help but for larger dose it 
makes things worse. Now consider that "c" is a complicated function
of hematocrit, it may not be hard to imagine that anemics and siderositic( is 
that a word LOL?) have some underlying problems dealing with your drug. 
These may be distributed geographically etc etc etc.

This is all stuff you can simulate in R or even on paper. 

It sounds like you are already trying to write a label, which may
be a bit premature ( although I defer to the guy from DNA for that LOL).
" indicated for use in patients in Western Hemisphere with  " 

You may have decent luck looking at FDA panel discussion transcripts, search for
related general stats terms confined to "site:fda.gov"

> > Thanks for your help...
> > Al
> >
> > -Original Message-
> > From: Marc Schwartz [mailto:marc_schwa...@me.com]
> > Sent: Thursday, May 26, 2011 10:54 AM
> > To: El-Tahtawy, Ahmed
> > Cc: r-help@r-project.org
> > Subject: Re: [R] predictive accuracy
> >
> >
> > On May 26, 2011, at 7:42 AM, El-Tahtawy, Ahmed wrote:
> >
> >> I am trying to develop a prognostic model using logistic regression.
> > I
> >> built a full , approximate models with the use of penalization -
> > design
> >> package. Also, I tried Chi-square criteria, step-down techniques. Used
> >> BS for model validation.
> >>
> >> > The main purpose is to develop a predictive model for future patient
> >> population.   One of the strong predictor pertains to the study design
> >> and would not mean much for a clinician/investigator in real clinical
> >> situation and have been asked to remove it.
> >> > Can I propose a model and nomogram without that strong -irrelevant
> >> predictor?? If yes, do I need to redo model calibration,
> > discrimination,
> >> validation, etc...?? or just have 5 predictors instead of 6 in the
> >> prognostic model??
> >>
> >>
> >>
> >> Thanks for your help
> >>
> >> Al
> >
> >
> > Is it that the study design characteristic would not make sense to a
> > clinician but is relevant to future samples, or that the study design
> > characteristic is unique to the sample upon which the model was
> > developed and is not relevant to future samples because they will not be
> > in the same or a similar study?
> >
> > Is the study design characteristic a surrogate for other factors that
> > would be relevant t

Re: [R] Processing large datasets

2011-05-25 Thread Mike Marchywka










> Date: Wed, 25 May 2011 12:32:37 -0400
> Subject: Re: [R] Processing large datasets
> From: mailinglist.honey...@gmail.com
> To: marchy...@hotmail.com
> CC: ro...@bestroman.com; r-help@r-project.org
>
> Hi,
>
> On Wed, May 25, 2011 at 11:00 AM, Mike Marchywka  wrote:
> [snip]
> >> > If your datasets are *really* huge, check out some packages listed
> >> > under the "Large memory and out-of-memory data" section of the
> >> > "HighPerformanceComputing" task view at CRAN:
> >>
> >> > http://cran.r-project.org/web/views/HighPerformanceComputing.html
> >
> > Does this have any specific limitations ? It sounds offhand like it
> > does paging and all the needed buffering for arbitrary size
> > data. Does it work with everything?
>
> I'm not sure what limitations ... I know the bigmemory (and ff)
> packages try hard to make using out-of-memory datasets as
> "transparent" as possible.
>
> That having been said, I guess you will have to port "more advanced"
> methods to use such packages, hence the existence of the biglm,
> biganalytics, bigtabulate packages do.
>
> > I seem to recall bigmemory came up
> > before in this context and there was some problem.
>
> Well -- I don't often see emails on this list complaining about their
> functionality. That doesn't mean they're flawless (I also don't
> scrutinize the list traffic too closely). It could be that not too
> many people use them, or that people give up before they come knocking
> when there is a problem.
>
> Has something specifically failed for you in the past, or?

No, I haven't tried. I may have it confused with something else.
But this question does come up a bit usually related to 
" I tried to read huge file into data frame and wanted to pass
it to something with predictable memory access patterns and it
ran out of memory. What can I do?" I guess I also stopped reading
anything after " using a DB" as this is generally not a replacement
for a data strcuture. I'll take a look when I have a big dataset that
I can't condense easily. 






>
> -steve
>
> --
> Steve Lianoglou
> Graduate Student: Computational Systems Biology
>  | Memorial Sloan-Kettering Cancer Center
>  | Weill Medical College of Cornell University
> Contact Info: http://cbio.mskcc.org/~lianos/contact
  
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Processing large datasets

2011-05-25 Thread Mike Marchywka





> Date: Wed, 25 May 2011 10:18:48 -0400
> From: ro...@bestroman.com
> To: mailinglist.honey...@gmail.com
> CC: r-help@r-project.org
> Subject: Re: [R] Processing large datasets
>
> > Hi,
> > If your datasets are *really* huge, check out some packages listed
> > under the "Large memory and out-of-memory data" section of the
> > "HighPerformanceComputing" task view at CRAN:
>
> > http://cran.r-project.org/web/views/HighPerformanceComputing.html

Does this have any specific limitations ? It sounds offhand like it
does paging and all the needed buffering for arbitrary size
data. Does it work with everything? I seem to recall bigmemory came up
before in this context and there was some problem.

Thanks.



>
> > Also, if you find yourself needing to do lots of
> > "grouping/summarizing" type of calculations over large data
> > frame-like objects, you might want to check out the data.table package:
>
> > http://cran.r-project.org/web/packages/data.table/index.html
>
> > --
> > Steve Lianoglou
> > Graduate Student: Computational Systems Biology
> > | Memorial Sloan-Kettering Cancer Center
> > | Weill Medical College of Cornell University
> > Contact Info: http://cbio.mskcc.org/~lianos/contact
>
> I don't think data.table is fundamentally different from data.frame type, but 
> thanks for the suggestion.
>
> http://cran.r-project.org/web/packages/data.table/vignettes/datatable-intro.pdf
> "Just like data.frames, data.tables must fit inside RAM"
>
> The ff package by Adler, listed in "Large memory and out-of-memory data" is 
> probably most interesting.
>
> --Roman Naumenko
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
  
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Processing large datasets/ non answer but Q on writing data frame derivative.

2011-05-25 Thread Mike Marchywka







> Date: Wed, 25 May 2011 09:49:00 -0400
> From: ro...@bestroman.com
> To: biomathjda...@gmail.com
> CC: r-help@r-project.org
> Subject: Re: [R] Processing large datasets
>
> Thanks Jonathan.
>
> I'm already using RMySQL to load data for couple of days.
> I wanted to know what are the relevant R capabilities if I want to process 
> much bigger tables.
>
> R always reads the whole set into memory and this might be a limitation in 
> case of big tables, correct?

ok, now I ask, perhaps for my first R effort I will try to find source code for
data frame and make a paging or streaming derivative. That is, at least for
fixed size things, it can supply things like number of total rows but
has facilities for paging in and out of memory. Presumably all users of data
frame have to work through a limited interface which I guess could be 
expanded with various hints on " prefetch this" for example. I haven't looked
at this idea in a while but the issue keeps coming up, dev list maybe?

Anyway, for your immediate issues with a few statistics you could
probably write a simple c++ program that ultimately becomes part of
an R package. It is a good idea to see what is available but these
questions come up here a lot and the normal suggestion is "DB" which
is exactly the opposite of what you want if you have predictable
access patterns ( although even here prefetch could probably be implemented).






> Doesn't it use temporary files or something similar to deal such amount of 
> data?
>
> As an example I know that SAS handles sas7bdat files up to 1TB on a box with 
> 76GB memory, without noticeable issues.
>
> --Roman
>
> - Original Message -
>
> > In cases where I have to parse through large datasets that will not
> > fit into R's memory, I will grab relevant data using SQL and then
> > analyze said data using R. There are several packages designed to do
> > this, like [1] and [2] below, that allow you to query a database
> > using
> > SQL and end up with that data in an R data.frame.
>
> > [1] http://cran.cnr.berkeley.edu/web/packages/RMySQL/index.html
> > [2] http://cran.cnr.berkeley.edu/web/packages/RSQLite/index.html
>
> > On Wed, May 25, 2011 at 12:29 AM, Roman Naumenko
> >  wrote:
> > > Hi R list,
> > >
> > > I'm new to R software, so I'd like to ask about it is capabilities.
> > > What I'm looking to do is to run some statistical tests on quite
> > > big
> > > tables which are aggregated quotes from a market feed.
> > >
> > > This is a typical set of data.
> > > Each day contains millions of records (up to 10 non filtered).
> > >
> > > 2011-05-24 750 Bid DELL 14130770 400
> > > 15.4800 BATS 35482391 Y 1 1 0 0
> > > 2011-05-24 904 Bid DELL 14130772 300
> > > 15.4800 BATS 35482391 Y 1 0 0 0
> > > 2011-05-24 904 Bid DELL 14130773 135
> > > 15.4800 BATS 35482391 Y 1 0 0 0
> > >
> > > I'll need to filter it out first based on some criteria.
> > > Since I keep it mysql database, it can be done through by query.
> > > Not
> > > super efficient, checked it already.
> > >
> > > Then I need to aggregate dataset into different time frames (time
> > > is
> > > represented in ms from midnight, like 35482391).
> > > Again, can be done through a databases query, not sure what gonna
> > > be faster.
> > > Aggregated tables going to be much smaller, like thousands rows per
> > > observation day.
> > >
> > > Then calculate basic statistic: mean, standard deviation, sums etc.
> > > After stats are calculated, I need to perform some statistical
> > > hypothesis tests.
> > >
> > > So, my question is: what tool faster for data aggregation and
> > > filtration
> > > on big datasets: mysql or R?
> > >
> > > Thanks,
> > > --Roman N.
> > >
> > > [[alternative HTML version deleted]]
> > >
> > > __
> > > R-help@r-project.org mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide
> > > http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.
> > >
>
> > --
> > ===
> > Jon Daily
> > Technician
> > ===
> > #!/usr/bin/env outside
> > # It's great, trust me.
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
  
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] RGL package installation problem on Centos

2011-05-23 Thread Mike Marchywka













Date: Mon, 23 May 2011 14:43:59 +0100
From: arraystrugg...@gmail.com
To: r-help@r-project.org
Subject: [R] RGL package installation problem on Centos


Dear R users,
I have installed the latest version of R from source on Centos (using
configure and make install).
This seemed to work fine, with no Errors reported and R at the command line
starts R.

However, if I try and installed the package rgl using;
install.packages("rgl")
I get the following error;

installing to /usr/local/lib64/R/library/rgl/libs
** R
** demo
** inst
** preparing package for lazy loading
** help
*** installing help indices
** building package indices ...
** testing if installed package can be loaded
 *** caught segfault ***
address (nil), cause 'memory not mapped'

I just did install of R from source built with various
options to support Rapache and tried to load rgl.
First it complained no display so I went back
to bash and did "export DISPLAY=:0" and it seemed
to load ok. Do you have X running and a display set?
Not sure what happens if you have R without X11 support
for example. 

I probably installed with dep=TRUE and only cygwin I do recall
some issues with missing dependencies.  Try setting dependencies to
true and see if that helps.











aborting ...
sh: line 1: 23732 Segmentation fault  '/usr/local/lib64/R/bin/R'
--no-save --slave < /tmp/RtmpkvIjOb/file6d97876
ERROR: loading failed
* removing â/usr/local/lib64/R/library/rglâ
The downloaded packages are in
â/tmp/Rtmp5OaGuQ/downloaded_packagesâ
Updating HTML index of packages in '.Library'
Making packages.html  ... done
Warning message:
In install.packages("rgl") :
  installation of package 'rgl' had non-zero exit status
I read that Open GL header files have to be present and are in
/usr/include/GL.
I also read about different graphics cards causing problems but I don't know
how to find this info out.

Any help appreciated and full error message included below.

Thanks,

> sessionInfo()
R version 2.13.0 (2011-04-13)
Platform: x86_64-unknown-linux-gnu (64-bit)
locale:
 [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=C  LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
 [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

 full error ##
> install.packages("rgl")
--- Please select a CRAN mirror for use in this session ---
Loading Tcl/Tk interface ... done
trying URL 'http://cran.ma.imperial.ac.uk/src/contrib/rgl_0.92.798.tar.gz'
Content type 'application/x-gzip' length 162 bytes (1.6 Mb)
opened URL
==
downloaded 1.6 Mb
* installing *source* package ârglâ ...
checking for gcc... gcc -std=gnu99
checking for C compiler default output file name... a.out
checking whether the C compiler works... yes
checking whether we are cross compiling... no
checking for suffix of executables...
checking for suffix of object files... o
checking whether we are using the GNU C compiler... yes
checking whether gcc -std=gnu99 accepts -g... yes
checking for gcc -std=gnu99 option to accept ISO C89... none needed
checking how to run the C preprocessor... gcc -std=gnu99 -E
checking for gcc... (cached) gcc -std=gnu99
checking whether we are using the GNU C compiler... (cached) yes
checking whether gcc -std=gnu99 accepts -g... (cached) yes
checking for gcc -std=gnu99 option to accept ISO C89... (cached) none needed
checking for libpng-config... yes
configure: using libpng-config
configure: using libpng dynamic linkage
checking for X... libraries , headers
checking GL/gl.h usability... yes
checking GL/gl.h presence... yes
checking for GL/gl.h... yes
checking GL/glu.h usability... yes
checking GL/glu.h presence... yes
checking for GL/glu.h... yes
checking for glEnd in -lGL... yes
checking for gluProject in -lGLU... yes
checking for freetype-config... yes
configure: using Freetype and FTGL
configure: creating ./config.status
config.status: creating src/Makevars
** libs
g++ -I/usr/local/lib64/R/include -DHAVE_PNG_H -I/usr/include/libpng12
-DHAVE_FREETYPE -Iext/ftgl -I/usr/include/freetype2 -Iext
-I/usr/local/include   -g -O2 -fpic  -g -O2 -c BBoxDeco.cpp -o BBoxDeco.o
g++ -I/usr/local/lib64/R/include -DHAVE_PNG_H -I/usr/include/libpng12
-DHAVE_FREETYPE -Iext/ftgl -I/usr/include/freetype2 -Iext
-I/usr/local/include   -g -O2 -fpic  -g -O2 -c Background.cpp -o
Background.o
g++ -I/usr/local/lib64/R/include -DHAVE_PNG_H -I/usr/include/libpng12
-DHAVE_FREETYPE -Iext/ftgl -I/usr/include/freetype2 -Iext
-I/usr/local/include   -g -O2 -fpic  -g -O2 -c Color.cpp -o Color.o
g++ -I/usr/local/lib64/R/include -DHAVE_PNG_H -I/usr/include/libpng12
-DHAVE_FREETYPE -Iext/ftgl -I/usr/include/freetype2 -Iext
-I/usr/local/include   -g -O2 -fpic

[R] so current status of interactive PDF creation isl? trying to explore econometriic convergence thread

2011-05-23 Thread Mike Marchywka



I think I just saw a thread go by on " how do I make interactive PDF from
rgl output" but I ignored it at the time and can't seem to find consensus result
on google. Essentially I wanted to create pdf or other output to 
share a 3D plot and let viewers interact with it but it still wasn't
clear on easiest way to do this. 
Is this 2009 page the best reference?

http://cran.r-project.org/web/views/Graphics.html

Thanks.
Specific case of interest is below, 

I found rgl to be quite useful in regards to this,

http://r.789695.n4.nabble.com/maximum-likelihood-convergence-reproducing-Anderson-Blundell-1982-Econometrica-R-vs-Stata-td3502516.html#a3512807


I generated some data points using this script
( which itself uses this data file http://98.129.232.234/temp/share/em.txt )

http://98.129.232.234/temp/share/em.R.txt

After some editing with sed, the output of the above script
made this data file showing some optimization's
"trjectory" according to my variables of interest,

http://98.129.232.234/temp/share/emx.dat.txt

df<-read.table("emx.dat",header=F,sep=" ")




I subsequently found a few ways to plot the data,
notably,

png("em.png")

plot(log(df$V1),df$V2)

png.off()



http://98.129.232.234/temp/share/em.png



png("em2.png")

plot(log(df$V1),df$V2,cex=1+.2*(log(df$V3)-min(log(df$V3

dev.off()


http://98.129.232.234/temp/share/em2.png


But what I'd like to do is publish an interactive 3D
plot of this data similar to the rgl output
of this,

df<-read.table("emx.dat",header=F,sep=" ")
library(rgl)
rgl.points(log(df$V1),df$V2,log(df$V3))

quick google search and ?rgl didn't seem
to provide immediate answer.

Is there a way to publish interactive plots?
Thanks.








  
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] maximum likelihood convergence reproducing Anderson Blundell 1982 Econometrica R vs Stata

2011-05-22 Thread Mike Marchywka

hine$double.eps
> > n <- length(x)
> > h0 <- g <- rep(0, n)
> > for (i in 1:n) {
> > h0[i] <- h * 1i
> > g[i] <- Im(fn(x+h0, ...))/h
> > h0[i] <- 0
> > }
> > g
> > }
> >
> > gr.csd <- function(theta,y1, y2, x1, x2, x3) {
> > csd(lnl, theta, y1=y1, y2=y2, x1=x1, x2=x2, x3=x3)
> > }
> >
> > # exact solution as the starting value
> > start <- c(coef(lm(y1~x1+x2+x3)), coef(lm(y2~x1+x2+x3)))
> > p1 <- optimx(start, lnl, y1=y1, y2=y2, x1=x1, x2=x2, x3=x3, 
> > control=list(all.methods=TRUE, maxit=1500))
> >
> > # numerical gradient
> > p2 <- optimx(rep(0,8), lnl, y1=y1, y2=y2, x1=x1, x2=x2, x3=x3, 
> > control=list(all.methods=TRUE, maxit=1500))
> >
> > # exact gradient
> > p2g <- optimx(rep(0,8), lnl, gr.csd, y1=y1, y2=y2, x1=x1, x2=x2, x3=x3, 
> > control=list(all.methods=TRUE, maxit=1500))
> >
> > # comparing p2 and p2g, we see the dramatic improvement in BFGS when exact 
> > gradient is used, we also see a major difference for L-BFGS-B
> > # Exact gradient did not affect the gradient methods, CG and spg, much. 
> > However, convergence of Rcgmin improved when exact gradient was used
> >
> > x1s <- scale(x1)
> > x2s <- scale(x2)
> > x3s <- scale(x3)
> >
> > p3 <- optimx(rep(0,8),lnl, y1=y1, y2=y2, x1=x1s, x2=x2s, x3=x3s, 
> > control=list(all.methods=TRUE, maxit=1500))
> >
> > # both scaling and exact gradient
> > p3g <- optimx(rep(0,8),lnl, gr.csd, y1=y1, y2=y2, x1=x1s, x2=x2s, x3=x3s, 
> > control=list(all.methods=TRUE, maxit=1500))
> >
> > # Comparing p3 and p3g, use of exact gradient improved spg 
> > dramatically[[elided Hotmail spam]]
> >
> > # Of course, derivative-free methods newuoa, and Nelder-Mead are improved 
> > by scaling, but not by the availability of exact gradients. I don't know 
> > what is wrong with bobyqa in this example.
> >
> > In short, even with scaling and exact gradients, this optimization problem 
> > is recalcitrant.
> >
> > Best,
> > Ravi.
> >
> > 
> > From: Mike Marchywka [marchy...@hotmail.com]
> > Sent: Thursday, May 12, 2011 8:30 AM
> > To: Ravi Varadhan; pda...@gmail.com; alex.ols...@gmail.com
> > Cc: r-help@r-project.org
> > Subject: RE: [R] maximum likelihood convergence reproducing Anderson 
> > Blundell 1982 Econometrica R vs Stata
> >
> > So what was the final verdict on this discussion? I kind of
> > lost track if anyone has a minute to summarize and critique my summary 
> > below.
> >
> >
> > Apparently there were two issues, the comparison between R and Stata
> > was one issue and the "optimum" solution another. As I understand it,
> > there was some question about R numerical gradient calculation. This would
> > suggest some features of the function may be of interest to consider.
> >
> > The function to be optimized appears to be, as OP stated,
> > some function of residuals of two ( unrelated ) fits.
> > The residual vectors e1 and e2 are dotted in various combinations
> > creating a matrix whose determinant is (e1.e1)(e2.e2)-(e1.e2)^2 which
> > is the result to be minimized by choice of theta. Theta it seems is
> > an 8 component vector, 4 components determine e1 and the other 4 e2.
> > Presumably a unique solution would require that e1 and e2, both n-component 
> > vectors,
> > point in different directions or else both could become aribtarily large
> > while keeping the error signal at zero. For fixed magnitudes, colinearity
> > would reduce the "Error." The intent would appear to be to
> > keep the residuals distributed similarly in the two ( unrelated) fits.
> > I guess my question is,
> > " did anyone determine that there is a unique solution?" or
> > am I totally wrong here ( I haven't used these myself to any
> > extent and just try to run some simple teaching examples, asking
> > for my own clarification as much as anything).
> >
> > Thanks.
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > 
> > > From: rvarad...@jhmi.edu
> > > To: pda...@gmail.com; alex.ols...@gmail.com
> > > Date: Sat, 7 May 2011 11:51:56 -0400
> > > CC: r-help@r-project.org
> > > Subject: Re: [R] maximum likelihood convergence reproducing Anderson 
> > > Blundell 1982 Econometrica R vs Stata
> > >
> > > There is something strange in this

Re: [R] maximum likelihood convergence reproducing Anderson Blundell 1982 Econometrica R vs Stata

2011-05-21 Thread Mike Marchywka

1=x1s, x2=x2s, x3=x3s, 
> control=list(all.methods=TRUE, maxit=1500))
>
> # both scaling and exact gradient
> p3g <- optimx(rep(0,8),lnl, gr.csd, y1=y1, y2=y2, x1=x1s, x2=x2s, x3=x3s, 
> control=list(all.methods=TRUE, maxit=1500))
>
> # Comparing p3 and p3g, use of exact gradient improved spg 
> dramatically[[elided Hotmail spam]]
>
> # Of course, derivative-free methods newuoa, and Nelder-Mead are improved by 
> scaling, but not by the availability of exact gradients. I don't know what is 
> wrong with bobyqa in this example.
>
> In short, even with scaling and exact gradients, this optimization problem is 
> recalcitrant.
>
> Best,
> Ravi.
>
> 
> From: Mike Marchywka [marchy...@hotmail.com]
> Sent: Thursday, May 12, 2011 8:30 AM
> To: Ravi Varadhan; pda...@gmail.com; alex.ols...@gmail.com
> Cc: r-help@r-project.org
> Subject: RE: [R] maximum likelihood convergence reproducing Anderson Blundell 
> 1982 Econometrica R vs Stata
>
> So what was the final verdict on this discussion? I kind of
> lost track if anyone has a minute to summarize and critique my summary below.
>
>
> Apparently there were two issues, the comparison between R and Stata
> was one issue and the "optimum" solution another. As I understand it,
> there was some question about R numerical gradient calculation. This would
> suggest some features of the function may be of interest to consider.
>
> The function to be optimized appears to be, as OP stated,
> some function of residuals of two ( unrelated ) fits.
> The residual vectors e1 and e2 are dotted in various combinations
> creating a matrix whose determinant is (e1.e1)(e2.e2)-(e1.e2)^2 which
> is the result to be minimized by choice of theta. Theta it seems is
> an 8 component vector, 4 components determine e1 and the other 4 e2.
> Presumably a unique solution would require that e1 and e2, both n-component 
> vectors,
> point in different directions or else both could become aribtarily large
> while keeping the error signal at zero. For fixed magnitudes, colinearity
> would reduce the "Error." The intent would appear to be to
> keep the residuals distributed similarly in the two ( unrelated) fits.
> I guess my question is,
> " did anyone determine that there is a unique solution?" or
> am I totally wrong here ( I haven't used these myself to any
> extent and just try to run some simple teaching examples, asking
> for my own clarification as much as anything).
>
> Thanks.
>
>
>
>
>
>
>
>
>
>
> 
> > From: rvarad...@jhmi.edu
> > To: pda...@gmail.com; alex.ols...@gmail.com
> > Date: Sat, 7 May 2011 11:51:56 -0400
> > CC: r-help@r-project.org
> > Subject: Re: [R] maximum likelihood convergence reproducing Anderson 
> > Blundell 1982 Econometrica R vs Stata
> >
> > There is something strange in this problem. I think the log-likelihood is 
> > incorrect. See the results below from "optimx". You can get much larger 
> > log-likelihood values than for the exact solution that Peter provided.
> >
> > ## model 18
> > lnl <- function(theta,y1, y2, x1, x2, x3) {
> > n <- length(y1)
> > beta <- theta[1:8]
> > e1 <- y1 - theta[1] - theta[2]*x1 - theta[3]*x2 - theta[4]*x3
> > e2 <- y2 - theta[5] - theta[6]*x1 - theta[7]*x2 - theta[8]*x3
> > e <- cbind(e1, e2)
> > sigma <- t(e)%*%e
> > logl <- -1*n/2*(2*(1+log(2*pi)) + log(det(sigma))) # it looks like there is 
> > something wrong here
> > return(-logl)
> > }
> >
> > data <- read.table("e:/computing/optimx_example.dat", header=TRUE, sep=",")
> >
> > attach(data)
> >
> > require(optimx)
> >
> > start <- c(coef(lm(y1~x1+x2+x3)), coef(lm(y2~x1+x2+x3)))
> >
> > # the warnings can be safely ignored in the "optimx" calls
> > p1 <- optimx(start, lnl, hessian=TRUE, y1=y1, y2=y2,
> > + x1=x1, x2=x2, x3=x3, control=list(all.methods=TRUE, maxit=1500))
> >
> > p2 <- optimx(rep(0,8), lnl, hessian=TRUE, y1=y1, y2=y2,
> > + x1=x1, x2=x2, x3=x3, control=list(all.methods=TRUE, maxit=1500))
> >
> > p3 <- optimx(rep(0.5,8), lnl, hessian=TRUE, y1=y1, y2=y2,
> > + x1=x1, x2=x2, x3=x3, control=list(all.methods=TRUE, maxit=1500))
> >
> > Ravi.
> > 
> > From: r-help-boun...@r-project.org [r-help-boun...@r-project.org] On Behalf 
> > Of peter dalgaard [pda...@gmail.com]
> > Sent: Saturday, May 07, 2011 4:46 AM
> > To: Alex Olssen
> > Cc: r-help@r-project.org
> >

[R] predict 'expected' with eha package

2011-05-21 Thread Mike Harwood

I am unsure what is being returned, and what is supposed to be
returned, when using 'predict' with "type='expected'" for an aftreg
survival model.  The code below first generates a weibull  model, then
uses predict to create a vector of the linear predictors, then
attempts to create the 'expected' vector, which is empty.  The final
two steps in the code generate a lognormal model with the same data,
and the same empty 'expected' vector.

My expectation had been that 'expected' would generate the same
transformed dependent variable output as predict with a survreg model
using type='response'.  Since my 'real' data is left-truncated and
right-censored I cannot use survreg, and I wanted to investigate the
output from eha.

Thanks in advance!

Mike

> data(mort)
> aftreg(Surv(enter, exit, event) ~ ses, data = mort)
Call:
aftreg(formula = Surv(enter, exit, event) ~ ses, data = mort)

Covariate  W.mean  Coef Exp(Coef)  se(Coef)Wald p
ses
   lower0.416 0 1   (reference)
   upper0.584-0.348 0.706 0.089 0.000

log(scale)3.60336.698 0.065 0.000
log(shape)0.331 1.392 0.058 0.000

Events276
Total time at risk 17038
Max. log. likelihood  -1391.3
LR test statistic 16.1
Degrees of freedom1
Overall p-value   5.91578e-05
> m1 <- aftreg(Surv(enter, exit, event) ~ ses, data = mort)
> head(predict(m1, type='lp')) ## produces output
1 2 3 4 5 6
-0.347853  0.00 -0.347853  0.00  0.00  0.00
> head(predict(m1, type='expected')) ## is this correct?
numeric(0)
> m2 <- aftreg(Surv(enter, exit, event) ~ ses, dist='lognormal', data = mort)
> head(predict(m2, type='expected')) ## is this correct?
numeric(0)


from eha (the survival and rms packages are not an option for my
'real' question, since I have left-truncated right-censored data

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Help, please

2011-05-19 Thread Mike Marchywka









> From: dwinsem...@comcast.net
> To: julio.flo...@spss.com.mx
> Date: Thu, 19 May 2011 10:40:08 -0400
> CC: r-help@r-project.org
> Subject: Re: [R] Help, please
>
>
> On May 18, 2011, at 6:29 PM, Julio César Flores Castro wrote:
>
> > Hi,
> >
> > I am using R 2.10.1 and I have a doubt. Do you know how many cases
> > can R
> > handle?
>
> I was able to handle (meaning do Cox proportional hazards work with
> the 'rms' package which adds extra memory overhead with a datadist
> object) a 5.5 million rows by 100 columns dataframe without
> difficulty using 24 GB on a Mac (BSD UNIX kernel). I was running into
> performance slow downs related to paging out to virtual memory at 150
> columns, but after expanding to 32 GB can now handle 5.5 MM records
> with 200 columns without paging.
>
> >
> > I want to use the library npmc but if I have more than 4,500 cases I
> > get an
> > error message. If I use less than 4500 cases I don´t have problems
> > with this
> > library.
> >
> > Is there any way to increase the number of cases in order to use this
> > library.
>
> 64 bit OS, 64 bit R, and more memory.
>
The longer term solution is implementation and algorithm designed to
increase coherence
of memory accesses ( firefox is doing this to me now dropping every few chars 
and
 getting 
many behind as it thrashes with memory leak, LOL).




  
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] text mining analysis and word visualization of pdfs

2011-05-19 Thread Mike Marchywka

Date: Wed, 18 May 2011 15:24:49 +0530
From: ashimkap...@gmail.com
To: k...@huftis.org
CC: r-h...@stat.math.ethz.ch
Subject: Re: [R] text mining analysis and word visualization of pdfs

On Wed, May 18, 2011 at 1:44 PM, Karl Ove Hufthammer wrote:

> Ajay Ohri wrote:
>
> > What is the appropriate software package for dumping say 20 PDFS in a
> > folder, then creating data visualization with frequency counts of
> > certain words as well as measure correlation within each file for
> > certain key relationships or key words.
>
> pdftotext + Unix™ for Poets + R (ggplot2)
>
> What about the tm package ? I am a beginner and I don't know much about
this but I recall that it does have the ability to handle PDF's. A few words
from the experts would be nice.

I don;t know if I'm an expert, I can't even get a browser that echo's
keystrokes in a reasonable time with 4 core CPU on 'dohs, but PDF
could mean just about anything in terms of how text is respresented. Whatever
R packages do, they will not be able to read the mind of the author.
Even with pdftotext, there are many options and even simple things like
US IRS instruction forms can be almost impossible to extract in a coherent
manner. Many authors could care less about the information as long as the
thing looks like paper copy. If you are stuck with PDF, I'd be looking
for more tools first as you will probably want to know how they are 
constrcuted. 

I would just reiterate that the best approach for many data analysts would
be to contact data source explaining problems with improperly authored PDF or
other specialized file format that are only supported by limited proprietary 
tools
or that obfuscate information of interest. 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Powerful PC to run R

2011-05-13 Thread Mike Marchywka







> Date: Fri, 13 May 2011 12:38:51 +0200
> From: haenl...@escpeurope.eu
> To: r-help@r-project.org
> Subject: [R] Powerful PC to run R
>
> Dear all,
>
> I'm currently running R on my laptop -- a Lenovo Thinkpad X201 (Intel Core
> i7 CPU, M620, 2.67 Ghz, 8 GB RAM). The problem is that some of my
> calculations run for several days sometimes even weeks (mainly simulations
> over a large parameter space). Depending on the external conditions, my
> laptop sometimes shuts down due to overheating.
>
> I'm now thinking about buying a more powerful desktop PC or laptop. Can
> anybody advise me on the best configuration to run R as fast as possible? I
> will use this PC exclusively for R so any other factors are of limited
> importance.

( I think my laptop is overheating with firefox trying to execute whatever
stupid code hotmail is using  ssh to remote server echos keys faster LOL)
The point of the above is that it really depends what you
are doing. Heat can com from disk drive as well as silicon. Generally
you'd want to consider algorithm and implementation and get profiling info
before just buying bigger hammer. If you are thrashing VM, sorting
some data may help for example. If it is suited parallelism, you could
even try to distribute task over several cheaper computers, hard to knwo.


>
> Thanks,

  
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Power Spectrum from STFT (e1071)?

2011-05-13 Thread Mike Marchywka







> Date: Tue, 10 May 2011 11:55:34 -0700
> From: ajf...@psu.edu
> To: r-help@r-project.org
> Subject: [R] Power Spectrum from STFT (e1071)?
>
> Hello.
>
> Does anyone know how to generate a power spectrum from the STFT function in
> package e1071? The output for this function supplies the Fourier
> coefficients, but I do not know how to relate these back to the power
> spectrum.
>
> Alternatively, if anyone is aware of other packages/functions that perform
> short time Fourier transforms, that would also be helpful.

What exactly are you trying to do? From what I can recall, quick look on 
wikipedia
and I've done this in the past for audio envelope/carrier separation, 
the short-term modifier just means you need to window the input signal.
I did just do something like this in R, to deconvolve histograms with
an impulse response, and it is not hard to create your own window function
and multiply with signal. Power spectrum  is just magnitude, but you may want
to also invert and see that output is real (Re>> Im) to verify nothing went
wrong in processing. Probably this returns complex numbers, I think you
can use abs) or Re() and Im() on them and also str() but post some
code and try "?fft" for example.



>
> Thanks.
>

  
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] maximum likelihood convergence reproducing Anderson Blundell 1982 Econometrica R vs Stata

2011-05-12 Thread Mike Marchywka





So what was the final verdict on this discussion? I kind of 
lost track if anyone has a minute to summarize and critique my summary below.


Apparently there were two issues, the comparison between R and Stata
was one issue and the "optimum" solution another. As I understand it,
there was some question about R numerical gradient calculation. This would
suggest some features of the function may be of interest to consider. 

The function to be optimized appears to be, as OP stated, 
some function of residuals of two ( unrelated ) fits. 
The residual vectors e1 and e2 are dotted in various combinations
creating a matrix whose determinant is (e1.e1)(e2.e2)-(e1.e2)^2 which
is the result to be minimized by choice of theta. Theta it seems is
an 8 component vector, 4 components determine e1 and the other 4 e2.
Presumably a unique solution would require that e1 and e2, both n-component 
vectors,
 point in different directions or else both could become aribtarily large
while keeping the error signal at zero. For fixed magnitudes, colinearity
would reduce the "Error."  The intent would appear to be to 
keep the residuals distributed similarly in the two ( unrelated) fits. 
 I guess my question is,
" did anyone determine that there is a unique solution?" or
am I totally wrong here ( I haven't used these myself to any
extent and just try to run some simple teaching examples, asking
for my own clarification as much as anything).

Thanks.











> From: rvarad...@jhmi.edu
> To: pda...@gmail.com; alex.ols...@gmail.com
> Date: Sat, 7 May 2011 11:51:56 -0400
> CC: r-help@r-project.org
> Subject: Re: [R] maximum likelihood convergence reproducing Anderson Blundell 
> 1982 Econometrica R vs Stata
>
> There is something strange in this problem. I think the log-likelihood is 
> incorrect. See the results below from "optimx". You can get much larger 
> log-likelihood values than for the exact solution that Peter provided.
>
> ## model 18
> lnl <- function(theta,y1, y2, x1, x2, x3) {
> n <- length(y1)
> beta <- theta[1:8]
> e1 <- y1 - theta[1] - theta[2]*x1 - theta[3]*x2 - theta[4]*x3
> e2 <- y2 - theta[5] - theta[6]*x1 - theta[7]*x2 - theta[8]*x3
> e <- cbind(e1, e2)
> sigma <- t(e)%*%e
> logl <- -1*n/2*(2*(1+log(2*pi)) + log(det(sigma))) # it looks like there is 
> something wrong here
> return(-logl)
> }
>
> data <- read.table("e:/computing/optimx_example.dat", header=TRUE, sep=",")
>
> attach(data)
>
> require(optimx)
>
> start <- c(coef(lm(y1~x1+x2+x3)), coef(lm(y2~x1+x2+x3)))
>
> # the warnings can be safely ignored in the "optimx" calls
> p1 <- optimx(start, lnl, hessian=TRUE, y1=y1, y2=y2,
> + x1=x1, x2=x2, x3=x3, control=list(all.methods=TRUE, maxit=1500))
>
> p2 <- optimx(rep(0,8), lnl, hessian=TRUE, y1=y1, y2=y2,
> + x1=x1, x2=x2, x3=x3, control=list(all.methods=TRUE, maxit=1500))
>
> p3 <- optimx(rep(0.5,8), lnl, hessian=TRUE, y1=y1, y2=y2,
> + x1=x1, x2=x2, x3=x3, control=list(all.methods=TRUE, maxit=1500))
>
> Ravi.
> 
> From: r-help-boun...@r-project.org [r-help-boun...@r-project.org] On Behalf 
> Of peter dalgaard [pda...@gmail.com]
> Sent: Saturday, May 07, 2011 4:46 AM
> To: Alex Olssen
> Cc: r-help@r-project.org
> Subject: Re: [R] maximum likelihood convergence reproducing Anderson Blundell 
> 1982 Econometrica R vs Stata
>
> On May 6, 2011, at 14:29 , Alex Olssen wrote:
>
> > Dear R-help,
> >
> > I am trying to reproduce some results presented in a paper by Anderson
> > and Blundell in 1982 in Econometrica using R.
> > The estimation I want to reproduce concerns maximum likelihood
> > estimation of a singular equation system.
> > I can estimate the static model successfully in Stata but for the
> > dynamic models I have difficulty getting convergence.
> > My R program which uses the same likelihood function as in Stata has
> > convergence properties even for the static case.
> >
> > I have copied my R program and the data below. I realise the code
> > could be made more elegant - but it is short enough.
> >
> > Any ideas would be highly appreciated.
>
> Better starting values would help. In this case, almost too good values are 
> available:
>
> start <- c(coef(lm(y1~x1+x2+x3)), coef(lm(y2~x1+x2+x3)))
>
> which appears to be the _exact_ solution.
>
> Apart from that, it seems that the conjugate gradient methods have 
> difficulties with this likelihood, for some less than obvious reason. 
> Increasing the maxit gets you closer but still not satisfactory.
>
> I would suggest trying out the experimental optimx package. Apparently, some 
> of the algorithms in there are much better at handling this likelihood, 
> notably "nlm" and "nlminb".
>
>
>
>
> >
> > ## model 18
> > lnl <- function(theta,y1, y2, x1, x2, x3) {
> > n <- length(y1)
> > beta <- theta[1:8]
> > e1 <- y1 - theta[1] - theta[2]*x1 - theta[3]*x2 - theta[4]*x3
> > e2 <- y2 - theta[5] - theta[6]*x1 - theta[7]*x2 - theta[8]*x3
> > e <- cbind(e1, e2)
> > sigma <- t(e)

Re: [R] How to extract information from the following dataset?

2011-05-12 Thread Mike Marchywka














> Date: Thu, 12 May 2011 10:43:59 +0200
> From: jose-marcio.mart...@mines-paristech.fr
> To: xzhan...@ucr.edu
> CC: r-help@r-project.org
> Subject: Re: [R] How to extract information from the following dataset?
>
> Xin Zhang wrote:
> > Hi all,
> >
> > I have never worked with this kind of data before, so Please help me out
> > with it.
> > I have the following data set, in a csv file, looks like the following:
> >
> > Jan 27, 2010 16:01:24,000 125 - - -
> > Jan 27, 2010 16:06:24,000 125 - - -
> > Jan 27, 2010 16:11:24,000 176 - - -
> > Jan 27, 2010 16:16:25,000 159 - - -
> > Jan 27, 2010 16:21:25,000 142 - - -
> > Jan 27, 2010 16:26:24,000 142 - - -
> > Jan 27, 2010 16:31:24,000 125 - - -
> > Jan 27, 2010 16:36:24,000 125 - - -
> > Jan 27, 2010 16:41:24,000 125 - - -
> > Jan 27, 2010 16:46:24,000 125 - - -
> > Jan 27, 2010 16:51:24,000 125 - - -
> > Jan 27, 2010 16:56:24,000 125 - - -
> > Jan 27, 2010 17:01:24,000 157 - - -
> > Jan 27, 2010 17:06:24,000 172 - - -
> > Jan 27, 2010 17:11:25,000 142 - - -
> > Jan 27, 2010 17:16:24,000 125 - - -
> > Jan 27, 2010 17:21:24,000 125 - - -
> > Jan 27, 2010 17:26:24,000 125 - - -
> > Jan 27, 2010 17:31:24,000 125 - - -
> > Jan 27, 2010 17:36:24,000 125 - - -
> > Jan 27, 2010 17:41:24,000 125 - - -
> > Jan 27, 2010 17:46:24,000 125 - - -
> > Jan 27, 2010 17:51:24,000 125 - - -
> > ..
> >
> > The first few columns are month, day, year, time with OS3 accuracy. And the
> > last number is the measurement I need to extract.
> > I wonder if there is a easy way to just take out the measurements only from
> > a specific day and hour, i.e. if I want measurements from Jan 27 2010
> > 16:--:--
> > then I get 125,125,176,159,142,142,125,125,125,125,125,125.
> > Many thanks!!
>
> The easiest is in the shell, if you're using some flavour of unix :
>
> grep "Jan 27, 2010 16" filein.txt | awk '{print $5}' > fileout.txt
>
> and use fileout which will contain only the column of data you want.
>
Nomrally that is what I do but the R POSIXct features work pretty easily.
I guess I'd use bash text processing commands to put the data into a 
form you like, perhaps "y-mo-day time " and then read it in in as data frame.
Usually I convert everything to "time since epoch began" because I like integers
but there are some facilities here like "round" that work well with date-times.

> dx<-as.POSIXct("2011-04-03 13:14:15")
> dx
[1] "2011-04-03 13:14:15 CDT"
> round(dx,"hour")
[1] "2011-04-03 13:00:00 CDT"
> as.integer(dx)
[1] 1301854455
>

  
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] maximum likelihood convergence reproducing Anderson Blundell 1982 Econometrica R vs Stata

2011-05-09 Thread Mike Marchywka








> Date: Mon, 9 May 2011 22:06:38 +1200
> From: alex.ols...@gmail.com
> To: pda...@gmail.com
> CC: r-help@r-project.org; da...@otter-rsch.com
> Subject: Re: [R] maximum likelihood convergence reproducing Anderson Blundell 
> 1982 Econometrica R vs Stata
>
> Peter said
>
> "Ahem! You might get us interested in your problem, but not to the
> level that we are going to install Stata and Tsp and actually dig out
> and study the scientific paper you are talking about. Please cite the
> results and explain the differences."
>
> Apologies Peter, will do,
>
> The results which I can emulate in Stata but not (yet) in R are reported 
> below.

did you actually cut/paste code anywhere and is your first coefficient -.19 or 
-.019?
Presumably typos would be one possible problem. 

> They come from Econometrica Vol. 50, No. 6 (Nov., 1982), pp. 1569


is this it, page 1559?

http://www.jstor.org/pss/1913396

generally it helps if we could at least see the equations to check your
code against typos ( note page number ?) in lnl that may fix part of the
mystery.  Is full text available
on author's site, doesn't come up on citeseer AFAICT,


http://citeseerx.ist.psu.edu/search?q=blundell+1982&sort=ascdate

I guess one question would be " what is beta" in lnl supposed to be -
it isn't used anywhere but I will also mentioned I'm not that familiar
with the R code ( I'm trying to work through this to learn R and the 
optimizers). 

maybe some words would help, is sigma supposed to be 2x2 or 8x8 and what are
e1 and e2 supposed to be?




>
> TABLE II - model 18s
>
> coef std err
> p10 -0.19 0.078
> p11 0.220 0.019
> p12 -0.148 0.021
> p13 -0.072
> p20 0.893 0.072
> p21 -0.148
> p22 0.050 0.035
> p23 0.098
>
> The results which I produced in Stata are reported below.
> I spent the last hour rewriting the code to reproduce this - since I
> am now at home and not at work :(
> My results are "identical" to those published. The estimates are for
> a 3 equation symmetrical singular system.
> I have not bothered to report symmetrical results and have backed out
> an extra estimate using adding up constraints.
> I have also backed out all standard errors using the delta method.
>
> . ereturn display
> --
> | Coef. Std. Err. z P>|z| [95% Conf. Interval]
> -+
> a |
> a1 | -.0188115 .0767759 -0.25 0.806 -.1692895 .1316664
> a2 | .8926598 .0704068 12.68 0.000 .7546651 1.030655
> a3 | .1261517 .0590193 2.14 0.033 .010476 .2418275
> -+
> g |
> g11 | .2199442 .0184075 11.95 0.000 .183866 .2560223
> g12 | -.1476856 .0211982 -6.97 0.000 -.1892334 -.1061378
> g13 | -.0722586 .0145154 -4.98 0.000 -.1007082 -.0438089
> g22 | .0496865 .0348052 1.43 0.153 -.0185305 .1179034
> g23 | .0979991 .0174397 5.62 0.000 .0638179 .1321803
> g33 | -.0257405 .0113869 -2.26 0.024 -.0480584 -.0034226
> --
>
> In R I cannot get results like this - I think it is probably to do
> with my inability at using the optimisers well.
> Any pointers would be appreciated.
>
> Peter said "Are we maximizing over the same parameter space? You say
> that the estimates from the paper gives a log-likelihood of 54.04, but
> the exact solution clocked in at 76.74, which in my book is rather
> larger."
>
> I meant +54.04 > -76.74. It is quite common to get positive
> log-likelihoods in these system estimation.
>
> Kind regards,
>
> Alex
>
> On 9 May 2011 19:04, peter dalgaard  wrote:
> >
> > On May 9, 2011, at 06:07 , Alex Olssen wrote:
> >
> >> Thank you all for your input.
> >>
> >> Unfortunately my problem is not yet resolved.  Before I respond to
> >> individual comments I make a clarification:
> >>
> >> In Stata, using the same likelihood function as above, I can reproduce
> >> EXACTLY (to 3 decimal places or more, which is exactly considering I
> >> am using different software) the results from model 8 of the paper.
> >>
> >> I take this as an indication that I am using the same likelihood
> >> function as the authors, and that it does indeed work.
> >> The reason I am trying to estimate the model in R is because while
> >> Stata reproduces model 8 perfectly it has convergence
> >> difficulties for some of the other models.
> >>
> >> Peter Dalgaard,
> >>
> >> "Better starting values would help. In this case, almost too good
> >> values are available:
> >>
> >> start <- c(coef(lm(y1~x1+x2+x3)), coef(lm(y2~x1+x2+x3)))
> >>
> >> which appears to be the _exact_ solution."
> >>
> >> Thanks for the suggestion.  Using these starting values produces the
> >> exact estimate that Dave Fournier emailed me.
> >> If these are the exact solution then why did the author publish
> >> different answers which are completely reproducible in
> >> Stata and

Re: [R] Confidence intervals and polynomial fits

2011-05-08 Thread Mike Marchywka

> From: pda...@gmail.com
> Date: Sun, 8 May 2011 09:33:23 +0200
> To: rh...@sticksoftware.com
> CC: r-help@r-project.org
> Subject: Re: [R] Confidence intervals and polynomial fits
>
>
> On May 7, 2011, at 16:15 , Ben Haller wrote:
>
> > On May 6, 2011, at 4:27 PM, David Winsemius wrote:
> >
> >> On May 6, 2011, at 4:16 PM, Ben Haller wrote:
> >>>
> >>
> >>> As for correlated coefficients: x, x^2, x^3 etc. would obviously be 
> >>> highly correlated, for values close to zero.
> >>
> >> Not just for x close to zero:
> >>
> >>> cor( (10:20)^2, (10:20)^3 )
> >> [1] 0.9961938
> >>> cor( (100:200)^2, (100:200)^3 )
> >> [1] 0.9966219
> >
> > Wow, that's very interesting. Quite unexpected, for me. Food for thought. 
> > Thanks!
> >
>
> Notice that because of the high correlations between the x^k, their parameter 
> estimates will be correlated too. In practice, this means that the c.i. for 
> the quartic term contains values for which you can compensate with the other 
> coefficients and still have an acceptable fit to data. (Nothing strange about 
> that; already in simple linear regression, you allow the intercept to change 
> while varying the slope.)

I was trying to compose a longer message but at least for even/odd it isn't 
hard to find a set
of values for which cor is zero or to find a set of points that make sines of 
different frequencies have
non-zero correlation- this highlights the fact that the computer isn't magic 
and it
needs data to make basis functions different from each other. 
For background, you probably want to look up "Taylor Series" and "Orthogonal 
Basis."
I would also suggest using R to add noise to your input and see what that does 
to your predictions
or for that matter take simple known data and add noise although I think in 
principal you can
do this analytically. You can always project a signal
onto some subspace and get an estimate of how good your estimate is, but that 
is different from
asking how well you can reconstruct your signal from a bunch of projections. 
If you want to know, "what can I infer about the slope of my thing at x=a" that 
is a
specific question about one coefficient but at this point statisticians can 
elaborate about
various issues with the other things you ignore. Also, I think you said 
something about
correclated at x=0 but you can change your origin, (x-a)^n and expand this in a 
finite series in x^m
to see what happens here. 

Also, if you are using hotmail don't think that a dot product is not html LOL 
since
hotmail know you must mean html when you use less than even in text email...

>
>
> --
> Peter Dalgaard
> Center for Statistics, Copenhagen Business School
> Solbjerg Plads 3, 2000 Frederiksberg, Denmark
> Phone: (+45)38153501
> Email: pd@cbs.dk Priv: pda...@gmail.com
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Syntax for iter.max in rms

2011-05-08 Thread Mike Harwood

Hello,

I would like to increase the number of iterations for running a
Buckley-James regression model in the rms package, but something is
apparently syntactically wrong.  The following code initially is
exactly as it appears in the help page (which runs successfully), then
a "failure to converge" message (resulting from specifying an
'identity' link argument, the error message by adding both the
iter.max control and 'identity' link arguments, and finally, the error
message when testing just an iter.max argument.

This was run in R 2.13.0 with rms version 3.3-0 on Ubuntu Linux 11.04.

Thank you in advance, and all insights and criticisms are appreciated.

Mike

library(rms)
> f <- bj(Surv(ftime, stroke) ~ rcs(age,5) + hospital,x=TRUE, y=TRUE)
> f <- bj(Surv(ftime, stroke) ~ rcs(age,5) + hospital, link='identity',x=TRUE, 
> y=TRUE)

No convergence in 50 steps
Failure in bj.fit
> f <- bj(Surv(ftime, stroke) ~ rcs(age,5) + hospital, link='identity', 
> iter.max=200, x=TRUE, y=TRUE)
Error in bj(Surv(ftime, stroke) ~ rcs(age, 5) + hospital, link =
"identity",  :
  unused argument(s) (iter.max = 200)
> f <- bj(Surv(ftime, stroke) ~ rcs(age,5) + hospital, iter.max=200, x=TRUE, 
> y=TRUE)
Error in bj(Surv(ftime, stroke) ~ rcs(age, 5) + hospital, iter.max =
200,  :
  unused argument(s) (iter.max = 200)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] nls problem with R

2011-05-05 Thread Mike Marchywka











> Date: Thu, 5 May 2011 01:20:33 -0700
> From: sterles...@hotmail.com
> To: r-help@r-project.org
> Subject: Re: [R] nls problem with R
>
> ID1 ID2 t V(t)
> 1 1 0 6.053078443
> 2 1 0.3403 5.56937391
> 3 1 0.4181 5.45484486
> 4 1 0.4986 5.193124598
> 5 1 0.7451 4.31386722
> 6 1 1.0069 3.645422269
> 7 1 1.5535 3.587710965
> 8 1 1.8049 3.740362689
> 9 1 2.4979 3.699837726
> 10 1 6.4903 2.908485019
> 11 1 13.5049 1.888179494
> 12 1 27.5049 1.176091259
> 13 1 41.5049 1.176091259
>
> The model
> (1) V(t)=V0[1-epi+ epi*exp(-c(t-t0))]

A=Vo, B-Vo*epi, C=exp(-c*t0)
V(t)=A-B+B*C*exp(-ct)

or further, D=A-B, F=B*C,

V(t)=D+F*exp(-ct)

this model only really has 3 attriubtes: initial value, final value,
and decay constant yet you ask for 4 parameters. There is no
way to get a unique answer. For some reason this same form comes up
a lot here, I think this is about third time I've sene this in last few weeks.

I guess when fishing or shopping for forms to fit, it is tempting to
throw a bunch of parameteres into your model but this can create intractable
ambiguities. 

Indeed, if I just remove t0 and use your first 8 points I get this
( random starting values, but convewrged easily you still need to plot etc)


[1] "1   v= 8.77181162126362  epi= 0.672516376478598  cl= 1.90973175223917 t0= 0
.643481321167201"
> summary(nls2)

Formula: V2 ~ v0 * (1 - epi + epi * exp(-cl * (T2)))

Parameters:
    Estimate Std. Error t value Pr(>|t|)
v0    6.2901 0.3384  18.585  8.3e-06 ***
epi   0.5430 0.1373   3.955   0.0108 *
cl    0.9684 0.5491   1.763   0.1381
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.3579 on 5 degrees of freedom

Number of iterations to convergence: 11
Achieved convergence tolerance: 4.057e-06



> (2) V(t)=V0{A*exp[-lambda1(t-t0)]+(1-A)*exp[-lambda2(t-t0)]}
>
> in formula (2) lambda1=0.5*{(c+delta)+[(c-delta)^2+4*(1-epi)*c*delta]^0.5}
>
> lambda2=0.5*{(c+delta)-[(c-delta)^2+4*(1-epi)*c*delta]^0.5}
> A=(epi*c-lambda2)/(lambda1-lambda2)
>
> The regression rule :
> for formula (1):(t<=2,that is) first 8 rows are used for non-linear
> regression
> epi,c,t0,V0 parameters are obtained
> for formula (2):all 13 rows of results are used for non-linear regression
> lambda1,lambda2,A (with these parameters, delta can be calculated from them)
>
> Thanks for help
> Ster Lesser
>
> --
> View this message in context: 
> http://r.789695.n4.nabble.com/nls-problem-with-R-tp3494454p3497825.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
  
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] nls problem with R

2011-05-04 Thread Mike Marchywka




> Date: Wed, 4 May 2011 07:07:44 -0700
> From: sterles...@hotmail.com
> To: r-help@r-project.org
> Subject: Re: [R] nls problem with R
>
> Thanks Andrew.
> I am sorry for some typos that I omit some numbers of T2.
> Based on your suggestion,I think the problem is in the initial values.
> And I will read more theory about the non-linear regression.

there is unlikely to be any magic involved, unlike getting hotmail to work.
As a tool for understanding your data, you should have some idea
of the qualitiative properties of model and data and the error
function you use to reconcile the two. 

If you can post your full data set I may post an R example of somethings
to try. I was looking for an excuse to play with nls, I'm not expert here,
and curious to see what I can do with your example for critique by others.
If you want to fully automate this for N contnuous parameters, you
can take a shotgun approach but not sure it helps other htna to
find gross problems in model or data.
I actually wrote a loop to keep picking random parameter values
and calculate and SSE between predicted and real data. What you soon
find is that this is like trying to decode a good crypto algorithm
by guessing- you can do the math to see the problem LOL.







>
> --
> View this message in context: 
> http://r.789695.n4.nabble.com/nls-problem-with-R-tp3494454p3495672.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
  
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] ID parameter in model

2011-05-04 Thread Mike Harwood

Thank you, Goran.  Please see the package details below:

> packageDescription('eha')
Encoding: UTF-8
Package: eha
Version: 1.3-2
Date: 2011-03-01
Title: Event History Analysis
Description: A package for survival and event history analysis
License: GPL (>= 3)
Author: Göran Broström
Depends: R (>= 2.2.0), survival, graphics
Maintainer: Göran Broström 
Packaged: 2011-03-01 14:56:12 UTC; gb
Repository: CRAN
Date/Publication: 2011-03-01 15:50:52
Built: R 2.13.0; i386-pc-mingw32; 2011-04-15 08:22:36 UTC; windows

Mike


On Mon, May 2, 2011 at 10:38 AM, Mike Harwood  wrote:

> Hello,
>
> I am apparently confused about the use of an id parameter for an event
> history/survival model, and why the EHA documentation for aftreg does
> not specify one.  All assistance and insights are appreciated.
>
> Attempting to specifiy an id variable with the documentation example
> generates an "overlapping intervals" error, so I sorted the original
> mort dataframe and set subsequent entry times an id to the previous
> exit time + 0.0001.  This allowed me to see the affect of the id
> parameter on the coefficients and significance tests, and prompted my
> question.  The code I used is shown below, with the results at the
> bottom.  Thanks in advance!
>
> Mike
>
> head(mort) ## data clearly contains multiple entries for some of the
> dataframe ids
>
> no.id.aft <- aftreg(Surv(enter, exit, event) ~ ses, data = mort)  ##
> Inital model
> id.aft <- aftreg(Surv(enter, exit, event) ~ ses, data = mort, id=id)
> ## overlapping intervals error
>
> mort.sort <- ## ensure records ordered
>mort[
>order(mort$id, mort$enter),]
>
> ## remove overlap
> for (i in 2:nrow(mort.sort)){
> if (mort.sort[i,'id'] == mort.sort[i-1,'id'])
> mort.sort[i,'enter'] <- mort.sort[i-1, 'exit'] + 0.0001
>}
>
> no.id.aft.sort <- aftreg(Surv(enter, exit, event) ~ ses, data =
> mort.sort) ## initial model on modified df
> id.aft.sort <- aftreg(Surv(enter, exit, event) ~ ses, id=id, data =
> mort.sort) ## with id parameter
>
>
> #=== output ===#
> > no.id.aft.sort
> Call:
> aftreg(formula = Surv(enter, exit, event) ~ ses, data = mort.sort)
>
> Covariate  W.mean  Coef Exp(Coef)  se(Coef)Wald p
> ses
>   lower0.416 0 1   (reference)
>   upper0.584-0.347 0.707 0.089 0.000
>
> log(scale)3.60336.704 0.065 0.000
> log(shape)0.331 1.393 0.058 0.000
>
> Events276
> Total time at risk 17045
> Max. log. likelihood  -1391.4
> LR test statistic 16.1
> Degrees of freedom1
> Overall p-value   6.04394e-05
> > id.aft.sort
> Call:
> aftreg(formula = Surv(enter, exit, event) ~ ses, data = mort.sort,
>id = id)
>
> Covariate  W.mean  Coef Exp(Coef)  se(Coef)Wald p
> ses
>   lower0.416 0 1   (reference)
>   upper0.584-0.364 0.695 0.090 0.000
>
> log(scale)3.58836.171 0.065 0.000
> log(shape)0.338 1.402 0.058 0.000
>
> Events276
> Total time at risk 17045
> Max. log. likelihood  -1390.8
> LR test statistic 17.2
> Degrees of freedom1
> Overall p-value   3.3091e-05
> >
>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] UNIX-like "cut" command in R


On Tue, 3 May 2011, Christian Schulz wrote:


On Mon, 2 May 2011, P Ehlers wrote:


Use str_sub() in the stringr package:

require(stringr)  # install first if necessary
s <- "abcdefghijklmnopqrstuvwxyz"

str_sub(s, c(1,12,17), c(3,15,-1))
#[1] "abc""lmno"   "qrstuvwxyz"



Thanks.  That's very close to what I'm looking for, but it seems to 
correspond to "cut -c", not to "cut -f".  Can it work with delimiters or 
only with character counts?


Mike



x <- "this is a string"
unlist(strsplit(x," "))[c(1,4)]



Thanks.  I did figure that one out a couple of messages back, but to get 
it do behave like "cut -d' ' -f1,4", I had to add a paste command to 
reassemble the parts:


paste(unlist(strsplit(x," "))[c(1,4)], collapse=" ")

Then I wasn't sure if I could do this to every element of a vector of 
strings without looping -- I have to think not.


Mike

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] UNIX-like "cut" command in R


On Mon, 2 May 2011, P Ehlers wrote:


Use str_sub() in the stringr package:

require(stringr)  # install first if necessary
s <- "abcdefghijklmnopqrstuvwxyz"

str_sub(s, c(1,12,17), c(3,15,-1))
#[1] "abc""lmno"   "qrstuvwxyz"



Thanks.  That's very close to what I'm looking for, but it seems to 
correspond to "cut -c", not to "cut -f".  Can it work with delimiters or 
only with character counts?


Mike

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] UNIX-like "cut" command in R


On Mon, 2 May 2011, Gabor Grothendieck wrote:


On Mon, May 2, 2011 at 10:32 PM, Mike Miller  wrote:

On Tue, 3 May 2011, Andrew Robinson wrote:


try substr()


OK.  Apparently, it allows things like this...


substr("abcdef",2,4)


[1] "bcd"

...which is like this:

echo "abcdef" | cut -c2-4

But that doesn't use a delimiter, it only does character-based cutting, and
it is very limited.  With "cut -c" I can do stuff this:

echo "abcdefghijklmnopqrstuvwxyz" | cut -c-3,12-15,17-

abclmnoqrstuvwxyz

It extracts characters 1 to 3, 12 to 15 and 17 to the end.

That was a great tip, though, because it led me to strsplit, which can do
what I want, however somewhat awkwardly:


y <- "a b c d e f g h i j k l m n o p q r s t u v w x y z"
paste(unlist(strsplit(y, delim))[c(1:3,12:15,17:26)], collapse=delim)


[1] "a b c l m n o q r s t u v w x y z"

That gives me what I want, but it is still a little awkward.  I guess I
don't quite get what I'm doing with lists.  I'm not clear on how this would
work with a vector of strings.



Try this:


read.fwf(textConnection("abcdefghijklmnopqrstuvwxyz"), widths = c(3, 8, 4, 1, 10), 
colClasses = c(NA, "NULL"))

  V1   V3 V5
1 abc lmno qrstuvwxyz



That gives me a few more functions to study.  Of course the new code 
(using read.fwf() and textConnection()) is not doing what was requested 
and it requires some work to compute the widths from the given numbers 
(c(1:3, 12:15, 17:26) has to be converted to c(3, 8, 4, 1, 10)).


Mike__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] UNIX-like "cut" command in R


On Tue, 3 May 2011, Andrew Robinson wrote:


try substr()


OK.  Apparently, it allows things like this...


substr("abcdef",2,4)

[1] "bcd"

...which is like this:

echo "abcdef" | cut -c2-4

But that doesn't use a delimiter, it only does character-based cutting, 
and it is very limited.  With "cut -c" I can do stuff this:


echo "abcdefghijklmnopqrstuvwxyz" | cut -c-3,12-15,17-

abclmnoqrstuvwxyz

It extracts characters 1 to 3, 12 to 15 and 17 to the end.

That was a great tip, though, because it led me to strsplit, which can do 
what I want, however somewhat awkwardly:



y <- "a b c d e f g h i j k l m n o p q r s t u v w x y z"
paste(unlist(strsplit(y, delim))[c(1:3,12:15,17:26)], collapse=delim)

[1] "a b c l m n o q r s t u v w x y z"

That gives me what I want, but it is still a little awkward.  I guess I 
don't quite get what I'm doing with lists.  I'm not clear on how this 
would work with a vector of strings.


Mike

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] List of Data Frames

2011-05-02 Thread Mike Smith

I'm trying to create a list of Data Frames.  I have 17 data frames that I need 
to move through in a loop, but if I simply make a list of them, then they do 
not stay data frames, and I can't sort through them.  I tried to create an 
array, but the data frames can have anywhere from 14-16 rows, and I couldn't 
find a way to make a variable size array.  If you have any ideas, I would 
greatly appreciate any help, as I'm trying to learn R, and decided to apply it 
to a project that I have been working on.  My goal is splitting a sports season 
into games per week, and then do statistics on each week, but have an average 
running up to that point in the season.  Thus the list would be indexed by 
weeks, and then there's a data frame of the game and all relevant statistics. 

Thank You,
Mike Smith


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] UNIX-like "cut" command in R

The R "cut" command is entirely different from the UNIX "cut" command. 
The latter retains selected fields in a line of text.  I can do that kind 
of manipulation using sub() or gsub(), but it is tedious.  I assume there 
is an R function that will do this, but I don't know its name.  Can you 
tell me?


I'm also guessing that there is a web page somewhere that will tell me how 
to do a lot of common GNU/UNIX/Linux "text util" commmand-line kinds of 
things in R.  By that I mean by using R functions, not by making system 
calls.  Does anyone know of such a web page?


Thanks in advance.

Mike

--
Michael B. Miller, Ph.D.
Minnesota Center for Twin and Family Research
Department of Psychology
University of Minnesota

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] ID parameter in model

2011-05-02 Thread Mike Harwood

Hello,

I am apparently confused about the use of an id parameter for an event
history/survival model, and why the EHA documentation for aftreg does
not specify one.  All assistance and insights are appreciated.

Attempting to specifiy an id variable with the documentation example
generates an "overlapping intervals" error, so I sorted the original
mort dataframe and set subsequent entry times an id to the previous
exit time + 0.0001.  This allowed me to see the affect of the id
parameter on the coefficients and significance tests, and prompted my
question.  The code I used is shown below, with the results at the
bottom.  Thanks in advance!

Mike

head(mort) ## data clearly contains multiple entries for some of the
dataframe ids

no.id.aft <- aftreg(Surv(enter, exit, event) ~ ses, data = mort)  ##
Inital model
id.aft <- aftreg(Surv(enter, exit, event) ~ ses, data = mort, id=id)
## overlapping intervals error

mort.sort <- ## ensure records ordered
mort[
order(mort$id, mort$enter),]

## remove overlap
for (i in 2:nrow(mort.sort)){
 if (mort.sort[i,'id'] == mort.sort[i-1,'id'])
 mort.sort[i,'enter'] <- mort.sort[i-1, 'exit'] + 0.0001
}

no.id.aft.sort <- aftreg(Surv(enter, exit, event) ~ ses, data =
mort.sort) ## initial model on modified df
id.aft.sort <- aftreg(Surv(enter, exit, event) ~ ses, id=id, data =
mort.sort) ## with id parameter


#=== output ===#
> no.id.aft.sort
Call:
aftreg(formula = Surv(enter, exit, event) ~ ses, data = mort.sort)

Covariate  W.mean  Coef Exp(Coef)  se(Coef)Wald p
ses
   lower0.416 0 1   (reference)
   upper0.584-0.347 0.707 0.089 0.000

log(scale)3.60336.704 0.065 0.000
log(shape)0.331 1.393 0.058 0.000

Events276
Total time at risk 17045
Max. log. likelihood  -1391.4
LR test statistic 16.1
Degrees of freedom1
Overall p-value   6.04394e-05
> id.aft.sort
Call:
aftreg(formula = Surv(enter, exit, event) ~ ses, data = mort.sort,
id = id)

Covariate  W.mean  Coef Exp(Coef)  se(Coef)Wald p
ses
   lower0.416 0 1   (reference)
   upper0.584-0.364 0.695 0.090 0.000

log(scale)3.58836.171 0.065 0.000
log(shape)0.338 1.402 0.058 0.000

Events276
Total time at risk 17045
Max. log. likelihood  -1390.8
LR test statistic 17.2
Degrees of freedom1
Overall p-value   3.3091e-05
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] regular expression in gsub() for strings with leading backslash


On Fri, 29 Apr 2011, Duncan Murdoch wrote:


On 29/04/2011 7:41 PM, Miao wrote:


Can anyone help on gsub() in R?  I have a string like something below, and
wanted to delete all the strings with leading backslash, including 
"\xa0On",

"\023, "\xab", and many others.   How should I write a regular expression
pattern in gsub()?  I don't care how many characters following backslash.



If those are R strings, none of them contain a backslash.  In R, a backslash 
would always be printed as \\.


\x is the introduction to a hexadecimal encoding for a character; the next 
two characters show the hex digits.  So your first string contains a single 
character \xa0, the third one contains \xab, and so on.


The \023 is an octal encoding for a single character.



If we were dealing with a leading backslash, I guess this would do it:

gsub("^.*", "", txt)

R would display a double backslash, but I believe that represents a single 
backslash.  So if the string were saved using write.table, say, only a 
single backslash would be stored.



a <- "\\This is a string."
a

[1] "\\This is a string."

gsub("^", "", a)

[1] "This is a string."

a

[1] "\\This is a string."

gsub("^.*", "", a)

[1] ""

gsub("^.*", "", c(a,"Another string","\\more"))

[1] ""   "Another string" ""

write.table(a, file="a.txt", quote=F, row.names=F, col.names=F)


$ cat a.txt
\This is a string.

Apparently this is not what the OP really wanted.  The OP probably wanted 
to remove characters that were not from the regular ASCII set.



Mike

--
Michael B. Miller, Ph.D.
Minnesota Center for Twin and Family Research
Department of Psychology
University of Minnesota

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] fisher exact for > 2x2 table


Rob--

Your biostatistician has not disagreed with the rest of us about anything 
except for his preferred name for the test.  He wants to call it the 
Freeman-Halton test, some people call it the Fisher-Freeman-Halton test, 
but most people call it the Fisher Exact test -- all are the same test. 
When he was "adamant you could not do > 2x2", what he was being adamant 
about was the name you should use when referring to the test for tables 
larger than 2x2.  Why he was doing that, I don't know, but I think it is 
silly -- he confused you and the rest of us.


He goes on to tell you that to get the Freeman-Halton test in SAS, you use 
"tables a * b / fisher".  In other words, SAS calls the test "Fisher" 
instead of calling it Freeman-Halton.  R also calls it "Fisher" and not 
Freeman-Halton.  I'm like R and SAS and unlike your biostatistician, but 
to each his own.


You say that he is "exceptionally clear on this point," which may be true, 
but what is the point?  The point is that he prefers a different *name* 
for the test than the rest of us.  Everyone agrees on the math/stat.


Mike

--
Michael B. Miller, Ph.D.
Minnesota Center for Twin and Family Research
Department of Psychology
University of Minnesota


On Fri, 29 Apr 2011, viostorm wrote:



After I shared comments form the forum yesterday with the biostatistician he
indicated this:

"Fisher's exact test is the non-parametric analog for the Chi-square
test for 2x2 comparisons. A version (or extension) of the Fisher's Exact
test, known as the Freeman-Halton test applies to comparisons for tables
greater than 2x2. SAS can calculate both statistics using the following
instructions.

 proc freq; tables a * b / fisher;"

Do people here still stand by position fisher exact test can be used for RxC
contingency tables ?  Sorry to both you all so much it is just important for
a paper I am writing and planning to submit soon. ( I have a 4x2 table but
does not meet expected frequencies requirements for chi-squared.)

I guess people here have suggested R implements, the following, which
unfortunately are unavailable at least easily at my library but  at least by
the titles indicates it is extending it to RxC

Mehta CR, Patel NR. A network algorithm for performing Fisher's exact test
in r c contingency tables. Journal of the American Statistical Association
1983;78:427-34.

Mehta CR, Patel NR. Algorithm 643: FEXACT: A FORTRAN subroutine for Fisher's
exact test on unordered r x c contingency tables. ACM Transactions on
Mathematical Software 1986;12:154-61.

The only reason I ask again is he is exceptionally clear on this point.

Thanks again,

-Rob


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] how to generate a normal distribution with mean=1, min=0.2, max=0.8

On Fri, 29 Apr 2011, David Winsemius wrote:

On Apr 29, 2011, at 1:29 PM, Mike Miller wrote:

On Fri, 29 Apr 2011, Giovanni Petris wrote:

Well, but the original poster also refers to 0.2 and 0.8 as "expected min
and max", in which case we are back to a joke...

Well, he is a lot better with English than I am with Mandarin. He seemed
to like the truncated normal answers, so we'll let those be his answers.

It is possible to choose parameters for a normal distribution with 500
observations such that the expected value of the maximum is .8 and the
expected value of the minimum is .2. Obviously, the mean would be .5,
not 1, but what would the variance then have to be to provide the
correct expected max and min? That's another legitimate question.

You would need to specify an N since the expected first and last order
statistic would decrease/increase with increasing N.

Right -- I chose N=500, as did the OP. I think the order statistics for
the normal are pretty complex, but it wouldn't be hard to use the density
for order statistics for the uniform to compute the appropriate values for
a standard normal, then rescale.

http://en.wikipedia.org/wiki/Order_statistic#The_order_statistics_of_the_uniform_distribution

You'd have to multiply the beta density times the inverse normal cdf and
get the weighted average for a set of points. It doesn't sound terribly
difficult but I don't want to do it! ;-)

Mike

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] how to generate a normal distribution with mean=1, min=0.2, max=0.8


On Fri, 29 Apr 2011, Giovanni Petris wrote:

Well, but the original poster also refers to 0.2 and 0.8 as "expected 
min and max", in which case we are back to a joke...


Well, he is a lot better with English than I am with Mandarin.  He seemed 
to like the truncated normal answers, so we'll let those be his answers.


It is possible to choose parameters for a normal distribution with 500 
observations such that the expected value of the maximum is .8 and the 
expected value of the minimum is .2.  Obviously, the mean would be .5, not 
1, but what would the variance then have to be to provide the correct 
expected max and min?  That's another legitimate question.


Mike



-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Mao Jianfeng
Sent: Thursday, April 28, 2011 12:02 PM
To: r-help@r-project.org
Subject: [R] how to generate a normal distribution with mean=1, min=0.2, max=0.8

Dear all,

This is a simple probability problem. I want to know, How to generate 
a normal distribution with mean=1, min=0.2 and max=0.8?


I know how the generate a normal distribution of mean = 1 and sd = 1 
and with 500 data point.


rnorm(n=500, m=1, sd=1)

But, I am confusing with how to generate a normal distribution with 
expected min and max. I expect to hear your directions.


Thanks in advance.

Best,
Jian-Feng,


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] how to generate a normal distribution with mean=1, min=0.2, max=0.8

I just realized that I had misread what was wanted -- the code I wrote was 
for mean=0, sd=1, not for mean=1.  So for mean=m, and sd=s, lower limit L 
and upper limit U, this approach will work:


n <- 1000

m <- 1
s <- 1

L <- .2
U <- .8

p_L <- pnorm(L, mean=m, sd=s)
p_U <- pnorm(U, mean=m, sd=s)


x <- qnorm(runif(n, p_L, p_U), mean=m, sd=s)

Or it could be written on one line:

x <- qnorm(runif(n, pnorm(L, mean=m, sd=s), pnorm(U, mean=m, sd=s)), mean=m, 
sd=s)

Mike


On Thu, 28 Apr 2011, Mike Miller wrote:

Good point.  It would be absurdly inefficient if the upper and lower 
limits on the interval of interest were, say, 0.2 and 0.201 instead of 
0.2 and 0.8. Here's what I think is probably the best general approach:


Compute the CDF for the upper and lower limits of the interval and 
generate uniform random numbers within that range of CDF values, then 
compute the inverse CDF of those values.  To be explicit:


n <- 1000
L <- .2
U <- .8

p_L <- pnorm(L)
p_U <- pnorm(U)

x <- qnorm(runif(n, p_L, p_U))

It is very fast and it always produces exactly the desired number ("n") 
of random normal values.


Mike

--
Michael B. Miller, Ph.D.
Bioinformatics Specialist
Minnesota Center for Twin and Family Research
Department of Psychology
University of Minnesota



On Thu, 28 Apr 2011, Carl Witthoft wrote:

That method (creating lots of samples and throwing most of them away) is 
usually frowned upon :-).


Try this:  (I haven't, so it may well have syntax errors)

% n28<- dnorm(seq(.2,.8,by=.001),mean=1,sd=1)

% x <- sample(seq(.2,.8,by=.001), size=500,replace=TRUE, prob=n28)

And I guess in retrospect this  will get really ugly if you want, say, a 
sampling grid resolution of 1e-6 or so.


Anyone know what's applicable from the "sampling" package?


Carl

---__
From: David Winsemius 
Date: Thu, 28 Apr 2011 13:06:21 -0400

On Apr 28, 2011, at 12:09 PM, Ravi Varadhan wrote:


Surely you must be joking, Mr. Jianfeng.



Perhaps not joking and perhaps not with correct statistical specification.

A truncated Normal could be simulated with:

set.seed(567)
x <- rnorm(n=5, m=1, sd=1)
xtrunc <- x[x>=0.2 & x <=0.8]
require(logspline)
plot(logspline(xtrunc, lbound=0.2, ubound=0.8, nknots=7))

--
David.





-Original Message-


From: r-help-bounces_at_r-project.org 

[mailto:r-help-bounces_at_r-project.org


] On Behalf Of Mao Jianfeng





Dear all,







This is a simple probability problem. I want to know, How to



generate a



normal distribution with mean=1, min=0.2 and max=0.8?






__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.





__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] how to generate a normal distribution with mean=1, min=0.2, max=0.8

Good point.  It would be absurdly inefficient if the upper and lower 
limits on the interval of interest were, say, 0.2 and 0.201 instead of 0.2 
and 0.8.  Here's what I think is probably the best general approach:


Compute the CDF for the upper and lower limits of the interval and 
generate uniform random numbers within that range of CDF values, then 
compute the inverse CDF of those values.  To be explicit:


n <- 1000
L <- .2
U <- .8

p_L <- pnorm(L)
p_U <- pnorm(U)

x <- qnorm(runif(n, p_L, p_U))

It is very fast and it always produces exactly the desired number ("n") of 
random normal values.


Mike

--
Michael B. Miller, Ph.D.
Bioinformatics Specialist
Minnesota Center for Twin and Family Research
Department of Psychology
University of Minnesota



On Thu, 28 Apr 2011, Carl Witthoft wrote:

That method (creating lots of samples and throwing most of them away) is 
usually frowned upon :-).


Try this:  (I haven't, so it may well have syntax errors)

% n28<- dnorm(seq(.2,.8,by=.001),mean=1,sd=1)

% x <- sample(seq(.2,.8,by=.001), size=500,replace=TRUE, prob=n28)

And I guess in retrospect this  will get really ugly if you want, say, a 
sampling grid resolution of 1e-6 or so.


Anyone know what's applicable from the "sampling" package?


Carl

---__
From: David Winsemius 
Date: Thu, 28 Apr 2011 13:06:21 -0400

On Apr 28, 2011, at 12:09 PM, Ravi Varadhan wrote:


Surely you must be joking, Mr. Jianfeng.



Perhaps not joking and perhaps not with correct statistical specification.

A truncated Normal could be simulated with:

set.seed(567)
x <- rnorm(n=5, m=1, sd=1)
xtrunc <- x[x>=0.2 & x <=0.8]
require(logspline)
plot(logspline(xtrunc, lbound=0.2, ubound=0.8, nknots=7))

--
David.





-Original Message-


From: r-help-bounces_at_r-project.org 

[mailto:r-help-bounces_at_r-project.org


] On Behalf Of Mao Jianfeng





Dear all,







This is a simple probability problem. I want to know, How to



generate a



normal distribution with mean=1, min=0.2 and max=0.8?






__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] fisher exact for > 2x2 table


On Thu, 28 Apr 2011, viostorm wrote:


I have read the help page, or at least ?fisher.exact

I looked a bit on the Internet I guess it is applicable to > 2x2.  I had 
spoken to a biostatistician here who is quite excellent and was adamant 
with me I could not do > 2x2.


I found this:

http://mathworld.wolfram.com/FishersExactTest.html



That page shows that the Fisher Exact test can be implemented on tables 
with any numbers of rows and columns (so long as there are at least two 
rows and two columns).  Your biostatistician just didn't happen to know 
about this, but s/he shouldn't have been adamant when s/he was wrong. 
Show your biostatistician the MathWorld page.


Mike

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] fisher exact for > 2x2 table


On Fri, 29 Apr 2011, Thomas Lumley wrote:


On Fri, Apr 29, 2011 at 8:01 AM, Mike Miller  wrote:


On Thu, 28 Apr 2011, viostorm wrote:


I'm using fisher.exact on a 4x2 table and it seems to work.

Does anyone know exactly what is going on?  I thought fisher.exact is 
only for 2x2 tables.



You were wrong.  I'm sure there's nothing wrong with the program.  You 
will find that with bigger tables and larger sample sizes the 
computational cost becomes quite enormous.


In fact, with large tables, roundoff error becomes significant before 
computational cost becomes prohibitive.



To avoid both of these problems one might use Monte Carlo resampling under 
the null, maybe 10,000 times or more.  I think independence_test() in the 
coin package will do this:


http://cran.r-project.org/web/packages/coin/

To estimate very small p-values properly, one must resample many more 
times.


Mike__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] fisher exact for > 2x2 table