date:20110506

Taking the final value for each 30-minute interval seems like it would get what 
I want. The problem is that sometimes this value would be 15 minutes before the 
end of the 30-minute interval. What would I use to pick up this value?

-Original Message-
From: ehl...@ucalgary.ca [mailto:ehl...@ucalgary.ca] 
Sent: Thursday, May 05, 2011 03:58 PM
To: Thompson, Adele - adele_thomp...@cargill.com
Cc: r-help@r-project.org
Subject: Re: [R] Averaging uneven measurements by time with uneven numbers of 
measurements

On 2011-05-05 14:20, Schatzi wrote:
 I do not want smoothing as the data should have jumps (it is weight left in
 feeding bunker). I was thinking of maybe using a histogram-like function and
 then averaging that. Not sure if this is possible.

(It would be useful to include your original request - not everyone
uses Nabble.)

Actually, averaging *is* smoothing, but I suppose your intent is, for
some reason, not to smooth across 30-minute boundaries.

Perhaps you could use findInterval() to identify which measurements to
average.

Peter Ehlers

 -
 In theory, practice and theory are the same. In practice, they are not - 
 Albert Einstein
 --
 View this message in context: 
 http://r.789695.n4.nabble.com/Averaging-uneven-measurements-by-time-with-uneven-numbers-of-measurements-tp3499337p3499386.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] RV: R question



On May 6, 2011, at 4:03 AM, Philipp Pagel wrote:


which is the maximum large of digits that R has?, because SQL work
with 50 digits I think.


I am wondering if that is binary or decimal.


and I need a software that work  with a lot
of digits.


The .Machine() command will provide some insight into these matters.


On my device (and I suspect on all versions of R) .Machine is a built- 
in list and there is no .Machine() function.


.Machine returns a list that includes
$integer.max
[1] 2147483647

And you can look at the FAQ:
http://cran.r-project.org/doc/FAQ/R-FAQ.html#Why-doesn_0027t-R-think-these-numbers-are-equal_003f

Which tells you that R can handle 53 digits _binary_

In agreement with  these components of .Machine:
$double.base
[1] 2

$double.digits
[1] 53

... and the FAQ has some explicit warnings about trusting more than 16  
digits decimal.


And look at the package, 'gmp'.




cu
Philipp


--

David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Uniroot - error

2011-05-06 Thread Duncan Murdoch

You should ask your instructor or teaching assistant for help.  R-help 
is not for doing homework.


Duncan Murdoch

On 06/05/2011 9:00 AM, CarJabo wrote:

Hi,

I have tried to use uniroot to solve a value (value a in my function) that
gives f=0, and I repeat this process for 1 times(stimulations). However
error occures from the 4625th stimulation - Error in uniroot(f, c(0, 2),
maxiter = 1000, tol = 0.001) :
   f() values at end points not of opposite sign

I have also tried interval of (lower=min(U), upper=max(U)) and it won't work
as well.

Can anyone help me as I have struggled for few days already and I have to
finish it soon. Thanks.


numsim=1

set.seed(12345)

P = c()
for (m in 1:numsim) {

Y = rnorm(140,0.0125,(0.005^(1/2)))
U = exp(X1)
.
.(sorry i have to skip the code in between otherwise
..my assignment will get penalty for plagarism according to
those screening sotware)

S = sum(.)

f = function(a){sum(F*(answer^(910:1))) - S}

answer = uniroot(f, c(0,2), maxiter=1000,tol=0.001)$root
P[m] = answer^26 - 1
}

all the vectors are correct; it works without stimulation; it also works for
loop(1:4624) but after 4625 there's error.

--
View this message in context: 
http://r.789695.n4.nabble.com/Uniroot-error-tp3502628p3502628.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] For loop and sqldf

2011-05-06 Thread Gabor Grothendieck

On Fri, Apr 29, 2011 at 4:27 PM, mathijsdevaan mathijsdev...@gmail.com wrote:
 Hi list,

 Can anyone tell my why the following does not work? Thanks a lot! Your help
 is very much appreciated.

 DF = data.frame(read.table(textConnection(    B  C  D  E  F  G
 8025  1995  0  4  1  2
 8025  1997  1  1  3  4
 8026  1995  0  7  0  0
 8026  1996  1  2  3  0
 8026  1997  1  2  3  1
 8026  1998  6  0  0  4
 8026  1999  3  7  0  3
 8027  1997  1  2  3  9
 8027  1998  1  2  3  1
 8027  1999  6  0  0  2
 8028  1999  3  7  0  0
 8029  1995  0  2  3  3
 8029  1998  1  2  3  2
 8029  1999  6  0  0  1),head=TRUE,stringsAsFactors=FALSE))
 list-sort(unique(DF$C))
 for (t in 1:length(list))
        {
        year = as.character(list[t])
        data[year]-sqldf('select * from DF where C = [year]')
        }

 I am trying to split up the data.frame into 5 new ones, one for every year.



This has already been answered but just thought I would point out that
the perhaps subtle point is that sqldf automatically loads data frames
that it finds in your sql statement into the data base but it does not
do anything with non-data frame variables.

Thus DF is a data frame in your workspace is loaded into the database
but year is not.

Also at least in sqlite you can't put a constant in square brackets.

To construct the desired sql string you can use paste, sprintf or
gsubfn's perl-like $ string interpolation which is invoked by
prefacing sqldf with fn$ and prefacing the variable to interpolate
with a $.   gsubfn is automatically loaded by sqldf.   See
http://gsubfn.googlecode.com for more on fn.

library(sqldf)

# test data
DF - data.frame(a = 1:10, C = rep(1970:1971, each = 5))
year - 1970

sqldf(paste(select * from DF where C =, year))

sqldf(sprintf(select * from DF where C=%s, year))

fn$sqldf(select * from DF where C = $year)

-- 
Statistics  Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] for loop



On May 6, 2011, at 4:24 AM, Petr Savicky wrote:


On Fri, May 06, 2011 at 02:28:57PM +1000, andre bedon wrote:


Hi,
I'm hoping someone can offer some advice:I have a matrix x of  
dimensions 160 by 1. I need to create a matrix y, where the  
first 7 elements are equal to x[1]^1/7, then the next 6 equal to  
x[2]^1/6, next seven x[3]^1/7 and so on all the way to the  
1040th element. I have implemented this with a for loop an hour  
ago and it is still loading, can anyone offer any suggestions as to  
how I can create this matrix without using loops? I would really  
appreciate any suggestions.


Hi.

Since indexing x[1], x[2], ... is used and also the description
of y corresponds more to a vector, let me first suggest a solution
for vectors.

 x - rep(42, times=4) # any vector of even length
 x - x/c(7, 6)
 rep(x, times=rep(c(7, 6), length=length(x)))
 [1] 6 6 6 6 6 6 6 7 7 7 7 7 7 6 6 6 6 6 6 6 7 7 7 7 7 7

The input vector may be obtained using c() from a matrix. The
output vector may be reformatted using matrix(). However, for
a matrix solution, a more precise description of the question
is needed.


I certainly agree there, but you have pointed the way to a possible  
solution if the OP wants a solution that works along column-wise  
application of the 7 alternate with 6 rule applied to a smaller  
version that might be called minimal:


mtx - matrix(1:160, 16, 10)
mtx^(1/c(7,7,7,7,7,7,7,6,6,6,6,6,6) )  # argument recycling results in  
alternating lengths of 7 and 6


(takes less than 5 seconds, and you should expect a warning message  
since 160*1 is not evenly divisible by 13.)


I am not smart enough to know whether mtx2[160,1] should be  
160^(1/6) or (.)^(1/7) but this suggests it is (1/6):

 all.equal( mtx2[160,1] , 160^(1/7) )
[1] Mean relative difference: 0.2883231
 all.equal( mtx2[160,1] , 160^(1/6) )
[1] TRUE

If he wanted it to be row-wise then he could transpose, operate, and  
back-transpose. This, of course, assumes that the OP did not really  
mean x[1]^1/7 but rather meant x[1]^(1/7), since I know of no computer  
language where exponentiation is lower in precedence than division.




Hope this helps.

Petr Savicky.

--
David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] RV: R question

2011-05-06 Thread Philipp Pagel

On Fri, May 06, 2011 at 09:17:11AM -0400, David Winsemius wrote:
 
 On May 6, 2011, at 4:03 AM, Philipp Pagel wrote:
 The .Machine() command will provide some insight into these matters.
 
 On my device (and I suspect on all versions of R) .Machine is a
 built-in list and there is no .Machine() function.

Oops - my fault. You are right, of course.

cu
Philipp

-- 
Dr. Philipp Pagel
Lehrstuhl für Genomorientierte Bioinformatik
Technische Universität München
Wissenschaftszentrum Weihenstephan
Maximus-von-Imhof-Forum 3
85354 Freising, Germany
http://webclu.bio.wzw.tum.de/~pagel/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Uniroot - error

2011-05-06 Thread CarJabo

sorry I am not asking someone to do my homework, as I have finished all the
procedure. I am just wondering why this technical error occurs, so I can fix
it myself.
By the way i don't have any instructor or teaching assistant for help, so
any suggestion for the error will be appreciated.
Thanks very much.

--
View this message in context: 
http://r.789695.n4.nabble.com/Uniroot-error-tp3502628p3502773.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] two-way group mean prediction in survreg with three factors

2011-05-06 Thread Pang Du

Thank you very much for the reply.  I tend to agree with your first
suggestion.  And that's exactly what I did.

In other functions, an easier way to marginalize such a variable C (not
necessarily a factor) is to use the option 
include=c(A,B,A:B)
This essentially sets C at a value such that the contribution from C is 0.
Unfortunately, this option is not available in survreg for some reason.

Pang


-Original Message-
From: Andrew Robinson [mailto:a.robin...@ms.unimelb.edu.au] 
Sent: Thursday, May 05, 2011 7:55 PM
To: Pang Du
Cc: r-help@r-project.org
Subject: Re: [R] two-way group mean prediction in survreg with three factors

Even then, I think that there's a problem.  If C is in the model, then
the response varies by C.  The simplest way is to pick a value for C,
and then evaluate the group mean estimates of A and B (and C).

Something in my brain keeps asking whether another way to marginalize
C for the purposes of predicting A and B is just to remove it from the
model, or alternatively to make it a random effect.  Neither idea
seems rock solid at this point.  

Cheers

Andrew

On Thu, May 05, 2011 at 09:37:15AM -0400, Pang Du wrote:
 Oops, I hope not too.  Don't know why I had the brackets around B+C.  My
 model is actually A*B+C.  And I'm not sure how to obtain the two-way
 prediction of AB with C marginalized.  Thanks.
 
 Pang
 
 -Original Message-
 From: Andrew Robinson [mailto:a.robin...@ms.unimelb.edu.au] 
 Sent: Wednesday, May 04, 2011 10:13 PM
 To: Pang Du
 Cc: r-help@r-project.org
 Subject: Re: [R] two-way group mean prediction in survreg with three
factors
 
 I hope not!
 
 Facetiousness aside, the model that you have fit contains C, and,
 indeed, an interaction between A and C.  So, the effect of A upon the
 response variable depends on the level of C.  The summary you want
 must marginalize C somehow, probably by a weighted or unweighted
 average across its levels.  What does that summary really mean?  Can
 you meaningfully average across the levels of a predictor that is
 included in the model as a main and an interaction term?
 
 Best wishes
 
 Andrew
 
 On Wed, May 04, 2011 at 12:24:50PM -0400, Pang Du wrote:
  I'm fitting a regression model for censored data with three categorical
  predictors, say A, B, C.  My final model based on the survreg function
is 
  
  Surv(..) ~ A*(B+C).
  
  I know the three-way group mean estimates can be computed using the
 predict
  function. But is there any way to obtain two-way group mean estimates,
say
  estimated group mean for (A1, B1)-group?  The sample group means don't
  incorporate censoring and thus may not be appropriate here.
  
   
  
  Pang Du
  
  Virginia Tech
  
  
  [[alternative HTML version deleted]]
  
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 -- 
 Andrew Robinson  
 Program Manager, ACERA 
 Department of Mathematics and StatisticsTel: +61-3-8344-6410
 University of Melbourne, VIC 3010 Australia   (prefer email)
 http://www.ms.unimelb.edu.au/~andrewpr  Fax: +61-3-8344-4599
 http://www.acera.unimelb.edu.au/
 
 Forest Analytics with R (Springer, 2011) 
 http://www.ms.unimelb.edu.au/FAwR/
 Introduction to Scientific Programming and Simulation using R (CRC, 2009):

 http://www.ms.unimelb.edu.au/spuRs/

-- 
Andrew Robinson  
Program Manager, ACERA 
Department of Mathematics and StatisticsTel: +61-3-8344-6410
University of Melbourne, VIC 3010 Australia   (prefer email)
http://www.ms.unimelb.edu.au/~andrewpr  Fax: +61-3-8344-4599
http://www.acera.unimelb.edu.au/

Forest Analytics with R (Springer, 2011) 
http://www.ms.unimelb.edu.au/FAwR/
Introduction to Scientific Programming and Simulation using R (CRC, 2009): 
http://www.ms.unimelb.edu.au/spuRs/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] pcse package error message concerning nrows of using data

2011-05-06 Thread Johann Maier

Dear R Community,

I am currently facing this seemingly obscure Problem with Panel 
Corrected Standard Errors (PCSE) following Beck  Katz (1995). As the 
authors suggest, I regressed a linear model (tmodel) with lm() with 
option na.action=na.exclude (I have also tried other options here). My 
dataframe is organized in pooled times series fashion, but with various 
missing values along the spacial and temporal dimension.

 library(pcse)

 tmodel-lm(var1 ~ var2 + var3 + var4 + as.factor(year) , 
data=dataframe.2, na.action=na.exclude)

 pcse.tmodel-pcse(tmodel, groupN = country, groupT = year, 
pairwise=TRUE)

 Error in pcse(tmodel, groupN = country, groupT = year, pairwise = 
TRUE) :
 Length of groupN and groupT must equal nrows of using data.

I have checked the length of groupN, groupT and the residuals of tmodel, 
only to find out they are equally long (N=1667.) To my knowledge, the 
pairwise=TRUE argument should have accounted for the Problem of missing 
values, i.e. an unbalanced panel. But pairwise=FALSE produces the same 
error message.
Would repeating the regression only with cases without missing values 
solve the problem?

I hope I have given sufficient information for you to be able to 
identify the problem.

Thanks in advance,

Johann Maier

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Long Term Prediction Time Series

2011-05-06 Thread matteo.redae...@libero.it

Hello

I'm interested in Long Term Prediction over time series: regarding it, I and 
other guys have developed STRATEGICO, a free and opensource tool at http://code.
google.com/p/strategico/

Please have a look at it, test it online with your own time series and give us 
any feedbacks and suggestions (new models?) ..

Do you have any documentation or  R libraries/tools to suggest?

Regards
Matteo
http://www.redaelli.org/matteo/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Uniroot - error

On Fri, May 6, 2011 at 6:33 AM, CarJabo carly_...@hellokitty.com wrote:
 sorry I am not asking someone to do my homework, as I have finished all the
 procedure. I am just wondering why this technical error occurs, so I can fix
 it myself.

My guess would be it has something to do with the random data
generated at the 4625th simulation, but you have not posted a
reproducible example (as requested in the posting guide) so it is not
really possible to say.

 By the way i don't have any instructor or teaching assistant for help, so
 any suggestion for the error will be appreciated.

If whatever institution you are taking this at simply gives
assignments, grades them and penalizes for plagiarism without having
any instructors or teachers, I recommend moving to an institution
where classes are taught by someone who can answer questions etc.  As
Dr. Murdoc (and the posting guide) said and say, R-help is not for
homework.

 Thanks very much.

Good luck,

Josh


 --
 View this message in context: 
 http://r.789695.n4.nabble.com/Uniroot-error-tp3502628p3502773.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
http://www.joshuawiley.com/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] for loop

2011-05-06 Thread Petr Savicky

On Fri, May 06, 2011 at 02:28:57PM +1000, andre bedon wrote:
 
 Hi,
 I'm hoping someone can offer some advice:I have a matrix x of dimensions 
 160 by 1. I need to create a matrix y, where the first 7 elements are 
 equal to x[1]^1/7, then the next 6 equal to x[2]^1/6, next seven x[3]^1/7 and 
 so on all the way to the 1040th element. I have implemented this with a 
 for loop an hour ago and it is still loading, can anyone offer any 
 suggestions as to how I can create this matrix without using loops? I would 
 really appreciate any suggestions.

Hi.

Thanks to a remark by David, i now see that x[1]^1/7 is meant
as x[1]^(1/7). The following is a solution modified accordingly

  x - matrix(100, nrow=2, ncol=2)
  x - x^c(1/7, 1/6)
  rep(x, times=rep(c(7, 6), length=length(x)))

   [1] 1.930698 1.930698 1.930698 1.930698 1.930698 1.930698 1.930698 2.154435
   [9] 2.154435 2.154435 2.154435 2.154435 2.154435 1.930698 1.930698 1.930698
  [17] 1.930698 1.930698 1.930698 1.930698 2.154435 2.154435 2.154435 2.154435
  [25] 2.154435 2.154435

The output is a vector. You require that the output has 6.5 times
more elements than the input, since 1040/160/1 = 6.5. This
corresponds to the understanding that odd elements should repeat 7
times and even elements 6 times. However, it is not clear, what
the dimension of the output matrix should be.

Hope this helps.

Petr Savicky.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R CMD check warning

2011-05-06 Thread Uwe Ligges




On 05.05.2011 21:20, Ray Brownrigg wrote:

On 6/05/2011 6:06 a.m., swaraj basu wrote:

Dear All,
I am trying to build a package for a set of functions. I am
able to build the package and its working fine. When I check it with
R CMD check

I get a following warning : no visible global function
definition for ‘biocLite’

I have used biocLite to load a user defined library from
within a function if that library is not pre-installed

if(is.element(annotpkg, installed.packages()[,1]) == FALSE){
source(http://www.bioconductor.org/biocLite.R;)
biocLite(annotpkg)
library(annotpkg,character.only=TRUE)
}

Should I ignore this error or there is a workaround for the
warning. My package is working fine though still I guess the warning
has to
have significance. Please help in clarifying this
warning.


Your source() call generates the function biocLite() in such a way that
R CMD check cannot know, so a workaround is to precede the source()
statement with something like:
biocLite - function(x) 0

[I don't know if there is a 'best' way to do this.]

In general Warnings are just that - they point you to a possible fault,
but there is not enough information for the code to make a final
determination.



Or just set ask people (or try yourself) to use install.packages() with 
the corresponding BioC repository selected. I still do not get the point 
why install.packages() and friends are not sufficient.


Best,
Uwe



HTH
Ray Brownrigg

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Confidence intervals and polynomial fits

  Hi all!  I'm getting a model fit from glm() (a binary logistic regression 
fit, but I don't think that's important) for a formula that contains powers of 
the explanatory variable up to fourth.  So the fit looks something like this 
(typing into mail; the actual fit code is complicated because it involves 
step-down and so forth):

x_sq - x * x
x_cb - x * x * x
x_qt - x * x * x * x
model - glm(outcome ~ x + x_sq + x_cb + x_qt, binomial)

  This works great, and I get a nice fit.  My question regards the confidence 
intervals on that fit.  I am calling:

cis - confint.default(model, level=0.95)

to get the confidence intervals for each coefficient.  This confint.default() 
call gives me a table like this (where dispersal is the real name of what I'm 
calling x above):

 2.5 %  97.5 %
(Intercept)  8.7214297   8.9842533
dispersal  -37.5621412 -35.6629981
dispersal_sq66.8077669  74.4490123
dispersal_cb   -74.5963861 -62.8347057
dispersal_qt22.6590159  28.6506186

  A side note: calling confint() instead of confint.default() is much, much 
slower -- my model has many other terms I'm not discussing here, and is fit to 
300,000 data points -- and a check indicates that the intervals returned for my 
model by confint() are not virtually identical, so this is not the source of my 
problem:

 2.5 %  97.5 %
(Intercept)  8.7216693   8.9844958
dispersal  -37.5643922 -35.6652276
dispersal_sq66.8121519  74.4534753
dispersal_cb   -74.5995820 -62.8377766
dispersal_qt22.6592724  28.6509494

  These tables show the problem: the 95% confidence limits for the quartic term 
are every bit as wide as the limits for the other terms.  Since the quartic 
term coefficient gets multiplied by the fourth power of x, this means that the 
width of the confidence band starts out nice and narrow (when x is close to 
zero, where the width of the confidence band is pretty much just determined by 
the confidence limits on the intercept) but balloons out to be tremendously 
wide for larger values of x.  The width of it looks quite unreasonable to me, 
given that my fit is constrained by 300,000 data points.

  Intuitively, I would have expected the confidence limits for the quartic term 
to be much narrower than for, say, the linear term, because the effect of 
variation in the quartic term is so much larger.  But the way confidence 
intervals are calculated in R, at least, does not seem to follow my instincts 
here.  In other words, predicted values using the 2.5% value of the linear 
coefficient and the 97.5% value of the linear coefficient are really not very 
different, but predicted values using the 2.5% value of the quadratic 
coefficient and the 97.5% value of the quadratic coefficient are enormously, 
wildly divergent -- far beyond the range of variability in the original data, 
in fact.  I feel quite confident, in a seat-of-the-pants sense, that the actual 
95% limits of that quartic coefficient are much, much narrower than what R is 
giving me.

  Looking at it another way: if I just exclude the quartic term from the glm() 
fit, I get a dramatically narrower confidence band, even though the fit to the 
data is not nearly as good (I'm keeping the fourth power for good reasons).  
This again seems to me to indicate that the confidence band with the quartic 
term included is artificially, and incorrectly, wide.  If a fit with reasonably 
narrow confidence limits can be produced for the model with terms only up to 
cubic (or, taking this reductio even further, for a quadratic or a linear 
model, also), why does adding the quadratic term, which improves the quality of 
the fit to the data, cause the confidence limits to nevertheless balloon 
outwards?  That seems fundamentally illogical to me, but maybe my instincts are 
bad.

  So what's going on here?  Is this just a limitation of the way confint() 
computes confidence intervals?  Is there a better function to use, that is 
compatible with binomial glm() fits?  Or am I misinterpreting the meaning of 
the confidence limits in some way?  Or what?

  Thanks for any advice.  I've had trouble finding examples of polynomial fits 
like this; people sometimes fit the square of the explanatory variable, of 
course, but going up to fourth powers seems to be quite uncommon, so I've had 
trouble finding out how others have dealt with these sorts of issues that crop 
up only with higher powers.

Ben Haller
McGill University

http://biology.mcgill.ca/grad/ben/
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Confidence intervals and polynomial fits

Hi Ben,

From what you have written, I am not exactly sure what your
seat-of-the-pant sense is coming from.  My pantseat typically does not
tell me much; however, quartic trends tend to less stable than linear,
so I am not terribly surprised.

As two side notes:

x_qt - x^4 # shorter code-wise
and you can certain just enter a quartic term if the data is just
quartic and you are not really itnerested in the lower trends.

Cheers,

Josh

On Fri, May 6, 2011 at 8:35 AM, Ben Haller rh...@sticksoftware.com wrote:
  Hi all!  I'm getting a model fit from glm() (a binary logistic regression 
 fit, but I don't think that's important) for a formula that contains powers 
 of the explanatory variable up to fourth.  So the fit looks something like 
 this (typing into mail; the actual fit code is complicated because it 
 involves step-down and so forth):

 x_sq - x * x
 x_cb - x * x * x
 x_qt - x * x * x * x
 model - glm(outcome ~ x + x_sq + x_cb + x_qt, binomial)

  This works great, and I get a nice fit.  My question regards the confidence 
 intervals on that fit.  I am calling:

 cis - confint.default(model, level=0.95)

 to get the confidence intervals for each coefficient.  This confint.default() 
 call gives me a table like this (where dispersal is the real name of what 
 I'm calling x above):

                             2.5 %      97.5 %
 (Intercept)              8.7214297   8.9842533
 dispersal              -37.5621412 -35.6629981
 dispersal_sq            66.8077669  74.4490123
 dispersal_cb           -74.5963861 -62.8347057
 dispersal_qt            22.6590159  28.6506186

  A side note: calling confint() instead of confint.default() is much, much 
 slower -- my model has many other terms I'm not discussing here, and is fit 
 to 300,000 data points -- and a check indicates that the intervals returned 
 for my model by confint() are not virtually identical, so this is not the 
 source of my problem:

                             2.5 %      97.5 %
 (Intercept)              8.7216693   8.9844958
 dispersal              -37.5643922 -35.6652276
 dispersal_sq            66.8121519  74.4534753
 dispersal_cb           -74.5995820 -62.8377766
 dispersal_qt            22.6592724  28.6509494

  These tables show the problem: the 95% confidence limits for the quartic 
 term are every bit as wide as the limits for the other terms.  Since the 
 quartic term coefficient gets multiplied by the fourth power of x, this means 
 that the width of the confidence band starts out nice and narrow (when x is 
 close to zero, where the width of the confidence band is pretty much just 
 determined by the confidence limits on the intercept) but balloons out to be 
 tremendously wide for larger values of x.  The width of it looks quite 
 unreasonable to me, given that my fit is constrained by 300,000 data points.

  Intuitively, I would have expected the confidence limits for the quartic 
 term to be much narrower than for, say, the linear term, because the effect 
 of variation in the quartic term is so much larger.  But the way confidence 
 intervals are calculated in R, at least, does not seem to follow my instincts 
 here.  In other words, predicted values using the 2.5% value of the linear 
 coefficient and the 97.5% value of the linear coefficient are really not very 
 different, but predicted values using the 2.5% value of the quadratic 
 coefficient and the 97.5% value of the quadratic coefficient are enormously, 
 wildly divergent -- far beyond the range of variability in the original data, 
 in fact.  I feel quite confident, in a seat-of-the-pants sense, that the 
 actual 95% limits of that quartic coefficient are much, much narrower than 
 what R is giving me.

  Looking at it another way: if I just exclude the quartic term from the glm() 
 fit, I get a dramatically narrower confidence band, even though the fit to 
 the data is not nearly as good (I'm keeping the fourth power for good 
 reasons).  This again seems to me to indicate that the confidence band with 
 the quartic term included is artificially, and incorrectly, wide.  If a fit 
 with reasonably narrow confidence limits can be produced for the model with 
 terms only up to cubic (or, taking this reductio even further, for a 
 quadratic or a linear model, also), why does adding the quadratic term, which 
 improves the quality of the fit to the data, cause the confidence limits to 
 nevertheless balloon outwards?  That seems fundamentally illogical to me, but 
 maybe my instincts are bad.

  So what's going on here?  Is this just a limitation of the way confint() 
 computes confidence intervals?  Is there a better function to use, that is 
 compatible with binomial glm() fits?  Or am I misinterpreting the meaning of 
 the confidence limits in some way?  Or what?

  Thanks for any advice.  I've had trouble finding examples of polynomial fits 
 like this; people sometimes fit the square of the explanatory variable, of 
 course, but going up to fourth powers seems to be

Re: [R] reading a column as a character vector

The strsplit function is probably the closest R function to perls split 
function.  For more detailed control the gsubfn package can be useful.

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Gamliel Beyderman
Sent: Thursday, May 05, 2011 3:35 PM
To: r-help@r-project.org
Subject: [R] reading a column as a character vector

Hi!

I have 2 columns (even though the data looks like there's more columns than
just two) of data in the following format:

0,58905313R0EOL 229742002R0EOL 58905312R0EOL
1,58905317R0DBL 58905303R0DBL 58905313R0IL 58905313R0VH
58905313R0EOL 223354003R0IL 223354003R0VH 58905308R0DBL
58905308R0VM 58905301R0DBL 229742002R0IL 229742002R0VH
229742002R0EOL

I can change the format of the input (remove quotes, add spaces, only put
quotes around the entire list of codes...)

The first column is numeric, the second column is a character vector of
event codes. Ultimately, I want to to transform the second column into a
factor where each event code (such as 58905313R0EOL or 216918000R0DBL) is a
separate level.

while the following statement works:

reduce2-read.table(reduce2.csv, sep=,,
colClasses=c(integer,factor))

it does not know to break the event vectors into separate levels, the factor
it creates is wrong.

Perhaps there's something in R similar to split in Perl...

Thanks!

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] split character vector by multiple keywords simultaneously

Will all the keywords always be present in the same order?  Or are you looking 
for the keywords, but some may be absent or in different orders?

Look into the gsubfn package for some tools that could help.

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of sunny
Sent: Wednesday, May 04, 2011 5:09 PM
To: r-help@r-project.org
Subject: [R] split character vector by multiple keywords simultaneously

Hi. I have a character vector that looks like this:

 temp - c(Company name: The first company General Manager: John Doe I
 Managers: John Doe II, John Doe III,Company name: The second company
 General Manager: Jane Doe I,Company name: The third company Managers:
 Jane Doe II, Jane Doe III)
 temp
[1] Company name: The first company General Manager: John Doe I Managers:
John Doe II, John Doe III
[2] Company name: The second company General Manager: Jane Doe I  

[3] Company name: The third company Managers: Jane Doe II, Jane Doe III 

I know all the keywords, i.e. Company name:, General Manager:,
Managers: etc. I'm looking for a way to split this character vector into
multiple character vectors, with one column for each keyword and the
corresponding values for each, i.e.

Company name  General Manager  Managers
1  The first companyJohn Doe IJohn Doe II, John
Doe III
2 The second companyJane Doe I  
3  The third company  Jane Doe II,
Jane Doe III

I have tried a lot to find something suitable but haven't so far. Any help
will be greatly appreciated. I am running R-2.12.1 on x86_64 linux.

Thanks.

--
View this message in context: 
http://r.789695.n4.nabble.com/split-character-vector-by-multiple-keywords-simultaneously-tp3497033p3497033.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Confidence intervals and polynomial fits



On May 6, 2011, at 11:35 AM, Ben Haller wrote:

 Hi all!  I'm getting a model fit from glm() (a binary logistic  
regression fit, but I don't think that's important) for a formula  
that contains powers of the explanatory variable up to fourth.  So  
the fit looks something like this (typing into mail; the actual fit  
code is complicated because it involves step-down and so forth):


x_sq - x * x
x_cb - x * x * x
x_qt - x * x * x * x
model - glm(outcome ~ x + x_sq + x_cb + x_qt, binomial)


It might have been easier with:

 model - glm(outcome ~ poly(x, 4) , binomial)

Since the authors of confint might have been expecting a poly()  
formulation the results might be more reliable. I'm just using lm()  
but I think the conclusions are more general:


 set.seed(123)
 x -seq(0, 4, by=0.1)
 y -seq(0, 4, by=0.1)^2 +rnorm(41)
 x2 - x^2
 x3 - x^3
 summary(lm(y~x+x2+x3 ) )

Call:
lm(formula = y ~ x + x2 + x3)

Residuals:
Min  1Q  Median  3Q Max
-1.8528 -0.6146 -0.1164  0.6211  1.8923

Coefficients:
Estimate Std. Error t value Pr(|t|)
(Intercept)  0.457370.52499   0.871   0.3893
x   -0.759891.15080  -0.660   0.5131
x2   1.309870.67330   1.945   0.0594 .
x3  -0.035590.11058  -0.322   0.7494
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.9184 on 37 degrees of freedom
Multiple R-squared: 0.9691, Adjusted R-squared: 0.9666
F-statistic: 386.8 on 3 and 37 DF,  p-value:  2.2e-16

I wouldn't have been able to understand why a highly significant  
relationship overall could not be ascribed to any of the terms. See  
what happens with poly(x,3)


 summary( lm(y~poly(x,3) ) )

Call:
lm(formula = y ~ poly(x, 3))

Residuals:
Min  1Q  Median  3Q Max
-1.8528 -0.6146 -0.1164  0.6211  1.8923

Coefficients:
Estimate Std. Error t value Pr(|t|)
(Intercept)   5.4271 0.1434  37.839   2e-16 ***
poly(x, 3)1  30.0235 0.9184  32.692   2e-16 ***
poly(x, 3)2   8.7823 0.9184   9.563 1.53e-11 ***
poly(x, 3)3  -0.2956 0.9184  -0.3220.749
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.9184 on 37 degrees of freedom
Multiple R-squared: 0.9691, Adjusted R-squared: 0.9666
F-statistic: 386.8 on 3 and 37 DF,  p-value:  2.2e-16

 confint( lm(y~poly(x,3) ) )
2.5 %97.5 %
(Intercept)  5.136526  5.717749
poly(x, 3)1 28.162706 31.884351
poly(x, 3)2  6.921505 10.643149
poly(x, 3)3 -2.156431  1.565213
 confint(lm(y~x+x2+x3 ) )
  2.5 %97.5 %
(Intercept) -0.60636547 1.5211072
x   -3.09164639 1.5718576
x2  -0.05437802 2.6741120
x3  -0.25964705 0.1884609

Neither version exactly captures the simulation input, which would  
have shown only an effect of the quadratic, but I didn't do any  
centering. At least the poly version shows the lower order terms up to  
quadratic as significant, whereas it's difficult to extract much  
meaningful out of the separately calculated term version.


--

David.



 This works great, and I get a nice fit.  My question regards the  
confidence intervals on that fit.  I am calling:


cis - confint.default(model, level=0.95)

to get the confidence intervals for each coefficient.  This  
confint.default() call gives me a table like this (where dispersal  
is the real name of what I'm calling x above):


2.5 %  97.5 %
(Intercept)  8.7214297   8.9842533
dispersal  -37.5621412 -35.6629981
dispersal_sq66.8077669  74.4490123
dispersal_cb   -74.5963861 -62.8347057
dispersal_qt22.6590159  28.6506186

 A side note: calling confint() instead of confint.default() is  
much, much slower -- my model has many other terms I'm not  
discussing here, and is fit to 300,000 data points -- and a check  
indicates that the intervals returned for my model by confint() are  
not virtually identical, so this is not the source of my problem:


2.5 %  97.5 %
(Intercept)  8.7216693   8.9844958
dispersal  -37.5643922 -35.6652276
dispersal_sq66.8121519  74.4534753
dispersal_cb   -74.5995820 -62.8377766
dispersal_qt22.6592724  28.6509494

 These tables show the problem: the 95% confidence limits for the  
quartic term are every bit as wide as the limits for the other  
terms.  Since the quartic term coefficient gets multiplied by the  
fourth power of x, this means that the width of the confidence band  
starts out nice and narrow (when x is close to zero, where the width  
of the confidence band is pretty much just determined by the  
confidence limits on the intercept) but balloons out to be  
tremendously wide for larger values of x.  The width of it looks  
quite unreasonable to me, given that my fit is constrained by  
300,000 data points.


 Intuitively, I would have expected the confidence limits for

[R] How to alter circle size

2011-05-06 Thread Dat Mai

Hello all,

I'm trying  to create a heatmap using 2 matrices I have: z and v. Both
matrices represent different correlations for the same independent
variables. The problem I have is that I wish to have the values from matrix
z to be represented by color intensity while having the values from matrix v
to be represented by circle size. I currently have the following in front of
me and an unsure of what to add or change in order to achieve that goal.

panel.corrgram.2 =
function(x, y, z, subscripts, at = pretty(z), scale = 0.8, ...)
{
  require(grid, quietly = TRUE)
  x - as.numeric(x)[subscripts]
  y - as.numeric(y)[subscripts]
  z - as.numeric(z)[subscripts]
  zcol - level.colors(z, at = at, ...)
  for (i in seq(along = z))
  {
lims - range(0, z[i])
tval - 2 * base::pi * seq(from = lims[1], to = lims[2], by = 0.01)
grid.polygon(x = x[i] + .5 * scale * c(0, sin(tval)),
  y = y[i] + .5 * scale * c(0, cos(tval)),
  default.units = native,
  gp = gpar(fill = zcol[i]))
grid.circle(x = x[i], y = y[i], r = .5 * scale,
  default.units = native)
  }
}

k=levelplot(signif(z,3), scales = list(x = list(rot = 45)),
col.regions=col.sch, panel = panel.corrgram.2,
label = num.circ, xlab=, ylab=, main=paste(output,receptor response))
print(k)

-- 
Best,
Dat Mai
PhD Rotation Student
Albert Einstein College of Medicine

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Confidence intervals and polynomial fits

 From what you have written, I am not exactly sure what your
 seat-of-the-pant sense is coming from.  My pantseat typically does not
 tell me much; however, quartic trends tend to less stable than linear,
 so I am not terribly surprised.

  My pantseat is not normally very informative either, but if you saw the width 
of the confidence limits I'm getting for the quartic coefficient, I think your 
pantseat would agree with mine.  :-  The confidence band is staggeringly wide, 
many times the variation in the data itself; and with 300,000 data points to 
fit to, that just should not be the case.  With that many data points, it seems 
quite obvious that one can be 95% confident that the true data lies within a 
band somewhere reasonably close to the band that the sample data actually fall 
into, not a band many times wider.

 As two side notes:
 
 x_qt - x^4 # shorter code-wise
 and you can certain just enter a quartic term if the data is just
 quartic and you are not really itnerested in the lower trends.

  Yep, for sure.

  Thanks!

Ben Haller
McGill University

http://biology.mcgill.ca/grad/ben/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Creating binary variable depending on strings of two dataframes

2011-05-06 Thread Pete Pete


Gabor Grothendieck wrote:
 
 On Tue, Dec 7, 2010 at 11:30 AM, Pete Pete lt;noxyp...@gmail.comgt;
 wrote:

 Hi,
 consider the following two dataframes:
 x1=c(232,3454,3455,342,13)
 x2=c(1,1,1,0,0)
 data1=data.frame(x1,x2)

 y1=c(232,232,3454,3454,3455,342,13,13,13,13)
 y2=c(E1,F3,F5,E1,E2,H4,F8,G3,E1,H2)
 data2=data.frame(y1,y2)

 I need a new column in dataframe data1 (x3), which is either 0 or 1
 depending if the value E1 in y2 of data2 is true while x1=y1. The
 result
 of data1 should look like this:
   x1     x2 x3
 1 232   1   1
 2 3454 1   1
 3 3455 1   0
 4 342   0   0
 5 13     0   1

 I think a SQL command could help me but I am too inexperienced with it to
 get there.

 
 Try this:
 
 library(sqldf)
 sqldf(select x1, x2, max(y2 = 'E1') x3 from data1 d1 left join data2 d2
 on (x1 = y1) group by x1, x2 order by d1.rowid)
 x1 x2 x3
 1  232  1  1
 2 3454  1  1
 3 3455  1  0
 4  342  0  0
 5   13  0  1
 
 
 -- 
 Statistics  Software Consulting
 GKX Group, GKX Associates Inc.
 tel: 1-877-GKX-GROUP
 email: ggrothendieck at gmail.com
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 


That works pretty cool but I need to automate this a bit more. Consider the
following example:

list1=c(A01,B04,A64,G84,F19)

x1=c(232,3454,3455,342,13)
x2=c(1,1,1,0,0)
data1=data.frame(x1,x2)

y1=c(232,232,3454,3454,3455,342,13,13,13,13)
y2=c(E13,B04,F19,A64,E22,H44,F68,G84,F19,A01)
data2=data.frame(y1,y2)

I want now to creat a loop, which creates for every value in list1 a new
binary variable in data1. Result should look like:
x1  x2  A01 B04 A64 G84 F19
232 1   0   1   0   0   0
34541   0   0   1   0   1
34551   0   0   0   0   0
342 0   0   0   0   0   0
13  0   1   0   0   1   1

Thanks!


--
View this message in context: 
http://r.789695.n4.nabble.com/Creating-binary-variable-depending-on-strings-of-two-dataframes-tp3076724p3503334.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] understanding error messages in archmCopulaFit in fCopulae

2011-05-06 Thread Lee, Eric

Hello,

I'm running version R x64 v2.12.2 on a 64bit windows 7 PC.  I have two data 
vectors, x and y, and try to run archmCopulaFit.  Most of the copulas produce 
errors.  Can you tell me what the errors mean and if possible, how I can set 
archmCopulaFit options to make them run?  I see in the documentation that there 
are arguments passed to the optimization function in use,  'nlminb', however, 
it's not clear to me how to set them.

Thanks.

Copulas 2, 4,7,8,11,12,14,15,18,19,20,21,22 have the following error:

fit1 = archmCopulaFit(x,y,type=2)
Error in if (alpha  range[1] | alpha  range[2]) { :
  missing value where TRUE/FALSE needed

Copulas 3 has the following error:

fit1 = archmCopulaFit(x,y,type=3)
Error in if (alpha  range[1] | alpha  range[2]) { :
  missing value where TRUE/FALSE needed
In addition: Warning messages:
1: In log(darchmCopula(u = U, v = V, alpha = x, type = type)) :
  NaNs produced
2: In log(darchmCopula(u = U, v = V, alpha = x, type = type)) :
  NaNs produced

Copula 10:

fit1 = archmCopulaFit(x,y,type=10)
Error in if (alpha  range[1] | alpha  range[2]) { :
  missing value where TRUE/FALSE needed
In addition: Warning message:
In log(darchmCopula(u = U, v = V, alpha = x, type = type)) : NaNs produced

Copulas 1,5,9,13,16,17 produce estimates.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Cumsum in Lattice Panel Function

2011-05-06 Thread Elliot Joel Bernstein

I'm trying to create an xyplot with a groups argument where the y-variable
is the cumsum of the values stored in the input data frame. I almost have
it, but I can't get it to automatically adjust the y-axis scale. How do I
get the y-axis to automatically scale as it would have if the cumsum values
had been stored in the data frame?

Here is the code I have so far:

require(lattice)



dates - seq(as.Date(2011-01-01), as.Date(2011-04-30), days)
g - 1:3


dat - data.frame(date = rep(dates, length(g)),
  group = rep(g, each = length(dates)),
  value = rnorm(length(dates)*length(g)) + 0.05)


xyplot(value ~ date, data = dat, group = group, type = 'l', grid = TRUE,
   panel = panel.superpose,
   panel.groups = function(x, y, ...) { panel.xyplot(x, cumsum(y), ...)
})


I want the result to look the same as if I had done

dat$cumvalue - with(dat, unsplit(lapply(split(value, group), cumsum),
group))
xyplot(cumvalue ~ date, data = dat, group = group, type = 'l', grid = TRUE)

Thanks for your help.

- Elliot


Elliot Joel Bernstein, Ph.D. | Research Associate | FDO Partners, LLC
134 Mount Auburn Street | Cambridge, MA | 02138
Phone: (617) 503-4619 | Email: elliot.bernst...@fdopartners.com

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Using $ accessor in GAM formula

2011-05-06 Thread Gene Leynes

Hmmm

After reading that email four times, I think I see what you mean.

Checking for variables within particular scopes is probably one of the most
challenging things in R, and I would guess in other languages too.  In R
it's compounded by situations when you're writing a function to accept
variables as either a stand alone global variable, or as an element of a
data.frame or list.

I think this is a new problem, and I'll switch to the lengthier
data.frame[,'x'] syntax in gam for now.

By the way, about the $ accessors.  I can see why some people don't like
them, but they are a part of the language.  And, I think they're a good
part.  They make the code much more readable, and I have yet to make a
mistake using them (which makes me think that they can be used
responsibly).  Making code harder to read is its own source of error, and is
not something that I take lightly!

Thanks for the replies.  And, thank you Rolf for the detailed analysis.  Do
you think that your or I should submit a bug report to the package
maintainer?  I'm not sure how that works.  Very few of my challenges turn
out to be actual bugs, but I think this one is.

On Fri, May 6, 2011 at 4:53 AM, Berwin A Turlach
berwin.turl...@gmail.comwrote:

 G'day Rolf,

 On Fri, 06 May 2011 09:58:50 +1200
 Rolf Turner rolf.tur...@xtra.co.nz wrote:

  but it's strange that the dodgey code throws an error with gam(dat1$y
  ~ s(dat1$x))  but not with gam(dat2$cf ~ s(dat2$s))

  Something a bit subtle is going on; it would be nice to be able to
  understand it.

 Well,

 R traceback()
 3: eval(expr, envir, enclos)
 2: eval(inp, data, parent.frame())
 1: gam(dat$y ~ s(dat$x))

 So the lines leading up to the problem seem to be the following from
 the gam() function:

vars - all.vars(gp$fake.formula[-2])
inp - parse(text = paste(list(, paste(vars, collapse = ,),
)))
if (!is.list(data)  !is.data.frame(data))
data - as.data.frame(data)



 Setting

 R options(error=recover)

 running the code until the error occurs, and then examining the frame
 number for the gam() call shows that inp is
 expression(list( dat1,x )) in your first example and
 expression(list( dat2,s )) in your second example.  In both
 examples, data is list() (not unsurprisingly).  When,

dl - eval(inp, data, parent.frame())

 is executed, it tries to eval inp, in both cases dat1 and dat2
 are found, obviously, in the parent frame.  In your first example x is
 (typically) not found and an error is thrown, in your second example an
 object with name s is found in package:mgcv and the call to eval
 succeeds.  dl becomes a list with two components, the first being,
 respectively, dat1 or dat2, and the second the body of the function
 s.  (To verify that, you should probably issue the command
 debug(gam) and step through those first few lines of the function
 until you reach the above command.)

 The corollary is that you can use the name of any object that R will
 find in the parent frame, if it is another data set, then that data
 set will become the second component of inp.  E.g.:

 R dat=data.frame(min=1:100,cf=sin(1:100/50)+rnorm(100,0,.05))
 R gam(dat$cf ~ s(dat$min))

 Family: gaussian
 Link function: identity

 Formula:
 dat$cf ~ s(dat$min)

 Estimated degrees of freedom:
 3.8925  total = 4.892488

 GCV score: 0.002704789

 Or

 R dat=data.frame(BOD=1:100,cf=sin(1:100/50)+rnorm(100,0,.05))
 R gam(dat$cf ~ s(dat$BOD))

 Family: gaussian
 Link function: identity

 Formula:
 dat$cf ~ s(dat$BOD)

 Estimated degrees of freedom:
 3.9393  total = 4.939297

 GCV score: 0.002666985

  Just out of pure academic interest. :-)

 Hope your academic curiosity is now satisfied. :)

 HTH.

 Cheers,

Berwin

 == Full address 
 A/Prof Berwin A Turlach   Tel.: +61 (8) 6488 3338 (secr)
 School of Maths and Stats (M019)+61 (8) 6488 3383 (self)
 The University of Western Australia   FAX : +61 (8) 6488 1028
 35 Stirling Highway
 Crawley WA 6009e-mail: berwin.turl...@gmail.com
 Australiahttp://www.maths.uwa.edu.au/~berwin


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Confidence intervals and polynomial fits

On May 6, 2011, at 12:31 PM, David Winsemius wrote:

 On May 6, 2011, at 11:35 AM, Ben Haller wrote:
 
 Hi all!  I'm getting a model fit from glm() (a binary logistic regression 
 fit, but I don't think that's important) for a formula that contains powers 
 of the explanatory variable up to fourth.  So the fit looks something like 
 this (typing into mail; the actual fit code is complicated because it 
 involves step-down and so forth):
 
 x_sq - x * x
 x_cb - x * x * x
 x_qt - x * x * x * x
 model - glm(outcome ~ x + x_sq + x_cb + x_qt, binomial)
 
 It might have been easier with:
 
 model - glm(outcome ~ poly(x, 4) , binomial)

  Interesting.  The actual model I'm fitting has lots more terms, and needs to 
be able to be stepped down; sometimes some of the terms of the polynomials will 
get dropped while others get retained, for example.  But more to the point, 
poly() seems to be doing something tricky involving orthogonal polynomials 
that I don't understand.  I don't think I want whatever that is; I want my 
original variable x, plus its powers.  For example, if I do:

x  runif(10)
poly(x, 3)

the columns I get are not x, x^2, x^3; they are something else.  So the poly() 
fit is not equivalent to my fit.

 Since the authors of confint might have been expecting a poly() formulation 
 the results might be more reliable. I'm just using lm() but I think the 
 conclusions are more general:
 
 ...
 
 Coefficients:
Estimate Std. Error t value Pr(|t|)
 (Intercept)  0.457370.52499   0.871   0.3893
 x   -0.759891.15080  -0.660   0.5131
 x2   1.309870.67330   1.945   0.0594 .
 x3  -0.035590.11058  -0.322   0.7494
 
 ...
 
 Coefficients:
Estimate Std. Error t value Pr(|t|)
 (Intercept)   5.4271 0.1434  37.839   2e-16 ***
 poly(x, 3)1  30.0235 0.9184  32.692   2e-16 ***
 poly(x, 3)2   8.7823 0.9184   9.563 1.53e-11 ***
 poly(x, 3)3  -0.2956 0.9184  -0.3220.749

  Here, in your illustration, is underscored what I mean.  Whatever these 
orthogonal polynomial terms are that you're using, they are clearly not the 
original x, x^2 and x^3, and they're giving you a different fit than those do.  
I probably ought to learn about this technique, since it looks interesting; but 
for my purposes I need the fit to actually be in terms of x, since x is my 
explanatory variable.  And the fit I'm getting is highly significant (all terms 
 2e-16), so the lack of fit problem you're illustrating does not seem to apply 
to my case.

  Or maybe I'm totally misunderstanding your point...?  :-

  Thanks!

Ben Haller
McGill University

http://biology.mcgill.ca/grad/ben/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Using $ accessor in GAM formula

2011-05-06 Thread Gavin Simpson

On Fri, 2011-05-06 at 11:20 -0500, Gene Leynes wrote:
 Hmmm
 
 After reading that email four times, I think I see what you mean.
 
 Checking for variables within particular scopes is probably one of the most
 challenging things in R, and I would guess in other languages too.  In R
 it's compounded by situations when you're writing a function to accept
 variables as either a stand alone global variable, or as an element of a
 data.frame or list.

Dear Gene,

 I think this is a new problem, and I'll switch to the lengthier
 data.frame[,'x'] syntax in gam for now.

No, No, No, No, No!

If there is one thing you **should** take from this thread is that there
is no need to perform subsetting like that in a model formula.

Why would you want (or prefer) to do:

gam(dat$y ~ s(dat$x))

or

gam(dat[, y] ~ s(dat[, x]))

when

gam(y ~ s(x), dat)

will suffice?

 By the way, about the $ accessors.  I can see why some people don't like
 them, but they are a part of the language.  And, I think they're a good
 part.  They make the code much more readable, and I have yet to make a
 mistake using them (which makes me think that they can be used
 responsibly).  Making code harder to read is its own source of error, and is
 not something that I take lightly!

If you use R's formula notation properly, you'll get cleaner code than
either of your suggestions.

 Thanks for the replies.  And, thank you Rolf for the detailed analysis.  Do
 you think that your or I should submit a bug report to the package
 maintainer?  I'm not sure how that works.  Very few of my challenges turn
 out to be actual bugs, but I think this one is.

I would consider this a bug - but in the sense that Simon didn't foresee
the strange formulas that users of his software might concoct.

By the way, I think you perhaps meant Berwin (re the detailed analysis)?

HTH

G

 On Fri, May 6, 2011 at 4:53 AM, Berwin A Turlach
 berwin.turl...@gmail.comwrote:
 
  G'day Rolf,
 
  On Fri, 06 May 2011 09:58:50 +1200
  Rolf Turner rolf.tur...@xtra.co.nz wrote:
 
   but it's strange that the dodgey code throws an error with gam(dat1$y
   ~ s(dat1$x))  but not with gam(dat2$cf ~ s(dat2$s))
 
   Something a bit subtle is going on; it would be nice to be able to
   understand it.
 
  Well,
 
  R traceback()
  3: eval(expr, envir, enclos)
  2: eval(inp, data, parent.frame())
  1: gam(dat$y ~ s(dat$x))
 
  So the lines leading up to the problem seem to be the following from
  the gam() function:
 
 vars - all.vars(gp$fake.formula[-2])
 inp - parse(text = paste(list(, paste(vars, collapse = ,),
 )))
 if (!is.list(data)  !is.data.frame(data))
 data - as.data.frame(data)
 
 
 
  Setting
 
  R options(error=recover)
 
  running the code until the error occurs, and then examining the frame
  number for the gam() call shows that inp is
  expression(list( dat1,x )) in your first example and
  expression(list( dat2,s )) in your second example.  In both
  examples, data is list() (not unsurprisingly).  When,
 
 dl - eval(inp, data, parent.frame())
 
  is executed, it tries to eval inp, in both cases dat1 and dat2
  are found, obviously, in the parent frame.  In your first example x is
  (typically) not found and an error is thrown, in your second example an
  object with name s is found in package:mgcv and the call to eval
  succeeds.  dl becomes a list with two components, the first being,
  respectively, dat1 or dat2, and the second the body of the function
  s.  (To verify that, you should probably issue the command
  debug(gam) and step through those first few lines of the function
  until you reach the above command.)
 
  The corollary is that you can use the name of any object that R will
  find in the parent frame, if it is another data set, then that data
  set will become the second component of inp.  E.g.:
 
  R dat=data.frame(min=1:100,cf=sin(1:100/50)+rnorm(100,0,.05))
  R gam(dat$cf ~ s(dat$min))
 
  Family: gaussian
  Link function: identity
 
  Formula:
  dat$cf ~ s(dat$min)
 
  Estimated degrees of freedom:
  3.8925  total = 4.892488
 
  GCV score: 0.002704789
 
  Or
 
  R dat=data.frame(BOD=1:100,cf=sin(1:100/50)+rnorm(100,0,.05))
  R gam(dat$cf ~ s(dat$BOD))
 
  Family: gaussian
  Link function: identity
 
  Formula:
  dat$cf ~ s(dat$BOD)
 
  Estimated degrees of freedom:
  3.9393  total = 4.939297
 
  GCV score: 0.002666985
 
   Just out of pure academic interest. :-)
 
  Hope your academic curiosity is now satisfied. :)
 
  HTH.
 
  Cheers,
 
 Berwin
 
  == Full address 
  A/Prof Berwin A Turlach   Tel.: +61 (8) 6488 3338 (secr)
  School of Maths and Stats (M019)+61 (8) 6488 3383 (self)
  The University of Western Australia   FAX : +61 (8) 6488 1028
  35 Stirling Highway
  Crawley WA 6009e-mail: berwin.turl...@gmail.com
  Australiahttp://www.maths.uwa.edu.au/~berwin

Re: [R] Averaging uneven measurements by time with uneven numbers of measurements

Here is an example of what I would like to do:
meas = measurements 
times = time of measurement
measf = measurements in final, reduced matrix
timesf = time of measurement in final matrix

meas-runif(30)
times-sort(runif(30))
inputmat-cbind(times,meas)
names(inputmat)-c(timef,measf)

I would then like to break the times up into 0.2 increments so the final matrix 
would look like this:
timef measf
0.2  mean of meas where (time =0) and (time0.2)
0.4
.
.
.
1.0

Instead of measf being the mean, it could be the last measurement taken.


-Original Message-
From: ehl...@ucalgary.ca [mailto:ehl...@ucalgary.ca] 
Sent: Thursday, May 05, 2011 03:58 PM
To: Thompson, Adele - adele_thomp...@cargill.com
Cc: r-help@r-project.org
Subject: Re: [R] Averaging uneven measurements by time with uneven numbers of 
measurements

On 2011-05-05 14:20, Schatzi wrote:
 I do not want smoothing as the data should have jumps (it is weight left in
 feeding bunker). I was thinking of maybe using a histogram-like function and
 then averaging that. Not sure if this is possible.

(It would be useful to include your original request - not everyone
uses Nabble.)

Actually, averaging *is* smoothing, but I suppose your intent is, for
some reason, not to smooth across 30-minute boundaries.

Perhaps you could use findInterval() to identify which measurements to
average.

Peter Ehlers


 -
 In theory, practice and theory are the same. In practice, they are not - 
 Albert Einstein
 --
 View this message in context: 
 http://r.789695.n4.nabble.com/Averaging-uneven-measurements-by-time-with-uneven-numbers-of-measurements-tp3499337p3499386.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Confidence intervals and polynomial fits

2011-05-06 Thread Bert Gunter

FWIW:

Fitting higher order polynomials (say  2) is almost always a bad idea.

See e.g.  the Hastie, Tibshirani, et. al book on Statistical
Learning for a detailed explanation why. The Wikipedia entry on
smoothing splines also contains a brief explanation, I believe.

Your ~0 P values for the coefficients also suggest problems/confusion
(!) -- perhaps you need to consider something along the lines of
functional data analysis  for your analysis.

Having no knowledge of your issues, these remarks are entirely
speculative and may well be wrong. So feel free to dismiss.
Nevertheless, you may find it useful to consult your local
statistician for help.

Cheers,
Bert

P.S. The raw = TRUE option of poly will fit raw, not orthogonal,
polynomials. The fitted projections will be the same up to numerical
error whichever basis is chosen, of course.

On Fri, May 6, 2011 at 10:08 AM, Ben Haller rh...@sticksoftware.com wrote:
 On May 6, 2011, at 12:31 PM, David Winsemius wrote:

 On May 6, 2011, at 11:35 AM, Ben Haller wrote:

 Hi all!  I'm getting a model fit from glm() (a binary logistic regression 
 fit, but I don't think that's important) for a formula that contains powers 
 of the explanatory variable up to fourth.  So the fit looks something like 
 this (typing into mail; the actual fit code is complicated because it 
 involves step-down and so forth):

 x_sq - x * x
 x_cb - x * x * x
 x_qt - x * x * x * x
 model - glm(outcome ~ x + x_sq + x_cb + x_qt, binomial)

 It might have been easier with:

 model - glm(outcome ~ poly(x, 4) , binomial)

  Interesting.  The actual model I'm fitting has lots more terms, and needs to 
 be able to be stepped down; sometimes some of the terms of the polynomials 
 will get dropped while others get retained, for example.  But more to the 
 point, poly() seems to be doing something tricky involving orthogonal 
 polynomials that I don't understand.  I don't think I want whatever that is; 
 I want my original variable x, plus its powers.  For example, if I do:

 x  runif(10)
 poly(x, 3)

 the columns I get are not x, x^2, x^3; they are something else.  So the 
 poly() fit is not equivalent to my fit.

 Since the authors of confint might have been expecting a poly() formulation 
 the results might be more reliable. I'm just using lm() but I think the 
 conclusions are more general:

 ...

 Coefficients:
            Estimate Std. Error t value Pr(|t|)
 (Intercept)  0.45737    0.52499   0.871   0.3893
 x           -0.75989    1.15080  -0.660   0.5131
 x2           1.30987    0.67330   1.945   0.0594 .
 x3          -0.03559    0.11058  -0.322   0.7494

 ...

 Coefficients:
            Estimate Std. Error t value Pr(|t|)
 (Intercept)   5.4271     0.1434  37.839   2e-16 ***
 poly(x, 3)1  30.0235     0.9184  32.692   2e-16 ***
 poly(x, 3)2   8.7823     0.9184   9.563 1.53e-11 ***
 poly(x, 3)3  -0.2956     0.9184  -0.322    0.749

  Here, in your illustration, is underscored what I mean.  Whatever these 
 orthogonal polynomial terms are that you're using, they are clearly not the 
 original x, x^2 and x^3, and they're giving you a different fit than those 
 do.  I probably ought to learn about this technique, since it looks 
 interesting; but for my purposes I need the fit to actually be in terms of x, 
 since x is my explanatory variable.  And the fit I'm getting is highly 
 significant (all terms  2e-16), so the lack of fit problem you're 
 illustrating does not seem to apply to my case.

  Or maybe I'm totally misunderstanding your point...?  :-

  Thanks!

 Ben Haller
 McGill University

 http://biology.mcgill.ca/grad/ben/

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Men by nature long to get on to the ultimate truths, and will often
be impatient with elementary studies or fight shy of them. If it were
possible to reach the ultimate truths without the elementary studies
usually prefixed to them, these would not be preparatory studies but
superfluous diversions.

-- Maimonides (1135-1204)

Bert Gunter
Genentech Nonclinical Biostatistics

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Panels order in lattice graphs

2011-05-06 Thread Cristina Silva

Thanks to all that reply to my post. The best solution that answers 
entirely to my question and can be used as a general function and not 
case by case is the one sent by the package author.


Many thanks to everybody. It was helpful.

Cristina

On 05/05/2011 10:44, Deepayan Sarkar wrote:

On Wed, May 4, 2011 at 9:20 PM, Cristina Silvacsi...@ipimar.pt  wrote:

Hi all,

In lattice graphs, panels are drawn from left to right and bottom to top.
The flag as.table=TRUE changes to left to right and top to bottom. Is
there any way to change to first top to bottom and then left to right?
didn´t find anything neither in Help pages nor Lattice book.

See ?packet.panel.default. For example,


p- xyplot(mpg ~ disp | factor(carb), mtcars, as.table = TRUE)

print(p, packet.panel = packet.panel.default)

my.packet.panel-
 function(layout, condlevels, page, row, column, ...)
{
 tlayout- layout[c(2, 1, 3)] # switch row and column
 print(packet.panel.default(tlayout, condlevels, page = page,
row = column, column = row, ...))
}

print(p, packet.panel = my.packet.panel)


-Deepayan




--
--
Cristina Silva
INRB/L-IPIMAR
Unidade de Recursos Marinhos e Sustentabilidade
Av. de Brasília, 1449-006 Lisboa
Portugal
Tel.: 351 21 3027096
Fax: 351 21 3015948
csi...@ipimar.pt

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Creating binary variable depending on strings of two dataframes



On May 6, 2011, at 11:35 AM, Pete Pete wrote:



Gabor Grothendieck wrote:


On Tue, Dec 7, 2010 at 11:30 AM, Pete Pete lt;noxyp...@gmail.comgt;
wrote:


Hi,
consider the following two dataframes:
x1=c(232,3454,3455,342,13)
x2=c(1,1,1,0,0)
data1=data.frame(x1,x2)

y1=c(232,232,3454,3454,3455,342,13,13,13,13)
y2=c(E1,F3,F5,E1,E2,H4,F8,G3,E1,H2)
data2=data.frame(y1,y2)

I need a new column in dataframe data1 (x3), which is either 0 or 1
depending if the value E1 in y2 of data2 is true while x1=y1. The
result
of data1 should look like this:
  x1 x2 x3
1 232   1   1
2 3454 1   1
3 3455 1   0
4 342   0   0
5 13 0   1

I think a SQL command could help me but I am too inexperienced  
with it to

get there.



Try this:


library(sqldf)
sqldf(select x1, x2, max(y2 = 'E1') x3 from data1 d1 left join  
data2 d2

on (x1 = y1) group by x1, x2 order by d1.rowid)

   x1 x2 x3
1  232  1  1
2 3454  1  1
3 3455  1  0
4  342  0  0
5   13  0  1



snipped Gabor's sig


That works pretty cool but I need to automate this a bit more.  
Consider the

following example:

list1=c(A01,B04,A64,G84,F19)

x1=c(232,3454,3455,342,13)
x2=c(1,1,1,0,0)
data1=data.frame(x1,x2)

y1=c(232,232,3454,3454,3455,342,13,13,13,13)
y2=c(E13,B04,F19,A64,E22,H44,F68,G84,F19,A01)
data2=data.frame(y1,y2)

I want now to creat a loop, which creates for every value in list1 a  
new

binary variable in data1. Result should look like:
x1  x2  A01 B04 A64 G84 F19
232 1   0   1   0   0   0
34541   0   0   1   0   1
34551   0   0   0   0   0
342 0   0   0   0   0   0
13  0   1   0   0   1   1


Loops!?! We don't nee no steenking loops!

 xtb -  with(data2, table(y1,y2))
 cbind(data1, xtb[match(data1$x1, rownames(xtb)), ] )
   x1 x2 A01 A64 B04 E13 E22 F19 F68 G84 H44
232   232  1   0   0   1   1   0   0   0   0   0
3454 3454  1   0   1   0   0   0   1   0   0   0
3455 3455  1   0   0   0   0   1   0   0   0   0
342   342  0   0   0   0   0   0   0   0   0   1
13 13  0   1   0   0   0   0   1   1   1   0

I am guessing that you were to ... er, busy? ... to complete the table?

--

David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Confidence intervals and polynomial fits

2011-05-06 Thread Prof Brian Ripley

On Fri, 6 May 2011, Bert Gunter wrote:

FWIW:

Fitting higher order polynomials (say 2) is almost always a bad idea.

See e.g. the Hastie, Tibshirani, et. al book on Statistical
Learning for a detailed explanation why. The Wikipedia entry on
smoothing splines also contains a brief explanation, I believe.

Your ~0 P values for the coefficients also suggest problems/confusion
(!) -- perhaps you need to consider something along the lines of
functional data analysis for your analysis.

Having no knowledge of your issues, these remarks are entirely
speculative and may well be wrong. So feel free to dismiss.
Nevertheless, you may find it useful to consult your local
statistician for help.

That is the main piece of advice I would have given. But if you must
DIY, consider the merits of orthogonal polynomials. Computing
individual confidence intervals for highly correlated coefficients is
very dubious practice. Without the example the posting guide asked
for, we can only guess if that is what is happening.

Cheers,
Bert

P.S. The raw = TRUE option of poly will fit raw, not orthogonal,
polynomials. The fitted projections will be the same up to numerical
error whichever basis is chosen, of course.

On Fri, May 6, 2011 at 10:08 AM, Ben Haller rh...@sticksoftware.com wrote:

On May 6, 2011, at 12:31 PM, David Winsemius wrote:

On May 6, 2011, at 11:35 AM, Ben Haller wrote:

Hi all! I'm getting a model fit from glm() (a binary logistic regression fit,
but I don't think that's important) for a formula that contains powers of the
explanatory variable up to fourth. So the fit looks something like this
(typing into mail; the actual fit code is complicated because it involves
step-down and so forth):

x_sq - x * x
x_cb - x * x * x
x_qt - x * x * x * x
model - glm(outcome ~ x + x_sq + x_cb + x_qt, binomial)

It might have been easier with:

model - glm(outcome ~ poly(x, 4) , binomial)

Interesting. The actual model I'm fitting has lots more terms, and needs to be able to
be stepped down; sometimes some of the terms of the polynomials will get dropped while
others get retained, for example. But more to the point, poly() seems to be doing
something tricky involving orthogonal polynomials that I don't understand. I
don't think I want whatever that is; I want my original variable x, plus its powers. For
example, if I do:

x runif(10)
poly(x, 3)

the columns I get are not x, x^2, x^3; they are something else. So the poly()
fit is not equivalent to my fit.

Since the authors of confint might have been expecting a poly() formulation the
results might be more reliable. I'm just using lm() but I think the conclusions
are more general:

...

Coefficients:
Estimate Std. Error t value Pr(|t|)
(Intercept) 0.45737 0.52499 0.871 0.3893
x -0.75989 1.15080 -0.660 0.5131
x2 1.30987 0.67330 1.945 0.0594 .
x3 -0.03559 0.11058 -0.322 0.7494

...

Coefficients:
Estimate Std. Error t value Pr(|t|)
(Intercept) 5.4271 0.1434 37.839 2e-16 ***
poly(x, 3)1 30.0235 0.9184 32.692 2e-16 ***
poly(x, 3)2 8.7823 0.9184 9.563 1.53e-11 ***
poly(x, 3)3 -0.2956 0.9184 -0.322 0.749

Here, in your illustration, is underscored what I mean. Whatever these
orthogonal polynomial terms are that you're using, they are clearly not the
original x, x^2 and x^3, and they're giving you a different fit than those do. I
probably ought to learn about this technique, since it looks interesting; but for
my purposes I need the fit to actually be in terms of x, since x is my explanatory
variable. And the fit I'm getting is highly significant (all terms 2e-16),
so the lack of fit problem you're illustrating does not seem to apply to my case.

Or maybe I'm totally misunderstanding your point...? :-

Thanks!

Ben Haller
McGill University

http://biology.mcgill.ca/grad/ben/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

--
Men by nature long to get on to the ultimate truths, and will often
be impatient with elementary studies or fight shy of them. If it were
possible to reach the ultimate truths without the elementary studies
usually prefixed to them, these would not be preparatory studies but
superfluous diversions.

-- Maimonides (1135-1204)

Bert Gunter
Genentech Nonclinical Biostatistics

--
Brian D. Ripley, rip...@stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/

[R] histograms and for loops

2011-05-06 Thread wwreith

The following code works mostly. It runs fine but... 

1. Is there a way to increment the xlab for each graph? I would like to have
Graph 1, Graph 2, etc. Right now it just gives me Graph i over and over
again. 

2. Is there a way to get the x-axis and y-axis to be bold or at least a
darker color? Right now it is light gray.

3. Is there a way to modify this code to automatically save them? I assume
the answer is do it manually. This is not the most important.

for(i in 1:12){
hist(zNort1[,i], freq=FALSE, xlab=Graph i, col=blue, main=Standardized
Residuals Histogram, ylim=c(0,1), xlim=c(-3.0,3.0))
zNortmin-min(zNort1[,1])
zNortmax-max(zNort1[,1])
zNortmean-mean(zNort1[,1])
zNortsd-sd(zNort1[,1])
X1-seq(-3.0, 3.0, by=.01)
lines(X1, dnorm(X1, zNortmean, zNortsd) , col=black)
}


--
View this message in context: 
http://r.789695.n4.nabble.com/histograms-and-for-loops-tp3503648p3503648.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] read a netcdf file _Fill_value=-32768

2011-05-06 Thread francoise orain


Hello
I am a new user of R .
and I ve problem with R and netcdf .
I succed installation , I could use all examples .
But when I take my netcf it is different .

I want to do statistic on this kind of file .

1)
first calculate mean .
my data is like that
through ncdump -h test.nc

netcdf test {
dimensions:
   lat = 301 ;
   lon = 401 ;
   time = UNLIMITED ; // (80 currently)
variables:
   float lat(lat) ;
   lat:long_name = latitude ;
   lat:standard_name = latitude ;
   lat:units = degrees_north ;
   lat:valid_range = -60., 60. ;
   float lon(lon) ;
   lon:long_name = longitude ;
   lon:standard_name = longitude ;
   lon:units = degrees_east ;
   lon:valid_range = -100., 45. ;
   double sst(time, lat, lon) ;
   sst:long_name = sea surface temperature ;
   sst:standard_name = sea_surface_temperature ;
   sst:units = K ;
   sst:_FillValue = -32768. ;
   double time(time) ;
   time:long_name = time ;
   time:standard_name = time ;
   time:units = seconds since 1981-01-01 00:00:00 ;
   time:ancillary_variables = sst_data_processed_flag ;

I succeed to read it with R, there is only one variable sst   80 days 
and lat = 301 lon = 401 (this file is already a concatenation with 
nco fonctions .


Here is what I do :

nc-open.ncdf(test.nc)
summary (nc)
data- get.var.ncdf(nc)
print (data)

answer extract :

[331,] -32768.00 -32768.00 -32768.00 -32768.00 -32768.00 -32768.00 -32768.00
[332,] -32768.00 -32768.00 -32768.00 -32768.00 -32768.00 -32768.00 -32768.00
 [,295][,296][,297][,298][,299][,300][,301]
 [1,]  0.10  0.10  0.12  0.15  0.15  0.14  0.15
 [2,]  0.12  0.12  0.13  0.15  0.15  0.14  0.15
 [3,]  0.13  0.13  0.13  0.16  0.14  0.14  0.14
 [4,]  0.15  0.13  0.11  0.16  0.15  0.15  0.15
 [5,]  0.17  0.16  0.15  0.15  0.16  0.17  0.15

#_fill_value is at  -32768.00

[ reached getOption(max.print) -- omitted 69 row(s) and 79 matrix 
slice(s) ] OK too long

 mean(data, na.rm=TRUE)
[1] -20020.35
 summary(data)
Min.   1st Qu.Median  Mean   3rd Qu.  Max.
-32770.00 -32770.00 -32770.00 -20020.00  0.42  6.59
 q()


It is clear that the missing value_FillValue = -32768.  here , is used 
to calculate the mean , the Median etc .

But I don't  succed to avoid NA value in the sum ???


I've seen set.missval.ncdf but I don't succeed anymore to make R 
understand that -32768  is a  missing_value.


What  are my mistakes , has someone an example or a solution ?

2)
Then after I want to calculate correlation between this one and another 
one , so my time dimension is 80  (80 days) but my time variable is 
always the same (because of concatenation with nco ) how can I  
calculate cor(data1,data2)
and be sure that the correlation is not calculate between day1 and day50 
for instance .


Thanks

Françoise Orain

.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] histograms and for loops

2011-05-06 Thread vioravis

This should work!!

for(i in 1:12){
xLabel - paste(Graph,i) 
plotTitle - paste(Graph,i,.jpg) 
jpeg(plotTitle)
print(hist(zNort1[,i], freq=FALSE, xlab=xLabel, col=blue,
main=Standardized Residuals Histogram, ylim=c(0,1), xlim=c(-3.0,3.0)),axes
= FALSE)
axis(1, col = blue,col.axis = blue)
axis(2, col= red,col.axis = red)
zNortmin-min(zNort1[,1]) 
zNortmax-max(zNort1[,1]) 
zNortmean-mean(zNort1[,1]) 
zNortsd-sd(zNort1[,1]) 
X1-seq(-3.0, 3.0, by=.01) 
lines(X1, dnorm(X1, zNortmean, zNortsd) , col=black) 
dev.off()
} 
 

--
View this message in context: 
http://r.789695.n4.nabble.com/histograms-and-for-loops-tp3503648p3503758.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] histograms and for loops

1. ?paste ?sprintf
2. ?par (look at col.axis) ?axis
3. ?pdf ?png ?dev.copy

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111


 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
 project.org] On Behalf Of wwreith
 Sent: Friday, May 06, 2011 11:11 AM
 To: r-help@r-project.org
 Subject: [R] histograms and for loops
 
 The following code works mostly. It runs fine but...
 
 1. Is there a way to increment the xlab for each graph? I would like to
 have
 Graph 1, Graph 2, etc. Right now it just gives me Graph i over and over
 again.
 
 2. Is there a way to get the x-axis and y-axis to be bold or at least a
 darker color? Right now it is light gray.
 
 3. Is there a way to modify this code to automatically save them? I
 assume
 the answer is do it manually. This is not the most important.
 
 for(i in 1:12){
 hist(zNort1[,i], freq=FALSE, xlab=Graph i, col=blue,
 main=Standardized
 Residuals Histogram, ylim=c(0,1), xlim=c(-3.0,3.0))
 zNortmin-min(zNort1[,1])
 zNortmax-max(zNort1[,1])
 zNortmean-mean(zNort1[,1])
 zNortsd-sd(zNort1[,1])
 X1-seq(-3.0, 3.0, by=.01)
 lines(X1, dnorm(X1, zNortmean, zNortsd) , col=black)
 }
 
 
 --
 View this message in context: http://r.789695.n4.nabble.com/histograms-
 and-for-loops-tp3503648p3503648.html
 Sent from the R help mailing list archive at Nabble.com.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-
 guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] histograms and for loops



On May 6, 2011, at 1:11 PM, wwreith wrote:


The following code works mostly. It runs fine but...

1. Is there a way to increment the xlab for each graph? I would like  
to have
Graph 1, Graph 2, etc. Right now it just gives me Graph i over and  
over

again.



Use the power of bquote. See modified code below.

2. Is there a way to get the x-axis and y-axis to be bold or at  
least a

darker color? Right now it is light gray.


What do you mean by the axis. The bounding box?, the ticks?, the  
labels?


3. Is there a way to modify this code to automatically save them? I  
assume

the answer is do it manually. This is not the most important.


?savePlot




for(i in 1:12){
hist(zNort1[,i], freq=FALSE, xlab=bquote(Graph *.(i) ) ,
  col=blue, main=Standardized
  Residuals Histogram, ylim=c(0,1), xlim=c(-3.0,3.0))
zNortmin-min(zNort1[,1])
zNortmax-max(zNort1[,1])
zNortmean-mean(zNort1[,1])
zNortsd-sd(zNort1[,1])
X1-seq(-3.0, 3.0, by=.01)
lines(X1, dnorm(X1, zNortmean, zNortsd) , col=black)
}


--



David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Conditional plot length in R

2011-05-06 Thread Pavan G

Hello All,
Let's say I have data spanning all quadrants of x-y plane. If I plot data
with a certain x and y range using xlim and ylim or by using plot.formula as
described in this link:
http://www.mathkb.com/Uwe/Forum.aspx/statistics/5684/plotting-in-R

*DF - data.frame(x = rnorm(1000), y = rnorm(1000))

*
* str(DF)*
*'data.frame':   1000 obs. of  2 variables:
$ x: num  -0.0265  0.1554 -0.1050 -0.9697 -0.3430 ...
$ y: num   1.386 -1.356 -1.170  0.426  0.204 ...

Now, let's plot the data meeting the criteria you indicated above:

 plot(y ~ x, data = DF, subset = (x  0)  (y  0))*

How then can I get the length of that data? If have 1000 data points and 200
lie in x,y0, how do I find that the length is 200?
Thanks!
Why-so-serious

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Draw a nomogram after glm

2011-05-06 Thread Frank Harrell

Don't attach the Design package.  Use only rms.  Please provide the output of
lrm (print the f object).  With such a strong model make sure you do not
have a circularity somewhere.  With nomogram you can specify ranges for the
predictors; default is 10th smallest to 10th largest.

rms will not make customized nomograms such as the one you included.

Frank


Hi all R users, 
Thanks Frank for your advices
In fact I posted all my script. In the R Help, the script for nomogram is
long and I took only the part what I think interesting in my case. 
I use informations from( datadist {Design} and rms {rms}) in the help of R
software to do my code.  
I see that I progress with my nomogram. Because with the following code
where I put  my real variables names: 

library(Design)
library(rms)
d - datadist(Fire)
options(datadist='d')
f-lrm(Ignition~FMC+Charge,data=Fire)
summary(f,FMC=c(0,506.40),Charge=c(45,646)) # min and max of FMC: 0 ,506.40
and the min and max of Charge: 45, 646 
plot(Predict(f,FMC=0:506.40,Charge=646))
plot(nomogram(f, interact=list(Charge= c(.2,.7 # sorry, not understand
vector c(.2,.7) from R help  

As result, I have the figure 1 then figure 2 but there is a problem. Because
the 3rd line of Figure 2  Charge must to go 0 until 650. 
Also the linear predictor must to go 0 until 1. 
http://r.789695.n4.nabble.com/file/n3503828/nomo.jpeg 
http://r.789695.n4.nabble.com/file/n3503828/nomogram.jpeg 
After, is it possible to draw my nomogram like the 3rd graph that I found in
Internet, it is easier to understand.
http://r.789695.n4.nabble.com/file/n3503828/nogramme.bmp 
Nb: I  Apologize for my bad english
Thanks for your help
Komine 
PhD student 
Dakar _Sénégal
West Africa  

Komine wrote:
 
 Hi all R users 
 I did a logistic regression with my binary variable Y (0/1) and 2
 explanatory variables.
 Now I try to draw my nomogram with predictive value. I visited the help of
 R but I have problem to understand well the example. When I use glm
 fonction, I have a problem, thus I use lrm. My code is: 
 modele-lrm(Y~L+P,data=donnee)
 fun- function(x) plogis(x-modele$coef[1]+modele$coef[2])
 f - Newlabels(modele,c(L=poids,P=taille))  
 nomogram(f, fun=list('Prob Y=1'=plogis), 
  fun.at=c(seq(0,1,by=.1),.95,.99),
  lmgp=.1, cex.axis=.6)
  fun.at=c(.01,.05,seq(.1,.9,by=.1),.95,.99),
  lmgp=.2, cex.axis=.6)
 options(Fire=NULL)
 Result is bad and I have this following error message:
 Erreur dans value.chk(at, i, NA, -nint, Limval, type.range = full) : 
   variable L does not have limits defined by datadist
 
 Could you help me on the code to draw nomogram. 
 Nb: my English is low, I apologize.
 Thank for your help
 Komine
 

Komine wrote:
 
 Hi all R users 
 I did a logistic regression with my binary variable Y (0/1) and 2
 explanatory variables.
 Now I try to draw my nomogram with predictive value. I visited the help of
 R but I have problem to understand well the example. When I use glm
 fonction, I have a problem, thus I use lrm. My code is: 
 modele-lrm(Y~L+P,data=donnee)
 fun- function(x) plogis(x-modele$coef[1]+modele$coef[2])
 f - Newlabels(modele,c(L=poids,P=taille))  
 nomogram(f, fun=list('Prob Y=1'=plogis), 
  fun.at=c(seq(0,1,by=.1),.95,.99),
  lmgp=.1, cex.axis=.6)
  fun.at=c(.01,.05,seq(.1,.9,by=.1),.95,.99),
  lmgp=.2, cex.axis=.6)
 options(Fire=NULL)
 Result is bad and I have this following error message:
 Erreur dans value.chk(at, i, NA, -nint, Limval, type.range = full) : 
   variable L does not have limits defined by datadist
 
 Could you help me on the code to draw nomogram. 
 Nb: my English is low, I apologize.
 Thank for your help
 Komine
 

Komine wrote:
 
 Hi all R users 
 I did a logistic regression with my binary variable Y (0/1) and 2
 explanatory variables.
 Now I try to draw my nomogram with predictive value. I visited the help of
 R but I have problem to understand well the example. When I use glm
 fonction, I have a problem, thus I use lrm. My code is: 
 modele-lrm(Y~L+P,data=donnee)
 fun- function(x) plogis(x-modele$coef[1]+modele$coef[2])
 f - Newlabels(modele,c(L=poids,P=taille))  
 nomogram(f, fun=list('Prob Y=1'=plogis), 
  fun.at=c(seq(0,1,by=.1),.95,.99),
  lmgp=.1, cex.axis=.6)
  fun.at=c(.01,.05,seq(.1,.9,by=.1),.95,.99),
  lmgp=.2, cex.axis=.6)
 options(Fire=NULL)
 Result is bad and I have this following error message:
 Erreur dans value.chk(at, i, NA, -nint, Limval, type.range = full) : 
   variable L does not have limits defined by datadist
 
 Could you help me on the code to draw nomogram. 
 Nb: my English is low, I apologize.
 Thank for your help
 Komine
 

Komine wrote:
 
 Hi all R users 
 I did a logistic regression with my binary variable Y (0/1) and 2
 explanatory variables.
 Now I try to draw my nomogram with predictive value. I visited the help of
 R but I have problem to understand well the example. When I use glm
 fonction, I have a problem, thus I use lrm. My code is:

Re: [R] Conditional plot length in R



On May 6, 2011, at 2:28 PM, Pavan G wrote:


Hello All,
Let's say I have data spanning all quadrants of x-y plane. If I plot  
data
with a certain x and y range using xlim and ylim or by using  
plot.formula as

described in this link:
http://www.mathkb.com/Uwe/Forum.aspx/statistics/5684/plotting-in-R

*DF - data.frame(x = rnorm(1000), y = rnorm(1000))

*
* str(DF)*
*'data.frame':   1000 obs. of  2 variables:
$ x: num  -0.0265  0.1554 -0.1050 -0.9697 -0.3430 ...
$ y: num   1.386 -1.356 -1.170  0.426  0.204 ...

Now, let's plot the data meeting the criteria you indicated above:

plot(y ~ x, data = DF, subset = (x  0)  (y  0))*

How then can I get the length of that data? If have 1000 data points  
and 200

lie in x,y0, how do I find that the length is 200?


with(DF, sum((x  0)  (y  0), na.rm=TRUE) )  # since it is 1 for  
TRUE and 0 for false after coercion.


--
David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] editor: not possible to change the variable name

2011-05-06 Thread tornanddesperate

Hi again everybody

I have I new problem concerning the editor of R. It is possible to  add a
new variable column, but they all have the name var1.I read somewhere that
it should be possible to change the variable name by clicking on it, but
that doesn't work. Is that a bug or how is it possible to change the
variable header?

Many thanks

Matthias 

--
View this message in context: 
http://r.789695.n4.nabble.com/editor-not-possible-to-change-the-variable-name-tp3503864p3503864.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Draw a nomogram after glm

2011-05-06 Thread Komine

Hi all R users, 
Thanks Frank for your advices
In fact I posted all my script. In the R Help, the script for nomogram is
long and I took only the part what I think interesting in my case. 
I use informations from( datadist {Design} and rms {rms}) in the help of R
software to do my code.  
I see that I progress with my nomogram. Because with the following code
where I put  my real variables names: 

library(Design)
library(rms)
d - datadist(Fire)
options(datadist='d')
f-lrm(Ignition~FMC+Charge,data=Fire)
summary(f,FMC=c(0,506.40),Charge=c(45,646)) # min and max of FMC: 0 ,506.40
and the min and max of Charge: 45, 646 
plot(Predict(f,FMC=0:506.40,Charge=646))
plot(nomogram(f, interact=list(Charge= c(.2,.7 # sorry, not understand
vector c(.2,.7) from R help  

As result, I have the figure 1 then figure 2 but there is a problem. Because
the 3rd line of Figure 2  Charge must to go 0 until 650. 
Also the linear predictor must to go 0 until 1. 
http://r.789695.n4.nabble.com/file/n3503828/nomo.jpeg 
http://r.789695.n4.nabble.com/file/n3503828/nomogram.jpeg 
After, is it possible to draw my nomogram like the 3rd graph that I found in
Internet, it is easier to understand.
http://r.789695.n4.nabble.com/file/n3503828/nogramme.bmp 
Nb: I  Apologize for my bad english
Thanks for your help
Komine 
PhD student 
Dakar _Sénégal
West Africa  

Komine wrote:
 
 Hi all R users 
 I did a logistic regression with my binary variable Y (0/1) and 2
 explanatory variables.
 Now I try to draw my nomogram with predictive value. I visited the help of
 R but I have problem to understand well the example. When I use glm
 fonction, I have a problem, thus I use lrm. My code is: 
 modele-lrm(Y~L+P,data=donnee)
 fun- function(x) plogis(x-modele$coef[1]+modele$coef[2])
 f - Newlabels(modele,c(L=poids,P=taille))  
 nomogram(f, fun=list('Prob Y=1'=plogis), 
  fun.at=c(seq(0,1,by=.1),.95,.99),
  lmgp=.1, cex.axis=.6)
  fun.at=c(.01,.05,seq(.1,.9,by=.1),.95,.99),
  lmgp=.2, cex.axis=.6)
 options(Fire=NULL)
 Result is bad and I have this following error message:
 Erreur dans value.chk(at, i, NA, -nint, Limval, type.range = full) : 
   variable L does not have limits defined by datadist
 
 Could you help me on the code to draw nomogram. 
 Nb: my English is low, I apologize.
 Thank for your help
 Komine
 

Komine wrote:
 
 Hi all R users 
 I did a logistic regression with my binary variable Y (0/1) and 2
 explanatory variables.
 Now I try to draw my nomogram with predictive value. I visited the help of
 R but I have problem to understand well the example. When I use glm
 fonction, I have a problem, thus I use lrm. My code is: 
 modele-lrm(Y~L+P,data=donnee)
 fun- function(x) plogis(x-modele$coef[1]+modele$coef[2])
 f - Newlabels(modele,c(L=poids,P=taille))  
 nomogram(f, fun=list('Prob Y=1'=plogis), 
  fun.at=c(seq(0,1,by=.1),.95,.99),
  lmgp=.1, cex.axis=.6)
  fun.at=c(.01,.05,seq(.1,.9,by=.1),.95,.99),
  lmgp=.2, cex.axis=.6)
 options(Fire=NULL)
 Result is bad and I have this following error message:
 Erreur dans value.chk(at, i, NA, -nint, Limval, type.range = full) : 
   variable L does not have limits defined by datadist
 
 Could you help me on the code to draw nomogram. 
 Nb: my English is low, I apologize.
 Thank for your help
 Komine
 

Komine wrote:
 
 Hi all R users 
 I did a logistic regression with my binary variable Y (0/1) and 2
 explanatory variables.
 Now I try to draw my nomogram with predictive value. I visited the help of
 R but I have problem to understand well the example. When I use glm
 fonction, I have a problem, thus I use lrm. My code is: 
 modele-lrm(Y~L+P,data=donnee)
 fun- function(x) plogis(x-modele$coef[1]+modele$coef[2])
 f - Newlabels(modele,c(L=poids,P=taille))  
 nomogram(f, fun=list('Prob Y=1'=plogis), 
  fun.at=c(seq(0,1,by=.1),.95,.99),
  lmgp=.1, cex.axis=.6)
  fun.at=c(.01,.05,seq(.1,.9,by=.1),.95,.99),
  lmgp=.2, cex.axis=.6)
 options(Fire=NULL)
 Result is bad and I have this following error message:
 Erreur dans value.chk(at, i, NA, -nint, Limval, type.range = full) : 
   variable L does not have limits defined by datadist
 
 Could you help me on the code to draw nomogram. 
 Nb: my English is low, I apologize.
 Thank for your help
 Komine
 

Hi all R users 
I did a logistic regression with my binary variable Y (0/1) and 2
explanatory variables.
Now I try to draw my nomogram with predictive value. I visited the help of R
but I have problem to understand well the example. When I use glm fonction,
I have a problem, thus I use lrm. My code is: 
modele-lrm(Y~L+P,data=donnee)
fun- function(x) plogis(x-modele$coef[1]+modele$coef[2])
f - Newlabels(modele,c(L=poids,P=taille))  
nomogram(f, fun=list('Prob Y=1'=plogis), 
 fun.at=c(seq(0,1,by=.1),.95,.99),
 lmgp=.1, cex.axis=.6)
 fun.at=c(.01,.05,seq(.1,.9,by=.1),.95,.99),
 lmgp=.2, cex.axis=.6)
options(Fire=NULL)
Result is bad and I have this following error message:

[R] create arrays

2011-05-06 Thread Schatzi

In Matlab, an array can be created from 1 - 30 using the command similar to R
which is 1:30. Then, to make the array step by 0.1 the command is 1:0.1:30
which is 1, 1.1, 1.2,...,29.9,30. How can I do this in R?

-
In theory, practice and theory are the same. In practice, they are not - Albert 
Einstein
--
View this message in context: 
http://r.789695.n4.nabble.com/create-arrays-tp3503988p3503988.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] create arrays

2011-05-06 Thread Schatzi

I can get around it by doing something like:
as.matrix(rep(1,291))*row(as.matrix(rep(1,291)))/10+.9

I was just hoping for a simple command.


Schatzi wrote:
 
 In Matlab, an array can be created from 1 - 30 using the command similar
 to R which is 1:30. Then, to make the array step by 0.1 the command is
 1:0.1:30 which is 1, 1.1, 1.2,...,29.9,30. How can I do this in R?
 


-
In theory, practice and theory are the same. In practice, they are not - Albert 
Einstein
--
View this message in context: 
http://r.789695.n4.nabble.com/create-arrays-tp3503988p3503998.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] create arrays

2011-05-06 Thread David Wolfskill

On Fri, May 06, 2011 at 12:11:30PM -0700, Schatzi wrote:
 In Matlab, an array can be created from 1 - 30 using the command similar to R
 which is 1:30. Then, to make the array step by 0.1 the command is 1:0.1:30
 which is 1, 1.1, 1.2,...,29.9,30. How can I do this in R?
 ...

This may well be a hack, but

 10:300/10

seemed to do it for me.

Peace,
david
-- 
David H. Wolfskill  r...@catwhisker.org
Depriving a girl or boy of an opportunity for education is evil.

See http://www.catwhisker.org/~david/publickey.gpg for my public key.


pgpGpFjB08hOL.pgp
Description: PGP signature
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] create arrays

?seq

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111


 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
 project.org] On Behalf Of Schatzi
 Sent: Friday, May 06, 2011 1:12 PM
 To: r-help@r-project.org
 Subject: [R] create arrays
 
 In Matlab, an array can be created from 1 - 30 using the command
 similar to R
 which is 1:30. Then, to make the array step by 0.1 the command is
 1:0.1:30
 which is 1, 1.1, 1.2,...,29.9,30. How can I do this in R?
 
 -
 In theory, practice and theory are the same. In practice, they are not
 - Albert Einstein
 --
 View this message in context: http://r.789695.n4.nabble.com/create-
 arrays-tp3503988p3503988.html
 Sent from the R help mailing list archive at Nabble.com.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-
 guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] create arrays

On Fri, May 6, 2011 at 12:11 PM, Schatzi adele_thomp...@cargill.com wrote:
 In Matlab, an array can be created from 1 - 30 using the command similar to R
 which is 1:30. Then, to make the array step by 0.1 the command is 1:0.1:30
 which is 1, 1.1, 1.2,...,29.9,30. How can I do this in R?

Hmm, in this case, I would do it slightly differently:

seq(from = 1, to = 30, by = .1)

Cheers,

Josh


 -
 In theory, practice and theory are the same. In practice, they are not - 
 Albert Einstein
 --
 View this message in context: 
 http://r.789695.n4.nabble.com/create-arrays-tp3503988p3503988.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
http://www.joshuawiley.com/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] create arrays

Beautiful.

-Original Message-
From: greg.s...@imail.org [mailto:greg.s...@imail.org] 
Sent: Friday, May 06, 2011 02:17 PM
To: Thompson, Adele - adele_thomp...@cargill.com; r-help@r-project.org
Subject: RE: [R] create arrays

?seq

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111

 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
 project.org] On Behalf Of Schatzi
 Sent: Friday, May 06, 2011 1:12 PM
 To: r-help@r-project.org
 Subject: [R] create arrays

 In Matlab, an array can be created from 1 - 30 using the command
 similar to R
 which is 1:30. Then, to make the array step by 0.1 the command is
 1:0.1:30
 which is 1, 1.1, 1.2,...,29.9,30. How can I do this in R?

 -
 In theory, practice and theory are the same. In practice, they are not
 - Albert Einstein
 --
 View this message in context: http://r.789695.n4.nabble.com/create-
 arrays-tp3503988p3503988.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-
 guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] editor: not possible to change the variable name

Hi Matthias,

What do you mean by that doesn't work? What platform are you using? Using:

sessionInfo()
R version 2.13.0 (2011-04-13)
Platform: x86_64-pc-mingw32/x64 (64-bit)

fix(mydataframe)

brings up the data editor, then if I click a variable name, and change
it and shut down the data editor, it works just fine for me. You
could also use the menu to bring up the editor. More information
about the steps you are trying, the version of R, and exactly what is
going wrong will help us help you.

Best regards,

Josh

On Fri, May 6, 2011 at 11:32 AM, tornanddesperate matthiasn...@gmx.ch wrote:
Hi again everybody

I have I new problem concerning the editor of R. It is possible to add a
new variable column, but they all have the name var1.I read somewhere that
it should be possible to change the variable name by clicking on it, but
that doesn't work. Is that a bug or how is it possible to change the
variable header?

Many thanks

Matthias

--
View this message in context:
http://r.789695.n4.nabble.com/editor-not-possible-to-change-the-variable-name-tp3503864p3503864.html
Sent from the R help mailing list archive at Nabble.com.

--
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
http://www.joshuawiley.com/

[R] Extracting months and years from Dates while keeping order

2011-05-06 Thread Dimitri Liakhovitski

Hello!

I'd like to take Dates and extract from them months and years - but so
that it sorts correctly. For example:

x1-seq(as.Date(2009-01-01), length = 14, by = month)
(x1)
order(x1)  # produces correct order based on full dates

# Of course, I could do format - but this way I am losing the Date
quality of the data:

x2-format(x1,%Y.%m)
(x2)
order(x2)

Not sure it's possible at all: But is there some other way to extract
just months and years from Dates (while ingoring days) - but so that
it's still somehow a Date object?

Thank you very much!

-- 
Dimitri Liakhovitski
Ninah Consulting
www.ninah.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Extracting months and years from Dates while keeping order

2011-05-06 Thread Gabor Grothendieck

On Fri, May 6, 2011 at 4:07 PM, Dimitri Liakhovitski
dimitri.liakhovit...@gmail.com wrote:
 Hello!

 I'd like to take Dates and extract from them months and years - but so
 that it sorts correctly. For example:

 x1-seq(as.Date(2009-01-01), length = 14, by = month)
 (x1)
 order(x1)  # produces correct order based on full dates

 # Of course, I could do format - but this way I am losing the Date
 quality of the data:

 x2-format(x1,%Y.%m)
 (x2)
 order(x2)

 Not sure it's possible at all: But is there some other way to extract
 just months and years from Dates (while ingoring days) - but so that
 it's still somehow a Date object?


Try using the yearmon class in zoo.

 library(zoo)
 y - as.yearmon(x1)
 y
 [1] Jan 2009 Feb 2009 Mar 2009 Apr 2009 May 2009 Jun 2009
 [7] Jul 2009 Aug 2009 Sep 2009 Oct 2009 Nov 2009 Dec 2009
[13] Jan 2010 Feb 2010
 order(y)
 [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14


-- 
Statistics  Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] replace NA

2011-05-06 Thread azam jaafari

Hello all
 
I have a geology map that has three level, bellow
 
-geology
lithology    landscape   landform
 
 
landform level is used as covariate (with codes=1,2,3,4,5) for training of 
neural network, but this level has missing data as NA. 
I want to replace the missing data of landform level with 0 (zero). Finally, 
landform will have codes  0,1,2,3,4,5.
 
please help me
 
Thanks alot.
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Fw: grid

2011-05-06 Thread azam jaafari


 
Dear All 

I trained a neural network for 200 data and I did prediction for a grid file 
(e.g. 100 points) such as below: 

snn-predict(nn, newdata=data.frame(wetness=wetnessgrid$band1, 
ndvi=ndvigrid$band1)) 

the pixels of snn is same with wetnessgrid or ndvigrid  

 I want to convert this file (snn) to a map that I can open it in GIS 
software.   

Thanks alot 
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Confidence intervals and polynomial fits

On May 6, 2011, at 1:58 PM, Prof Brian Ripley wrote:

 On Fri, 6 May 2011, Bert Gunter wrote:
 
 FWIW:
 
 Fitting higher order polynomials (say  2) is almost always a bad idea.
 
 See e.g.  the Hastie, Tibshirani, et. al book on Statistical
 Learning for a detailed explanation why. The Wikipedia entry on
 smoothing splines also contains a brief explanation, I believe.
 
 Your ~0 P values for the coefficients also suggest problems/confusion
 (!) -- perhaps you need to consider something along the lines of
 functional data analysis  for your analysis.
 
 Having no knowledge of your issues, these remarks are entirely
 speculative and may well be wrong. So feel free to dismiss.
 Nevertheless, you may find it useful to consult your local
 statistician for help.
 
 That is the main piece of advice I would have given.  But if you must DIY, 
 consider the merits of orthogonal polynomials.  Computing individual 
 confidence intervals for highly correlated coefficients is very dubious 
 practice.  Without the example the posting guide asked for, we can only guess 
 if that is what is happening.

  Thanks to both of you.  Yes, I am admittedly out of my depth; the 
statistician I would normally ask is on sabbatical, and I'm a bit at sea.  Of 
course McGill has a whole department of mathematics and statistics; I guess I 
ought to try to make a friend over there (I'm in the biology department).  
Anyhow, I've just downloaded the Hastie et al. book and will try to figure out 
whether my use of higher order polynomials is incorrect in my situation.  
Eliminating those would certainly solve my problem with the confidence 
intervals.  :-

  I was figuring that the ~0 P-values for coefficients was just the result of 
my having 300,000 data points; I figured the regression procedure was thus able 
to pin down very accurate estimates of them.  I'll look into functional data 
analysis as you recommend, though; I'm entirely unfamiliar with it.

  As for correlated coefficients: x, x^2, x^3 etc. would obviously be highly 
correlated, for values close to zero.  Is this what you mean, as a potential 
source of problems?  Or if you mean that the various other terms in my model 
might be correlated with x, that is not the case; each independent variable is 
completely uncorrelated with the others (this data comes from simulations, so 
the independent variables for each data point were in fact chosen by random 
drawing).

  It didn't seem easy to post an example, since my dataset is so large, but if 
either of you would be willing to look at this further, I could upload my 
dataset to a web server somewhere and post a link to it.  In any case, thanks 
very much for your help; I'll look into the things you mentioned.

Ben Haller
McGill University

http://biology.mcgill.ca/grad/ben/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] replace NA

Hi,

If your geology map is a special kind of object, this may not work,
but if you are just dealing with a data frame or matrix type object
named, geology with columns, something like this ought to do the
trick:

geology[is.na(geology[, landform]), landform] - 0

?is.na returns a logical vector of TRUE/FALSE whether a value is
missing which is used to index the data and then 0 is assigned to all
those positions.

HTH,

Josh

On Fri, May 6, 2011 at 1:12 PM, azam jaafari azamjaaf...@yahoo.com wrote:
 Hello all

 I have a geology map that has three level, bellow

 -geology
 lithology    landscape   landform


 landform level is used as covariate (with codes=1,2,3,4,5) for training of 
 neural network, but this level has missing data as NA.
 I want to replace the missing data of landform level with 0 (zero). Finally, 
 landform will have codes  0,1,2,3,4,5.

 please help me

 Thanks alot.
        [[alternative HTML version deleted]]


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.





-- 
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
http://www.joshuawiley.com/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Averaging uneven measurements by time with uneven numbers of measurements

I figured out a poor way to do what I want.
meas-runif(30)
times-sort(runif(30))

timesdec-seq(0,1,0.2)
ltim-length(timesdec)
storing-rep(0,ltim)

for (i in 1:ltim) {
if (i=1) {rowstart=1} else {rowstart-findInterval(timesdec[i-1],times)+1}
rowfinal-findInterval(timesdec[i],times)
storing[i]-mean(meas[rowstart:rowfinal])
} #end i-for loop


-Original Message-
From: ehl...@ucalgary.ca [mailto:ehl...@ucalgary.ca] 
Sent: Thursday, May 05, 2011 03:58 PM
To: Thompson, Adele - adele_thomp...@cargill.com
Cc: r-help@r-project.org
Subject: Re: [R] Averaging uneven measurements by time with uneven numbers of 
measurements

On 2011-05-05 14:20, Schatzi wrote:
 I do not want smoothing as the data should have jumps (it is weight left in
 feeding bunker). I was thinking of maybe using a histogram-like function and
 then averaging that. Not sure if this is possible.

(It would be useful to include your original request - not everyone
uses Nabble.)

Actually, averaging *is* smoothing, but I suppose your intent is, for
some reason, not to smooth across 30-minute boundaries.

Perhaps you could use findInterval() to identify which measurements to
average.

Peter Ehlers


 -
 In theory, practice and theory are the same. In practice, they are not - 
 Albert Einstein
 --
 View this message in context: 
 http://r.789695.n4.nabble.com/Averaging-uneven-measurements-by-time-with-uneven-numbers-of-measurements-tp3499337p3499386.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Confidence intervals and polynomial fits



On May 6, 2011, at 4:16 PM, Ben Haller wrote:



 As for correlated coefficients: x, x^2, x^3 etc. would obviously be  
highly correlated, for values close to zero.


Not just for x close to zero:

 cor( (10:20)^2, (10:20)^3 )
[1] 0.9961938
 cor( (100:200)^2, (100:200)^3 )
[1] 0.9966219


Is this what you mean, as a potential source of problems?  Or if you  
mean that the various other terms in my model might be correlated  
with x, that is not the case; each independent variable is  
completely uncorrelated with the others (this data comes from  
simulations, so the independent variables for each data point were  
in fact chosen by random drawing).


 It didn't seem easy to post an example, since my dataset is so  
large, but if either of you would be willing to look at this  
further, I could upload my dataset to a web server somewhere and  
post a link to it.  In any case, thanks very much for your help;  
I'll look into the things you mentioned.


Ben Haller
McGill University

http://biology.mcgill.ca/grad/ben/


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] understanding error messages in archmCopulaFit in fCopulae



On May 6, 2011, at 12:14 PM, Lee, Eric wrote:


Hello,

I'm running version R x64 v2.12.2 on a 64bit windows 7 PC.  I have  
two data vectors, x and y, and try to run archmCopulaFit.  Most of  
the copulas produce errors.  Can you tell me what the errors mean  
and if possible, how I can set archmCopulaFit options to make them  
run?  I see in the documentation that there are arguments passed to  
the optimization function in use,  'nlminb', however, it's not  
clear to me how to set them.


Thanks.

Copulas 2, 4,7,8,11,12,14,15,18,19,20,21,22 have the following error:

fit1 = archmCopulaFit(x,y,type=2)
Error in if (alpha  range[1] | alpha  range[2]) { :
 missing value where TRUE/FALSE needed


Seems like a fairly explanatory error message. The function is testing  
the assumptions going into it, possibly against default values,  using  
some sort of range calculation, and is finding that your data doesn't  
meet those requirements.




Copulas 3 has the following error:

fit1 = archmCopulaFit(x,y,type=3)
Error in if (alpha  range[1] | alpha  range[2]) { :
 missing value where TRUE/FALSE needed
In addition: Warning messages:
1: In log(darchmCopula(u = U, v = V, alpha = x, type = type)) :
 NaNs produced
2: In log(darchmCopula(u = U, v = V, alpha = x, type = type)) :
 NaNs produced


That message arises when you ask for the log of a number less than to  
equal to 0.




Copula 10:

fit1 = archmCopulaFit(x,y,type=10)
Error in if (alpha  range[1] | alpha  range[2]) { :
 missing value where TRUE/FALSE needed
In addition: Warning message:
In log(darchmCopula(u = U, v = V, alpha = x, type = type)) : NaNs  
produced




See above.


Copulas 1,5,9,13,16,17 produce estimates.


Those errors appear to be arising because of specific aspects of your  
data ...  which seems to be missing in action.


--
David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Extracting months and years from Dates while keeping order

2011-05-06 Thread Dimitri Liakhovitski

Perfect - that's it, Gabor, thanks a lot!
Dimitri

On Fri, May 6, 2011 at 4:11 PM, Gabor Grothendieck
ggrothendi...@gmail.com wrote:
 On Fri, May 6, 2011 at 4:07 PM, Dimitri Liakhovitski
 dimitri.liakhovit...@gmail.com wrote:
 Hello!

 I'd like to take Dates and extract from them months and years - but so
 that it sorts correctly. For example:

 x1-seq(as.Date(2009-01-01), length = 14, by = month)
 (x1)
 order(x1)  # produces correct order based on full dates

 # Of course, I could do format - but this way I am losing the Date
 quality of the data:

 x2-format(x1,%Y.%m)
 (x2)
 order(x2)

 Not sure it's possible at all: But is there some other way to extract
 just months and years from Dates (while ingoring days) - but so that
 it's still somehow a Date object?


 Try using the yearmon class in zoo.

 library(zoo)
 y - as.yearmon(x1)
 y
  [1] Jan 2009 Feb 2009 Mar 2009 Apr 2009 May 2009 Jun 2009
  [7] Jul 2009 Aug 2009 Sep 2009 Oct 2009 Nov 2009 Dec 2009
 [13] Jan 2010 Feb 2010
 order(y)
  [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14


 --
 Statistics  Software Consulting
 GKX Group, GKX Associates Inc.
 tel: 1-877-GKX-GROUP
 email: ggrothendieck at gmail.com




-- 
Dimitri Liakhovitski
Ninah Consulting
www.ninah.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] editor: not possible to change the variable name

Hi Matthias,

If you know the column number you want to change, it is pretty straightforward.

## use the builtin mtcars dataset as an example
## and store it in variable, 'x'
x - mtcars
## change the second column name to cylinder
colnames(x)[2] - cylinder

## compare the column names of 'x' with the unchanged 'mtcars'
colnames(mtcars)
colnames(x)

You might also consider R-sig mac if it is at all related to your OS.
I have never really used macs so I am unfamiliar with Rs default
interface on them, though I would assume it is similar.

Good luck,

Josh

On Fri, May 6, 2011 at 3:05 PM, Matthias Neff matthiasn...@gmx.ch wrote:
Hello Joshua,

Thank you for your prompt reply

I am using:
R version 2.13.0 (2011-04-13)
Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)

The problem is that the part where a window should pop up just doesn't
happen: I can click on the variable name as often as I want, nothing
happens.

Maybe this is a problem specific to the mac version of R.
Do you know if there is an alternative way to change the variable name?

Have a good day,

Matthias

On 06.05.2011, at 21:25, Joshua Wiley wrote:

Hi Matthias,

What do you mean by that doesn't work? What platform are you using?
Using:

sessionInfo()

R version 2.13.0 (2011-04-13)
Platform: x86_64-pc-mingw32/x64 (64-bit)

fix(mydataframe)

Best regards,

Josh

On Fri, May 6, 2011 at 11:32 AM, tornanddesperate matthiasn...@gmx.ch
wrote:

Hi again everybody

I have I new problem concerning the editor of R. It is possible to add a
new variable column, but they all have the name var1.I read somewhere
that
it should be possible to change the variable name by clicking on it, but
that doesn't work. Is that a bug or how is it possible to change the
variable header?

Many thanks

Matthias

--
View this message in context:
http://r.789695.n4.nabble.com/editor-not-possible-to-change-the-variable-name-tp3503864p3503864.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

--
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
http://www.joshuawiley.com/

Re: [R] create arrays

2011-05-06 Thread Daniel Nordlund

Some good suggestions, just (as always) be aware of floating-point imprecision. 
See FAQ 7.31

 s - seq(1,30,0.1)
 s[8]
[1] 1.7
 s[8] == 1.7
[1] FALSE

Just trying to forestall future questions :-)

Dan

Daniel Nordlund
Bothell, WA USA

 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
 On Behalf Of adele_thomp...@cargill.com
 Sent: Friday, May 06, 2011 12:18 PM
 To: greg.s...@imail.org; r-help@r-project.org
 Subject: Re: [R] create arrays
 
 Beautiful.
 
 -Original Message-
 From: greg.s...@imail.org [mailto:greg.s...@imail.org]
 Sent: Friday, May 06, 2011 02:17 PM
 To: Thompson, Adele - adele_thomp...@cargill.com; r-help@r-project.org
 Subject: RE: [R] create arrays
 
 ?seq
 
 --
 Gregory (Greg) L. Snow Ph.D.
 Statistical Data Center
 Intermountain Healthcare
 greg.s...@imail.org
 801.408.8111
 
 
  -Original Message-
  From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
  project.org] On Behalf Of Schatzi
  Sent: Friday, May 06, 2011 1:12 PM
  To: r-help@r-project.org
  Subject: [R] create arrays
 
  In Matlab, an array can be created from 1 - 30 using the command
  similar to R
  which is 1:30. Then, to make the array step by 0.1 the command is
  1:0.1:30
  which is 1, 1.1, 1.2,...,29.9,30. How can I do this in R?
 
  -
  In theory, practice and theory are the same. In practice, they are not
  - Albert Einstein
  --
  View this message in context: http://r.789695.n4.nabble.com/create-
  arrays-tp3503988p3503988.html
  Sent from the R help mailing list archive at Nabble.com.
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting-
  guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-
 guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Generalized Hyperbolic distribution

2011-05-06 Thread claire

How to use the package generalized hyperbolic distribution in order to
estimate the four parameters in the NIG-distribution? I have a data material
with stock returns that I want to fit the parameters to.

--
View this message in context: 
http://r.789695.n4.nabble.com/Generalized-Hyperbolic-distribution-tp3504369p3504369.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Weighted box or violin plots in Lattice?

2011-05-06 Thread Raphael Mazor


Is it possible to create weighted boxplots or violin plots in lattice?

It seems that you can specify weights for panel.histogram() and 
panel.densityplot(), but not for panel.bwplot or panel.violin().


Please let me know if I've missed something in the package documentation.

Thanks!

--
Raphael D. Mazor
Freshwater Biologist
Southern California Coastal Water Research Project
3535 Harbor Blv Suite 110
Costa Mesa, CA 92626

Tel. (714) 755-3235
Fax. (714) 755-3299

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Replacing missing values with values before it

2011-05-06 Thread Nick Manginelli

I'm using the survey api. I am taking 1000 samples of size of 100 and replacing 
20 of those values with missing values. Im trying to use sequential hot deck 
imputation, and thus I am trying to figure out how to replace missing values 
with the value before it. Other things I have to keep in mind is if there are 
two missing values side by side, how do I replace both those values with the 
value before. Also if the first of the sample of 100 is a missing value  I will 
replace that with the mean of the population. Im pretty sure I have to write a 
loop, but if anyone can help me figuring how to write this I would appreciate 
it greatly. Thank you

Nick Manginelli
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Custom Zero equivalent in R

2011-05-06 Thread Megh Dal

Hi all, I am to find some way on how I can tell R to use this small number 
10^-20 as zero by default. This means if any number is below this then that 
should be treated as negative, or if I divide something by any number less than 
that (in absolute term) then, Inf will be displayed etc.

I have gone through the help page of options() function, however could not find 
anything on how to handle that issue. Can somebody help me on this regards?

Thanks
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Sweave: no eps fig

2011-05-06 Thread mk90-40

Hi everyone,

I'm using R, Latex and Sweave for some years now, but today it confuses me alot:

Running Sweave produces only figures in .pdf format, no .eps figures.

The header looks like this:
echo=TRUE, fig=TRUE, label=Fig1=

There was no error message.

Does anybody have an idea?
Any changes in the Sweave-package?

Or a missing driver or something like that?
I recently switched to another OS (Fedora RedHat) - might that be the reason?

Grateful for any suggestions,
Lena

--

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Bigining with a Program of SVR

2011-05-06 Thread ypriverol

Thanks Max. I'm using now the library caret with my data. But the models
showed a correlation under 0.7. Maybe the problem is with the variables that
I'm using to generate the model. For that reason I'm asking for some
packages that allow me to reduce the number of feature and to remove the
worst features. I read recently an article taht combine Genetic algorithm
with support vector regression to do that.

Best Regards 
Yasset  

--
View this message in context: 
http://r.789695.n4.nabble.com/Bigining-with-a-Program-of-SVR-tp3484476p3503918.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] convert count data to binary data

2011-05-06 Thread Christopher G Oakley

Is there a way to generate a new dataframe that produces x lines based on the 
contents of a column?

for example: I would like to generate a new dataframe with 70 lines of data[1, 
1:3], 67 lines of data[2, 1:3], 75lines of data[3,1:3] and so on up to numrow = 
sum(count).

 data

pop fam yesorno count
1 126 170
1 127 167
1 128 175
1 126 020
1 127 023
1 128 015


Thanks,

Chris


Department of Biological Science
Florida State University
319 Stadium Drive
Tallahassee, FL 32306-4295

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Draw a nomogram after glm

2011-05-06 Thread Komine

Hi Frank, 
For to answer your request: 

 print(f)

Logistic Regression Model

lrm(formula = Ignition ~ FMC + Charge, data = Fire)

  Model Likelihood DiscriminationRank Discrim.
 Ratio TestIndexes  Indexes   

Obs   231LR chi2 231.58R2   0.852C   0.976
 0 96d.f. 2g8.972Dxy 0.953
 1135Pr( chi2) 0.0001gr7878.577gamma   0.953
max |deriv| 1e-06  gp   0.466tau-a   0.465
   Brier0.054 


  CoefS.E.   Wald Z Pr(|Z|)
Intercept  9.6937 1.5863  6.11  0.0001 
FMC   -0.0828 0.0138 -6.02  0.0001 
Charge-0.0047 0.0021 -2.28  0.0223  

I continue to try my nomogram. 

Thanks again
Komine

--
View this message in context: 
http://r.789695.n4.nabble.com/Draw-a-nomogram-after-glm-tp3498144p3504208.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] (no subject)

2011-05-06 Thread Nick Manginelli

I'm using the survey api. I am taking 1000 samples of size of 100 and 
replacing 20 of those values with missing values. Im trying to use 
sequential hot deck imputation, and thus I am trying to figure out how 
to replace missing values with the value before it. Other things I have 
to keep in mind is if there are two missing values side by side, how do I
 replace both those values with the value before. Also if the first of 
the sample of 100 is a missing value  I will replace that with the mean 
of the population. Im pretty sure I have to write a loop, but if anyone 
can help me figuring how to write this I would appreciate it greatly. 
Thank you


Nick Manginelli
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] coxph and survfit issue - strata

2011-05-06 Thread Eva Bouguen

Dear users,

In a study with recurrent events: 
My objective is to get estimates of survival (obtained through a Cox model) by 
rank of recurrence and by treatment group.
With the following code (corresponding to a model with a global effect of the 
treatment=rx), I get no error and manage to obtain what I want : 
data-(bladder)
model1-coxph(Surv(stop,event)~rx+strata(enum)+cluster(id),data=bladder)
data1-data.frame(rx=1)
survfit(model1,newdata=data1)

But with the model with strata by treatment interaction (corresponding to a 
model with an effect of the treatment by rank), I get the following error:
model2-coxph(Surv(stop,event)~rx*strata(enum)+cluster(id),data=bladder)

data1-data.frame(rx=1)

survfit(model2,newdata=data1)
Erreur dans strata(enum) : objet enum non trouvé =error in strata(enum) : 
object enum not found

Would you have any idea to help me?

Thanks in advance,

Eva




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] My First Attempt at Screen Scraping with R

2011-05-06 Thread Abraham Mathew

Hello Folks,

I'm working on trying to scrape my first web site and ran into a issue
because I'm really
don't know anything about regular expressions in R.

library(XML)
library(RCurl)

site - http://thisorthat.com/leader/month;
site.doc - htmlParse(site, ?, xmlValue)


At the ?, I realize that I need to insert a regex command which will
decipher the contents of the web page...right?

First, I'm not sure if the contents of the site would be considered a table
and I'm also not sure how to disregard pictures
when scraping the site.



 sessionInfo()
R version 2.13.0 (2011-04-13)
Platform: i686-pc-linux-gnu (32-bit)


Please Help!
Abraham

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Sweave: no eps fig

2011-05-06 Thread Marc Schwartz

On May 6, 2011, at 6:00 PM, mk90...@gmx.de wrote:

 Hi everyone,
 
 I'm using R, Latex and Sweave for some years now, but today it confuses me 
 alot:
 
 Running Sweave produces only figures in .pdf format, no .eps figures.
 
 The header looks like this:
 echo=TRUE, fig=TRUE, label=Fig1=
 
 There was no error message.
 
 Does anybody have an idea?
 Any changes in the Sweave-package?
 
 Or a missing driver or something like that?
 I recently switched to another OS (Fedora RedHat) - might that be the reason?
 
 Grateful for any suggestions,
 Lena


Starting with R 2.13.0, the default in Sweave is to produce only PDF and not 
EPS.

You really only need EPS plots if you are doing postscript magic (eg. PSTricks, 
which I use) and need to use latex rather than pdflatex.

From news():

o   The default for Sweave() is to produce only PDF figures (rather
than both EPS and PDF).

Always a good idea to read news() if running a new version...  :-)

See ?RweaveLatex

You can either set:

  echo=TRUE, fig=TRUE, label=Fig1, eps=true=

or globally:

  \SweaveOpts{eps=true}

HTH,

Marc Schwartz

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] convert count data to binary data

2011-05-06 Thread Marc Schwartz

On May 6, 2011, at 3:15 PM, Christopher G Oakley wrote:

 Is there a way to generate a new dataframe that produces x lines based on the 
 contents of a column?
 
 for example: I would like to generate a new dataframe with 70 lines of 
 data[1, 1:3], 67 lines of data[2, 1:3], 75lines of data[3,1:3] and so on up 
 to numrow = sum(count).
 
 data
 
 pop fam yesorno count
 1 126 170
 1 127 167
 1 128 175
 1 126 020
 1 127 023
 1 128 015
 
 
 Thanks,
 
 Chris


# Better not to use 'data' as the name of an R object to avoid 
# confusion with certain functions where 'data' is the name of 
# an argument, such as regression models. R is smart enough 
# to generally know the difference, but it can make reading code
# less confusing

 DF
  pop fam yesorno count
1   1 126   170
2   1 127   167
3   1 128   175
4   1 126   020
5   1 127   023
6   1 128   015


Use rep() to generate a vector of repeated indices (?rep):

 rep(1:nrow(DF), DF$count)
  [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 [34] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 [67] 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
[100] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
[133] 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
[166] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
[199] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
[232] 4 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 6 6 6 6 6 6 6 6 6
[265] 6 6 6 6 6 6


 table(rep(1:nrow(DF), DF$count))

 1  2  3  4  5  6 
70 67 75 20 23 15 


Now use that vector as input:

DF.New - DF[rep(1:nrow(DF), DF$count), 1:3]


 str(DF.New)
'data.frame':   270 obs. of  3 variables:
 $ pop: int  1 1 1 1 1 1 1 1 1 1 ...
 $ fam: int  126 126 126 126 126 126 126 126 126 126 ...
 $ yesorno: int  1 1 1 1 1 1 1 1 1 1 ...


 with(DF.New, table(fam, yesorno))
 yesorno
fam0  1
  126 20 70
  127 23 67
  128 15 75


If you might need something more generalized to handle generating 'raw' data of 
various types from a contingency table, search the list archives for the 
function expand.dft, which I posted a few years ago and I think found its way 
into a couple of CRAN packages.

HTH,

Marc Schwartz

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Generalized Hyperbolic distribution



On May 6, 2011, at 5:17 PM, claire wrote:


How to use the package generalized hyperbolic distribution in order to
estimate the four parameters in the NIG-distribution? I have a data  
material

with stock returns that I want to fit the parameters to.


On StackOverfolw you have already been told:
The ghyp package has functions fit.NIGuv (for univariate data) and a  
fit.NIGmv (for multivariate) data, and it's all very clearly described  
in the doc for the package. Did you look at it or try it out?

by ... – Prasad Chalasani

Why are you now cross-posting this question here?

--

David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] coxph and survfit issue - strata