date:20071201

Re: [R] compare strings

2007-12-01 Thread Peter Dalgaard

Bernd Jagla wrote:
> It helps writing down these question, you are then getting much closer to an
> answer...
>
> summary(as.integer(t3[,2]) == as.integer(t3[,4]) & as.integer(t3[,3]) ==
> as.integer(t3[,5]))
>
> will compare two pairs of column pairs and give a count of flase and true
> rows...
>
>   
If they really are factors with different level sets, I think you might 
prefer as.character() there.

 > x <- factor(c("a","b"))
 > y <- factor(c("b","c"))
 > x==y
Error in Ops.factor(x, y) : level sets of factors are different
 > as.integer(x)==as.integer(y)
[1] TRUE TRUE
 > as.character(x)==as.character(y)
[1] FALSE FALSE

Also, extending the above slightly:
 > d <- as.character(x)==as.character(y)
 > table(d)
d
FALSE  TRUE
2 1
 > which(!d)
[1] 1 2



> -B
>
> |-Original Message-
> |From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On
> |Behalf Of Bernd Jagla
> |Sent: Friday, November 30, 2007 10:25 PM
> |To: r-help@r-project.org
> |Subject: [R] compare strings
> |
> |Sorry for the question, but I really cannot find the right search terms to
> |find an answer..
> |
> |
> |
> |I have a data frame with strings in some of the columns.
> |
> |I want to know all the rows where the strings in both columns are equal.
> |
> |
> |
> |How do I do this?
> |
> |
> |
> |Thanks,
> |
> |
> |
> |Bernd
> |
> |
> | [[alternative HTML version deleted]]
> |
> |__
> |R-help@r-project.org mailing list
> |https://stat.ethz.ch/mailman/listinfo/r-help
> |PLEASE do read the posting guide http://www.R-project.org/posting-
> |guide.html
> |and provide commented, minimal, self-contained, reproducible code.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>   


-- 
   O__   Peter Dalgaard Øster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark  Ph:  (+45) 35327918
~~ - ([EMAIL PROTECTED])  FAX: (+45) 35327907

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] compare strings

2007-12-01 Thread Peter Dalgaard

Peter Dalgaard wrote:
> Also, extending the above slightly:
>   
Oups. Two lines fell out. I meant:

 > y <- factor(c("b","c","b"))
 > x <- factor(c("a","b","b"))
 > d <- as.character(x)==as.character(y)
 > table(d)
d
FALSE  TRUE
2 1
 > which(!d)
[1] 1 2

-- 
   O__   Peter Dalgaard Øster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark  Ph:  (+45) 35327918
~~ - ([EMAIL PROTECTED])  FAX: (+45) 35327907

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] modeling time series with ARIMA

2007-12-01 Thread eugen pircalabelu

Good afternoon!

I'm trying to model a time series on the following data, which represent a 
monthly consumption of juices:

>x<-scan()
1: 2859  3613  3930  5193  4523  3226  4280  3436  3235  3379  3517  6022
13:  4465  4604  5441  6575  6092  6607  6390  6150  6488  5912  6228 10196
25:  7612  7270  8617  9535  8449  8520  9148  8077  7824  7991  7660 12130
37:  9135  9512  9631 12642 11369 12140 13953 12421 11081
46: 
Read 45 items

> arima(x,order=c(2,1,2), seasonal=list(order=c(0,1,0), period=12))->l
> acf(l$resid)
> sd(l$resid)
> Box.test(l$resid)

Now, my problem:
1. All the analysis that i have seen regarding ARIMA modeling, had the 
residuals acf,  within the confidence interval, while my residual acf at first 
lag is very close to one (and going out of the confidence interval), even if 
the Box.test can not reject the null hypothesis of a significant acf for all my 
residuals.
I imagine that i am doing something wrong with my model. Is the acf at lag 1 a 
sign that my residuals are not white noise, or what is wrong here?

2. What would be the impact of an inappropriate model on the confidence 
interval for a future prediction? (disregarding the fact  that an inappropriate 
model would give a bad forecast on future value, could it have also an impact 
on enlarging the interval?)

3. As a rule of thumb, do you chose your model by selecting the lowest AIC, or 
by the lowest standard deviation of the residuals ?

Thank you and have  a great day!



   
-

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] R function for percentrank

2007-12-01 Thread tom soyer

Hi,

Does anyone know if R has a built-in function that is equvalent to Excel's
percentrank, i.e., returns the rank of a value in a data set as a percentage
of the data set?

Thanks,

-- 
Tom

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Rolling Correlations

2007-12-01 Thread Shubha Vishwanath Karanth

Hi R,

 

I want to do some rolling correlations. But before, I searched for
"?rollingCorrelation" and tried the example in it. But I was not
successful. What could be the problem? Here is the code I tried:

 

> library(zoo)

> library(PerformanceAnalytics)

> rollingCorrelation([EMAIL PROTECTED],1],[EMAIL PROTECTED],n=12)

Error in inherits(object, "zoo") : object "manager.ts" not found

 

 

BR, Shubha


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Correlation among random effects in lme

2007-12-01 Thread Pengyuan Liu

Dear R-helper:
  I measured 40 different mouse strains. In 10 of them, I measured 2 males and 
4 females for each strain. In another 10 of them, I measured 4 males and 2 
females for each strain. In the remaining 20, I measured 3 males and 3 females 
for each strain. Totally, I have 240 data for 40 strains each of them have 6 
data. 
  The model for the data can be written as, pheno=mu+sex+strain+e, where mu 
(mean) and sex (two levels) are fixed effects, and strain (40 levels) and e are 
random effects:
  result1 <- lme(pheno~sex, random=list(strain=~1),data=dat) (model_1)
  This works and I obtained variances for both strain and residual. However, 
these 40 strains are highly correlated. I don't know to how to incorporate 
dependence among these strains into the above model. I already obtained a 
correlation matrix for the 40 strains. The below R does not work:
  result2 <- lme(pheno~sex, random=list(strain=~1), 
correlation=corSymm(value=c(0.8,0.7,...,0.4),form =~ Strain), data=dat) 
(model_2)
   
  I will greatly appreciate if you can give some suggestions
   
   


Pengyuan Liu
Dept of Surgery
Washington Univ in St Louis
   
-

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Help with tables

2007-12-01 Thread Alejandro Rodríguez

Hello,  I'm new using R and developing tables.

I have a problem in developing a table.  In a questionaire I made I ask this
question "Please tell me the first three sympthoms caused by Respiratory
tract infection you've caught this year", then the people answer three
sympthoms, the first mention (Top of mind) is saved in a variable called
"P2_A", the second mention in a variable called "P2_B" and the third mention
in "P2_C".  Each answer is coded with numbers like this:

1 = Flu
2 = Cough
3 = Asthma


13 = Fever

I've already done a K-cluster analysis and segmented my data base.  So my
first task is to develop tables to develop tables of frequencies crossing
Cluster vs. "P2_A" in order to know which are the top of mind products and
their frequencies by cluster, then the second mention and third mention.
I've used this instruction which worked well:

> table(infec1,aglomera)
  aglomera
infec1   1   2   3   4
1  117  88  76  83
2   10  10   9   7
3   15  11  14  14
42   0   1   1
52   3   1   0
61   0   1   0
83   3   0   1
93   1   1   0
11   0   0   1   1

Where "infec1" is a factor of "P2_A" and "aglomera" is a factor of the
variable "Cluster" I made.  It worked well when I study them
separately...however I would like to know the TOTAL mentions of each
sympthom by cluster.  I've done this exercise in SPSS using the "Multiple
Response" instruction first grouping my three variables (i.e. "P2_A", "P2_B"
and "P2_C") into a variable called "sick" and cross tabulating it vs. QCL_1
(my cluster variable) and it gave me the table I need in this way (showed at
the bottom of this mail):

How can I made a table like this in R???.  I've tried combining my variables
in a matrix and using xtabs, ftable, table, structable and a lot of
combination of them but I haven't had succed with any of them.

Please help me with this issue, I don't want to keep using SPSS any more.

Thanx in advance.

P.D. Result from SPSS is shown below.



* * *  C R O S S T A B U L A T I O N  * * *

   $SICK (group)  mr sick
by QCL_1  Cluster Number of Case


   QCL_1

Count  I
   I  Row
   I Total
   I 1  I 2  I 3  I 4  I
$SICK  +++++
1  I   130  I97  I83  I89  I   399
  Gripe, influenza, ca IIIII  83.1
   +++++
2  I53  I55  I42  I46  I   196
  Tos de cualquier tip IIIII  40.8
   +++++
3  I33  I36  I36  I39  I   144
  Dolor irritación IIIII  30.0
   +++++
4  I 5  I 1  I 2  I 3  I11
  Bronquitis   IIIII   2.3
   +++++
5  I 5  I 4  I 1  I 0  I10
  SinusitisIIIII   2.1
   +++++
6  I 1  I 1  I 1  I 3  I 6
  Rinitis  IIIII   1.3
   +++++
8  I 8  I 6  I 4  I 4  I22
  Amigdalitis  IIIII   4.6
   +++++
9  I 6  I 4  I 1  I 2  I13
  Faringitis   IIIII   2.7
   +++++
   10  I 1  I 2  I 2  I 3  I 8
  Laringitis   IIIII   1.7
   +++++
   11  I 1  I 1  I 1  I 1  I 4
  Neumonia IIIII.8
   +++++
   13  I 0  I 0  I 1  I 0  I 1
  Asma IIIII.2
   +++++
   Column  153  116  104  107  480
Total 31.9 24.2 21.7 22.3100.0

Percents and totals based on respondents

480 valid cases;  0 missing cases



Act. Calef Alejandro Rodríguez Cuevas
Analista de mercado

Laboratorios Farmasa S.A. de C.V.
Schwabe Mexico, S.A. de C.V.

Bufalo Nr. 27
Col. del Valle 03100
Mexico, D.F.
Mexico

Tel. 52 00 26

[R] How to cbind DF:s with differing number of rows?

2007-12-01 Thread Lauri Nikkinen

#Hi R-users,
#Suppose that I have a data.frame like this:

y1 <- rnorm(10) + 6.8
y2 <- rnorm(10) + (1:10*1.7 + 1)
y3 <- rnorm(10) + (1:10*6.7 + 3.7)
y <- c(y1,y2,y3)
x <- rep(1:3,10)
f <- gl(2,15, labels=paste("lev", 1:2, sep=""))
g <- seq(as.Date("2000/1/1"), by="day", length=30)
DF <- data.frame(x=x,y=y, f=f, g=g)
DF$g[DF$x == 1] <- NA
DF$x[3:6] <- NA
DF$wdays <- weekdays(DF$g)

DF

#For EDA purposes, I would like to calculate frequences in each variable
g <- lapply(DF, function(x) as.data.frame(table(x)))

#After this, I would like to cbind these data.frames (in g) into a
single data.frame (which to export to MS Excel)
#do.call(cbind, g) does not seem to work because of the different
number of rows in each data.frame.
#The resulting data.frame shoul look like this (only two variables
printed here):

Rowid;x;Freq.x;y;Freq.y; # etc...
1;1;9;1.69151845313816;1;
2;2;9;5.03748767699799;1;
3;3;8;5.37387749444247;1;
4;Empty;Empty;6.83926626214299;1;
5;Empty;Empty;6.97484558968873;1;
6;Empty;Empty;7.11023821708323;1;
7;Empty;Empty;7.1348316549091;1;
8;Empty;Empty;7.16727166992407;1;
9;Empty;Empty;7.35983428577469;1;
10;Empty;Empty;7.7596470136235;1;
11;Empty;Empty;7.86369414967578;1;
12;Empty;Empty;7.97164674771006;1;
13;Empty;Empty;8.0787295301318;1;
14;Empty;Empty;8.14161030348166;1;
15;Empty;Empty;8.20134832959661;1;
16;Empty;Empty;10.1469115339016;1
17;Empty;Empty;12.7442067301746;1
18;Empty;Empty;14.0865167751202;1
19;Empty;Empty;15.8280312307450;1
20;Empty;Empty;16.0484499360756;1
21;Empty;Empty;17.079522214;1
22;Empty;Empty;18.1254057823357;1
23;Empty;Empty;22.7169729331525;1
24;Empty;Empty;30.7237748005358;1
25;Empty;Empty;37.2141271786934;1
26;Empty;Empty;44.4954633229803;1
27;Empty;Empty;50.2302409305761;1
28;Empty;Empty;57.8913405112114;1
29;Empty;Empty;64.849897477945;1
30;Empty;Empty;71.4205263353053;1


#Anyone have an idea how to do this?

#Thanks,
#Lauri

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Rating R Helpers

2007-12-01 Thread Gabor Grothendieck

On Dec 1, 2007 2:21 AM,  <[EMAIL PROTECTED]> wrote:
> To me a much more urgent initiative is some kind of user online review
> system for packages, even something as simple as that used by Amazon.com
> has for customer review of books.
>
> I think the need for this is rather urgent, in fact.  Most packages are
> very good, but I regret to say some are pretty inefficient and others
> downright dangerous.  You don't want to discourage people from
> submitting their work to CRAN, but at the same time you do want some
> mechanism that allows users to relate their experience with it, good or
> bad.

You can get a very rough idea of this automatically by making a list of which
packages  are dependents of other packages.

library(pkgDepTools) # from BioC
AP <- available.packages()
dep <- AP[, "Depends"]
deps <- unlist(sapply(dep, pkgDepTools:::cleanPkgField))
sort(table(deps))

Of course some packages are more naturally end-user oriented and so
would never make such a list and others may be good but just no one
knows about them so they have never been used.   Some packages
might get on the list because an author has two packages and one uses
the other so an enhancement could be to eliminate dependencies with
a common author.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to cbind DF:s with differing number of rows?

2007-12-01 Thread jim holtman

This should do it for you by padding out the rows so they are the same length:

> # use your 'g' and pad out the rows so they are the same length
> str(g)
List of 5
 $ x:'data.frame':  3 obs. of  2 variables:
  ..$ x   : Factor w/ 3 levels "1","2","3": 1 2 3
  ..$ Freq: int [1:3] 9 9 8
 $ y:'data.frame':  30 obs. of  2 variables:
  ..$ x   : Factor w/ 30 levels "4.21178116845085",..: 1 2 3 4 5 6 7 8 9 10 ...
  ..$ Freq: int [1:30] 1 1 1 1 1 1 1 1 1 1 ...
 $ f:'data.frame':  2 obs. of  2 variables:
  ..$ x   : Factor w/ 2 levels "lev1","lev2": 1 2
  ..$ Freq: int [1:2] 15 15
 $ g:'data.frame':  20 obs. of  2 variables:
  ..$ x   : Factor w/ 20 levels "2000-01-02","2000-01-03",..: 1 2 3 4
5 6 7 8 9 10 ...
  ..$ Freq: int [1:20] 1 1 1 1 1 1 1 1 1 1 ...
 $ wdays:'data.frame':  7 obs. of  2 variables:
  ..$ x   : Factor w/ 7 levels "Friday","Monday",..: 1 2 3 4 5 6 7
  ..$ Freq: int [1:7] 2 3 3 4 3 2 3
> # determine max nrows
> max.rows <- max(sapply(g, nrow))
> g.new <- lapply(g, function(.x){
+ if (nrow(.x) < max.rows) .x <- rbind(.x, matrix(NA, ncol=2,
nrow=max.rows - nrow(.x),
+ dimnames=list(NULL, c('x', 'Freq'
+ .x
+ })
> do.call('cbind', g.new)
x.x x.Freq  y.x y.Freq  f.x f.Freqg.x g.Freq
wdays.x wdays.Freq
1 1  9 4.21178116845085  1 lev1 15 2000-01-02  1
 Friday  2
2 2  9 4.78984323641143  1 lev2 15 2000-01-03  1
 Monday  3
3 3  8  5.4787594194582  1  NA 2000-01-05  1
Saturday  3
4   NA  5.5853001128225  1  NA 2000-01-06  1
 Sunday  4
5   NA 5.96437138758995  1  NA 2000-01-08  1
Thursday  3
6   NA 5.97953161588198  1  NA 2000-01-09  1
Tuesday  2
7   NA 6.17354618925767  1  NA 2000-01-11  1
Wednesday  3
8   NA 6.49461161284364  1  NA 2000-01-12  1
NA
9   NA 6.98364332422208  1  NA 2000-01-14  1
NA
10  NA 7.12950777181536  1  NA 2000-01-15  1
NA
11  NA 7.28742905242849  1  NA 2000-01-17  1
NA
12  NA  7.3757813516535  1  NA 2000-01-18  1
NA
13  NA 7.53832470512922  1  NA 2000-01-20  1
NA
14  NA 8.39528080213779  1  NA 2000-01-21  1
NA
15  NA 10.6249309181431  1  NA 2000-01-23  1
NA
16  NA 11.1550663909848  1  NA 2000-01-24  1
NA
17  NA 11.3189773716082  1  NA 2000-01-26  1
NA
18  NA 12.8838097369011  1  NA 2000-01-27  1
NA
19  NA 15.5438362106853  1  NA 2000-01-29  1
NA
20  NA 17.1212211950981  1  NA 2000-01-30  1
NA
21  NA 17.8821363007311  1  NANA
NA
22  NA 18.5939013212175  1  NANA
NA


On Dec 1, 2007 8:05 AM, Lauri Nikkinen <[EMAIL PROTECTED]> wrote:
> #Hi R-users,
> #Suppose that I have a data.frame like this:
>
> y1 <- rnorm(10) + 6.8
> y2 <- rnorm(10) + (1:10*1.7 + 1)
> y3 <- rnorm(10) + (1:10*6.7 + 3.7)
> y <- c(y1,y2,y3)
> x <- rep(1:3,10)
> f <- gl(2,15, labels=paste("lev", 1:2, sep=""))
> g <- seq(as.Date("2000/1/1"), by="day", length=30)
> DF <- data.frame(x=x,y=y, f=f, g=g)
> DF$g[DF$x == 1] <- NA
> DF$x[3:6] <- NA
> DF$wdays <- weekdays(DF$g)
>
> DF
>
> #For EDA purposes, I would like to calculate frequences in each variable
> g <- lapply(DF, function(x) as.data.frame(table(x)))
>
> #After this, I would like to cbind these data.frames (in g) into a
> single data.frame (which to export to MS Excel)
> #do.call(cbind, g) does not seem to work because of the different
> number of rows in each data.frame.
> #The resulting data.frame shoul look like this (only two variables
> printed here):
>
> Rowid;x;Freq.x;y;Freq.y; # etc...
> 1;1;9;1.69151845313816;1;
> 2;2;9;5.03748767699799;1;
> 3;3;8;5.37387749444247;1;
> 4;Empty;Empty;6.83926626214299;1;
> 5;Empty;Empty;6.97484558968873;1;
> 6;Empty;Empty;7.11023821708323;1;
> 7;Empty;Empty;7.1348316549091;1;
> 8;Empty;Empty;7.16727166992407;1;
> 9;Empty;Empty;7.35983428577469;1;
> 10;Empty;Empty;7.7596470136235;1;
> 11;Empty;Empty;7.86369414967578;1;
> 12;Empty;Empty;7.97164674771006;1;
> 13;Empty;Empty;8.0787295301318;1;
> 14;Empty;Empty;8.14161030348166;1;
> 15;Empty;Empty;8.20134832959661;1;
> 16;Empty;Empty;10.1469115339016;1
> 17;Empty;Empty;12.7442067301746;1
> 18;Empty;Empty;14.0865167751202;1
> 19;Empty;Empty;15.8280312307450;1
> 20;Empty;Empty;16.0484499360756;1
> 21;Empty;Empty;17.079522214;1
> 22;Empty;Empty;18.1254057823357;1
> 23;Empty;Empty;22.7169729331525;1
> 24;Empty;Empty;30.7237748005358;1
> 25;Empty;Empty;37.2141271786934;1
> 26;Empty;Empty;44.4954633229803;1
> 27;Empty;Empty;50.2302409305761;1
> 28;Empty;Empty;57.8913405112114;1
> 29;Empty;Empty;64.849897477945;1
> 30;

Re: [R] lmer and method call

2007-12-01 Thread Douglas Bates

On Nov 29, 2007 8:09 PM, M-J Milloy <[EMAIL PROTECTED]> wrote:
>
> Hello all,
>
> I'm attempting to fit a generalized linear mixed-effects model using lmer
> (R v 2.6.0, lmer 0.99875-9, Mac OS X 10.4.10) using the call:
>
> vidusLMER1 <- lmer(jail ~ visit + gender + house + cokefreq + cracfreq +
> herofreq + borcur + comc + (1 | code), data = vidusGD, family = binomial,
> correlation = corCompSymm(form = 1 | ID), method = "ML")
>
> Although the model fits, the summary indicates the model is a "Generalized
> linear mixed model fit using Laplace". I've tried any number of
> permutations; is only Laplace supported in lmer, despite the text of the
> help file?

The help file does say that for a generalized linear mixed model
(GLMM), which is what family = binomial implies, the estimation
criterion is always "ML" (maximum likelihood) as opposed to "REML"
(restricted, or residual, maximum likelihood).  So stating method =
"ML" is redundant.

For a GLMM, however, the log-likelihood cannot not be evaluated
directly and must be approximated.  Here the help file is misleading
because it implies that there are three possible approximations, "PQL"
(penalized quasi-likelihood), "Laplace" and "AGQ" (adaptive Gaussian
quadrature).  AGQ has not yet been implemented so the only effective
choices are PQL and Laplace.  The default is PQL, to refine the
starting estimates, followed by optimization of the Laplace
approximation.  In some cases it is an advantage to suppress the PQL
iterations which can be done with one of the settings for the control
argument.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] lmer and method call

2007-12-01 Thread Douglas Bates

On Dec 1, 2007 9:26 AM, Douglas Bates <[EMAIL PROTECTED]> wrote:
> On Nov 29, 2007 8:09 PM, M-J Milloy <[EMAIL PROTECTED]> wrote:
> >
> > Hello all,
> >
> > I'm attempting to fit a generalized linear mixed-effects model using lmer
> > (R v 2.6.0, lmer 0.99875-9, Mac OS X 10.4.10) using the call:
> >
> > vidusLMER1 <- lmer(jail ~ visit + gender + house + cokefreq + cracfreq +
> > herofreq + borcur + comc + (1 | code), data = vidusGD, family = binomial,
> > correlation = corCompSymm(form = 1 | ID), method = "ML")
> >
> > Although the model fits, the summary indicates the model is a "Generalized
> > linear mixed model fit using Laplace". I've tried any number of
> > permutations; is only Laplace supported in lmer, despite the text of the
> > help file?
>
> The help file does say that for a generalized linear mixed model
> (GLMM), which is what family = binomial implies, the estimation
> criterion is always "ML" (maximum likelihood) as opposed to "REML"
> (restricted, or residual, maximum likelihood).  So stating method =
> "ML" is redundant.
>
> For a GLMM, however, the log-likelihood cannot not be evaluated
> directly and must be approximated.  Here the help file is misleading
> because it implies that there are three possible approximations, "PQL"
> (penalized quasi-likelihood), "Laplace" and "AGQ" (adaptive Gaussian
> quadrature).  AGQ has not yet been implemented so the only effective
> choices are PQL and Laplace.  The default is PQL, to refine the
> starting estimates, followed by optimization of the Laplace
> approximation.  In some cases it is an advantage to suppress the PQL
> iterations which can be done with one of the settings for the control
> argument.

I forgot to mention that the correlation argument has no effect in
this call.  That argument is for the lme function in the nlme package.
 In lmer it is ignored.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to cbind DF:s with differing number of rows?

2007-12-01 Thread Lauri Nikkinen

Thanks Jim! That helped me a lot!

Cheers,
Lauri

2007/12/1, jim holtman <[EMAIL PROTECTED]>:
> This should do it for you by padding out the rows so they are the same length:
>
> > # use your 'g' and pad out the rows so they are the same length
> > str(g)
> List of 5
>  $ x:'data.frame':  3 obs. of  2 variables:
>  ..$ x   : Factor w/ 3 levels "1","2","3": 1 2 3
>  ..$ Freq: int [1:3] 9 9 8
>  $ y:'data.frame':  30 obs. of  2 variables:
>  ..$ x   : Factor w/ 30 levels "4.21178116845085",..: 1 2 3 4 5 6 7 8 9 10 ...
>  ..$ Freq: int [1:30] 1 1 1 1 1 1 1 1 1 1 ...
>  $ f:'data.frame':  2 obs. of  2 variables:
>  ..$ x   : Factor w/ 2 levels "lev1","lev2": 1 2
>  ..$ Freq: int [1:2] 15 15
>  $ g:'data.frame':  20 obs. of  2 variables:
>  ..$ x   : Factor w/ 20 levels "2000-01-02","2000-01-03",..: 1 2 3 4
> 5 6 7 8 9 10 ...
>  ..$ Freq: int [1:20] 1 1 1 1 1 1 1 1 1 1 ...
>  $ wdays:'data.frame':  7 obs. of  2 variables:
>  ..$ x   : Factor w/ 7 levels "Friday","Monday",..: 1 2 3 4 5 6 7
>  ..$ Freq: int [1:7] 2 3 3 4 3 2 3
> > # determine max nrows
> > max.rows <- max(sapply(g, nrow))
> > g.new <- lapply(g, function(.x){
> + if (nrow(.x) < max.rows) .x <- rbind(.x, matrix(NA, ncol=2,
> nrow=max.rows - nrow(.x),
> + dimnames=list(NULL, c('x', 'Freq'
> + .x
> + })
> > do.call('cbind', g.new)
>x.x x.Freq  y.x y.Freq  f.x f.Freqg.x g.Freq
> wdays.x wdays.Freq
> 1 1  9 4.21178116845085  1 lev1 15 2000-01-02  1
>  Friday  2
> 2 2  9 4.78984323641143  1 lev2 15 2000-01-03  1
>  Monday  3
> 3 3  8  5.4787594194582  1  NA 2000-01-05  1
> Saturday  3
> 4   NA  5.5853001128225  1  NA 2000-01-06  1
>  Sunday  4
> 5   NA 5.96437138758995  1  NA 2000-01-08  1
> Thursday  3
> 6   NA 5.97953161588198  1  NA 2000-01-09  1
> Tuesday  2
> 7   NA 6.17354618925767  1  NA 2000-01-11  1
> Wednesday  3
> 8   NA 6.49461161284364  1  NA 2000-01-12  1
>NA
> 9   NA 6.98364332422208  1  NA 2000-01-14  1
>NA
> 10  NA 7.12950777181536  1  NA 2000-01-15  1
>NA
> 11  NA 7.28742905242849  1  NA 2000-01-17  1
>NA
> 12  NA  7.3757813516535  1  NA 2000-01-18  1
>NA
> 13  NA 7.53832470512922  1  NA 2000-01-20  1
>NA
> 14  NA 8.39528080213779  1  NA 2000-01-21  1
>NA
> 15  NA 10.6249309181431  1  NA 2000-01-23  1
>NA
> 16  NA 11.1550663909848  1  NA 2000-01-24  1
>NA
> 17  NA 11.3189773716082  1  NA 2000-01-26  1
>NA
> 18  NA 12.8838097369011  1  NA 2000-01-27  1
>NA
> 19  NA 15.5438362106853  1  NA 2000-01-29  1
>NA
> 20  NA 17.1212211950981  1  NA 2000-01-30  1
>NA
> 21  NA 17.8821363007311  1  NANA
>NA
> 22  NA 18.5939013212175  1  NANA
>NA
>
>
> On Dec 1, 2007 8:05 AM, Lauri Nikkinen <[EMAIL PROTECTED]> wrote:
> > #Hi R-users,
> > #Suppose that I have a data.frame like this:
> >
> > y1 <- rnorm(10) + 6.8
> > y2 <- rnorm(10) + (1:10*1.7 + 1)
> > y3 <- rnorm(10) + (1:10*6.7 + 3.7)
> > y <- c(y1,y2,y3)
> > x <- rep(1:3,10)
> > f <- gl(2,15, labels=paste("lev", 1:2, sep=""))
> > g <- seq(as.Date("2000/1/1"), by="day", length=30)
> > DF <- data.frame(x=x,y=y, f=f, g=g)
> > DF$g[DF$x == 1] <- NA
> > DF$x[3:6] <- NA
> > DF$wdays <- weekdays(DF$g)
> >
> > DF
> >
> > #For EDA purposes, I would like to calculate frequences in each variable
> > g <- lapply(DF, function(x) as.data.frame(table(x)))
> >
> > #After this, I would like to cbind these data.frames (in g) into a
> > single data.frame (which to export to MS Excel)
> > #do.call(cbind, g) does not seem to work because of the different
> > number of rows in each data.frame.
> > #The resulting data.frame shoul look like this (only two variables
> > printed here):
> >
> > Rowid;x;Freq.x;y;Freq.y; # etc...
> > 1;1;9;1.69151845313816;1;
> > 2;2;9;5.03748767699799;1;
> > 3;3;8;5.37387749444247;1;
> > 4;Empty;Empty;6.83926626214299;1;
> > 5;Empty;Empty;6.97484558968873;1;
> > 6;Empty;Empty;7.11023821708323;1;
> > 7;Empty;Empty;7.1348316549091;1;
> > 8;Empty;Empty;7.16727166992407;1;
> > 9;Empty;Empty;7.35983428577469;1;
> > 10;Empty;Empty;7.7596470136235;1;
> > 11;Empty;Empty;7.86369414967578;1;
> > 12;Empty;Empty;7.97164674771006;1;
> > 13;Empty;Empty;8.0787295301318;1;
> > 14;Empty;Empty;8.14161030348166;1;
> > 15;Empty;Empty;8.20134832959661;1;
> > 16;Empty;Empty;10.1469115339016;1
> > 17;Empty;Empty;12.7442067301746;1
> > 18;Empty;Empty;14.0865167751202;1
> > 19;Empty;Empty;15.8280312307450;1
> > 20;Empty;Empty;16.0484499360756;1
> > 21;Em

[R] Spellchecking Sweave documents

2007-12-01 Thread Michael Hoffman

I have been using Aspell on a Linux system, but it doesn't
understand the noweb chunks, which I'd rather it not spellcheck. I
can run it on the generated .tex files, but then changes I make
during the spellcheck will not be propagated back to the original
source. Any suggestions on how to spellcheck Sweave documents?

I see from a search that some people seem to be trying Flyspell on 
Emacs. I'd rather have a solution that runs outside of Emacs, but if 
anyone is using Flyspell successfully, I'd love to know of their 
experiences.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] lmer and method call

2007-12-01 Thread Dieter Menne

Douglas Bates  stat.wisc.edu> writes:

(lmer)

> The default is PQL, to refine the
> starting estimates, followed by optimization of the Laplace
> approximation.  In some cases it is an advantage to suppress the PQL
> iterations which can be done with one of the settings for the control
> argument.

I had found out the hard way that it is often better to let PQL 
play the game rather loosely.  Yet I never dared to tell someone, for fear
the approximation could end up in the wrong slot,

Any rules (beside trying variants) if I can trust such a result?

Dieter

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Problem with data editor when R built with --enable-mcfs

2007-12-01 Thread gginiu

Hi,

I encountered this problem when I was looking around in R, by default
R is built with --enable-mcfs and when I tried to edit any data I was
getting:

dataentry(datalist, modes) : invalid device
unable to create fontset -*-fixed-medium-r-normal--13-*-*-*-*-*-*-*

anyway I looked around code, noticed that it comes from fragment when
SUPPORT_MCFS is defined, and decided to try my luck without it,
rebuild with:

--disable-mfcs

and it worked like a charm, just doesn't have UTF-8 support now of
course what would be nice... am I missing something? I googled and
searched and found one similar message, but then it was stated it
works with current versions... my fonts in xorg.conf points to right
directories in my system and all other apps works. I run current
version of Arch linux

I would appreciate any help or hints how to solve that - whatever this
is bug or misconfiguration :)
thanks in advance,
Andrzej Giniewicz.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Sweave: Variables in code chunk headers

2007-12-01 Thread Michael Hoffman

I would like to be able to do something like this:

   <>=
   ...
   @

with mywidth set in a previous code chunk. Is there a way to do this in 
Sweave?

(Sorry for two questions in a row, I have been saving these up.)
-- 
Michael

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] finding roots (Max Like Est)

2007-12-01 Thread Berend Hasselman


Why exactly the same question?

You were told what to do.
I am new to R. I did what the previous poster said.
I found it. Write your function in terms of vector operations. Avoid  
loops if you can.

Sample input for R follows
-

# par is the thing that has to be found
# x are the observations

f  <- function(par,x) sum(2*(x-par)/(1+(x-par)^2))

# trial stuff

x  <- 1:10

# use uniroot to find a value for par such that f(par,x) == 0
# interval for par is obvious (lower=1 and upper=10)

paropt  <- uniroot(f,c(1,10),tol=1e-8,x)

paropt


It works.

Berend Hasselman

On 30 Nov 2007, at 17:59, stathelp wrote:

>
> I have this maximum liklihood estimate problem
>
> i need to find the roots of the following:
>
> [sum (from i=1 to n) ] ((2(x[i]-parameter)/(1+(x[i]-parameter)^2))=0
>
> given to me is the x vector which has length 100
>
> how would I find the roots using R?
>
> I have 2 thoughts.. 1 is using a grid search ... eg. brute  
> force, just
> choosing a whole bunch of different values for my parameter   
> such as
> parameter=seq(0,100,.1)  and this is what I have so far,
>
>
>   john=rep(0,length(x))
>   for(i in 1:length(x)) {
>   john[i]=((x[i]-0)/(1+(x[i]-0)^2))
> }
> sum(john)
>
> then
>
>   john=rep(0,length(x))
>   for(i in 1:length(x)) {
>   john[i]=((x[i]-.1)/(1+(x[i]-.1)^2))
> }
> sum(john)
>
> then
>
>   john=rep(0,length(x))
>   for(i in 1:length(x)) {
>   john[i]=((x[i]-.2)/(1+(x[i]-.2)^2))
> }
> sum(john)
>
> something like this ...
>
> theta=seq(0,100,.1)
>   john=rep(0,length(x))
>   for(i in 1:length(x)) {
>   john[i]=((x[i]-theta[j])/(1+(x[i]-theta[j])^2))
> }
> sum(john)
>
> but something is wrong with my code because its not working. Anyone  
> have any
> ideas? (I am very new to R and statistical software in general)
>
> The 2nd thought was to use the Newton Raphson Method, but, I dont  
> even know
> where to start with that.
>
> Any thoughts help.
>
> Thanks
>
>
> -- 
> View this message in context: 
> http://www.nabble.com/finding-roots-%28Max-Like-Est%29-tf4901659.html#a14040895
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Rating R Helpers

2007-12-01 Thread Mark Kimpel

I'll throw one more idea into the mix. I agree with Bill that a rating
system for respondents is probably not that practical and of not the highest
importance. It also seems like a recipe for creating inter-personal problems
that the list doesn't need.

I do like Bill's idea of a review system for packages, which could be
incorporated into my idea that follows...

What I would find useful would be some sort of tagging system for messages.
I can't count the times I've remembered seeing a message that addresses a
question I have down the road but, when Googled, I can't find it. It would
be so nice, for example, to reliably be able to find all messages related to
a certain package or package function posted within the last X days. This
could be implemented as simply as asking posters to provide keywords at the
end of a message, but it would be great if they could somehow be pulled out
of a message and stored in a DB. For instance keywords could be surrounded
by a sequence of special characters, which a parser could then extract and
store in a DB along with the message.

Of course, this would be work to set up, but how many of our "experts" who
so kindly give of their time, get exasperated when similar questions keep
popping up on the list? Also, if we had a web-accessable DB, the responses,
not the responders, could be rated as to how well a reply takes care of an
issue. Thus, over time, a sort of auto-wiki could be born. I can think of
more uses for this as well. For example a developer could quickly check to
see what usability problems or suggestions have cropped up of on individual
package.

Mark

On Dec 1, 2007 2:21 AM, <[EMAIL PROTECTED]> wrote:

> This seems a little impractical to me.  People respond so much at random
> and most only tackle questions with which they feel comfortable.  As
> it's not a competition in any sense, it's going to be hard to rank
> people in any effective way.  But suppose you succeed in doing so, then
> what?
>
> To me a much more urgent initiative is some kind of user online review
> system for packages, even something as simple as that used by Amazon.com
> has for customer review of books.
>
> I think the need for this is rather urgent, in fact.  Most packages are
> very good, but I regret to say some are pretty inefficient and others
> downright dangerous.  You don't want to discourage people from
> submitting their work to CRAN, but at the same time you do want some
> mechanism that allows users to relate their experience with it, good or
> bad.
>
>
> Bill Venables
> CSIRO Laboratories
> PO Box 120, Cleveland, 4163
> AUSTRALIA
> Office Phone (email preferred): +61 7 3826 7251
> Fax (if absolutely necessary):  +61 7 3826 7304
> Mobile: +61 4 8819 4402
> Home Phone: +61 7 3286 7700
> mailto:[EMAIL PROTECTED]
> http://www.cmis.csiro.au/bill.venables/
>
> -Original Message-
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
> On Behalf Of Doran, Harold
> Sent: Saturday, 1 December 2007 6:13 AM
> To: R Help
> Subject: [R] Rating R Helpers
>
> Since R is open source and help may come from varied levels of
> experience on R-Help, I wonder if it might be helpful to construct a
> method that can be used to "rate" those who provide help on this list.
>
> This is something that is done on other comp lists, like
> http://www.experts-exchange.com/.
>
> I think some of the reasons for this are pretty transparent, but I
> suppose one reason is that one could decide to implement the advise of
> those with "superior" or "expert" levels. In other words, you can trust
> the advice of someone who is more experienced more than someone who is
> not. Currently, there is no way to discern who on this list is really an
> R expert and who is not. Of course, there is R core, but most people
> don't actually know who these people are (at least I surmise that to be
> true).
>
> If this is potentially useful, maybe one way to begin the development of
> such ratings is to allow the original poster to "rate" the level of help
> from those who responded. Maybe something like a very simple
> questionnaire on a likert-like scale that the original poster would
> respond to upon receiving help which would lead to the accumulation of
> points for the responders. Higher points would result in higher levels
> of expertise (e.g., novice, ..., wizaRd).
>
> Just a random thought. What do others think?
>
> Harold
>
>
>
>
>[[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and p

Re: [R] Rating R Helpers

2007-12-01 Thread Gabor Grothendieck

I don't think r-help is really intended for packages although for some
very popular packages questions appear on it anyways sometimes.

On Dec 1, 2007 11:28 AM, Mark Kimpel <[EMAIL PROTECTED]> wrote:
> I'll throw one more idea into the mix. I agree with Bill that a rating
> system for respondents is probably not that practical and of not the highest
> importance. It also seems like a recipe for creating inter-personal problems
> that the list doesn't need.
>
> I do like Bill's idea of a review system for packages, which could be
> incorporated into my idea that follows...
>
> What I would find useful would be some sort of tagging system for messages.
> I can't count the times I've remembered seeing a message that addresses a
> question I have down the road but, when Googled, I can't find it. It would
> be so nice, for example, to reliably be able to find all messages related to
> a certain package or package function posted within the last X days. This
> could be implemented as simply as asking posters to provide keywords at the
> end of a message, but it would be great if they could somehow be pulled out
> of a message and stored in a DB. For instance keywords could be surrounded
> by a sequence of special characters, which a parser could then extract and
> store in a DB along with the message.
>
> Of course, this would be work to set up, but how many of our "experts" who
> so kindly give of their time, get exasperated when similar questions keep
> popping up on the list? Also, if we had a web-accessable DB, the responses,
> not the responders, could be rated as to how well a reply takes care of an
> issue. Thus, over time, a sort of auto-wiki could be born. I can think of
> more uses for this as well. For example a developer could quickly check to
> see what usability problems or suggestions have cropped up of on individual
> package.
>
> Mark
>
> On Dec 1, 2007 2:21 AM, <[EMAIL PROTECTED]> wrote:
>
>
> > This seems a little impractical to me.  People respond so much at random
> > and most only tackle questions with which they feel comfortable.  As
> > it's not a competition in any sense, it's going to be hard to rank
> > people in any effective way.  But suppose you succeed in doing so, then
> > what?
> >
> > To me a much more urgent initiative is some kind of user online review
> > system for packages, even something as simple as that used by Amazon.com
> > has for customer review of books.
> >
> > I think the need for this is rather urgent, in fact.  Most packages are
> > very good, but I regret to say some are pretty inefficient and others
> > downright dangerous.  You don't want to discourage people from
> > submitting their work to CRAN, but at the same time you do want some
> > mechanism that allows users to relate their experience with it, good or
> > bad.
> >
> >
> > Bill Venables
> > CSIRO Laboratories
> > PO Box 120, Cleveland, 4163
> > AUSTRALIA
> > Office Phone (email preferred): +61 7 3826 7251
> > Fax (if absolutely necessary):  +61 7 3826 7304
> > Mobile: +61 4 8819 4402
> > Home Phone: +61 7 3286 7700
> > mailto:[EMAIL PROTECTED]
> > http://www.cmis.csiro.au/bill.venables/
> >
> > -Original Message-
> > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
> > On Behalf Of Doran, Harold
> > Sent: Saturday, 1 December 2007 6:13 AM
> > To: R Help
> > Subject: [R] Rating R Helpers
> >
> > Since R is open source and help may come from varied levels of
> > experience on R-Help, I wonder if it might be helpful to construct a
> > method that can be used to "rate" those who provide help on this list.
> >
> > This is something that is done on other comp lists, like
> > http://www.experts-exchange.com/.
> >
> > I think some of the reasons for this are pretty transparent, but I
> > suppose one reason is that one could decide to implement the advise of
> > those with "superior" or "expert" levels. In other words, you can trust
> > the advice of someone who is more experienced more than someone who is
> > not. Currently, there is no way to discern who on this list is really an
> > R expert and who is not. Of course, there is R core, but most people
> > don't actually know who these people are (at least I surmise that to be
> > true).
> >
> > If this is potentially useful, maybe one way to begin the development of
> > such ratings is to allow the original poster to "rate" the level of help
> > from those who responded. Maybe something like a very simple
> > questionnaire on a likert-like scale that the original poster would
> > respond to upon receiving help which would lead to the accumulation of
> > points for the responders. Higher points would result in higher levels
> > of expertise (e.g., novice, ..., wizaRd).
> >
> > Just a random thought. What do others think?
> >
> > Harold
> >
> >
> >
> >
> >[[alternative HTML version deleted]]
> >
> > __
> > R-help@r-project.org mailing list
> > https://stat.

Re: [R] R function for percentrank

2007-12-01 Thread John Kane

I don't see one but that means nothing.   I think you
can write such a function in a few minutes

Will something like this work or am I
missunderstanding what Excel's percentrank does ?

aa <- rnorm(25);  aa  # data vector 
percentrank <- function(x) { 
var  <- sort(x)
p.rank <- 1:length(var)/length(var)*100
dd  <- cbind(var,p.rank)
}
pr <- percentrank(aa); pr

--- tom soyer <[EMAIL PROTECTED]> wrote:

> Hi,
> 
> Does anyone know if R has a built-in function that
> is equvalent to Excel's
> percentrank, i.e., returns the rank of a value in a
> data set as a percentage
> of the data set?
> 
> Thanks,
> 
> -- 
> Tom
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained,
> reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Sweave: Variables in code chunk headers

2007-12-01 Thread Dieter Menne

Michael Hoffman  sneakemail.com> writes:

> 
> I would like to be able to do something like this:
> 
><>=
>...
>@
> 
> with mywidth set in a previous code chunk. Is there a way to do this in 
> Sweave?
> 

Not in the <<>>, but you could set a hook for fig:

>From Sweave docs:

If option "SweaveHooks" is defined as list(fig = foo), and foo is a function,
then it would be executed before the code in each figure chunk. This is
especially useful to set defaults for the graphical parameters in a series of
figure chunks.

Dieter

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R function for percentrank

2007-12-01 Thread tom soyer

John,

The Excel's percentrank function works like this: if one has a number, x for
example, and one wants to know the percentile of this number in a given data
set, dataset, one would type =percentrank(dataset,x) in Excel to calculate
the percentile. So for example, if the data set is c(1:10), and one wants to
know the percentile of 2.5 in the data set, then using the percentrank
function one would get 0.166, i.e., 2.5 is in the 16.6th percentile.

I am not sure how to program this function in R. I couldn't find it as a
built-in function in R either. It seems to be an obvious choice for a
built-in function. I am very surprised, but maybe we both missed it.

On 12/1/07, John Kane <[EMAIL PROTECTED]> wrote:
>
> I don't see one but that means nothing.   I think you
> can write such a function in a few minutes
>
> Will something like this work or am I
> missunderstanding what Excel's percentrank does ?
>
> aa <- rnorm(25);  aa  # data vector
> percentrank <- function(x) {
> var  <- sort(x)
> p.rank <- 1:length(var)/length(var)*100
> dd  <- cbind(var,p.rank)
> }
> pr <- percentrank(aa); pr
>
>
> --- tom soyer <[EMAIL PROTECTED]> wrote:
>
> > Hi,
> >
> > Does anyone know if R has a built-in function that
> > is equvalent to Excel's
> > percentrank, i.e., returns the rank of a value in a
> > data set as a percentage
> > of the data set?
> >
> > Thanks,
> >
> > --
> > Tom
> >
> >   [[alternative HTML version deleted]]
> >
> > __
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained,
> > reproducible code.
> >
>
>
>
>  Ask a question on any topic and get answers from real people. Go to
> Yahoo! Answers and share what you know at http://ca.answers.yahoo.com
>

-- 
Tom

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] lmer and method call

2007-12-01 Thread Douglas Bates

On Dec 1, 2007 10:08 AM, Dieter Menne <[EMAIL PROTECTED]> wrote:
> Douglas Bates  stat.wisc.edu> writes:
>
> (lmer)
>
> > The default is PQL, to refine the
> > starting estimates, followed by optimization of the Laplace
> > approximation.  In some cases it is an advantage to suppress the PQL
> > iterations which can be done with one of the settings for the control
> > argument.
>
> I had found out the hard way that it is often better to let PQL
> play the game rather loosely.  Yet I never dared to tell someone, for fear
> the approximation could end up in the wrong slot,

> Any rules (beside trying variants) if I can trust such a result?

I'm not sure I understand the sense of your first statement.  Do you
mean that you have found that you should use PQL or you should not use
PQL?

I would advise using the Laplace approximation for the final
estimates.  At one time I thought it would be much slower than the PQL
iterations but it doesn't seem to be that bad.

I also thought that PQL would refine the starting estimates in the
sense that it would take comparatively crude starting values and get
you much closer to the optimum before you switched to Laplace.
However, because PQL is an algorithm that iterates on both the fixed
effects and the random effects with fixed weights, then updates the
weights, then goes back to the fixed effects and random effects, etc.
there is a possibility that the early weights can force poor values of
the fixed effects and later iterations do not recover.

I tend to prefer the Laplace approximation directly without any PQL
iterations.  That is

 method = "Laplace", control = list(usePQL = FALSE)

I would be interested in learning what experiences you or others have
had with the different approaches.

I am cc:ing this to the R-SIG-mixed-models list and suggest we switch
to that list only for further discussion.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Need help on changing a table

2007-12-01 Thread gsmkb86


Hi all:
Im kind of new on R and I need help changing a table. The thing is, i read a
file on R using the read.table command and the table looks like this:
Item  3d Plot XY plot
001  1 0
001  0 1
001  0 1  
002  1 0
002  1 0
002  0 1
..... ..
And what I want to do is generate a new table by item with the sum of the
numbres, the next one is an example:

Item 3d Plot   XY plot
001  1  2
002  2  1
003  ......

Does anyone know how to do this? Thanks in advance, help is greatly
appreciated
-- 
View this message in context: 
http://www.nabble.com/Need-help-on-changing-a-table-tf4929014.html#a14107837
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Need help on changing a table

2007-12-01 Thread Prof Brian Ripley

Note that read.table() reads a table and returns a _data frame_.  So it is 
the _data frame_ that you want to change (a table is something else in R). 
This is a simple application of aggregate(), e.g.

> aggregate(z[-1], list(Item=z$Item), sum)
   Item X3d_Plot XY_plot
111   2
222   1

where I got z from read.table("clipboard", header=TRUE).

On Sat, 1 Dec 2007, gsmkb86 wrote:

>
> Hi all:
> Im kind of new on R and I need help changing a table. The thing is, i read a
> file on R using the read.table command and the table looks like this:
> Item  3d Plot XY plot
> 001  1 0
> 001  0 1
> 001  0 1
> 002  1 0
> 002  1 0
> 002  0 1
> ..... ..
> And what I want to do is generate a new table by item with the sum of the
> numbres, the next one is an example:
>
> Item 3d Plot   XY plot
> 001  1  2
> 002  2  1
> 003  ......
>
> Does anyone know how to do this? Thanks in advance, help is greatly
> appreciated
>

-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R function for percentrank

2007-12-01 Thread David Winsemius

"tom soyer" <[EMAIL PROTECTED]> wrote in
news:[EMAIL PROTECTED]: 

> John,
> 
> The Excel's percentrank function works like this: if one has a number,
> x for example, and one wants to know the percentile of this number in
> a given data set, dataset, one would type =percentrank(dataset,x) in
> Excel to calculate the percentile. So for example, if the data set is
> c(1:10), and one wants to know the percentile of 2.5 in the data set,
> then using the percentrank function one would get 0.166, i.e., 2.5 is
> in the 16.6th percentile. 
> 
> I am not sure how to program this function in R. I couldn't find it as
> a built-in function in R either. It seems to be an obvious choice for
> a built-in function. I am very surprised, but maybe we both missed it.
 
My nomination for a function with a similar result would be ecdf(), the 
empirical cumulative distribution function. It is of class "function" so 
efforts to index ecdf(.)[.] failed for me.

> df4$V2
[1] 1 1 1 1 1 5 6 7 9
> ecdf.V2<-ecdf(df4$V2)
> ecdf.V2(df4$V2)
 [1] 0.2 0.2 0.4 0.4 0.5 0.6 0.7 0.8 1.0 0.9

Don't have Excel, but the OpenOffice.org Calc program has the same 
function. It produces:
xpercentrank(x)
1   0.000
1   0.000
3   0.222
3   0.222
4   0.444
5   0.556
6   0.667
7   0.778
10  1.000
9   0.889

(Not that I am saying that the OO.o/Excel function is what one _should_ 
want. Its behavior seems pathological to me.)

-- 
David Winsemius

> 
> On 12/1/07, John Kane <[EMAIL PROTECTED]> wrote:
>>
>> I don't see one but that means nothing.   I think you
>> can write such a function in a few minutes
>>
>> Will something like this work or am I
>> missunderstanding what Excel's percentrank does ?
>>
>> aa <- rnorm(25);  aa  # data vector
>> percentrank <- function(x) {
>> var  <- sort(x)
>> p.rank <- 1:length(var)/length(var)*100
>> dd  <- cbind(var,p.rank)
>> }
>> pr <- percentrank(aa); pr
>>
>>
>> --- tom soyer <[EMAIL PROTECTED]> wrote:
>>
>> > Hi,
>> >
>> > Does anyone know if R has a built-in function that
>> > is equvalent to Excel's
>> > percentrank, i.e., returns the rank of a value in a
>> > data set as a percentage
>> > of the data set?
>> >
>> > Thanks,
>> >

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R function for percentrank

2007-12-01 Thread David Winsemius

David Winsemius <[EMAIL PROTECTED]> wrote in 
news:[EMAIL PROTECTED]:

> "tom soyer" <[EMAIL PROTECTED]> wrote in
> news:[EMAIL PROTECTED]: 
> 
>> John,
>> 
>> The Excel's percentrank function works like this: if one has a number,
>> x for example, and one wants to know the percentile of this number in
>> a given data set, dataset, one would type =percentrank(dataset,x) in
>> Excel to calculate the percentile. So for example, if the data set is
>> c(1:10), and one wants to know the percentile of 2.5 in the data set,
>> then using the percentrank function one would get 0.166, i.e., 2.5 is
>> in the 16.6th percentile. 
>> 
>> I am not sure how to program this function in R. I couldn't find it as
>> a built-in function in R either. It seems to be an obvious choice for
>> a built-in function. I am very surprised, but maybe we both missed it.
>  
> My nomination for a function with a similar result would be ecdf(), the 
> empirical cumulative distribution function. It is of class "function" 
so 
> efforts to index ecdf(.)[.] failed for me.
> 
>> df4$V2
> [1] 1 1 1 1 1 5 6 7 9  #copied wrong line in R session

Make that;
df4$V2<-c(1,1,3,3,4,5,6,7,10,9)
[1] 1 1 3 3 5 6 7 9

>> ecdf.V2<-ecdf(df4$V2)
>> ecdf.V2(df4$V2)
>  [1] 0.2 0.2 0.4 0.4 0.5 0.6 0.7 0.8 1.0 0.9
> 
> Don't have Excel, but the OpenOffice.org Calc program has the same 
> function. It produces:
> xpercentrank(x)
> 1 0.000
> 1 0.000
> 3 0.222
> 3 0.222
> 4 0.444
> 5 0.556
> 6 0.667
> 7 0.778
> 10 1.000
> 9 0.889
> 
> (Not that I am saying that the OO.o/Excel function is what one _should_ 
> want. Its behavior seems pathological to me.)
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Spellchecking Sweave documents

2007-12-01 Thread Duncan Murdoch

On 01/12/2007 11:11 AM, Michael Hoffman wrote:
> I have been using Aspell on a Linux system, but it doesn't
> understand the noweb chunks, which I'd rather it not spellcheck. I
> can run it on the generated .tex files, but then changes I make
> during the spellcheck will not be propagated back to the original
> source. Any suggestions on how to spellcheck Sweave documents?

If you want the concordance between the lines in the original file and 
those in the .tex file, you can get it (with the concordance=TRUE Sweave 
option).  Then it would be theoretically possible to convert reports 
about errors in the .tex into reports about the .Rnw.  I don't know if 
Aspell provides information in a form where you could actually make use 
of this, but if so, it might be a nice contribution to do so.

(For sample code decoding the concordance, see my patchDVI package:  it 
converts source links in .dvi files to point to the .Rnw instead of the 
.tex file.)

Duncan Murdoch

> 
> I see from a search that some people seem to be trying Flyspell on 
> Emacs. I'd rather have a solution that runs outside of Emacs, but if 
> anyone is using Flyspell successfully, I'd love to know of their 
> experiences.
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R function for percentrank

2007-12-01 Thread Marc Schwartz


On Sat, 2007-12-01 at 18:40 +, David Winsemius wrote:
> David Winsemius <[EMAIL PROTECTED]> wrote in 
> news:[EMAIL PROTECTED]:
> 
> > "tom soyer" <[EMAIL PROTECTED]> wrote in
> > news:[EMAIL PROTECTED]: 
> > 
> >> John,
> >> 
> >> The Excel's percentrank function works like this: if one has a number,
> >> x for example, and one wants to know the percentile of this number in
> >> a given data set, dataset, one would type =percentrank(dataset,x) in
> >> Excel to calculate the percentile. So for example, if the data set is
> >> c(1:10), and one wants to know the percentile of 2.5 in the data set,
> >> then using the percentrank function one would get 0.166, i.e., 2.5 is
> >> in the 16.6th percentile. 
> >> 
> >> I am not sure how to program this function in R. I couldn't find it as
> >> a built-in function in R either. It seems to be an obvious choice for
> >> a built-in function. I am very surprised, but maybe we both missed it.
> >  
> > My nomination for a function with a similar result would be ecdf(), the 
> > empirical cumulative distribution function. It is of class "function" 
> so 
> > efforts to index ecdf(.)[.] failed for me.

You can use ls.str() to look into the function environment:

> ls.str(environment(ecdf(x)))
f :  num 0
method :  int 2
n :  int 25
x :  num [1:25] -2.215 -1.989 -0.836 -0.820 -0.626 ...
y :  num [1:25] 0.04 0.08 0.12 0.16 0.2 0.24 0.28 0.32 0.36 0.4 ...
yleft :  num 0
yright :  num 1



You can then use get() or mget() within the function environment to
return the requisite values. Something along the lines of the following
within the function percentrank():

percentrank <- function(x, val)
{
  env.x <- environment(ecdf(x))
  res <- mget(c("x", "y"), env.x)
  Ind <- which(sapply(seq(length(res$x)),
  function(i) isTRUE(all.equal(res$x[i], val
  res$y[Ind]
}


Thus:

set.seed(1)
x <- rnorm(25)

> x
 [1] -0.62645381  0.18364332 -0.83562861  1.59528080  0.32950777
 [6] -0.82046838  0.48742905  0.73832471  0.57578135 -0.30538839
[11]  1.51178117  0.38984324 -0.62124058 -2.21469989  1.12493092
[16] -0.04493361 -0.01619026  0.94383621  0.82122120  0.59390132
[21]  0.91897737  0.78213630  0.07456498 -1.98935170  0.61982575


> percentrank(x, 0.48742905)
[1] 0.56


One other approach, which returns the values and their respective rank
percentiles is:

> cumsum(prop.table(table(x)))
   -2.2146998871775   -1.98935169586337  -0.835628612410047 
   0.040.080.12 
 -0.820468384118015  -0.626453810742333  -0.621240580541804 
   0.160.200.24 
 -0.305388387156356 -0.0449336090152308 -0.0161902630989461 
   0.280.320.36 
 0.0745649833651906   0.183643324222082   0.329507771815361 
   0.400.440.48 
  0.389843236411431   0.487429052428485   0.575781351653492 
   0.520.560.60 
  0.5939013212175090.61982574789471   0.738324705129217 
   0.640.680.72 
  0.782136300731067   0.821221195098089   0.918977371608218 
   0.760.800.84 
0.94383621068531.124930918143111.51178116845085 
   0.880.920.96 
   1.59528080213779 
   1.00 

HTH,

Marc Schwartz

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R function for percentrank

2007-12-01 Thread John Kane


--- David Winsemius <[EMAIL PROTECTED]> wrote:

> "tom soyer" <[EMAIL PROTECTED]> wrote in
>
news:[EMAIL PROTECTED]:
> 
> 
> > John,
> > 
> > The Excel's percentrank function works like this:
> if one has a number,
> > x for example, and one wants to know the
> percentile of this number in
> > a given data set, dataset, one would type
> =percentrank(dataset,x) in
> > Excel to calculate the percentile. So for example,
> if the data set is
> > c(1:10), and one wants to know the percentile of
> 2.5 in the data set,
> > then using the percentrank function one would get
> 0.166, i.e., 2.5 is
> > in the 16.6th percentile. 
> > 
> > I am not sure how to program this function in R. I
> couldn't find it as
> > a built-in function in R either. It seems to be an
> obvious choice for
> > a built-in function. I am very surprised, but
> maybe we both missed it.
>  
> My nomination for a function with a similar result
> would be ecdf(), the 
> empirical cumulative distribution function. It is of
> class "function" so 
> efforts to index ecdf(.)[.] failed for me.
> 
> > df4$V2
> [1] 1 1 1 1 1 5 6 7 9
> > ecdf.V2<-ecdf(df4$V2)
> > ecdf.V2(df4$V2)
>  [1] 0.2 0.2 0.4 0.4 0.5 0.6 0.7 0.8 1.0 0.9
> 
> Don't have Excel, but the OpenOffice.org Calc
> program has the same 
> function. It produces:
> xpercentrank(x)
> 1 0.000
> 1 0.000
> 3 0.222
> 3 0.222
> 4 0.444
> 5 0.556
> 6 0.667
> 7 0.778
> 101.000
> 9 0.889
> 
> (Not that I am saying that the OO.o/Excel function
> is what one _should_ 
> want. Its behavior seems pathological to me.)
> 
Excel 
x  percentrank(x)
1   0
1   0
3   0.222
3   0.222
4   0.444
5   0.555
6   0.666
7   0.777
10  1
9   0.888

It seems that OOo is following Excel's distinguished
footsteps. 

How can one have a 0 percentile ranking? 


> -- 
> David Winsemius
> 
> > 
> > On 12/1/07, John Kane <[EMAIL PROTECTED]> wrote:
> >>
> >> I don't see one but that means nothing.   I think
> you
> >> can write such a function in a few minutes
> >>
> >> Will something like this work or am I
> >> missunderstanding what Excel's percentrank does ?
> >>
> >> aa <- rnorm(25);  aa  # data vector
> >> percentrank <- function(x) {
> >> var  <- sort(x)
> >> p.rank <- 1:length(var)/length(var)*100
> >> dd  <- cbind(var,p.rank)
> >> }
> >> pr <- percentrank(aa); pr
> >>
> >>
> >> --- tom soyer <[EMAIL PROTECTED]> wrote:
> >>
> >> > Hi,
> >> >
> >> > Does anyone know if R has a built-in function
> that
> >> > is equvalent to Excel's
> >> > percentrank, i.e., returns the rank of a value
> in a
> >> > data set as a percentage
> >> > of the data set?
> >> >
> >> > Thanks,
> >> >
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained,
> reproducible code.
> 



  Looking for the perfect gift? Give the gift of Flickr!

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Sweave: Variables in code chunk headers

2007-12-01 Thread Michael Hoffman

Dieter Menne wrote:
> Michael Hoffman  sneakemail.com> writes:
> 
>> I would like to be able to do something like this:
>>
>><>=
>>...
>>@
>>
>> with mywidth set in a previous code chunk. Is there a way to do this in 
>> Sweave?
>>
> 
> Not in the <<>>, but you could set a hook for fig:
> 
> 
>>From Sweave docs:
> 
> If option "SweaveHooks" is defined as list(fig = foo), and foo is a function,
> then it would be executed before the code in each figure chunk. This is
> especially useful to set defaults for the graphical parameters in a series of
> figure chunks.

Thanks. I guess what I really want to do is switch between one of two 
settings. Only one value can be in the defaults, and I would like some 
way of setting the other value. This might not be easily possible, but I 
thought I would ask.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R function for percentrank

2007-12-01 Thread Gabor Grothendieck

Its a bit tricky if you want to get it to work exactly the same as
Excel even in the presence of runs but in terms of the R approx function
I think percentrank corresponds to ties = "min" if the value is among those
in the table and ties = "ordered" otherwise so:

percentrank <- function(table, x = table) {
   table <- sort(table)
   ties <- ifelse(match(x, table, nomatch = 0), "min", "ordered")
   len <- length(table)
   f <- function(x, ties)
  (approx(table, seq(0, len = len), x, ties = ties)$y) / (len - 1)
   mapply(f, x, ties)
}

# test
tab <- c(1, 1, 2, 2, 3, 3)
percentrank(tab, 2:6/2) # c(0, .3, .4, .7, .8)

which is the same result as Excel 2007 gives.

On Dec 1, 2007 6:37 AM, tom soyer <[EMAIL PROTECTED]> wrote:
> Hi,
>
> Does anyone know if R has a built-in function that is equvalent to Excel's
> percentrank, i.e., returns the rank of a value in a data set as a percentage
> of the data set?
>
> Thanks,
>
> --
> Tom
>
>[[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Need help on changing a table

2007-12-01 Thread John Kane

That's no table that's a data.frame :). 

For what you want probably the easiest way is 
aggregate.  Type ?aggregate for the help information

Based on your example and assuming the data is called
"mydata" this shold do what you want.

aggregate(mydata[,2:3], by=list(item=mydata$Item),
sum)

--- gsmkb86 <[EMAIL PROTECTED]> wrote:

> 
> Hi all:
> Im kind of new on R and I need help changing a
> table. The thing is, i read a
> file on R using the read.table command and the table
> looks like this:
> Item  3d Plot XY plot
> 001  1 0
> 001  0 1
> 001  0 1  
> 002  1 0
> 002  1 0
> 002  0 1
> ..... ..
> And what I want to do is generate a new table by
> item with the sum of the
> numbres, the next one is an example:
> 
> Item 3d Plot   XY plot
> 001  1  2
> 002  2  1
> 003  ......
> 
> Does anyone know how to do this? Thanks in advance,
> help is greatly
> appreciated

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] creating conditional means

2007-12-01 Thread Sherri Heck

Hi all-

I have a dataset (year, month, hour, co2(ppm), num1,num2)


[49,] 2006   110 383.3709   28   28
[50,] 2006   111 383.3709   28   28
[51,] 2006   112 383.3709   28   28
[52,] 2006   113 383.3709   28   28
[53,] 2006   114 383.3709   28   28
[54,] 2006   115 383.3709   28   28
[55,] 2006   116 383.3709   28   28
[56,] 2006   117 383.3709   28   28
[57,] 2006   118 383.3709   28   28
[58,] 2006   119 383.3709   27   27
[59,] 2006   11   10 383.3709   28   28

that repeats in this style for each month.  I would like to compute the 
mean for each hour in three month intervals.
i.e.  average all 2pms for each day for months march, april and may. and 
then do this for each hour interval.
i have been messing around with 'for loops' but can't seem to get the 
output I want.

thanks in advance for any help-

s.heck
CU, Boulder

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Rating R Helpers

2007-12-01 Thread Johannes Hüsing

Mark Kimpel <[EMAIL PROTECTED]> [Sat, Dec 01, 2007 at 05:28:28PM CET]:
> What I would find useful would be some sort of tagging system for messages.

Hrm. I find tags immensely useful for entities which do not contain primarily
text, such as photos. I am at doubt how keywords are important when they are
not found in the text. There are situations where the first keyword that comes
to mind is tiptoed around in the message, but I don't know if this is often
the case.

> I can't count the times I've remembered seeing a message that addresses a
> question I have down the road but, when Googled, I can't find it. 

Is it a problem of way too many false positives or a problem of false negatives?
Tags may help out in the second case, but in my experiencd it is rare.

[...]
> 
> Of course, this would be work to set up, but how many of our "experts" who
> so kindly give of their time, get exasperated when similar questions keep
> popping up on the list? 

Do you think that people who keep asking similar questions do so because they
didn't do their homework first, or that they Googled and failed?

-- 
Johannes H�sing   There is something fascinating about science. 
  One gets such wholesale returns of conjecture 
mailto:[EMAIL PROTECTED]  from such a trifling investment of fact.  
  
http://derwisch.wikidot.com (Mark Twain, "Life on the Mississippi")

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Specify var-covar matrix in mixed linear model using lme?

2007-12-01 Thread Pengyuan Liu

Hi,All:
I have a question about specifying var-cov matrix in
mixed linear model using lme. For example, for
single-level mixed linear model:
yi = XiB + Ziui + ei
ui ~ N(0,D), ei ~ N(0,sigma^2R),
var(yi)=sigma^2(ZiAZi^T + R). R is an identify matrix
and A is a known var-covar matrix in my data.
In my data, there is only one random effect besides
ei. But this random effect is dependent among
different subjects within group (this dependence is
characterized in A which is known). corStruct class in
nlme can specify correlation structure for within
group errors (that is, specify R for ei). But I don't
know how to specify A for a specific random effect. In
other words, I want to fix matrix A in the analysis. 
Thanks a lot for your help.



Pengyuan Liu 
Dept of Surgery 
Washington Univ in St Louis


  

Be a better sports nut!  Let your teams follow you

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] creating conditional means

2007-12-01 Thread Gabor Grothendieck

Try aggregate:


Lines <- "Year Month Hour co2 num1 num2
 2006   110 383.3709   28   28
 2006   111 383.3709   28   28
 2006   112 383.3709   28   28
 2006   113 383.3709   28   28
 2006   114 383.3709   28   28
 2006   115 383.3709   28   28
 2006   116 383.3709   28   28
 2006   117 383.3709   28   28
 2006   118 383.3709   28   28
 2006   119 383.3709   27   27
 2006   11   10 383.3709   28   28
"
DF <- read.table(textConnection(Lines), header = TRUE)
aggregate(DF[4:6],
   with(DF, data.frame(Year, Qtr = (Month - 1) %/% 3 + 1, Hour)),
   mean)

On Dec 1, 2007 3:57 PM, Sherri Heck <[EMAIL PROTECTED]> wrote:
> Hi all-
>
> I have a dataset (year, month, hour, co2(ppm), num1,num2)
>
>
> [49,] 2006   110 383.3709   28   28
> [50,] 2006   111 383.3709   28   28
> [51,] 2006   112 383.3709   28   28
> [52,] 2006   113 383.3709   28   28
> [53,] 2006   114 383.3709   28   28
> [54,] 2006   115 383.3709   28   28
> [55,] 2006   116 383.3709   28   28
> [56,] 2006   117 383.3709   28   28
> [57,] 2006   118 383.3709   28   28
> [58,] 2006   119 383.3709   27   27
> [59,] 2006   11   10 383.3709   28   28
>
> that repeats in this style for each month.  I would like to compute the
> mean for each hour in three month intervals.
> i.e.  average all 2pms for each day for months march, april and may. and
> then do this for each hour interval.
> i have been messing around with 'for loops' but can't seem to get the
> output I want.
>
> thanks in advance for any help-
>
> s.heck
> CU, Boulder
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] creating conditional means

2007-12-01 Thread Sherri Heck

Hi Gabor,

Thank you for your help.  I think I need to clarify a bit more.  I am 
trying to say

average all 2pms for months march + april + may (for example). I hope this is 
clearer.  

here's a larger subset of my data set:

year, month, hour, co2(ppm), num1,num2

2006 1 0 384.2055 14 14
2006 1 1 384.0304 14 14
2006 1 2 383.9672 14 14
2006 1 3 383.8452 14 14
2006 1 4 383.8594 14 14
2006 1 5 383.7318 14 14
2006 1 6 383.6439 14 14
2006 1 7 383.7019 14 14
2006 1 8 383.7487 14 14
2006 1 9 383.8376 14 14
2006 1 10 383.8684 14 14
2006 1 11 383.8301 14 14
2006 1 12 383.8058 14 14
2006 1 13 383.9419 14 14
2006 1 14 383.7876 14 14
2006 1 15 383.7744 14 14
2006 1 16 383.8566 14 14
2006 1 17 384.1014 14 14
2006 1 18 384.1312 14 14
2006 1 19 384.1551 14 14
2006 1 20 384.099 14 14
2006 1 21 384.1408 14 14
2006 1 22 384.3637 14 14
2006 1 23 384.1491 14 14
2006 2 0 384.7082 27 27
2006 2 1 384.6139 27 27
2006 2 2 384.7453 26 26
2006 2 3 384.9224 28 28
2006 2 4 384.8581 28 28
2006 2 5 384.9208 28 28
2006 2 6 384.9086 28 28
2006 2 7 384.837 28 28
2006 2 8 384.6163 27 27
2006 2 9 384.7406 28 28
2006 2 10 384.7468 28 28
2006 2 11 384.6992 28 28
2006 2 12 384.6388 28 28
2006 2 13 384.6346 28 28
2006 2 14 384.6037 28 28
2006 2 15 384.5295 28 28
2006 2 16 384.5654 28 28
2006 2 17 384.6466 28 28
2006 2 18 384.6344 28 28
2006 2 19 384.5911 28 28
2006 2 20 384.6084 28 28
2006 2 21 384.6318 28 28
2006 2 22 384.6181 27 27
2006 2 23 384.6087 27 27


thanks you again for your assistance-

s.heck


Gabor Grothendieck wrote:
> Try aggregate:
>
>
> Lines <- "Year Month Hour co2 num1 num2
>  2006   110 383.3709   28   28
>  2006   111 383.3709   28   28
>  2006   112 383.3709   28   28
>  2006   113 383.3709   28   28
>  2006   114 383.3709   28   28
>  2006   115 383.3709   28   28
>  2006   116 383.3709   28   28
>  2006   117 383.3709   28   28
>  2006   118 383.3709   28   28
>  2006   119 383.3709   27   27
>  2006   11   10 383.3709   28   28
> "
> DF <- read.table(textConnection(Lines), header = TRUE)
> aggregate(DF[4:6],
>with(DF, data.frame(Year, Qtr = (Month - 1) %/% 3 + 1, Hour)),
>mean)
>
> On Dec 1, 2007 3:57 PM, Sherri Heck <[EMAIL PROTECTED]> wrote:
>   
>> Hi all-
>>
>> I have a dataset (year, month, hour, co2(ppm), num1,num2)
>>
>>
>> [49,] 2006   110 383.3709   28   28
>> [50,] 2006   111 383.3709   28   28
>> [51,] 2006   112 383.3709   28   28
>> [52,] 2006   113 383.3709   28   28
>> [53,] 2006   114 383.3709   28   28
>> [54,] 2006   115 383.3709   28   28
>> [55,] 2006   116 383.3709   28   28
>> [56,] 2006   117 383.3709   28   28
>> [57,] 2006   118 383.3709   28   28
>> [58,] 2006   119 383.3709   27   27
>> [59,] 2006   11   10 383.3709   28   28
>>
>> that repeats in this style for each month.  I would like to compute the
>> mean for each hour in three month intervals.
>> i.e.  average all 2pms for each day for months march, april and may. and
>> then do this for each hour interval.
>> i have been messing around with 'for loops' but can't seem to get the
>> output I want.
>>
>> thanks in advance for any help-
>>
>> s.heck
>> CU, Boulder
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] creating conditional means

2007-12-01 Thread Gabor Grothendieck

Just adjust the formula for Qtr appropriately if your quarters
are not Jan/Feb/Mar, Apr/May/Jun, Jul/Aug/Sep, Oct/Nov/Dec
as I assumed.

On Dec 1, 2007 5:21 PM, Sherri Heck <[EMAIL PROTECTED]> wrote:
> Hi Gabor,
>
> Thank you for your help.  I think I need to clarify a bit more.  I am
> trying to say
>
> average all 2pms for months march + april + may (for example). I hope this is 
> clearer.
>
> here's a larger subset of my data set:
>
> year, month, hour, co2(ppm), num1,num2
>
> 2006 1 0 384.2055 14 14
> 2006 1 1 384.0304 14 14
> 2006 1 2 383.9672 14 14
> 2006 1 3 383.8452 14 14
> 2006 1 4 383.8594 14 14
> 2006 1 5 383.7318 14 14
> 2006 1 6 383.6439 14 14
> 2006 1 7 383.7019 14 14
> 2006 1 8 383.7487 14 14
> 2006 1 9 383.8376 14 14
> 2006 1 10 383.8684 14 14
> 2006 1 11 383.8301 14 14
> 2006 1 12 383.8058 14 14
> 2006 1 13 383.9419 14 14
> 2006 1 14 383.7876 14 14
> 2006 1 15 383.7744 14 14
> 2006 1 16 383.8566 14 14
> 2006 1 17 384.1014 14 14
> 2006 1 18 384.1312 14 14
> 2006 1 19 384.1551 14 14
> 2006 1 20 384.099 14 14
> 2006 1 21 384.1408 14 14
> 2006 1 22 384.3637 14 14
> 2006 1 23 384.1491 14 14
> 2006 2 0 384.7082 27 27
> 2006 2 1 384.6139 27 27
> 2006 2 2 384.7453 26 26
> 2006 2 3 384.9224 28 28
> 2006 2 4 384.8581 28 28
> 2006 2 5 384.9208 28 28
> 2006 2 6 384.9086 28 28
> 2006 2 7 384.837 28 28
> 2006 2 8 384.6163 27 27
> 2006 2 9 384.7406 28 28
> 2006 2 10 384.7468 28 28
> 2006 2 11 384.6992 28 28
> 2006 2 12 384.6388 28 28
> 2006 2 13 384.6346 28 28
> 2006 2 14 384.6037 28 28
> 2006 2 15 384.5295 28 28
> 2006 2 16 384.5654 28 28
> 2006 2 17 384.6466 28 28
> 2006 2 18 384.6344 28 28
> 2006 2 19 384.5911 28 28
> 2006 2 20 384.6084 28 28
> 2006 2 21 384.6318 28 28
> 2006 2 22 384.6181 27 27
> 2006 2 23 384.6087 27 27
>
>
> thanks you again for your assistance-
>
> s.heck
>
>
>
> Gabor Grothendieck wrote:
> > Try aggregate:
> >
> >
> > Lines <- "Year Month Hour co2 num1 num2
> >  2006   110 383.3709   28   28
> >  2006   111 383.3709   28   28
> >  2006   112 383.3709   28   28
> >  2006   113 383.3709   28   28
> >  2006   114 383.3709   28   28
> >  2006   115 383.3709   28   28
> >  2006   116 383.3709   28   28
> >  2006   117 383.3709   28   28
> >  2006   118 383.3709   28   28
> >  2006   119 383.3709   27   27
> >  2006   11   10 383.3709   28   28
> > "
> > DF <- read.table(textConnection(Lines), header = TRUE)
> > aggregate(DF[4:6],
> >with(DF, data.frame(Year, Qtr = (Month - 1) %/% 3 + 1, Hour)),
> >mean)
> >
> > On Dec 1, 2007 3:57 PM, Sherri Heck <[EMAIL PROTECTED]> wrote:
> >
> >> Hi all-
> >>
> >> I have a dataset (year, month, hour, co2(ppm), num1,num2)
> >>
> >>
> >> [49,] 2006   110 383.3709   28   28
> >> [50,] 2006   111 383.3709   28   28
> >> [51,] 2006   112 383.3709   28   28
> >> [52,] 2006   113 383.3709   28   28
> >> [53,] 2006   114 383.3709   28   28
> >> [54,] 2006   115 383.3709   28   28
> >> [55,] 2006   116 383.3709   28   28
> >> [56,] 2006   117 383.3709   28   28
> >> [57,] 2006   118 383.3709   28   28
> >> [58,] 2006   119 383.3709   27   27
> >> [59,] 2006   11   10 383.3709   28   28
> >>
> >> that repeats in this style for each month.  I would like to compute the
> >> mean for each hour in three month intervals.
> >> i.e.  average all 2pms for each day for months march, april and may. and
> >> then do this for each hour interval.
> >> i have been messing around with 'for loops' but can't seem to get the
> >> output I want.
> >>
> >> thanks in advance for any help-
> >>
> >> s.heck
> >> CU, Boulder
> >>
> >> __
> >> R-help@r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide 
> >> http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> >>
> >>
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Help with tables

2007-12-01 Thread T.K.

I guess you can get the result by
1) concatenating all the variables (P2_A, P2_B, P2_C) into one variable ,
2) replicating segment membership properly,
3) make the table of 1) and 2)

For example, the following may do the job.

> ## (1) Generate data set
> # Set random seed
> set.seed(0)
>
> n.obs <- 15 # Number of patients
> disease.lv  <- 1:10 ## different types of disease
> data1 <- as.data.frame(t(replicate(n.obs, sample(disease.lv, 3 #
Create data set
> ## Create segment membership
> segment <- sample(LETTERS[1:3], n.obs, replace=TRUE)
>
> cbind(data1, segment)
   V1 V2 V3 segment
1   9  3 10   B
2   6  9  2   C
3   9 10  6   A
4   7  1  2   B
5   2  7  4   C
6   8  5  6   C
7  10  4  7   B
8  10  2  6   C
9   2  3  4   B
10  1  4  7   A
11  4  5  9   A
12  5  2  7   A
13  7  8  1   A
14  8  4  7   B
15  7  8  5   B
>
>
> ## (2) Table
> infec.all <- unlist(data1) # Concatenate all variables into one variable
> segment.all <- rep(segment, ncol(data1)) # Replicate the segment
membership as necessary
> table(infec.all, segment.all)
 segment.all
infec.all A B C
   1  2 1 0
   2  1 2 3
   3  0 2 0
   4  2 3 1
   5  2 1 1
   6  1 0 3
   7  3 4 1
   8  1 2 1
   9  2 1 1
   10 1 2 1



On Nov 30, 2007 10:10 AM, Alejandro Rodríguez <[EMAIL PROTECTED]>
wrote:

> Hello,  I'm new using R and developing tables.
>
> I have a problem in developing a table.  In a questionaire I made I ask
> this
> question "Please tell me the first three sympthoms caused by Respiratory
> tract infection you've caught this year", then the people answer three
> sympthoms, the first mention (Top of mind) is saved in a variable called
> "P2_A", the second mention in a variable called "P2_B" and the third
> mention
> in "P2_C".  Each answer is coded with numbers like this:
>
> 1 = Flu
> 2 = Cough
> 3 = Asthma
> 
>
> 13 = Fever
>
> I've already done a K-cluster analysis and segmented my data base.  So my
> first task is to develop tables to develop tables of frequencies crossing
> Cluster vs. "P2_A" in order to know which are the top of mind products and
> their frequencies by cluster, then the second mention and third mention.
> I've used this instruction which worked well:
>
> > table(infec1,aglomera)
>  aglomera
> infec1   1   2   3   4
>1  117  88  76  83
>2   10  10   9   7
>3   15  11  14  14
>42   0   1   1
>52   3   1   0
>61   0   1   0
>83   3   0   1
>93   1   1   0
>11   0   0   1   1
>
> Where "infec1" is a factor of "P2_A" and "aglomera" is a factor of the
> variable "Cluster" I made.  It worked well when I study them
> separately...however I would like to know the TOTAL mentions of each
> sympthom by cluster.  I've done this exercise in SPSS using the "Multiple
> Response" instruction first grouping my three variables (i.e. "P2_A",
> "P2_B"
> and "P2_C") into a variable called "sick" and cross tabulating it vs.
> QCL_1
> (my cluster variable) and it gave me the table I need in this way (showed
> at
> the bottom of this mail):
>
> How can I made a table like this in R???.  I've tried combining my
> variables
> in a matrix and using xtabs, ftable, table, structable and a lot of
> combination of them but I haven't had succed with any of them.
>
> Please help me with this issue, I don't want to keep using SPSS any more.
>
> Thanx in advance.
>
> P.D. Result from SPSS is shown below.
>
>
>
>* * *  C R O S S T A B U L A T I O N  * * *
>
>   $SICK (group)  mr sick
> by QCL_1  Cluster Number of Case
>
>
>   QCL_1
>
>Count  I
>   I  Row
>   I Total
>   I 1  I 2  I 3  I 4  I
> $SICK  +++++
>1  I   130  I97  I83  I89  I   399
>  Gripe, influenza, ca IIIII  83.1
>   +++++
>2  I53  I55  I42  I46  I   196
>  Tos de cualquier tip IIIII  40.8
>   +++++
>3  I33  I36  I36  I39  I   144
>  Dolor irritación IIIII  30.0
>   +++++
>4  I 5  I 1  I 2  I 3  I11
>  Bronquitis   IIIII   2.3
>   +++++
>5  I 5  I 4  I 1  I 0  I10
>  SinusitisIIIII   2.1
>   +++++
>6  I 1  I 1  I 1  I 3

41 matches

Mail list logo