[R] random effects in logistic regression (lmer)-- identification question

2007-06-14 Thread Paul Johnson
Hello R users!

I've been experimenting with lmer to estimate a  mixed model with a
dichotomous dependent variable.  The goal is to fit a hierarchical
model in which we compare the effect of individual and city-level
variables.  I've run up against a conceptual problem that I expect one
of you can clear up for me.

The question is about random effects in the context of a model fit
with a binomial family and logit link.  Unlike an ordinary linear
regression, there is no way to estimate an individual level random
error in the linear predictor

z_i = a + b*x_i + e_i

because the variance of e_i is unidentified.  The standard deviation
of the logistic is pi*s/3, and we assume s=1, so the standard
deviation is assumed to be pi/3 (just a bit bigger than 1, if you are
comparing against the Standard Normal).  The logistic fitting process
sets the variance of the error and the parameters a and b are
rescaled accordingly.

As a result, there is an implicit individual-level random effect
within a logistic model.  There is a good explanation of this issue in
Tony Lancaster's textbook An Introduction to Modern Bayesian
Econometrics.

So we usually end up thinking about a linear predictor in a logistic
regression like so

z_i = a + b*x_i

Random effects can be estimated for groups or clusters of
observations.  If j is a grouping variable, then we estimate

z_i = a + b*x_i + u_j

The variance component here is, as far as I understand, measured on
the same scale as the logistic distribution's standard deviation.

Currently, I'm working on a project in which there are observations
collected in many cities, represented by a variable PLACE.  We are
comparing the effect of several variables on a response for each of
the values of a RACE variable.  RACE is dichotomized into White and
Nonwhite by the people who collect the data.

For Nonwhites only, we can estimate the effect of individual level
predictors (x) on the output (y).

fm1 - lmer( y ~ x + (1 | PLACE), data=dat, , family=binomial, subset=
Race %in% c(Nonwhite))

The random effect in this model indicates the variance caused by a
Nonwhite's location on the response variable.

Random effects:
 Groups  NameVariance Std.Dev.
 PLACE (Intercept) 0.047326 0.21754

Suppose I estimate models for the 2 races in a combined model like this:

fm1 - lmer( y ~ -1 + Race / (x) + (-1 + Race | PLACE), data=dat,
family=binomial)

This gives fixed effects estimates that are pretty easy for
nonstatisticians to understand.  One can look and see the effect of a
variable x on people of different races.

But the random effect is a bit hard to understand.  Since the Race
variable is dichotomous, My aim was to see if the variance of the
random effect is different for the 2 racial categories. Here are the
estimates:

Random effects:
 Groups  Name  Variance  Std.Dev. Corr
 PLACE RACE_ALLWhite  0.0095429 0.097688
 RACE_ALLNonwhite 0.1286597 0.358692 1.000

I can't quite grasp what the correlation means.  I BELIEVE the
variance values indicate that the experiences of whites are
homogeneous across cities, because the variance is negligible for
them, while the experiences of Nonwhites are much more city-dependent.

What does it mean when the correlation is 1.0?  The correlation takes
on that value when there are no city-level variables in this model, so
I GUESS that it means that all city-level variation is attributed to
the random effect. What do you think?

If i put in some city level predictors, then the estimates of the
variance components change--they essentially disappear to the minimum
values--and the correlation is not 1.0 anymore.

Random effects:
 Groups  Name  Variance Std.Dev.   Corr
 PLACE  RACEWhite5e-102.2361e-05
  RACENonwhite   5e-102.2361e-05 0.001

This indicates that, after adding in the city level variables, the
unaccounted for city-level  variation is very small.  Correct?

Now, back to the idea that there is always an implicit individual
level random effect in a logistic regression. Is it meaningful to ask
is that individual level random effect different for people of
different races?  If so, How can I estimate that?  If e_i is the
implicit random error, can I ask for another random effect for
Nonwhites only, say u_iN, in a model like so:

z_i = a + b*x_i + e_i + u_iN

Suppose the unique respondent number is ID and we create a new variable

NonwhiteID = 0 for Nonwhites
 = ID for Nonwhites

Here's my idea about how to check to see if the individual level
variance component for Nonwhites is different from the baseline of
Whites by fitting this:

fm1 - lmer( y ~ -1 + Race / (x) + (-1 + Race | PLACE) + (1 |
NonwhiteID), data=dat, family=binomial)

Here, again, I leave out the city-level variables.

Random effects:
 Groups  Name  Variance   Std.Dev.   Corr
 NonwhiteID (Intercept)   5.e-10 2.2361e-05
 PLACE RACEWhite   1.0575e-02 1.0283e-01
 RACENonwhite1.2880e-01 

[R] Can I access the filename of the active editor window?

2007-06-11 Thread Paul Johnson
When I run a script from an open editor window (using Ctrl-A, Ctrl-R), I
would like the filename of the script to be automatically written into the
program output, to keep up with frequent version changes. Is there a way to
access the filename (+ path) of the open script (the active one, if there is
more than one editor window open)? 

 

I'm using R 2.4.1 on Windows XP.

 

Paul


[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Follow-up about ordinal logit with mixtures: how about 'continuation ratio' strategy?

2007-05-11 Thread Paul Johnson
On 5/10/07, Frank E Harrell Jr [EMAIL PROTECTED] wrote:
 Paul Johnson wrote:
  This is a follow up to the message I posted 3 days ago about how to
  estimate mixed ordinal logit models.  I hope you don't mind that I am
  just pasting in the code and comments from an R file for your
  feedback.  Actual estimates are at the end of the post.

 . . .

 Paul,

 lrm does not give an incorrect sign on the intercepts.  Just look at how
 it states the model in terms of Prob(Y=j) so that its coefficients are
 consistent with the way people state binary models.

 I'm not clear on your generation of simulated data.  I specify the
 population logit, anti-logit that, and generate binary responses with
 those probabilities.  I don't use rlogis.

Thank you.

I don't think I'm telling you anything you don't already know, but for
the record, here goes.  I think the difference in signs is just
convention within fields.  In choice models (the econometric
tradition), we usually write that the response is in a higher category
if

eta + random  cutpoint

and that's how I created the data--rlogis supplies the random noise.  Then

eta - cutpoint  random

or

cutpoint - eta  random

and so

Prob ( higher outcome ) = Prob ( random  cutpoint - eta)

In the docs on polr from MASS, VR say they have the logit equal to

cutpoint - eta

so their parameterization is consistent with mine.  On the other hand,
your approach is to say the response is in a higher category if

 intercept + eta  random,

where I think your intercept is -cutpoint. So the signs in your
results are reversed.

-cutpoint + eta  random


But this is aside from the major question I am asking.  Do we think
that the augmented data frame approach described in Cole, Allison, and
Ananth is a good alternative to maximum likelihood estimation of
ordinal logit models, whether they are interpreted as proportional
odds, continuation, or stopping models?   In the cases I've tested,
the parameter estimates from the augmented data frame are consistent
with polr or lrm, but the standard errors and other diagnostic
informations are quite different.

I do not think I can follow your suggestion to use penalties in lrm
because I have to allow for the possibilities that there are random
effects across clusters of observations, possibly including random
slope effects, but certainly including random intercepts for 2 levels
of groupings (in the HLM sense of these things).

Meanwhile, I'm studying how to use optim and numeric integration to
see if the results are comparable.

pj

 See if using the PO model with lrm with penalization on the factor does
 what you need.

 lrm is not set up to omit an intercept with the -1 notation.

 My book goes into details about the continuation ratio model.

 Frank Harrell






-- 
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Follow-up about ordinal logit with mixtures: how about 'continuation ratio' strategy?

2007-05-09 Thread Paul Johnson
This is a follow up to the message I posted 3 days ago about how to
estimate mixed ordinal logit models.  I hope you don't mind that I am
just pasting in the code and comments from an R file for your
feedback.  Actual estimates are at the end of the post.

### Subject: mixed ordinal logit via augmented data setup.

### I've been interested in estimating an ordinal logit model with
### a random parameter.  I asked in r-help about it. It appears to be
### a difficult problem because even well established commercial
### programs like SAS are prone to provide unreliable estimates.

### So far, I've found 3 avenues for research.  1) Go Bayesian and use
### MCMC to estimate the model.  2) Specify a likelihood function and
### then use R's optim function (as described in Laura A. Thompson,
### 2007, S-PLUS (and R) Manual to Accompany Agresti's Categorical
### Data Analysis (2002) 2nd edition).  My guess is that either of
### those approaches would be worth the while, but I might have
### trouble persuading a target audience that they have good
### properties.  3) Adapt a continuation ratio approach.

### This latter approach was suggested by a post in r-help by Daniel
### Farewell farewelld_at_Cardiff.ac.uk
### http://tolstoy.newcastle.edu.au/R/help/06/08/32398.html#start
### It pointed me in the direction of continuation ratio logit models
### and one way to estimate an ordinal logit model with random
### parameters.

### Farewell's post gives working example code that shows a way to
### convert a K category ordinal variable into K-1 dichotomous
### indicators (a continuation ratio model). Those K-1 indicators
### can be stacked into one column and then a logistic regression
### program that is written for a two-valued output can be used.
### Farewell reasoned that one might then use a program for two-valued
### outputs including mixed effects.  In his proposal, one would use
### the program lmer (package: lme4) ( a binomial family with a logit
### link) to estimate parameters for a dichotomous logit model with
### random parameters.

### This is the sort of magic trick I had suspected might be possible.
### Still, it is hard to believe it would work.  But in the r-help
### response to the post by Farewell, there is no general objection
### against his modeling strategy.

### I had not studied continuation ratio logit models before, so I
### looked up a few articles on estimation of ordinal models by
### re-coding the output as a sequence of binary comparisons (stop
### ratios, continuation ratios, etc).  The article that is most clear
### on how this can be done to estimate a proportional odds logistic
### model is

### Stephen R. Cole, Paul D. Allison, and Cande V. Ananth,
### Estimation of Cumulative Odds Ratios
### Ann Epidemiol 2004;14:172–178.

### They claim that one can recode an n-chotomy into n-1 dichotomous
### indicators.  Each observation in the original dataset begets n-1
### lines in the augmented version.  After creating the dichotomous
### indicator, one uses an ordinary dichotomous logit model to
### estimate parameters and cutpoints for an ordinal logit
### model. Cole, et al., are very clear.

### There is an additional benefit to the augmented data approach.
### One can explicitly test the proportional odds assumption by checking
### for interactions between the included independent variables and the
### level of the dependent variable being considered.  The Cole
### article shows some examples where the proportional assumption appears
### to be violated.

### To test it, I created the following example.  This shows the
### results of maximum likelihood estimation with the programs polr
### (package:MASS) and lrm (package: Design).  The estimates from
### the augmented data approach are not exactly the same as polr or
### lrm, but they are close.  It appears to me the claims about the
### augmented data approach are mostly correct.  The parameter
### estimates are pretty close to the true values, while the estimates
### of the ordinal cutpoints are a bit difficult to interpret.

### I don't know what to make of the model diagnostics for the augmented
### data model. Should I have confidence in the standard errors?
 How to interpret the degrees of freedom when 3 lines
### of data are manufactured from 1 observation?  Are likelihood-ratio
### (anova) tests valid in this context?  Are these estimates from the
### augmented data equivalent to maximum likelihood?  What does it
### mean that the t-ratios are so different?  That seems to be prima-facie
### evidence that the estimates based on the augmented data set are not
### trustworthy.

### Suppose I convince myself that the estimates of the ordinal model
### are as good as maximum likelihood.  Is it reasonable to take the
### next step, and follow Farewell's idea of using this kind of model
### to estimate a mixture model?  There are K-1 lines per case
### in the augmented data set. Suppose the observations were grouped
### into M sets 

[R] ordered logistic regression with random effects. Howto?

2007-05-07 Thread Paul Johnson
I'd like to estimate an ordinal logistic regression with a random
effect for a grouping variable.   I do not find a pre-packaged
algorithm for this.  I've found methods glmmML (package: glmmML) and
lmer (package: lme4) both work fine with dichotomous dependent
variables. I'd like a model similar to polr (package: MASS) or lrm
(package: Design) that allows random effects.

I was thinking there might be a trick that might allow me to use a
program written for a dichotomous dependent variable with a mixed
effect to estimate such a model.  The proportional odds logistic
regression is often written as a sequence of dichotomous comparisons.
But it seems to me that, if it would work, then somebody would have
proposed it already.

I've found some commentary about methods of fitting ordinal logistic
regression with other procedures, however, and I would like to ask for
your advice and experience with this. In this article,

Ching-Fan Sheu, Fitting mixed-effects models for repeated ordinal
outcomes with the NLMIXED procedure Behavior Research Methods,
Instruments,  Computers, 2002, 34(2): 151-157.

the other gives an approach that works in SAS's NLMIXED procedure.  In
this approach, one explicitly writes down the probability that each
level will be achieved (using the linear predictor and constants for
each level).  I THINK I could find a way to translate this approach
into a model that can be fitted with either nlme or lmer.  Has someone
done it?

What other strategies for fitting mixed ordinal models are there in R?

Finally, a definitional question.  Is a multi-category logistic
regression (either ordered or not) a member of the glm family?  I had
thought the answer is no, mainly because glm and other R functions for
glms never mention multi-category qualitative dependent variables and
also because the distribution does not seem to fall into the
exponential family.  However, some textbooks do include the
multicategory models in the GLM treatment.


-- 
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Segmentation fault/buffer overflow with fix() in Fedora Core 5 from Extras repository

2006-10-19 Thread Paul Johnson
The Fedora Extras update of R found its way onto my systems today and
I noted that fix() and edit() no longer work. There is a program crash
that closes up R, but it does not leave a core file.   I've tested by
turning off SELinux, it had no effect.

Do you see it too?  What do you think?  It happens on both systems
I've tested. As far as I know, both of these systems are up-to-date.

I restarted with R -d gdb to try to get a backtrace, but gdb says
the debugging symbols have been removed and I don't see the
debuginfo package on the Extras archive.  I'm attaching the gdb info
later, but  I don't think it helps much without line numbers..

I think my next step will be to re-build R on these systems and see if
the problem disappears. Right? If it still crashes, I'll make sure I
have debugging symbols and give you a full backtrace.  If it does not
crash, I'll let you know as well


Here's the session that crashes


 library(car)
 data(Chile)
 edit(Chile)
*** buffer overflow detected ***: /usr/lib/R/bin/exec/R terminated
=== Backtrace: =
/lib/libc.so.6(__chk_fail+0x29)[0xa8079d]
/lib/libc.so.6[0xa8195d]
/usr/lib/R/modules//R_X11.so[0x7c094a]
/usr/lib/R/modules//R_X11.so[0x7c20dd]
/usr/lib/R/modules//R_X11.so[0x7c3428]
/usr/lib/R/modules//R_X11.so(RX11_dataentry+0xa25)[0x7c4b15]
/usr/lib/R/lib/libR.so[0x2bf4c5]
/usr/lib/R/lib/libR.so[0x1dfd26]
/usr/lib/R/lib/libR.so(Rf_eval+0x483)[0x1b0973]
/usr/lib/R/lib/libR.so[0x1b4d28]
/usr/lib/R/lib/libR.so(Rf_eval+0x483)[0x1b0973]
/usr/lib/R/lib/libR.so[0x1b1887]
/usr/lib/R/lib/libR.so(Rf_eval+0x483)[0x1b0973]
/usr/lib/R/lib/libR.so(Rf_applyClosure+0x2a7)[0x1b2f67]
/usr/lib/R/lib/libR.so[0x1e146f]
/usr/lib/R/lib/libR.so(Rf_usemethod+0x609)[0x1e28d9]
/usr/lib/R/lib/libR.so[0x1e30ae]
/usr/lib/R/lib/libR.so(Rf_eval+0x483)[0x1b0973]
/usr/lib/R/lib/libR.so(Rf_applyClosure+0x2a7)[0x1b2f67]
/usr/lib/R/lib/libR.so(Rf_eval+0x2f4)[0x1b07e4]
/usr/lib/R/lib/libR.so(Rf_ReplIteration+0x311)[0x1d01b1]
/usr/lib/R/lib/libR.so[0x1d03c1]
/usr/lib/R/lib/libR.so(run_Rmainloop+0x60)[0x1d0710]
/usr/lib/R/lib/libR.so(Rf_mainloop+0x1c)[0x1d073c]
/usr/lib/R/bin/exec/R(main+0x46)[0x8048696]
/lib/libc.so.6(__libc_start_main+0xc6)[0x9c41fe]
/usr/lib/R/bin/exec/R[0x8048591]
=== Memory map: 
0011-00329000 r-xp  08:05 553625 /usr/lib/R/lib/libR.so
00329000-00336000 rwxp 00219000 08:05 553625 /usr/lib/R/lib/libR.so
00336000-003cd000 rwxp 00336000 00:00 0
003cd000-003d5000 r-xp  08:05 683486 /lib/libnss_files-2.4.90.so
003d5000-003d6000 r-xp 7000 08:05 683486 /lib/libnss_files-2.4.90.so
003d7000-003f5000 r-xp  08:05 1045723
/usr/lib/R/library/grDevices/libs/grDevices.so
003f5000-003f6000 rwxp 0001d000 08:05 1045723
/usr/lib/R/library/grDevices/libs/grDevices.so
003f6000-003fc000 r-xp  08:05 1046746
/usr/lib/R/library/methods/libs/methods.so
003fc000-003fd000 rwxp 5000 08:05 1046746
/usr/lib/R/library/methods/libs/methods.so
003fd000-0040 r-xp  08:05 1050384
/usr/lib/R/library/tools/libs/tools.so
0040-00401000 rwxp 2000 08:05 1050384
/usr/lib/R/library/tools/libs/tools.so
00413000-0043d000 r-xp  08:05 553410 /usr/lib/R/lib/libRblas.so
0043d000-0043e000 rwxp 00029000 08:05 553410 /usr/lib/R/lib/libRblas.so
0043e000-004b9000 r-xp  08:05 2868184/usr/lib/libgfortran.so.1.0.0
004b9000-004ba000 rwxp 0007b000 08:05 2868184/usr/lib/libgfortran.so.1.0.0
004ba000-0050b000 r-xp  08:05 1049782
/usr/lib/R/library/stats/libs/stats.so
0050b000-0050d000 rwxp 0005 08:05 1049782
/usr/lib/R/library/stats/libs/stats.so
0051-00511000 r-xp 0051 00:00 0  [vdso]
00511000-0060a000 r-xp  08:05 2868912/usr/lib/libX11.so.6.2.0
0060a000-0060e000 rwxp 000f9000 08:05 2868912/usr/lib/libX11.so.6.2.0
00664000-0067b000 r-xp  08:05 683622 /lib/libpcre.so.0.0.1
0067b000-00692000 rwxp 00017000 08:05 683622 /lib/libpcre.so.0.0.1
007bb000-007d4000 r-xp  08:05 1050764/usr/lib/R/modules/R_X11.so
007d4000-007d5000 rwxp 00018000 08:05 1050764/usr/lib/R/modules/R_X11.so
007d5000-007e1000 rwxp 007d5000 00:00 0
00896000-008eb000 r-xp  08:05 2876525/usr/lib/libXt.so.6.0.0
008eb000-008ef000 rwxp 00054000 08:05 2876525/usr/lib/libXt.so.6.0.0
0099-009a7000 r-xp  08:05 683431 /lib/ld-2.4.90.so
009a7000-009a8000 r-xp 00017000 08:05 683431 /lib/ld-2.4.90.so
009a8000-009a9000 rwxp 00018000 08:05 683431 /lib/ld-2.4.90.so
009ab000-00acf000 r-xp  08:05 683432 /lib/libc-2.4.90.so
00acf000-00ad1000 r-xp 00124000 08:05 683432 /lib/libc-2.4.90.so
00ad1000-00ad2000 rwxp 00126000 08:05 683432 /lib/libc-2.4.90.so
00ad2000-00ad5000 rwxp 00ad2000 00:00 0
00ad7000-00afc000 r-xp  08:05 683433 /lib/libm-2.4.90.so
00afc000-00afd000 r-xp 00024000 08:05 683433 /lib/libm-2.4.90.so
00afd000-00afe000 rwxp 00025000 08:05 683433 /lib/libm-2.4.90.so
00b0-00b02000 r-xp  08:05 683435   

Re: [R] Generate object names from variables

2006-07-14 Thread Paul Johnson
I collected some advice about this question a couple of  years ago.
This might help.

http://pj.freefaculty.org/R/Rtips.html#2.1 Add variables to a data
frame (or list)
and the next one after that.


On 7/14/06, Marc Schwartz (via MN) [EMAIL PROTECTED] wrote:
 On Fri, 2006-07-14 at 16:57 +0200, Georg Otto wrote:
  Hi,
 
  I want to generate object names out of variables in a sort of variable
  substitution.
 
  first i generate some vectors and an empty list:
 
   vector.a-c(a,b)
   vector.b-c(c,d)
   vector.c-c(e,f)
   vectors-c(vector.a, vector.b, vector.c)
   vectors
  [1] vector.a vector.b vector.c
 
   vectorlist-list()
 
  What I would then like to do is to generate elements of the list by
  using variables, somehow like this (does not work):
 
  for (i in vectors) {
  + list$i-i
  + }
 
  To end up with a list like this:
 
   list
  $vector.a
   [1] a b
  $vector.b
   [1] c d
  $vector.c
   [1] e f
 
 
  Any hint will be appreciated.
 
  Cheers,
 
  Georg


-- 
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Can't there be a cd command?

2006-05-10 Thread Paul Johnson
It is a FAQ in our Linux lab.  People start emacs and fire up R via
ess, and then they have no idea 'where they are.  For computer
experts, it is not a problem, but for people who don't know much about
computers, it is a pretty big problem.  They have data in some
subdirectory, but almost invariably they don't get emacs  R started
from that same place.

Unfortunately, for our users, it does not help to simply re-label
setwd as cd.  Both commands imply a deeper understanding of the OS
than they have.  Also, unfortunately, these are the same people who
don't understand that FAQs exist and should be consulted. These people
are so new/timid that asking in r-help would be the last thing to
cross their mind.

I've wondered if it would not help to have the R prompt include the
directory name, as in an x terminal.

pj

On 5/10/06, Prof Brian Ripley [EMAIL PROTECTED] wrote:


 Exactly.  I don't think I have ever used setwd() on Linux.

 Also, I have never seen this asked before for a Unix-alike, so it seems
 not to be a FAQ.  There is a common tendency for users who run into a
 problem to think everyone does too, and it isn't necessarily so.
 Frequently asked questions do make it to the FAQs: it is a defence
 mechanism for the volunteers supporting R.

 help.search(directory)  gets you there.

 --
 Brian D. Ripley,  [EMAIL PROTECTED]
 Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
 University of Oxford, Tel:  +44 1865 272861 (self)
 1 South Parks Road, +44 1865 272866 (PA)
 Oxford OX1 3TG, UKFax:  +44 1865 272595

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html



--
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Making R talk to Win/OpenBUGS in Linux (again)

2006-05-04 Thread Paul Johnson
 that programs can use, and I've
 fiddled this lots of ways, but it fails, saying
 
 
 Error in paste(WINEPATH, -w, x) : object WINEPATH not found
 library(R2WinBUGS)
 
 I hope that by giving you this small not-yet-working example, you can
 spot where I'm going wrong.
 
 ##Paul Johnson 2006-04-29
 
 
 library(R2WinBUGS)
 # Copied from Prof Andrew Gelman's example
 model.file - system.file(package = R2WinBUGS, model, schools.txt)
 #
 file.show(model.file)
 data(schools)
 schools
 
 J - nrow(schools)
 y - schools$estimate
 sigma.y - schools$sd
 data - list (J, y, sigma.y)
 inits - function(){
 list(theta = rnorm(J, 0, 100), mu.theta = rnorm(1, 0, 100),
 sigma.theta = runif(1, 0, 100))
 }
 parameters - c(theta, mu.theta, sigma.theta)
 
 
 schools.sim - bugs(data, inits, parameters, model.file, n.chains = 3,
 n.iter = 1000,bugs.directory = /usr/local/share/WinBUGS14/,
 working.directory = /tmp, clearWD = FALSE, useWINE=T, newWINE=T,
 WINE=/usr/bin/wine,WINEPATH=/usr/bin/winepath)
 
 
 
 I do have the binary /usr/bin/winepath, but can't tell how to give
 R2WinBUGS what it wants.  Many failed efforts below:
 
 
 
 
 schools.sim - bugs(data, inits, parameters, model.file, n.chains = 3, 
 n.iter = 1000,bugs.directory = /usr/local/share/WinBUGS14/, 
 working.directory = /tmp, clearWD = FALSE, useWINE=T, newWINE=T, 
 WINE=/usr/bin/wine,WINEPATH=/usr/bin/)
 
 Error in paste(WINEPATH, -w, x) : object WINEPATH not found
 
 
 schools.sim - bugs(data, inits, parameters, model.file, n.chains = 3, 
 n.iter = 1000,bugs.directory = /usr/local/share/WinBUGS14/, 
 working.directory = /tmp, clearWD = FALSE, useWINE=T, newWINE=T, 
 WINE=/usr/bin/wine)
 
 Error in if (!nchar(WINEPATH)) { : argument is of length zero
 
 
 schools.sim - bugs(data, inits, parameters, model.file, n.chains = 3, 
 n.iter = 1000,bugs.directory = /usr/local/share/WinBUGS14/, 
 working.directory = /tmp, clearWD = FALSE, useWINE=T, newWINE=T, 
 WINE=/usr/bin/wine,WINEPATH=/usr/bin)
 
 Error in paste(WINEPATH, -w, x) : object WINEPATH not found
 
 
 
 I know there are many unknowns here, but hope somebody has at least
 one working example that we can build on.
 
 
 




--
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] using latex() in R for Unix

2006-05-04 Thread Paul Johnson
This thread piqued my interest in how to use Hmisc latex() to produce
the tables that I actually want, rather than the ones that come out by
default.  Actually, I'd be glad to use R2HTML or any other tool if I
can make the output suit my taste.

 Here's a small working example that does not require any special libraries.


x - rnorm(100)
y - rnorm(100)
 mylm - lm(y~x)
 Estimates - cbind(coef(mylm), sqrt(diag(vcov(mylm
 colnames(Estimates) - c(OLS estimate,std. error)
 rownames(Estimates) - c(Intercept,Education)
 latex(Estimates,file=/home/pauljohn/test4.tex,digits=4)

That creates the top part of the table like I want it, except there
are no stars on the standard errors.

I want to add the diagnostic information, sample size, the R-square,
and root mean square error:

sumobj - summary(mylm)
## tedious pasting alert:
sumStat - paste(N=  ,sum(sumobj$df[-1]), , R^2 =, 
sumobj$r.squared, , RMSE = , sumobj$sigma)

# The following does not have the desired effect because it creates
and appends a whole new table

latex(sumStat, file=/home/pauljohn/test4.tex, append=T)

How can I snug up sumStat right at the end of the table?

pj


On 4/5/06, Frank E Harrell Jr [EMAIL PROTECTED] wrote:
 Peter Dalgaard wrote:
  Brian Quinif [EMAIL PROTECTED] writes:
 
 
 Yes, Peter, I do mean the later() function from the Hmsic package.
 Below is a reproducible example.  Also, does anyone know how to stop R
 from automatically running LaTeX on the file produced by latex()
 
 
  The last bit is easy. It's the print method that does that, so just
  don't print. E.g.
 
   invisible(latex())

 Or do w - latex(.., file='...')

 Frank
 


 --
 Frank E Harrell Jr   Professor and Chair   School of Medicine
   Department of Biostatistics   Vanderbilt University

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html



--
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Making R talk to Win/OpenBUGS in Linux (again)

2006-05-01 Thread Paul Johnson
Dear Jun:

How about telling us which version of Linux you use and how you make
linbugs run?  As far as I can tell, the OpenBUGS people have
intentionally removed linbugs from their version 2.2.  That leaves us
with various scripts that people have posted  tried, none work for
me.  Fedora Core 5, I cannot get any of the linbugs scripts that are
floating around to work.

I tried re-building the program thus:


# gcc -o cbugs CBugs.c  -ldl

That creates an executable cbugs, but there is no joy for me.

$ ./cbugs
failed to install signal
32
failed to install signal
33


* BlackBox
* illegal memory read [ad = ]
- HostFiles.NewFileRef  (pc=0D9B, fp=BFC73BE4)
- HostFiles.OpenFile  (pc=0E0E, fp=BFC73BFC)
- HostFiles.Directory.Old  (pc=29B2, fp=BFC73EA4)
- StdLoader.ThisObjFile  (pc=031D, fp=BFC744C0)
- StdLoader.ReadHeader  (pc=075F, fp=BFC746E0)
- StdLoader.LoadMod  (pc=107B, fp=BFC747F8)
- StdLoader.LoadMod  (pc=11E0, fp=BFC74910)
- StdLoader.LoadMod  (pc=11E0, fp=BFC74A28)
- StdLoader.Hook.ThisMod  (pc=1355, fp=BFC74A3C)
- Kernel.ThisMod  (pc=1224, fp=BFC74B58)
- Init.Init  (pc=004B, fp=BFC74B70)
- Init.$$  (pc=000A, fp=BFC74B80)




# export LD_ASSUME_KERNEL=2.4.1


# ./cbugs
./cbugs: error while loading shared libraries: libdl.so.2: cannot open
shared object file: No such file or directory

$ ldd ./cbugs
linux-gate.so.1 =  (0x00a2b000)
libdl.so.2 = /lib/libdl.so.2 (0x00ba9000)
libc.so.6 = /lib/libc.so.6 (0x00a4d000)
/lib/ld-linux.so.2 (0x00a2c000)

And after doing that LD_ASSUME_KERNEL thingie, nothing in the shell
works anymore...

$ which cbugs
/usr/bin/which: error while loading shared libraries: libc.so.6:
cannot open shared object file: No such file or directory

On 5/1/06, jun yan [EMAIL PROTECTED] wrote:
 I have used linbugs with the rbugs package for a recent work. It might be
 worthwhile trying.

  Jun



 On 5/1/06, Uwe Ligges [EMAIL PROTECTED]
 wrote:
  Paul Johnson wrote:
 
   Thank you very much.  With the insertion of WINEPATH declaration, then
   the following example program does run.  And really fast, too!
  
  
  
  
  
   library(R2WinBUGS)
  
   WINEPATH - /usr/bin/winepath
 
 
  Will be fixed in the package real soon now.
  Gregor, many thanks for the patches.
 
  Best,
  Uwe Ligges
 
 
   # An example
 model file is given in:
   model.file - system.file(package = R2WinBUGS, model, schools.txt)
   # Let's take a look:
   file.show(model.file)
  
   # Some example data (see ?schools for details):
   data(schools)
   schools
  
   J - nrow(schools)
   y - schools$estimate
   sigma.y - schools$sd
   data - list (J, y,  sigma.y)
   inits - function(){
   list(theta = rnorm(J, 0, 100), mu.theta = rnorm(1, 0, 100),
   sigma.theta = runif(1, 0, 100))
   }
   parameters - c(theta,  mu.theta, sigma.theta)
  
  
   schools.sim - bugs(data, inits, parameters, model.file, n.chains = 3,
   n.iter = 1000,bugs.directory = /usr/local/share/WinBUGS14/,
   working.directory = /tmp, clearWD = FALSE, useWINE=T, newWINE=T,
   WINE=/usr/bin/wine,WINEPATH=WINEPATH)
  
  
  
  
   On 4/30/06, Gregor Gorjanc  [EMAIL PROTECTED] wrote:
  
  Hello Paul,
  
  thank you very much for this report. You caught a bug in R2WinBUGS that
  was introduced by me. I added support for winepath in 1.1-1 version.
  Since I switch between Windows and Linux I always set WINEPATH and then
  use it in bugs(). That's why I forgot to add it in further calls in
  bugs(). How silly :( This actually means that nobody else beside you
  tried to run R2WinBUGS under Linux. To bad. Please report summary to
  BUGS list as you have also asked there for help and I did not had time
  to answer you. I have been successfully running R2WinBUGS under Linux
  for quite some time now.
  
  Now you have the following options:
  
  A Set WINEPATH before you call bugs()
  
  i.e.
  
  WINEPATH - /usr/bin/winepath
  
  bugs(..., WINEPATH=WINEPATH)
  
  This is the fastest approach for you.
  
  B Apply the following diffs, build and install R2WinBUGS by yourself
  
  I am also sendind them to maintainer of the package. So this should be
  fixed soon also on CRAN. Uwe?
  
  --- R2WinBUGSOrig/R/bugs.R  2006-04-29
 21:44:59.0 +0200
  +++ R2WinBUGS/R/bugs.R  2006-04-30 12:52: 36.0 +0200
  @@ -44,7 +44,7 @@
 } else new.model.file - model.file
 bugs.script(parameters.to.save, n.chains, n.iter, n.burnin, n.thin,
   bugs.directory, new.model.file , debug=debug,
 is.inits=!is.null(inits),
  -bin = bin, DIC = DIC, useWINE = useWINE, newWINE = newWINE)
  +bin = bin, DIC = DIC, useWINE = useWINE, newWINE = newWINE,
  WINEPATH=WINEPATH)
 #if (useWINE  missing(bugs.directory))
 #  bugs.directory - winedriveTr(bugs.directory)
 bugs.run(n.burnin, bugs.directory, WINE = WINE)
  
  --- R2WinBUGSOrig/R/bugs.script.R   2006-04-29
 21:44: 59.0 +0200
  +++ R2WinBUGS/R

Re: [R] Making R talk to Win/OpenBUGS in Linux (again)

2006-04-30 Thread Paul Johnson
 of
  R2WinBUGS, and R2WinBUGS is now buildable on Linux, it looks like
  R2WinBUGS may be the way to go.  But it fails.  The error centers
  around the WINEPATH setting.  I've learned that wine has a function
  winepath that returns information that programs can use, and I've
  fiddled this lots of ways, but it fails, saying
 
 
  Error in paste(WINEPATH, -w, x) : object WINEPATH not found
  library(R2WinBUGS)
 
  I hope that by giving you this small not-yet-working example, you can
  spot where I'm going wrong.
 
  ##Paul Johnson 2006-04-29
 
 
  library(R2WinBUGS)
  # Copied from Prof Andrew Gelman's example
  model.file - system.file(package = R2WinBUGS, model, schools.txt)
  #
  file.show(model.file)
  data(schools)
  schools
 
  J - nrow(schools)
  y - schools$estimate
  sigma.y - schools$sd
  data - list (J, y, sigma.y)
  inits - function(){
  list(theta = rnorm(J, 0, 100), mu.theta = rnorm(1, 0, 100),
  sigma.theta = runif(1, 0, 100))
  }
  parameters - c(theta, mu.theta, sigma.theta)
 
 
  schools.sim - bugs(data, inits, parameters, model.file, n.chains = 3,
  n.iter = 1000,bugs.directory = /usr/local/share/WinBUGS14/,
  working.directory = /tmp, clearWD = FALSE, useWINE=T, newWINE=T,
  WINE=/usr/bin/wine,WINEPATH=/usr/bin/winepath)
 
 
 
  I do have the binary /usr/bin/winepath, but can't tell how to give
  R2WinBUGS what it wants.  Many failed efforts below:
 
 
 
  schools.sim - bugs(data, inits, parameters, model.file, n.chains = 3, 
  n.iter = 1000,bugs.directory = /usr/local/share/WinBUGS14/, 
  working.directory = /tmp, clearWD = FALSE, useWINE=T, newWINE=T, 
  WINE=/usr/bin/wine,WINEPATH=/usr/bin/)
 
  Error in paste(WINEPATH, -w, x) : object WINEPATH not found
 
  schools.sim - bugs(data, inits, parameters, model.file, n.chains = 3, 
  n.iter = 1000,bugs.directory = /usr/local/share/WinBUGS14/, 
  working.directory = /tmp, clearWD = FALSE, useWINE=T, newWINE=T, 
  WINE=/usr/bin/wine)
 
  Error in if (!nchar(WINEPATH)) { : argument is of length zero
 
  schools.sim - bugs(data, inits, parameters, model.file, n.chains = 3, 
  n.iter = 1000,bugs.directory = /usr/local/share/WinBUGS14/, 
  working.directory = /tmp, clearWD = FALSE, useWINE=T, newWINE=T, 
  WINE=/usr/bin/wine,WINEPATH=/usr/bin)
 
  Error in paste(WINEPATH, -w, x) : object WINEPATH not found
 
 
 
 
  I know there are many unknowns here, but hope somebody has at least
  one working example that we can build on.


 --
 Lep pozdrav / With regards,
 Gregor Gorjanc

 --
 University of Ljubljana PhD student
 Biotechnical Faculty
 Zootechnical Department URI: http://www.bfro.uni-lj.si/MR/ggorjan
 Groblje 3   mail: gregor.gorjanc at bfro.uni-lj.si

 SI-1230 Domzale tel: +386 (0)1 72 17 861
 Slovenia, Europefax: +386 (0)1 72 17 888

 --
 One must learn by doing the thing; for though you think you know it,
  you have no certainty until you try. Sophocles ~ 450 B.C.
 --





--
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Making R talk to Win/OpenBUGS in Linux (again)

2006-04-29 Thread Paul Johnson
I'm back!

I've just learned that, on a fully updated Fedora Core Linux5 sytem,
the working solution to access Winbugs under wine via the R package
rbugs no longer works.  Here was my last post on this topic (with
the formerly working solution) from January.

http://finzi.psych.upenn.edu/R/Rhelp02a/archive/68497.html

Currently, what happens is that WinBUGS starts up, but just sits there
doing nothing.  I do not know if the problem is due to a change in
wine or rbugs, and since both of them are updated, it is hard to say. 
I'm thinking it is a wine or perhaps even a kernel related problem. 
Inside WinBUGS running under wine, it does not even work to manually
open the bugs.script file and then choose Model/Script.  it just
returns a screen full of errors saying the commands fail.  I don't
think rbugs gets even that far, however, since the WinBUGS window does
pop up, but nothing happens.

rbugs has been retooled to emphasize use of linbugs, the
OpenBUGS-based thing for linux, but I can't get linbugs to run at all
on my system, and the linbugs program itself appears to have been
removed from OpenBUGS distribution altogether. (I'm following that
with separate email to the OpenBUGS group).

I think JAGS is the right long term solution, but currently I'm in the
middle of a semester trying to teach about Bayesian stats and the
students have some familiarity with WinBUGS and I'm interested in
making it work. So while I'm learning more about JAGS and the bayesmix
package that supports access to it from R, I still would like a way to
interact with Winbugs through Wine.

If anybody has rbugs working in current Linux, please tell me
how--give me a working example?

In the meanwhile, I've noticed that nightly updates have successfully
installed R2WinBUGS on linux systems and I've got a small test case
that I'd like to ask about.  Since rbugs is a linux adaptation of
R2WinBUGS, and R2WinBUGS is now buildable on Linux, it looks like
R2WinBUGS may be the way to go.  But it fails.  The error centers
around the WINEPATH setting.  I've learned that wine has a function
winepath that returns information that programs can use, and I've
fiddled this lots of ways, but it fails, saying


Error in paste(WINEPATH, -w, x) : object WINEPATH not found
library(R2WinBUGS)

I hope that by giving you this small not-yet-working example, you can
spot where I'm going wrong.

##Paul Johnson 2006-04-29


library(R2WinBUGS)
# Copied from Prof Andrew Gelman's example
model.file - system.file(package = R2WinBUGS, model, schools.txt)
#
file.show(model.file)
data(schools)
schools

J - nrow(schools)
y - schools$estimate
sigma.y - schools$sd
data - list (J, y, sigma.y)
inits - function(){
list(theta = rnorm(J, 0, 100), mu.theta = rnorm(1, 0, 100),
sigma.theta = runif(1, 0, 100))
}
parameters - c(theta, mu.theta, sigma.theta)


schools.sim - bugs(data, inits, parameters, model.file, n.chains = 3,
n.iter = 1000,bugs.directory = /usr/local/share/WinBUGS14/,
working.directory = /tmp, clearWD = FALSE, useWINE=T, newWINE=T,
WINE=/usr/bin/wine,WINEPATH=/usr/bin/winepath)



I do have the binary /usr/bin/winepath, but can't tell how to give
R2WinBUGS what it wants.  Many failed efforts below:


 schools.sim - bugs(data, inits, parameters, model.file, n.chains = 3, n.iter 
 = 1000,bugs.directory = /usr/local/share/WinBUGS14/, working.directory = 
 /tmp, clearWD = FALSE, useWINE=T, newWINE=T, 
 WINE=/usr/bin/wine,WINEPATH=/usr/bin/)
Error in paste(WINEPATH, -w, x) : object WINEPATH not found
 schools.sim - bugs(data, inits, parameters, model.file, n.chains = 3, n.iter 
 = 1000,bugs.directory = /usr/local/share/WinBUGS14/, working.directory = 
 /tmp, clearWD = FALSE, useWINE=T, newWINE=T, WINE=/usr/bin/wine)
Error in if (!nchar(WINEPATH)) { : argument is of length zero
 schools.sim - bugs(data, inits, parameters, model.file, n.chains = 3, n.iter 
 = 1000,bugs.directory = /usr/local/share/WinBUGS14/, working.directory = 
 /tmp, clearWD = FALSE, useWINE=T, newWINE=T, 
 WINE=/usr/bin/wine,WINEPATH=/usr/bin)
Error in paste(WINEPATH, -w, x) : object WINEPATH not found


I know there are many unknowns here, but hope somebody has at least
one working example that we can build on.


--
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] need automake/autoconf help to build RnetCDF and ncdf packages

2006-04-25 Thread Paul Johnson
I imagine this where are your header files problem comes up in other
packages, so I'm asking this as a general R question. How should
configure scripts be re-written so they look in more places?


Briefly, the problem is that Fedora-Extras installs the header files
in a subdirectory /usr/include/netcdf-3 rather than /usr/include:

# rpm -ql netcdf-devel
/usr/include/netcdf-3
/usr/include/netcdf-3/ncvalues.h
/usr/include/netcdf-3/netcdf.h
/usr/lib/netcdf-3/libnetcdf.a
/usr/lib/netcdf-3/libnetcdf_c++.a
/usr/lib/netcdf-3/libnetcdf_g77.a

Last week I posted in this list that I re-built the Fedora-Extras
netcdf rpm so that it would have more standard installation, and then
I was able to make RNetCDF work.

In the meanwhile,  I posted in bugzilla.redhat.com asking if they
might use the standard packaging, but their response is an adamant
refusal:

https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=189734

When netcdf updates are issued in the Fedora-Extras network, the
special hacks I put in to un-do their special hacks are lost, and
netcdf programs don't work anymore.

The attempt to build ncdf fails inside R or on the command line, but
it gives a GOOD HINT about a command line work around:

# R CMD INSTALL ncdf_1.5.tar.gz
[...]

checking /sw/include/netcdf.h presence... no
checking for /sw/include/netcdf.h... no

Fatal error: I cannot find the directory that holds the netcdf include
file netcdf.h!
You can specify it as follows:
  ./configure --with-netcdf_incdir=directory_with_file_netcdf.h

 *** Special note for R CMD INSTALL users: *
 The syntax for specifying multiple --configure-args does not seem to be
 well documented in R.  If you have installed the netcdf include and library
 directories in some non-standard location, you can specify BOTH these
 during the R CMD INSTALL process using the following syntax:

   R CMD INSTALL
--configure-args=-with-netcdf_incdir=/path/to/netcdf/incdir
-with-netcdf_libdir=/path/to/netcdf/libdir ncdf_1.1.tar.gz

 where you should, of course, specify your own netcdf include and library
 directories, and the actual package name.
 ***


I found that the following did work!

# R CMD INSTALL
--configure-args=-with-netcdf_incdir=/usr/include/netcdf-3
-with-netcdf_libdir=/usr/lib/netcdf-3 ncdf_1.5.tar.gz

It is not the best solution, because special administrative effort is
required. And the install.packages approach inside R won't work.

However, with RNetCDF, the problem is slightly worse, and no such
helpful message appears:

# R CMD INSTALL  RNetCDF_1.1-3.tar.gz
* Installing *source* package 'RNetCDF' ...
checking for gcc... gcc
checking for C compiler default output... a.out
checking whether the C compiler works... yes
checking whether we are cross compiling... no
checking for executable suffix...
checking for object suffix... o
checking whether we are using the GNU C compiler... yes
checking whether gcc accepts -g... yes
checking for main in -lnetcdf... no
configure: error: netcdf library not found
ERROR: configuration failed for package 'RNetCDF'
** Removing '/usr/lib/R/library/RNetCDF'


I have no reason to doubt that the Fedora-Extras authors are right,
and that some changes in the configure scripts for these packages are
required.

In RnetCDF's configure.ac file, I see the place where it specifies the
NETCDF_INCDIR

if test -z ${NETCDF_PATH}; then
AC_CHECK_FILE(/usr/local/include/netcdf.h,
[USR_LOCAL_NETCDF_H=TRUE], [USR_LOCAL_NETCDF_H=FALSE])
if test ${USR_LOCAL_NETCDF_H} = TRUE; then
NETCDF_INCDIR=/usr/local/include
NETCDF_LIBDIR=/usr/local/lib
NETCDF_LIBNAME=netcdf
HAVE_NETCDF_H=TRUE
elif test ${HAVE_NETCDF_H} = FALSE; then
AC_CHECK_FILE(/usr/include/netcdf.h,
[USR_NETCDF_H=TRUE], [USR_NETCDF_H=FALSE])
if test ${USR_NETCDF_H} = TRUE; then
NETCDF_INCDIR=/usr/include
NETCDF_LIBDIR=/usr/lib
NETCDF_LIBNAME=netcdf
HAVE_NETCDF_H=TRUE
fi
fi
else
NETCDF_INCDIR=${NETCDF_PATH}/include
NETCDF_LIBDIR=${NETCDF_PATH}/lib
NETCDF_LIBNAME=netcdf
AC_CHECK_FILE(${NETCDF_INCDIR}/netcdf.h,
[INCDIR_NETCDF_H=TRUE], [INCDIR_NETCDF_H=FALSE])
if test ${INCDIR_NETCDF_H} = TRUE; then
HAVE_NETCDF_H=TRUE
fi

fi

I've tried fiddling around in this, and then typing

#autoconf configure.ac  newconfigure

sh ./newconfigure

But it always ends the same:

checking for main in -lnetcdf... no
: error: netcdf library not found

So, is there somebody here who know how configure scripts ought to be
written to accomodate this?



--
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 

[R] using betareg: problems with anova and predict

2006-04-17 Thread Paul Johnson
Dear R-helpers:

We have had fun using betareg to fit models with proportions as
dependent variables.

However, in the analysis of these models we found some wrinkles and
don't know where is the best place to start looking for a fix.

The problems we see (so far) are that

1. predict ignores newdata
2. anova does not work

Here is the small working example:


x - c(1, 3, 1, 4, 5, 3, 5, 2, 5, 2)
y - c(.3, .4, .4, .5, , .7, .4, .3, .4, .3, .5)
x2 - c( 4, 2, 1, 5, 1, 2, 4, 2, 1, 3)
library(betareg)
mybreg - betareg(y~x)
summary(mybreg)

predict(mybreg, newdata=data.frame(x=c(2,2)))

mybreg2 - betareg(y~x+x2)
summary(mybreg)
anova(mybreg, mybreg2)

---
Example output:

 predict(mybreg, newdata=data.frame(x=c(2,2)))
 [1] 0.3903155 0.4207632 0.3903155 0.4362319 0.4518258 0.4207632 0.4518258
 [8] 0.4054484 0.4518258 0.4054484
...
 anova(mybreg, mybreg2)
Error in as.double.default(lapply(objects, df.residual)) :
(list) object cannot be coerced to 'double'

I have been digging in this a little bit and notice betareg does not
return the log likelihood at the maximum likelihood estimates, but I
am able to hack the file /usr/lib/R/library/betareg/R/betareg and save
the value of the optim function and print that out, so at least I
can do a likelihood-ratio test by hand in place of the anova.  But I'm
stumped on why the predict ignores newdata.

I'm using R-2.1.1 on Fedora Core Linux 5 with the betareg package from CRAN.

In case you are interested in betareg, we have found this article to
be extremely readable and helpful: FERRARI, S.L.P., CRIBARI-NETO, F.
(2004). Beta regression for  modeling rates and proportions. Journal
of Applied Statistics, v. 31, n. 7, p. 799-815.


--
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Solution: Making RNetCDF work on Fedora Linux

2006-04-12 Thread Paul Johnson
Dear R users who might like to use the package RNetCDF on Fedora Linux:

Fedora (versions 4 and 5) users might have noticed that the default 
install of the netcdf and netcdf-devel packages from the Fedora Extra 
archive is inconsistent with the R package RNetCDF.  The attempt to 
install RNetCDF results in a failure in the configure stage because the 
header  library info for netcdf cannot be found. 

In here,

http://pj.freefaculty.org/software/PolsFC5Updates/

I just deposited an updated set of RPMS for netcdf version 3.6.1.  You 
need the netcdf and netcdf-devel packages in order to install the 
RNetCDF package inside R.  You don't need debuginfo, unless you are 
trying to use the GNU debugger to find bugs in netcdf itself. 

The problem installing RNetCDF was caused by the Fedora-Extras package 
maintainer's use of the install directory /usr/include/netcdf-3 rather 
than /usr/include.   I had several very productive emails with RNetCDF's 
packager Pavel Michna michna at giub.unibe.ch and after a while, I 
concluded that the Fedora-Extras packaging is just wrong, and there's no 
point in trying to hack RNetCDF to work around it. So  I removed the 
unusual packaging. While I was in the repackaging mood, I built the RPMS 
for the netcdf version 3.6.1, one notch newer than Fedora-extra 
currently offers.   If you install these instead of the netcdf that 
Fedora extras provides, then it will work fine to do the RNetCDF install.

Ordinarily, you would face the trouble that the automatic updates for 
Fedora (via yum) might obliterate the netcdf that I give you. You could 
exclude netcdf in your global yum settings, but in this case you might 
not have to worry.  The package version I give is 3.6.1, whereas the 
Extras has only 3.6.0, and I gave the package release number 10.  So 
until the extras folk get up to 3.6.1 and release 10, I think your yum 
updates will leave this one alone.  Of course, if they package up 3.6.2, 
which is supposed to be released soon, then you will be in trouble.

I've posted the SPEC file and SRPM file in case you want to build your 
own RPMS.  Its fun!

-- 
Paul E. Johnson   email: [EMAIL PROTECTED]
Dept. of Political Sciencehttp://lark.cc.ku.edu/~pauljohn
1541 Lilac Lane, Rm 504  
University of Kansas  Office: (785) 864-9086
Lawrence, Kansas 66044-3177   FAX: (785) 864-5700

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Problems in package management after Linux system upgrade

2006-04-05 Thread Paul Johnson
I upgraded from Fedora Core 4 to Fedora Core 5 and I find a lot of
previously installed packages won't run because shared libraries or
other system things have changed out from under the installed R
libraries.  I do not know for sure if the R version now from
Fedora-Extras (2.2.1) is exactly the same one I was using in FC4.

I see problems in many packages. Example, Hmisc:
unable to load shared library '/usr/lib/R/library/Hmisc/libs/Hmisc.so':
  libgfortran.so.0: cannot open shared object file: No such file or directory
Error in library(Hmisc) : .First.lib failed for 'Hmisc'


If I manually do a re-install, then Hmisc and the other packages are fine.

I ?THINK? that if I had been using version 2.2.0, and now I have
2.2.1, then the checkBuilt option would force a rebuild on all
packages. Right?

I'm pasting in below the script that I run nightly to update all
packages and install any new ones in CRAN.  Can anybody suggest
changes that might cause a rebuild of packages that need rebuilding?



## PJ 2005-11-05
options(repos = http://cran.cnr.berkeley.edu/;)

#failPackages is the black list.  Things get inserted for various reasons

#rejected because they don't build on my system as of July, 2005, or
are obviously not needed
failPackages1 -
c(BRugs,tclkt2,cyclones,rpvm,ncdf,gtkDevice,gap,gnomeGUI,mimR,pathmix,rcdd,rgdal,rpvm,Rmpi,RQuantLib,RMySQL,
RNetCDF,RODBC,ROracle,rsprng,RWinEdt,taskPR)

#rejected because I subjectively think we don't need them
failPackages2 -
c(aaMI,AlgDesign,bim,caMassClass,CGIwithR,CDNmoney,clac,clim.pact,compositions,cyclones,hapassoc,haplo.score,haplo.stats,hapsim,httpRequest,
labdsv,kza,LMGene,Malmig,magic,negenes,oz,papply,spe,wavethresh,waveslim,tdthap)


failPackages3 - c(rcom,Rlsf)

#put the 3 sets of rejects together
failPackages - union(failPackages1,union(failPackages2,failPackages3))

#list of all currently installed packages
installedPackages - rownames (installed.packages() )

#do any installed packages need removal because they are on the blacklist?
needRemoval -  installedPackages %in% failPackages

# remove any blacklisted packages if they are already installed.
if (sum(needRemoval) 0)   remove.packages(installedPackages[needRemoval] )


#update the ones you want to keep
update.packages(ask=F, checkBuilt=T)

#get list of all new packages on CRAN
theNew - new.packages()


#do any of the new packages belong to the black list?
shouldFail - theNew %in% failPackages

#install non blacklisted packages that are in theNew list
if (sum(!shouldFail)  0) install.packages( theNew[!shouldFail],dependencies=T)

# VGAM is not in CRAN yet, but Zelig will use it.

if ( VGAM %in% installedPackages)
  update.packages(repos=http://www.stat.auckland.ac.nz/~yee,ask=F) else
  install.packages(VGAM, repos=http://www.stat.auckland.ac.nz/~yee;)

--
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Want to fit random intercept in logistic regression (testing lmer and glmmML)

2006-03-08 Thread Paul Johnson
Greetings.  Here is sample code, with some comments. It shows how I
can simulate data and estimate glm with binomial family when there is
no individual level random error, but when I add random error into the
linear predictor, I have a difficult time getting reasonable estimates
of the model parameters or the variance component.

There are no clusters here, just individual level responses, so
perhaps I am misunderstanding the translation from my simple mixed
model to the syntax of glmmML and lmer.  I get roughly the same
(inaccurate) fixed effect parameter estimates from glmmML and lmer,
but the information they give on the variance components is quite
different.

Thanks in advance.

Now I paste in the example code

### Paul Johnson [EMAIL PROTECTED]
### 2006-03-08

N - 1000
A - -1
B - 0.3


x - 1 + 10 * rnorm(N)
eta - A + B * x

pi - exp(eta)/(1+exp(eta))

myunif - runif(N)

y - ifelse(myunif  pi, 1, 0)

plot(x,y, main=bquote( eta[i] == .(A) +   .(B) * x[i] ))

text ( 0.5*max(x), 0.5, expression( Prob( y[i] == 1) == frac( 1 , 1 +
exp(-eta[i] 


myglm1 - glm ( y ~ x, family=binomial(link=logit) )
summary(myglm1)

## Just for fun
myglm2 - glm(y~x, family=quasibinomial)
summary(myglm2)



### Mixed model: random intercept with large variance

eta - A + B * x + 5 * rnorm(N)
pi - exp(eta)/(1+exp(eta))
myunif - runif(N)
y - ifelse(myunif  pi, 1, 0)

plot(x,y, main=bquote( eta[i] == .(A) +   .(B) * x[i] + u[i]))

text ( 0.5*max(x), 0.5, expression( Prob( y[i] == 1) == frac( 1 , 1 +
exp(-eta[i] 

### Parameter estimates go to hell, as expected
myglm3 - glm ( y ~ x, family=binomial(link=logit) )
summary(myglm3)

### Why doesn't quasibinomial show more evidence of the random intercept?
myglm4 - glm(y~x, family=quasibinomial)
summary(myglm4)



# Can I estimate with lmer?


library(lme4)

### With no repeated observations, what does lmer want?
### Clusters of size 1 ?
### In lme, I'd need random= ~ 1

idnumber - 1: length(y)

mylmer1 - lmer( y ~ x + ( 1 | idnumber ), family=binomial, method=Laplace )
summary(mylmer1)


### Glmm wants clusters, and I don't have any clusters with more than
1 observation
###

library(glmmML)


myglmm1 - glmmML(y~x, family=binomial, cluster = idnumber )

summary(myglmm1)

--
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] how to make plotmath expression work together with paste

2006-03-05 Thread Paul Johnson
Recent questions about using plotmath have renewed my interest in this question

I want to have expressions take values of variables from the
environment. I am able to use expressions, and I am able to use paste
to put text and values of variables into
plots.  But the two things just won't work together.

Here is some example code that shows what I mean.


plot(NA,xlim=c(0,100),ylim=c(0,100))

#show plot math works
text(16, 22, expression(slope == frac(partialdiff * f(hat(theta)),
 partialdiff * hat(theta

# I want to put values of variables into the middle of expressions
 avar1 - 10

# prove I am able to use paste!
text(40,40, paste(a word,avar1,other words)

# But I'm completely frustrated by the problem of making paste and
# expression work together.

amath1 - expression(slope == frac(partialdiff * f(hat(theta)), 
partialdiff * hat(theta)))

# still works
text(10,60,amath1)


# paste breaks math output
text(60,60, paste (amath1, = , avar1) )


# Can't paste expression
text(20,30, paste(a word,avar1, expression( partialdiff )))



# paste does not work anymore--value of avar1 not replaced
text(50,80, expression(paste(slope == frac(partialdiff *
f(hat(theta)),partialdiff * hat(theta)), avar1)))

n-12

# This does get the avar1 value in the string and put it in, but it
# piles the labels on top of each other. Wish they went side by side
 text(12, 15, labels=c(expression(bar(x) == sum(frac(x[i], n), i==1,
)), paste(gamma =,avar1)),cex = .8)




--
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] glm gives t test sometimes, z test others. Why?

2006-03-04 Thread Paul Johnson
I just ran example(glm) and happened to notice that models based on
the Gamma distribution gives a t test, while the Poisson models give a
z test. Why?

Both are b/s.e., aren't they?

I can't find documentation supporting the claim that the distribution
is more like t in one case than another, except in the Gaussian case
(where it really is t).

Aren't all of the others approximations based on the Wald idea that

bhat^2
   
Var(bhat)

is asymptotically Chi-square?  And that sqrt(Chi-square) is Normal.


While I'm asking, I wonder if glm should report them at all. I've
followed up on Prof Ripley's advice to read the Hauck  Donner article
and the successors, and I'm persuaded that we ought to just use the
likelihood ratio test to decide about individual parameters.

--
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] predict.glm - how to?

2006-03-02 Thread Paul Johnson
On 3/2/06, Laurits Søgaard Nielsen [EMAIL PROTECTED] wrote:
 Hi

 I have a little R problem. I have created a GLM model in R and now I want to
 predict some values outside the values I have in the model (extrapolate).


myglm - glm( some stuff here)
whatever - some-new-hypothetical-data-you-create
predict (myglm, newdata=whatever, type=response)

I have hints on this in Rtips

http://pj.freefaculty.org/R/Rtips.html#7.5


--
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Specification decisions in glm and lmer

2006-02-28 Thread Paul Johnson
I have been reviewing GLM and LMER to sharpen up some course notes and
would like to ask for your advice.

1. Is there a test that would be used to check whether a particular
functional form--say Gaussian, Gamma, or Inverse Gaussian, is more
appropriate in a Generalized Linear Model?  A theoretical reason to
choose one over the other is enough for me, but I've got some
skeptical students who want a significance test.  I've done some
simulation tests and find that the AIC is smaller if you guess the
correct distribution, but as long as you have the link and the
functional form of the right hand side correct, then your parameter
estimates are not horribly affected by the choice of the distribution.
 If your data is Gamma, for example, the estimates from GLM-Normal are
just about as good.  Do you think so?

A draft of my notes is on my new web site:

http://pj.freefaculty.org/ps707/GLM/GammaGLM-01.pdf

If you look down to sections 7 and 8, (or the graphs on p. 9 and p.
17), you might see what I mean.  I still have work to do on this and
if you have pointers, please email me.

2. Suppose you fix a random effects model with lmer and you wonder if
the fit is better (I suppose in the sense of anova) than a glm that
is the same, except for the lack of a random effect.

Is there a test for that?  I've been reading the thread in the r-help
list from December, 2005, in which people are asking for a standard
error or confidence interval for a variance component and the answer
seems to be that such a thing is not statistically meaningful.

Would you consider an example using survey data about attitudes toward
the police in US cities?  The variable PLACE indicates which city
people live in. We wonder if there should be a random intercept to
represent diversity across cities.  I want something that works like
anova() might.  What to do?


 gl1 - glm ( RE2TRCOP~ AGE +  POLINT + HAPPY + EFFCOM + TRNEI +GRPETH + TRBLK 
 + BLKCHIEF + RCOPDIFF+ RCOPRATE + RCOPBKIL + RCRIM3YR + RPCTBLAK, 
 family=binomial(link=logit),data=eldat)
 summary(gl1)

Call:
glm(formula = RE2TRCOP ~ AGE + POLINT + HAPPY + EFFCOM + TRNEI +
GRPETH + TRBLK + BLKCHIEF + RCOPDIFF + RCOPRATE + RCOPBKIL +
RCRIM3YR + RPCTBLAK, family = binomial(link = logit), data = eldat)

Deviance Residuals:
Min   1Q   Median   3Q  Max
-1.5129  -0.5515  -0.4072  -0.2807   2.7975

Coefficients:
 Estimate Std. Error z value Pr(|z|)
(Intercept) -1.549801   0.666762  -2.324 0.020106 *
AGE -0.020215   0.005423  -3.728 0.000193 ***
POLINT  -0.148035   0.075748  -1.954 0.050666 .
HAPPY   -0.309656   0.114871  -2.696 0.007024 **
EFFCOM  -0.150475   0.088664  -1.697 0.089670 .
TRNEI0.376409   0.091701   4.105 4.05e-05 ***
GRPETH   0.593949   0.205169   2.895 0.003792 **
TRBLK0.575473   0.114896   5.009 5.48e-07 ***
BLKCHIEF-0.233197   0.188521  -1.237 0.216093
RCOPDIFF 0.051088   0.150306   0.340 0.733938
RCOPRATE 0.169789   0.097540   1.741 0.081733 .
RCOPBKIL 0.147662   0.089269   1.654 0.098102 .
RCRIM3YR 0.189701   0.097590   1.944 0.051913 .
RPCTBLAK-0.504626   0.175897  -2.869 0.004119 **
---
Signif. codes:  0 $-1òø***òù 0.001 òø**òù 0.01 òø*òù 0.05 òø.òù 0.1 òø òù 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 1364.4  on 1694  degrees of freedom
Residual deviance: 1197.5  on 1681  degrees of freedom
AIC: 1225.5

Number of Fisher Scoring iterations: 5


me1 - lmer( RE2TRCOP~ AGE +  POLINT + HAPPY + EFFCOM + TRNEI +GRPETH
+ TRBLK + BLKCHIEF + RCOPDIFF+ RCOPRATE + RCOPBKIL + RCRIM3YR +
RPCTBLAK + (1 | PLACE) , family=binomial(link=logit),data=eldat,
method=Laplace)
Loading required package: lattice
 summary(me1)
 Generalized linear mixed model fit using Laplace
Formula: RE2TRCOP ~ AGE + POLINT + HAPPY + EFFCOM + TRNEI + GRPETH +
TRBLK +  BLKCHIEF + RCOPDIFF + RCOPRATE + RCOPBKIL + RCRIM3YR +
RPCTBLAK +  (1 | PLACE)
   Data: eldat
 Family: binomial(logit link)
  AIC  BIClogLik deviance
 1227.446 1308.977 -598.7229 1197.446
Random effects:
 GroupsNameVarianceStd.Dev.
  PLACE (Intercept)   0.00563430.075062
# of obs: 1695, groups: PLACE, 18

Estimated scale (compare to 1)  1.014432

Fixed effects:
  Estimate Std. Error z value  Pr(|z|)
(Intercept) -1.5478617  0.6708198 -2.3074 0.0210315 *
AGE -0.0202473  0.0054271 -3.7308 0.0001909 ***
POLINT  -0.1481644  0.0757952 -1.9548 0.0506069 .
HAPPY   -0.3102757  0.1149718 -2.6987 0.0069609 **
EFFCOM  -0.1501713  0.0887251 -1.6925 0.0905419 .
TRNEI0.3757690  0.0918107  4.0929 4.261e-05 ***
GRPETH   0.5903173  0.2053184  2.8751 0.0040386 **
TRBLK0.5754894  0.1149954  5.0045 5.602e-07 ***
BLKCHIEF-0.2275911  0.1935195 -1.1761 0.2395698
RCOPDIFF 0.0483413  0.1541093  0.3137 0.7537626
RCOPRATE 0.1681531  0.1003416  1.6758 0.0937760 .
RCOPBKIL 0.1437347  0.0924614  1.5545 0.1200562
RCRIM3YR 

[R] Sweave: trouble controlling Design package startup output

2006-01-30 Thread Paul Johnson
Hello, everybody:

I'm experimenting more with Sweave, R, and LyX. There's now an entry
in the LyX wiki on using R, so anybody can do it!

http://wiki.lyx.org/LyX/LyxWithRThroughSweave

Now I notice this frustrating thing.  I think I've done everything
possible to make the Design library start up without any fanfare:


echo=F,quiet=T,print=F=
library(Design, verbose=F)
@

But in the output there is still one page of banner information, which
starts like this:

Loading required package: Hmisc
Hmisc library by Frank E Harrell Jr
Type library(help= Hmisc ), ?Overview, or ?Hmisc.Overview )
to see overall documentation.
NOTE:Hmisc no longer redefines [.factor to drop unused levels when
subsetting. To get the old behavior of Hmisc type dropUnusedLevels().
  Attaching package: Hmisc  The following object(s) are masked
from package:stats :
[and so forth]

Can you tell me how to make it stop?

I've seen this kind of output with a couple of other packages over the
past 2 months, but I can't find the files that produced them, so I
can't cite the examples.  But I don't think it is confined to Design.

--
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] cron job install/update problems: tcltk can't find display (installing e.g., pbatR)

2006-01-20 Thread Paul Johnson
On Fedora Core Linux 4, I have a cron job that causes R to update all
packages and install new ones.  Lately, I notice in the log that some
packages fail to install.  These are ones that assume X is running. 
For example, the pbatR install requires tcltk to be loaded, and then
the install fails because in a cron job, there is no DISPLAY
environment.  I suppose the same happens if you try to install R
packages in the console, without X running?

Error output is pasted here.  I wonder if you can advise me whether
this is the kind of thing that can be fixed in the cron job or not. 
I've verified that pbatR does install interactively (because tcltk
does start).  If you think this is a pbatR-specific problem, i will
contact the author directly.  When I have the repos option set, the
interactive install does not cause any tcltk widgets to pop up, so I
wonder if it is really necessary.

* Installing *source* package 'pbatR' ...
** libs
g++ -I/usr/lib/R/include  -I/usr/local/include   -fPIC  -O2 -g -pipe
-Wp,-D_FORTIFY_SOURCE=2 -fexceptions -m32 -march=i386 -mtune=pentium4
-fasynchronous-unwind-tables -c wait.cpp -o wait.o
wait.cpp: In function 'int runCommands()':
wait.cpp:77: warning: ignoring return value of 'int system(const
char*)', declared with attribute warn_unused_result
g++ -shared -L/usr/local/lib -o pbatR.so wait.o   -L/usr/lib/R/lib -lR
** R
** save image
Loading required package: survival
Loading required package: splines
Loading required package: tcltk
Loading Tcl/Tk interface ... Error in fun(...) : no display name and
no $DISPLAY environment variable
Error: .onLoad failed in 'loadNamespace' for 'tcltk'
Error: package 'tcltk' could not be loaded
Execution halted
ERROR: execution of package source for 'pbatR' failed
** Removing '/usr/lib/R/library/pbatR'
** Restoring previous '/usr/lib/R/library/pbatR'
--

Here's the R code that runs from the Cron job

# Paul Johnson [EMAIL PROTECTED] 2005-08-31
# This should update and then install all packages, except for
# ones I exclude because they don't work or we don't want them.


#options(repos = http://lib.stat.cmu.edu/R/CRAN/;)
options(repos = http://cran.cnr.berkeley.edu/;)



#failPackages is the black list.
# Things that don't build cleanly with FC4, but I
#  hope it will build someday soon.
failPackages -
c(deldir,frailtypack,fSeries,fCalendar,fExtremes,fPortfolio,hmm.discnp,knncat,labdsv,survrec,
SciViews,RGrace,uroot,fMultivar,fOptions,gcmrec,rcom,Rlsf)

#list of all currently installed packages
installedPackages - rownames (installed.packages() )

#do any installed packages need removal because they are on the blacklist?
needRemoval -  installedPackages %in% failPackages

# remove any blacklisted packages if they are already installed.
if (sum(needRemoval) 0)   remove.packages(installedPackages[needRemoval] )


#update the ones you want to keep
update.packages(ask=F, checkBuilt=T)

#get list of all new packages on CRAN
theNew - new.packages()


#do any of the new packages belong to the black list?
shouldFail - theNew %in% failPackages

#install non blacklisted packages that are in theNew list
if (sum(!shouldFail)  0) install.packages( theNew[!shouldFail],dependencies=T)






--
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Current state of support for BUGS access for Linux users?

2006-01-17 Thread Paul Johnson
Thanks, Uwe

that clears up why I can't make R2WinBUGs work with OpenBUGS and WinBUGS1.5 :)
Both work pretty good with Wine in a GUI.  I noticed that when I tried
rbugs, it does succeed in starting WinBUGS GUI, but then nothing
happens. I'll get WinBUGS1.4 and see what happens.

In the meanwhile, I'm going to t ry to see what BRugs is good for. In
Linux, when I try to install BRugs, the install fails with an error
saying that, at the current time, BRugs works only in Windows.


* Installing *source* package 'BRugs' ...
Package 'BRugs' currently only works under Windows.\nIt is supposed to
work under Linux in future releases.

I'd like to stop that check and see what happens!  The way I read the
sourcecode from OpenBUGS and BRugs, I need to replace the windows dll
install and instead put in an so file (as in OpenBUGS).

If anybody has done this, please let me know of your experience.


On 1/17/06, Uwe Ligges [EMAIL PROTECTED] wrote:
 Paul Johnson wrote:
  Greetings:
 
  I'm going to encourage some students to try Bayesian ideas for
  hierarchical models.
  I want to run the WinBUGS and R examples in Tony Lancaster's An
  Introduction to Modern Bayesian Econometrics.  That features MS
  Windows and bugs from R2WinBUGS.
 
  Today, I want to ask how people are doing this in Linux? I have found
  a plethora of possibilities, some of which are not quite ready, some
  of which work only under MS Windows.  Right now I just want to know
  what actually works.
 
  Here's where I stand now in Fedora Core 4 Linux.
  1. OpenBUGS-2.1.1 runs in Linux.  I can run linbugs (the console
  version similar to the old BUGS) and also I can run--under wine--the
  newest version of winbugs.exe that is circulated with OpenBUGS.  As
  far as I can tell, the graphical interface in wine/winbugs works in
  almost all elements.  A few things seem not quite right in the GUI
  (can't initialize more than one chain, difficult to specify variables
  for monitoring), but it does work.
 
  It is easier to install and work with OpenBUGS's version of
  winbugs.exe than with Winbugs-1.4 because the Open version does not
  have that annoying license registration and winbugs.exe is not
  wrapped inside an installation script.   I'm a little confused about
  WinBUGS versions because the BRugs documents
  http://www.biostat.umn.edu/~brad/software/BRugs/BRugs_install.html
  refer to WinBUGS-1.5, which refers to
  http://www.biostat.umn.edu/~brad/software/BRugs/WinBUGS15.zip, which
  can be downloaded without any of the registration steps, but WinBUGS15
  is not mentioned in the WinBUGS site (where 1.4.1 appears to be the
  newest).
 
  Supposing I get the winbugs.exe question settled:
 
  2. How to most dependably send jobs from R to linbugs or winbugs.exe?
 
  The BRugs package is preferred?
 
  For a long time, R2WinBUGS was Windows-only, but toward the end of
  last fall I noticed that R2WinBUGS now does compile and install under
  R in Linux.
 
  however, its help still says:
  SystemRequirements:   WinBUGS 1.4 on Windows
 
  I'd appreciate any advice.

 [resend to less recipients in order to save Martin's spare time to
 approve message;
 CCing Andrew Thomas, Bob O'Hara and Sibylle Sturtz separately]




 Re BUGS:
 WinBUGS-1.5 never got really released, AFAIK - Andrew or Bob might want
 to correct me. It has been renamed to OpenBUGS. The current version is
 the GPL'ed OpenBUGS 2.1.1 available from
 http://mathstat.helsinki.fi/openbugs/.

 Re R packages:
 - R2WinBUGS is compatible with WinBUGS-1.4.x only, its newest version
 can speak with WinBUGS under wine thanks to user contributions. But it
 still depends on WinBUGS-1.4.x, hence Windows only (considering wine as
 Windows).
 - BRugs contains the BRugs interface, R functions and the whole OpenBUGS
 installation. Unfortunately, due to serious compiler problems, we were
 not able to get a Linux version running using the interface. Hence it
 was not possible to release any non-Windows version up to now.
 I haven't tested BRugs under wine yet (in which case R has to run under
 wine as well, of course) ... and I do not know if there are any serious
 performance penalties.
 Note that even in the long term, OpenBUGS will only run on x86 based
 platforms.

 Due to the much more flexibile interface, I prefer BRugs.

 BTW: Real programmers won't consider R2WinBUGS to be an interface at
 all - it might be useful, though. ;-)


 Uwe Ligges



  --
  Paul E. Johnson
  Professor, Political Science
  1541 Lilac Lane, Room 504
  University of Kansas
 
  __
  R-help@stat.math.ethz.ch mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide! 
  http://www.R-project.org/posting-guide.html






--
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read

Re: [R] Current state of support for BUGS access for Linux users?

2006-01-17 Thread Paul Johnson
   [034DH]
.h  Services.StdHook[0101E380H]
 HostWindows.Idle   [4A86H]
.focus  BOOLEAN FALSE
.tick   Controllers.TickMsg Fields
.w  HostWindows.Window  NIL
 HostMenus.TimerTick   [3422H]
.lParam INTEGER 0
.opsControllers.PollOpsMsg  Fields
.wParam INTEGER 1
.wndINTEGER 65574
 Kernel.Try   [3A61H]
.a  INTEGER 65574
.b  INTEGER 1
.c  INTEGER 0
.h  PROCEDURE   HostMenus.TimerTick
 HostMenus.ApplWinHandler   [3841H]
.Proc   PROCEDURE   NIL
.hitBOOLEAN Undefined70
.lParam INTEGER 0
.messageINTEGER 275
.resINTEGER 1180843108
.s  ARRAY 256 OF SHORTCHAR  2X   ...
.w  INTEGER 538416909
.wParam INTEGER 1
.wndINTEGER 65574
system   (pc=465EDB29H,  fp=7FC8FB00H)
system   (pc=465EE418H,  fp=7FC8FB3CH)
system   (pc=465F1DF0H,  fp=7FC8FB80H)
system   (pc=465BD031H,  fp=7FC8FBB0H)
 HostMenus.Loop   [3BDEH]
.done   BOOLEAN FALSE
.f  SET {0..5}
.n  INTEGER 0
.resINTEGER 0
.w  HostWindows.Window  NIL
 Kernel.Start   [2B8CH]
.code   PROCEDURE   HostMenus.Loop


3. Accessing OpenBUGS with wine from rbugs gets as far as opening
the OpenBUGS GUI, but nothing ever appears in the log file and there
are no computations going on (according to system monitors, anyway),
but everything in OpenBUGS just seems stuck.


pj


On 1/17/06, Uwe Ligges [EMAIL PROTECTED] wrote:
 Paul Johnson wrote:
  Thanks, Uwe
 
  that clears up why I can't make R2WinBUGs work with OpenBUGS and WinBUGS1.5 
  :)
  Both work pretty good with Wine in a GUI.  I noticed that when I tried
  rbugs, it does succeed in starting WinBUGS GUI, but then nothing
  happens. I'll get WinBUGS1.4 and see what happens.
 
  In the meanwhile, I'm going to t ry to see what BRugs is good for. In
  Linux, when I try to install BRugs, the install fails with an error
  saying that, at the current time, BRugs works only in Windows.


 Yes, I have added that particular line in order not to confuse users,
 and I thought I told you in my last message that it works only under
 Windows.


  * Installing *source* package 'BRugs' ...
  Package 'BRugs' currently only works under Windows.\nIt is supposed to
  work under Linux in future releases.
 
  I'd like to stop that check and see what happens!

 OK, just remove the configure file.


  The way I read the
  sourcecode from OpenBUGS and BRugs, I need to replace the windows dll
  install and instead put in an so file (as in OpenBUGS).

 Yes, it is already shipped, and the infrastructure in the package is
 ready (hopefully), but the brugs.so file does not work as expected.


  If anybody has done this, please let me know of your experience.

 Yes, several tried, among them Andrew Thomas and Uwe Ligges, and then I
 invited Andrew Thomas to Dortmund and we tried together (I have to admit
 that I was clueless all the time and in fact Andrew tried).
 Andrew's conclusion was that there is some compiler problem on Linux
 with the BlackBox framework (Component Pascal compiler from Oberon
 microsystems) in which WinBUGS/OpenBUGS is written in ...

 If you get it to work, please let us know!

 Uwe


--
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Current state of support for BUGS access for Linux users?

2006-01-17 Thread Paul Johnson
Before I forget, found the working recipe for rbugs.  My mistake
before was not
realzing that the n.iter value is total iterations, including burnin,
and so by setting
n.iter=1000 and n.burnin=1000, I was leaving 0 iterations for the updates.

### Paul Johnson 2006-01-18. This does work!
### Works in Fedora Core 4 Linux with wine-0.9.5 and WinBUGS-1.4.1
### (that's patched Winbugs 1.4).  The first part is from Andrew Gelman's
### web site http://www.stat.columbia.edu/~gelman/bugsR/, but is basically
### same as rbugs example code.



schools - read.table (schools.dat, header=T)
J - nrow(schools)
y - schools$estimate
sigma.y - schools$sd
data - list (J, y, sigma.y)
inits - function() {list (theta=rnorm(J,0,100),
mu.theta=rnorm(1,0,100), sigma.theta=runif(1,0,100))}
parameters - c(theta, mu.theta, sigma.theta)

library(rbugs)


# Note that n.iter-n.burnin = N of observations collected in CODA
schools.sim - rbugs ( data,
  inits,
  parameters,
  schools.bug,
  n.chains = 1,
  n.iter = 3,
  n.burnin = 1000,
  n.thin = 1,
  workingDir=/home/pauljohn/.wine/fake_windows/temp,
  bugs=/usr/local/share/WinBUGS14/WinBUGS14.exe,
  bugsWorkingDir=z:/home/pauljohn/.wine/fake_windows/temp,
  useWine=TRUE,
  wine=/usr/bin/wine,
  debug=F)


myResults - getBugsOutput(workingDir =
/home/pauljohn/.wine/fake_windows/temp, n.chains=1)
# Creates an ordinary data frame:
str(myResults)




library(coda)
# Another way to retrieve text file data. using coda library
c1 - read.coda(coda1.txt,
/home/pauljohn/.wine/fake_windows/temp/codaIndex.txt)
# That creates a much more complicated mcmc data structure, however.
# But the output is pretty nice! And there are diagnostic functions
summary(c1)

heidel.diag(c1)

geweke.diag(c1)

### need more chains to run gelman.diag

###Note R2WinBUGS has a read.bugs command very similar. I suspect it
### work like a simple wrapper around the read.coda function

c2 - read.bugs(/home/pauljohn/.wine/fake_windows/temp/coda1.txt)

heidel.diag(c2)

geweke.diag(c2)


pj

On 1/17/06, Paul Johnson [EMAIL PROTECTED] wrote:
 Thanks!  Let me ask this question again.  Clearly, if you experts
 can't make R talk to WinBUGS, then I can't either.  So, If bugs()
 doesn't work, What is the next best thing?

 With wine, I can run OpenBUGS and WinBUGS, but I cannot send jobs from
 R to a BUGS program (still trying, some people say they have seen
 rbugs work in Linux).

 I want to follow along with the instructions In Prof Gelman's site for
 R2WinBUGS.  OR the examples in the rbugs package.  I just noticed that
 rbugs works by writing the data, model, inits, and so forth into text
 files, along with a script that drives WinBUGS.  Although there is
 something wrong with the script file that causes a fatal WinBUGS
 crash, I have just manually started the GUI and pointed and clicked my
 way to a working set of  BUGS estimates (see details on script flaw
 and trap below).

 That makes me feel a bit confident that I will be able to create
 working BUGS sims and get results.  I need to learn how to bring into
 R from CODA output.  In the WinBUGS output, I find a codaIndex.txt
 file that looks promising.


 Now, about efforts to run WinBUGS from within R. In case other people
 are trying this, here is what I learned.

 1. bugs() from R2WinBUGS to WinBUGS14 under wine-0.9.5 is not able to
 start WinBUGS14.exe

 I've just tried the newest example from
 http://www.stat.columbia.edu/~gelman/bugsR/ with WinBUGS14 under wine.

 WinBUGS never shows on the screen before the error appears:

  library(R2WinBUGS)
  schools - read.table (schools.dat, header=T)
  J - nrow(schools)
  y - schools$estimate
  sigma.y - schools$sd
  data - list (J, y, sigma.y)
  inits - function() {list (theta=rnorm(J,0,100), mu.theta=rnorm(1,0,100), 
  sigma.theta=runif(1,0,100))}
  parameters - c(theta, mu.theta, sigma.theta)
  schools.sim - bugs (data,
 +  inits,
 +  parameters,
 +  schools.bug,
 +  bugs.directory = /usr/local/share/WinBUGS14/,
 +  n.chains=3,
 +  n.iter=1000,
 +  useWINE= T,
 +  WINE = /usr/bin/wine
 +  )
 Loading required package: tools
 Error in pmatch(x, table, duplicates.ok) :
 argument is not of mode character

 Humphf!

 I tried running this under debug, but can't understand the output. I
 get through about 8 steps, when bugs.script() seems to cause the
 error:
 Browse[1] n
 debug: bugs.script(parameters.to.save, n.chains, n.iter, n.burnin, n.thin,
 bugs.directory, new.model.file, debug = debug, is.inits = !is.null(inits),
 bin = bin, DIC = DIC, useWINE = useWINE)
 Browse[1] n
 Error

[R] Current state of support for BUGS access for Linux users?

2006-01-16 Thread Paul Johnson
Greetings:

I'm going to encourage some students to try Bayesian ideas for
hierarchical models.
I want to run the WinBUGS and R examples in Tony Lancaster's An
Introduction to Modern Bayesian Econometrics.  That features MS
Windows and bugs from R2WinBUGS.

Today, I want to ask how people are doing this in Linux? I have found
a plethora of possibilities, some of which are not quite ready, some
of which work only under MS Windows.  Right now I just want to know
what actually works.

Here's where I stand now in Fedora Core 4 Linux.
1. OpenBUGS-2.1.1 runs in Linux.  I can run linbugs (the console
version similar to the old BUGS) and also I can run--under wine--the
newest version of winbugs.exe that is circulated with OpenBUGS.  As
far as I can tell, the graphical interface in wine/winbugs works in
almost all elements.  A few things seem not quite right in the GUI
(can't initialize more than one chain, difficult to specify variables
for monitoring), but it does work.

It is easier to install and work with OpenBUGS's version of
winbugs.exe than with Winbugs-1.4 because the Open version does not
have that annoying license registration and winbugs.exe is not
wrapped inside an installation script.   I'm a little confused about
WinBUGS versions because the BRugs documents
http://www.biostat.umn.edu/~brad/software/BRugs/BRugs_install.html
refer to WinBUGS-1.5, which refers to
http://www.biostat.umn.edu/~brad/software/BRugs/WinBUGS15.zip, which
can be downloaded without any of the registration steps, but WinBUGS15
is not mentioned in the WinBUGS site (where 1.4.1 appears to be the
newest).

Supposing I get the winbugs.exe question settled:

2. How to most dependably send jobs from R to linbugs or winbugs.exe?

The BRugs package is preferred?

For a long time, R2WinBUGS was Windows-only, but toward the end of
last fall I noticed that R2WinBUGS now does compile and install under
R in Linux.

however, its help still says:
SystemRequirements:   WinBUGS 1.4 on Windows

I'd appreciate any advice.

--
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] multiple lowess line in one plot

2006-01-04 Thread Paul Johnson
It appears to me lowess has no subset or strata option. The
brute-force way (which my students like best) is just to create 4
columns, one for each group (with unstack or such) and then run one
lines(lowess()) command for each one.

I think there is a bit more art in using by, which will create subsets
on the fly for you.

Here is a self contained example that I just worked out to illustrate.
grp is the grouping factor, x and y are just illustrative data. The by
function creates the data subsets and they are accessed in the FUN as
sub1.

x - rnorm(100)
grp - gl(4,25)
y - rpois(100, lambda=4)
mydf - data.frame(x,y,grp)
with(mydf, plot(x,y))
by (mydf, list(grp), function(sub1) lines(lowess(sub1$x, sub1$y)))

Hope that helps

On 1/4/06, Dean Sonneborn [EMAIL PROTECTED] wrote:
 I'm using this code to plot a smoothed line.  These two columns of data
 really represent 4 groups and I'd like to plot a separate line for each
 group but have them all in the same plot. The R-Docs for lowess do not
 seem to indicate some type of GROUPS=var_name option. What would be
 the syntax for this?

 plot(AWGT ~ lipid )
 lines(lowess(lipid , AWGT, f=.8))


 --
 Dean Sonneborn, MS
 Programmer Analyst
 Department of Public Health Sciences
 University of California, Davis
 (530) 754-9516

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html



--
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Hmisc latex function

2005-12-15 Thread Paul Johnson
Does anybody suggest a work-around this problem?

pj

Marc Schwartz (via MN) wrote:
 On Wed, 2005-10-12 at 08:33 -0500, Charles Dupont wrote:
 
Marc Schwartz (via MN) wrote:

On Tue, 2005-10-11 at 10:01 -0400, Rick Bilonick wrote:


I'm using R 2.2.0 on an up-to-date version of Fedora Core 4 with the
latest version of Hmisc. When I run an example from the latex function I
get the following:



x - matrix(1:6, nrow=2, dimnames=list(c('a','b'),c('c','d','enLine

2')))


x

 c d enLine 2
a 1 35
b 2 46


latex(x)   # creates x.tex in working directory

sh: line 0: cd: “/tmp/Rtmpl10983”: No such file or directory
This is pdfeTeX, Version 3.141592-1.21a-2.2 (Web2C 7.5.4)
entering extended mode
! I can't find file `“/tmp/Rtmpl10983/file643c9869”'.
* “/tmp/Rtmpl10983/file643c9869”

Please type another input file name: q
(/usr/share/texmf/tex/latex/tools/q.tex
LaTeX2e 2003/12/01
Babel v3.8d and hyphenation patterns for american, french, german,
ngerman, b
ahasa, basque, bulgarian, catalan, croatian, czech, danish, dutch,
esperanto, e
stonian, finnish, greek, icelandic, irish, italian, latin, magyar,
norsk, polis
h, portuges, romanian, russian, serbian, slovak, slovene, spanish,
swedish, tur
kish, ukrainian, nohyphenation, loaded.
File ignored
xdvi-motif.bin: Fatal error: /tmp/Rtmpl10983/file643c9869.dvi: No such
file.


How can I fix this?

Rick B.


I get the same results, also on FC4 with R 2.2.0.

I am cc:ing Frank here for his input, but a quick review of the code and
created files suggests that there may be conflict between the locations
of some of the resultant files during the latex system call. Some files
appear in a temporary R directory, while others appear in the current R
working directory.

For example, if I enter the full filename:
 
  /tmp/RtmpC12100/file643c9869.tex

at the latex prompt, I get:



latex(x)

sh: line 0: cd: “/tmp/RtmpC12100”: No such file or directory
This is pdfeTeX, Version 3.141592-1.21a-2.2 (Web2C 7.5.4)
entering extended mode
! I can't find file `“/tmp/RtmpC12100/file643c9869”'.
* “/tmp/RtmpC12100/file643c9869”

Please type another input file name: *** loading the extensions
datasource
/tmp/RtmpC12100/file643c9869.tex
(/tmp/RtmpC12100/file643c9869.tex
LaTeX2e 2003/12/01
Babel v3.8d and hyphenation patterns for american, french, german,
ngerman, b
ahasa, basque, bulgarian, catalan, croatian, czech, danish, dutch,
esperanto, e
stonian, finnish, greek, icelandic, irish, italian, latin, magyar,
norsk, polis
h, portuges, romanian, russian, serbian, slovak, slovene, spanish,
swedish, tur
kish, ukrainian, nohyphenation, loaded.
(/usr/share/texmf/tex/latex/base/report.cls
Document Class: report 2004/02/16 v1.4f Standard LaTeX document class
(/usr/share/texmf/tex/latex/base/size10.clo))
(/usr/share/texmf/tex/latex/geometry/geometry.sty
(/usr/share/texmf/tex/latex/graphics/keyval.sty)
(/usr/share/texmf/tex/latex/geometry/geometry.cfg))
No file file643c9869.aux.
[1] (./file643c9869.aux) )
Output written on file643c9869.dvi (1 page, 368 bytes).
Transcript written on file643c9869.log.
xdvi-motif.bin: Fatal error: /tmp/RtmpC12100/file643c9869.dvi


H,  It works for me.  Interesting.

It almost looks like the temp dir is not being created, but thats not 
possible because R does that.  It might be a Unicode issue with you 
system shell.  Can you run this statement in R

sys(paste('cd',dQuote(tempdir()),;,
echo Hello BOB  test.test,
;,cat test.test))


What version of Hmisc are you using?  What local are you using?

Charles
 
 
 Hmisc version 3.0-7, Dated 2005-09-15, which is the latest according to
 CRAN.
 
 
sys(paste('cd',dQuote(tempdir()),;,
 
 + echo Hello BOB  test.test,
 + ;,cat test.test))
 sh: line 0: cd: “/tmp/RtmpGY5553”: No such file or directory
 [1] Hello BOB
 
 
From a bash console:
 
 $ cd /tmp/RtmpGY5553
 $ pwd
 /tmp/RtmpGY5553
 
 
 $ locale
 LANG=en_US.UTF-8
 LC_CTYPE=en_US.UTF-8
 LC_NUMERIC=en_US.UTF-8
 LC_TIME=en_US.UTF-8
 LC_COLLATE=en_US.UTF-8
 LC_MONETARY=en_US.UTF-8
 LC_MESSAGES=en_US.UTF-8
 LC_PAPER=en_US.UTF-8
 LC_NAME=en_US.UTF-8
 LC_ADDRESS=en_US.UTF-8
 LC_TELEPHONE=en_US.UTF-8
 LC_MEASUREMENT=en_US.UTF-8
 LC_IDENTIFICATION=en_US.UTF-8
 LC_ALL=
 
 
 On the creation of the sys() call, it looks like the backquotes are
 causing the problem:
 
 
paste('cd',dQuote(tempdir()))
 
 [1] cd “/tmp/RtmpGY5553”
 
 
From a bash shell:
 
 $ cd “/tmp/RtmpGY5553”
 bash: cd: “/tmp/RtmpGY5553”: No such file or directory
 $ cd /tmp/RtmpGY5553
 $ pwd
 /tmp/RtmpGY5553
 
 
 According to ?dQuote:
 
 By default, sQuote and dQuote provide undirectional ASCII quotation
 style. In a UTF-8 locale (see l10n_info), the Unicode directional quotes
 are used.
 
 The See Also points to shQuote for quoting OS commands.
 
 
 HTH,
 
 Marc
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


-- 
Paul E. Johnson

[R] Predicting and Plotting hypothetical values of factors

2005-11-17 Thread Paul Johnson
Last Friday, I noticed that it is difficult to work with regression 
models in which there are factors.  It is easier to do the old fashioned 
thing of coding up dummy variables with 0-1 values.  The predict 
function's newdata argument is not suited to insertion of hypothetical 
values for the factor, whereas it has no trouble with numeric variables. 
  For example, if one uses a factor as a predictor in a logistic 
regression, then it is tough to plot the S-shaped curve that describes 
the probabilities.

Here is some code with comments describing the difficulties that we have 
found.  Are there simpler, less frustrating approaches?

-
# Paul Johnson  pauljohn at ku.edu 2005-11-17
# factorFutzing-1.R


myfac - factor(c(1.1, 4.5, 1.1, 1.1, 4.5, 1.1, 4.5, 1.1))

y - c(0,1,0,1,0,0,1,0)

mymod1 -glm (y~myfac, family=binomial)

p1 - predict(mymod1, type=response)

plot(myfac,p1)

# Wait, that's not the picture I wanted. I want to see the
# proverbial S-shaped curve!
#

# The contrast variable that R creates is coded 0 and 1.
# I'd like to have a sequence from 0 to 1 (seq(0,1,length.out=10)) and
# get predicted values for each.  That would give the S-shaped curve.


# But the use of predict does not work because the factor has 2
# values and the predict function won't take my new data:

predict(mymod1, newdata=data.frame(myfac=seq(0,1,length.out=8)))

# Error: variable 'myfac' was fitted with class factor but class 
numeric was supplied
# In addition: Warning message:
# variable 'myfac' is not a factor in: model.frame.default(Terms, 
newdata, na.action = na.action, xlev = object$xlevels)

# Isn't there an easier way than this?

c1 - coef(mymod1)

myseq - seq(0,1, length.out=10)

newdf - data.frame(1, myseq)

linearPredictor - as.matrix(newdf) %*% c1

p2 - 1/ (1 + exp(-linearPredictor))
# But there is a big disaster if you just try the obvious thing
# of plotting with
# lines(myseq,p2)
# The line does not show up where you hope in the plot.
# The plot appears to have the x-axis on the scale
# 1.1 to 4.5, So in order to see the s-shaped curve, it appears we have
# to re-scale. However, this is a big illusion.  To see what I mean,
# do

points(2,.4)

# you expected to see the point appear at (2,.4), but in the plot
# it appears at (4.5,.4).  Huh?

# The actual values being plotted are the integer-valued levels that
# R uses for factors, the numbers you get from as.numeric(myfac).
# So it is only necessary to re-scale the sequence by adding one.

myseq2 - 1 +  myseq

lines( myseq2, p2, type=l )

#Its not all that S-shaped, but you have to take what you can get! :)
-

-- 
Paul E. Johnson   email: [EMAIL PROTECTED]
Dept. of Political Sciencehttp://lark.cc.ku.edu/~pauljohn
1541 Lilac Lane, Rm 504
University of Kansas  Office: (785) 864-9086
Lawrence, Kansas 66044-3177   FAX: (785) 864-5700

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Need advice about models with ordinal input variables

2005-11-08 Thread Paul Johnson
Dear colleagues:

I've been storing up this question for a long time and apologize for the 
length and verbosity of it.  I am having trouble in consulting with 
graduate students on their research projects.  They are using surveys to 
investigate the sources of voter behavior or attitudes.  They have 
predictors that are factors, some ordered, but I am never confident in 
telling them what they ought to do.  Usually, they come in with a 
regression model fitted as though these variables are numerical, and if 
one looks about in the social science literature, one finds that many 
people have published doing the same.

I want to ask your advice about some cases.

1. An ordered factor that masquerades as a numerical interval level score.

In the research journals, these are the ones most often treated as 
numerical variables in regressions. For example: Thermometer scores 
for Presidential candidates range from 0 to 100 in integer units.

What's a better idea?  In an OLS model with just one input variable, a 
plot will reveal if there is a significant nonlinearity. One can 
recode the assigned values to linearize the final model or take the 
given values and make a nonlinear model.

In the R package acepack I found avas, which works like a rubber 
ruler and recodes variables in order to make relationships as linear 
and homoskedastic as possible.  I've never seen this used in the social 
science literature.  It works like magic.  Take an ugly scatterplot and 
shazam, out come transformed variables that have a beautiful plot.  But 
what do you make of these things?  There is so much going on in these 
transformations that interpretation of the results is very difficult. 
You can't say a one unit increase in x causes a b increase in y. 
Furthermore, if the model is a survival model, a logistic regression, or 
other non-OLS model, I don't see how the avas approach will help.

I've tried fiddling about with smoothers, treating the input scores as 
if they were numerical.  I got this idea from Prof. Harrell's Regression 
Modeling Strategies.  In his Design package for R, one can include a 
cubic spline for a variable in a model by replacing x with rcs(x). Very 
convenient. If the results say the relationship is mostly linear, then 
we might as well treat the input variable as a numerical score and save 
some degrees of freedom.

But if the higher order terms are statistically significant, it is 
difficult to know what to do. The best strategy I have found so far is 
to calculate fitted values for particular inputs and then try to tell a 
story about them.


2. Ordinal variables with less than 10 values.

Consider variables like self-reported ideology, where respondents are 
asked to place themselves on a 7 point scale ranging from very 
conservative to very liberal.  Or Party Identification on a 7 point 
scale, ranging (in the US) from Strong Democrat to Strong Republican.

It has been quite common to see these thrown into regression models as 
if they were numerical.

I've sometimes found it useful to run a regression treating them as 
unordered factors, and then I attempt to glean a pattern in the 
coefficients.  If the parameter estimates step up by a fixed proportion, 
then one might think there's no damage from treating them as numerical 
variables.

Yesterday, it occurred to us that there should be a signifance test to 
determine if one looses predictive power by replacing the 
factor-treatment of x with x itself.  Is there a non-nested model test 
that is most appropriate?

3. Truly numericals variable that are reported as grouped ordinal 
scales. THese variables are aweful in many ways.

Income is often reported in a form like this:

1) Less than 2
2) 2 to 35000
3) 35001 to 5
4) 50001 to 10
5) above 10

Education often appears in a form that has
1) 8 years or less
2) 9 years
3) 10 years
4) 11 years
5) 12 years
6) some college completed
7) undergraduate degree completed
8) graduate degree completed

These predictors pose many problems.  We have dissimilar people grouped 
together, so there are errors in variables and it seems obvious that 
the scores should be recoded somehow to reflect the substance of the 
differences among groups. But how?


4. Ordered variables with a small number of scores.

For example, has your economic situation been
1) worse
2) same
3) better

or how do you feel when you see the American flag?
1) no effect
2) OK
3) great
4) extatic

Anyway, in an R model, I think the right thing to do is to enter them 
into a regression with as.ordered(x).

But I don't know what to say about the results.  Has anybody written an 
idiots guide to orthogonal polynomials?  Aside from calculating fitted 
values, how do you interpret these things?  Is there ever a point when 
you would say we should treat that as a numerical variable with scores 
1-2-3-4 rather than as an ordered factor?


If you have advice, I would be delighted to hear it.

-- 
Paul E. 

Re: [R] Select varying LS digits in long numbers?

2005-10-04 Thread Paul Johnson
Maybe this is just the brute force you want to avoid, but it works to 
coerce the integer into a string, and then back to a number.  Could 
reduce number of lines by nesting functions, of course.

y - 1234131431
n - 3
ychar - as.character(y)
ydigits - nchar(ychar)
result - as.numeric ( substr(ychar, ydigits- n +1, ydigits) )

pj

(Ted Harding) wrote:
 Hi Folks,
 
 I'm trying to find a neat solution to an apparently simple
 problem, but one which turns out to be a bit more intricate
 and tricky than one might expect.
 
 Suppose I have numbers given to a large number of digits.
 For example
 
   1234567021
 
 where (though I don't know this beforehand) only the last
 3 digits will be varying (and all 3 will vary).
 
 What I want is, give a vector x of such numbers, to extract
 the minimal set of final digits which will include the varying
 digits (i.e. in this case the last 3 digits). And there may be
 a decimal point somewhere along the line (though again I won't
 know where, nor whether).
 



-- 
Paul E. Johnson   email: [EMAIL PROTECTED]
Dept. of Political Sciencehttp://lark.cc.ku.edu/~pauljohn
1541 Lilac Lane, Rm 504
University of Kansas  Office: (785) 864-9086
Lawrence, Kansas 66044-3177   FAX: (785) 864-5700

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] controlling usage of digits scientific notation in R plots; postscript margins

2005-09-21 Thread Paul Johnson
Dear R users:

I assigned students to make some graphs and I'm having trouble answering 
some questions that they have.  We are all working on R 2.1 on Fedora 
Core Linux 4 systems.

1. In the plot, the axis is not labeled by numbers, but rather 
scientific notation like -2e+08 or such.  We realize that means 
-200,000,000.  We want to beautify the plot. We would rather just print 
out the whole, big  number. But if we can't have that, we would like 
something more beautiful and  mathematical, such as
  8
   -2.0x10

Once the numbers are big, R automagically switches to scientific 
notation, and turning the values horizontal does not help on the y axis.

Example:

x - rnorm(100,mean=10,sd=1)
y - rnorm(100,mean=100,sd=1000)
plot(x,y,axes=F)
axis(2,las=2)  #turns y axis numbers perpendicular to y axis
axis(1)  # x axis

I realize a person could just divide all numbers by 1 million and then 
have a graph with -200 and an annotation (in millions).

On the x axis, we'd even settle to just label the two outermost points 
with full numerical values, and then have only ticks between.  I was 
looking for some way to use axis() to draw an unlabeled axis and then 
put in text labels after that.  However, I can't understand how to get 
the values of the tick marks placed by axis from the figure in order to 
place text by some tick marks.

2. Some students make postscript figures that fit just right into 
LaTeX documents, while some make figures that have huge, 
inches-and-inches of margins around the outside.  I'm not watching how 
they make these figures all the time, but I think I'm figuring out the 
cause of big margins.

Is this right: the margins issue relates to the use of the onefile=F 
option in a dev.copy command?  I think the figures turn out properly 
with this:

dev.copy(postscript,file=myfig.eps,height=5,width=5,horizontal=F,onefile=F)
dev.off()

If you mistakenly omit the onefile=F option, the margins stretch out 
from the center where the figure is placed to the edges of an entire 
sheet of paper. In other words, the default settings for margins in 
inches that I see in output from

  par()
...
$mai
[1] 1.0066821 0.8092935 0.8092935 0.4145162
...

have no effect if a person forgets the onefile=F option.  We fiddled a 
while, trying to force margins down, par(mai=c(0,0,0,0)), but no matter 
what we did, the figures still had the massive margins.

We don't have these margin problems with other devices. Margins in jpeg 
or png pictures are sized appropriately.


-- 
Paul E. Johnson   email: [EMAIL PROTECTED]
Dept. of Political Sciencehttp://lark.cc.ku.edu/~pauljohn
1541 Lilac Lane, Rm 504
University of Kansas  Office: (785) 864-9086
Lawrence, Kansas 66044-3177   FAX: (785) 864-5700

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] controlling where *.Rout gets printed. Possible?

2005-09-01 Thread Paul Johnson
OK, my journey to make lab machines automagically install  update all 
desirable R packages is nearing an end!  The only question I have now is 
this: How can I control where the system prints the *.Rout file that is 
created automatically when the R batch program runs.  In man R I don't 
find any information about it.  When the cron job runs R_installAll.sh 
(see below), I'd like to re-direct it to someplace that ordinary users 
can read.

Here's the shell script I will schedule with cron

R_installAll.sh --
#!/bin/bash
R CMD BATCH /usr/local/bin/R_installAll.R
--

And here's the R program

-R_installAll.R-

# Paul Johnson pauljohn _AT_ ku.edu 2005-08-31
# This should update and then install all packages, except for
# ones I exclude because they don't work or we don't want them.


options(repos = http://lib.stat.cmu.edu/R/CRAN/;)

update.packages(ask=F)
theNew - new.packages()
failPackages - 
c(BRugs,GDD,gtkDevice,gap,gnomeGUI,mimR,ncdf,pathmix,rcdd,rgdal,rpvm,
Rmpi,RQuantLib,RMySQL, 
RNetCDF,RODBC,ROracle,RScaLAPACK,rsprng,RWinEdt,taskPR)

shouldFail - theNew %in% failPackages

install.packages( theNew[!shouldFail],dependencies=T)


# VGAM is not in CRAN yet, but Zelig wants it.
# install.packages(VGAM, CRAN=http://www.stat.auckland.ac.nz/~yee;);


update.packages(CRAN=http://www.stat.auckland.ac.nz/~yee;)
--




-- 
Paul E. Johnson   email: [EMAIL PROTECTED]
Dept. of Political Sciencehttp://lark.cc.ku.edu/~pauljohn
1541 Lilac Lane, Rm 504
University of Kansas  Office: (785) 864-9086
Lawrence, Kansas 66044-3177   FAX: (785) 864-5700

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Advice about system for installing updating all R package in a Linux Lab?

2005-08-19 Thread Paul Johnson
Good day:

I'm administering 6 linux systems (FC4) in a student lab and worry that 
users may want packages that are not installed.  I get tired of adding 
them one by one.  Then I happened upon this page

http://support.stat.ucla.edu/view.php?supportid=30

about installing all R packages from CRAN.  That did not run as it was, 
but after some fiddling I arrived at the following script, which does 
run and it builds many packages and reports failures on the rest:

#R_installAll.R
options(repos = http://lib.stat.cmu.edu/R/CRAN/;)
update.packages(ask=F)
x - packageStatus(repositories=http://cran.r-project.org/src/contrib;)
st - x$avai[Status]
install.packages(rownames(st)[which(st$Status==not installed)], 
dependencies=T)

If I run that in batch mode (as root, of course)

   R CMD BATCH R_installAll.R

It produces some informative output. Some packages don't build because 
they are for Windows.  As Prof Ripley mentioned recently, some packages 
don't build because of gcc-4.0.1. Some fail because I don't have some 
requisite libraries installed.  I try to deduce which FC packages may be 
used to fix that and iterate the process now and then.

But, for the most part, the packages to be OK (as far as I can tell). 
The output of a recent update is posted on the net here, in case you are 
interested to see (this lists the ones that don't build plus the 
successful updates):

http://lark.cc.ku.edu/~pauljohn/software/R/R_installAll.Rout

I can't see how this does any damage, since the packages that don't 
build are very graceful about erasing themselves, and the ones that do 
build are automatically available for the users.

Can you see any downside to scheduling this process to run as a cron 
job, say once per week, to keep packages up to date?


-- 
Paul E. Johnson   email: [EMAIL PROTECTED]
Dept. of Political Sciencehttp://lark.cc.ku.edu/~pauljohn
1541 Lilac Lane, Rm 504
University of Kansas  Office: (785) 864-9086
Lawrence, Kansas 66044-3177   FAX: (785) 864-5700

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] LyX and Sweave

2005-08-18 Thread Paul Johnson
I just wanted to point out that I was there first :)  on the Lyx List 
(Nov 2004):

http://www.mail-archive.com/lyx-users@lists.lyx.org/msg36262.html

Perhaps somebody who is trying to put all of this together can benefit 
from both sets of explanations.

pj

[EMAIL PROTECTED] wrote:
On Mon, 25 Jul 2005 14:12:41 +0200,
Gorjanc Gregor (GG) wrote:
 
 
Hello R-users!
I have tried to use Sweave within LyX* and found two ways to accomplish
this. I have attached LyX source file for both ways as well as generated 
PDFs.
 
 I have copied Gregor's files at
 
   http://www.ci.tuwien.ac.at/~leisch/Sweave/LyX
 
 for those who didn't get the attachments. LyX looks actually much
 better and stable then when I last had a look a couple of years ago.



-- 
Paul E. Johnson   email: [EMAIL PROTECTED]
Dept. of Political Sciencehttp://lark.cc.ku.edu/~pauljohn
1541 Lilac Lane, Rm 504
University of Kansas  Office: (785) 864-9086
Lawrence, Kansas 66044-3177   FAX: (785) 864-5700

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Time Series Count Models

2005-07-18 Thread Paul Johnson
Dear Brett:

There are books for this topic that are more narrowly tailored to your 
question. Lindsey's Models for Repeated Measurements and Diggle, et al's 
Analysis of Longitudinal Data.  Lindsey offers an R package on his web 
site. If you dig around, you will find many modeling papers on this, 
although in my mind none coalesced into a completely clear path such as 
throw in these variables and you will get the right estimates.

The problem, as you will see, is that there are many possible 
mathematical descriptions of the idea that there is time dependence in a 
count model.

My political science colleagues John Williams and Pat Brandt published 2 
articles on time series with counts.  My favorite is the second one 
here.  There is R code for the Pests model. 
http://www.utdallas.edu/~pbrandt/pests/pests.htm

Brandt, Patrick T., John T. Williams, Benjamin O. Fordham and Brian 
Pollins. 2000. Dynamic Modelling For Persistent Event Count Time 
Series. American Journal of Political Science 44(4): 823-843.

Brandt, Patrick T. and John T. Williams. 2001. A Linear Poisson 
Autoregressive Model: the Poisson AR(p) Model. Political Analysis 9(2): 
164-184.

I worked really hard on TS counts a while ago because a student was 
trying that.  If you look at J Lindsay's book Models for Repeated 
Measures you will make some progress on understanding his method 
kalcount. That's in the repeated library you get from his web site.

Here are the notes I made a couple of years ago

http://lark.cc.ku.edu/~pauljohn/stats/TimeSeries/

Look for files called TSCountData*.pdf.


It all boils down to the fact that you can't just act like it is an OLS 
model and throw Y_t-1 or something like that on the right had side. 
Instead, you have to think in a more delicate way about the process you 
are modeling and hit it from that other direction.

Here are some of the articles for which I kept copies.

U. Bokenholt, Mixed INAR(1) Poisson regression models Journal of 
Econometrics, 89 (1999): 317-338

A.C. Harvey and C. Fernandes, Time Series Models for Count or 
Qualitative Observations,  Journal of Business  Economic Statistics, 4 
(1989): 407-


I recall liking this one a lot

J E Kelsall and Scott Zeger and J M Samet Frequency Domain Log-linear 
Models; air pollution and mortality Appl. Statis 48 1999 331-344.

Good luck, let me know what you find out.

pj

Brett Gordon wrote:
 Hello,
 
 I'm trying to model the entry of certain firms into a larger number of
 distinct markets over time. I have a short time series, but a large
 cross section (small T, big N).
 
 I have both time varying and non-time varying variables. Additionally,
 since I'm modeling entry of firms, it seems like the number of
 existing firms in the market at time t should depend on the number of
 firms at (t-1), so I would like to include the lagged cumulative count.
 
 My basic question is whether it is appropriate (in a statistical
 sense) to include both the time varying variables and the lagged
 cumulative count variable. The lagged count aside, I know there are
 standard extensions to count models to handle time series. However,
 I'm not sure if anything changes when lagged values of the cumulative
 dependent variable are added (i.e. are the regular standard errors
 correct, are estimates consistent, etc).
 
 Can I still use one of the time series count models while including
 this lagged cumulative value?
 
 I would greatly appreciate it if anyone can direct me to relevant
 material on this. As a note, I have already looked at Cameron and
 Trivedi's book.
 
 Many thanks,
 
 Brett
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


-- 
Paul E. Johnson   email: [EMAIL PROTECTED]
Dept. of Political Sciencehttp://lark.cc.ku.edu/~pauljohn
1541 Lilac Lane, Rm 504
University of Kansas  Office: (785) 864-9086
Lawrence, Kansas 66044-3177   FAX: (785) 864-5700

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] shared library configuration; Gnome GUI

2005-04-22 Thread Paul Johnson
Hello, everybody:
On a Fedora Core 3 Linux system, I built R-2.1 using an updated version 
of the spec file that was used to make the RPMs for version 2.0.1 on the 
CRAN system.  The build was fine, and packages updates perfectly.  Thanks!

Then I got curious about the package gnomeGUI. While trying to build 
that, I see errors
=
* Installing *Frontend* package 'gnomeGUI' ...
Using R Installation in R_HOME=/usr/lib/R
R was not built as a shared library
Need a shared R library
ERROR: configuration failed for package 'gnomeGUI'
===

So then I look back at re-building R, and see
 ./configure --help
I see these two items that seem to contradict each other.  Why is the 
first defaulted to no and the second one yes?  What's the difference?

 --enable-R-shlibbuild R as a shared library [no]
[...snip...]
  --enable-shared[=PKGS]
  build shared libraries [default=yes]
I built with --enable-R-shlib and all seemed fine.
Anyway, it turns out it was all for nothing, because the Gnome package 
wants the Gnome-1.4 libraries, whereas I have 2.0X. So, I'm going to 
forget about gnomeGUI, but I wonder: did I do any harm by building R 
with the non-default --enable-R-shared?  Can it potentially break something?

As far as I can see, new R runs great.
--
Paul E. Johnson   email: [EMAIL PROTECTED]
Dept. of Political Sciencehttp://lark.cc.ku.edu/~pauljohn
1541 Lilac Lane, Rm 504
University of Kansas  Office: (785) 864-9086
Lawrence, Kansas 66044-3177   FAX: (785) 864-5700
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] lists: removing elements, iterating over elements,

2005-04-05 Thread Paul Johnson
I'm writing R code to calculate Hierarchical Social Entropy, a diversity 
index that Tucker Balch proposed.  One article on this was published in 
Autonomous Robots in 2000. You can find that and others through his web 
page at Georgia Tech.

http://www.cc.gatech.edu/~tucker/index2.html
While I work on this, I realize (again) that I'm a C programmer 
masquerading in R, and its really tricky working with R lists.  Here are 
things that surprise me, I wonder what your experience/advice is.

I need to calculate overlapping U-diametric clusters of a given radius. 
  (Again, I apologize this looks so much like C.)

## Returns a list of all U-diametric clusters of a given radius
## Give an R distance matrix
## Clusters may overlap.  Clusters may be identical (redundant)
getUDClusters -function(distmat,radius){
  mem - list()
  nItems - dim(distmat)[1]
  for ( i in 1:nItems ){
mem[[i]] - c(i)
  }
  for ( m in 1:nItems ){
for ( n in 1:nItems ){
  if (m != n  (distmat[m,n] = radius)){
##item is within radius, so add to collection m
mem[[m]] - sort(c( mem[[m]],n))
  }
}
  }
  return(mem)
}
That generates the list, like this:
[[1]]
[1]  1  3  4  5  6  7  8  9 10
[[2]]
[1]  2  3  4 10
[[3]]
[1]  1  2  3  4  5  6  7  8 10
[[4]]
[1]  1  2  3  4 10
[[5]]
[1]  1  3  5  6  7  8  9 10
[[6]]
[1]  1  3  5  6  7  8  9 10
[[7]]
[1]  1  3  5  6  7  8  9 10
[[8]]
[1]  1  3  5  6  7  8  9 10
[[9]]
[1]  1  5  6  7  8  9 10
[[10]]
 [1]  1  2  3  4  5  6  7  8  9 10
The next task is to eliminate the redundant elements.  unique() does not 
apply to lists, so I have to scan one by one.

  cluslist - getUDClusters(distmat,radius)
  ##find redundant (same) clusters
  redundantCluster - c()
  for (m in 1:(length(cluslist)-1)) {
for ( n in (m+1): length(cluslist) ){
  if ( m != n  length(cluslist[[m]]) == length(cluslist[[n]]) ){
if ( sum(cluslist[[m]] == cluslist[[n]]){
  redundantCluster - c( redundantCluster,n)
}
  }
}
  }
  ##make sure they are sorted in reverse order
  if (length(redundantCluster)0)
{
  redundantCluster - unique(sort(redundantCluster, decreasing=T))
  ## remove redundant clusters (must do in reverse order to preserve 
index of cluslist)
  for (i in redundantCluster) cluslist[[i]] - NULL
}

Question: am I deleting the list elements properly?
I do not find explicit documentation for R on how to remove elements 
from lists, but trial and error tells me

myList[[5]] - NULL
will remove the 5th element and then close up the hole caused by 
deletion of that element.  That suffles the index values, So I have to 
be careful in dropping elements. I must work from the back of the list 
to the front.

Is there an easier or faster way to remove the redundant clusters?
Now, the next question.  After eliminating the redundant sets from the 
list, I need to calculate the total number of items present in the whole 
list, figure how many are in each subset--each list item--and do some 
calculations.

I expected this would iterate over the members of the list--one step for 
each subcollection

for (i in cluslist){
}
but it does not.  It iterates over the items within the subsets of the 
list cluslist.  I mean, if cluslist has 5 sets, each with 10 elements, 
this for loop takes 50 steps, one for each individual item.

I find this does what I want
for (i in 1:length(cluslist))
But I found out the hard way :)
Oh, one more quirk that fooled me.  Why does unique() applied to a 
distance matrix throw away the 0's  I think that's really bad!

 x - rnorm(5)
 myDist - dist(x,diag=T,upper=T)
 myDist
  1 2 3 4 5
1 0.000 1.2929976 1.6658710 2.6648003 0.5494918
2 1.2929976 0.000 0.3728735 1.3718027 0.7435058
3 1.6658710 0.3728735 0.000 0.9989292 1.1163793
4 2.6648003 1.3718027 0.9989292 0.000 2.1153085
5 0.5494918 0.7435058 1.1163793 2.1153085 0.000
 unique(myDist)
 [1] 1.2929976 1.6658710 2.6648003 0.5494918 0.3728735 1.3718027 0.7435058
 [8] 0.9989292 1.1163793 2.1153085

--
Paul E. Johnson   email: [EMAIL PROTECTED]
Dept. of Political Sciencehttp://lark.cc.ku.edu/~pauljohn
1541 Lilac Lane, Rm 504
University of Kansas  Office: (785) 864-9086
Lawrence, Kansas 66044-3177   FAX: (785) 864-5700
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] working with pairlists imported from HDF5, converting to data frames?

2005-03-16 Thread Paul Johnson
I've used the HDF5 library to bring some data into R. THe verbose output 
looks like this:

 
hdf5load(hdfGraphWed_Mar_16_13_33_37_2005.hdf,load=T,verbosity=1,tidy=T)
Processing object: cprSeats .. which is a Group
Processing object: Seats 0 .. its a dataset..Finished dataset
Processing object: Seats 1 .. its a dataset..Finished dataset
Processing object: Seats 2 .. its a dataset..Finished dataset
Processing object: Seats 3 .. its a dataset..Finished dataset
Processing object: Seats 4 .. its a dataset..Finished dataset
... Done group cprSeats
Processing object: effective .. which is a Group
Processing object: AggComp .. its a dataset..Finished dataset
Processing object: AggNonComp .. its a dataset..Finished dataset
Processing object: CPR .. its a dataset..Finished dataset
Processing object: PR .. its a dataset..Finished dataset
Processing object: SMD .. its a dataset..Finished dataset
... Done group effective

Each item inside the group effective is a vector of numbers.  I want 
to convert effective into a data frame or matrix for use with matplot.

However, R sees effective not as a collection of vectors, but as a 
pairlist.  I'm not a Lisp programmer, so don't understand the 
significance of the list help page's comment about dotted lists.

 class(effective)
[1] pairlist
 attributes(effective)
$names
[1] SMDPR CPRAggNonComp AggComp
I can access the elements in effective as list elements, either with 
effective$SMD or effective[[1]], as in:

 effective[[1]]
  [1] 1.00 1.00 1.00 1.00 1.00 1.00 1.00 
1.00
  [9] 1.00 1.00 1.00 1.00 1.00 1.00 1.00 
1.00
 [17] 1.00 1.00 1.00 1.00 1.00 1.00 1.00 
1.00
 [25] 1.00 1.00 1.00 1.00 1.00 1.00 4.761905 
4.668534
 [33] 4.743833 4.743833 4.694836 4.672897 4.612546 4.612546 4.612546 
4.950495
 [41] 4.766444 4.761905 4.329004 4.990020 4.930966 4.906771 4.378284 
4.686036
 [49] 4.935834 4.793864 4.793864 4.541326 4.849661 4.730369 4.960317 
4.159734

But I can't force it into a data frame
 as.data.frame(effective)
Error in as.data.frame.default(effective) :
can't coerce pairlist into a data.frame
But I can manually build the dataframe
 
data.frame(effective$SMD,effective$PR,effective$CPR,effective$AggNonComp,effective$AggComp)

But that is a bit frustrating for every individual dataset coming in.
Is there a short cut?
--
Paul E. Johnson   email: [EMAIL PROTECTED]
Dept. of Political Sciencehttp://lark.cc.ku.edu/~pauljohn
1541 Lilac Lane, Rm 504
University of Kansas  Office: (785) 864-9086
Lawrence, Kansas 66044-3177   FAX: (785) 864-5700
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] here's why it make s sense

2005-03-04 Thread Paul Johnson
if you go
x[i]
you are giving x an index vector, which we had mistakenly thought was 
an integer. Rather, it is a vector of indices for observations.  Here's data

x - c(1 , 4, 3, 2, 5)
x[1] would be 1
x[2] would bd 4
but if you put an index vector in the brackets, you have
x [ c(1,2,1,2] ]
it means take the first, the second, the first, the second, so
x [ c(1,2,1,2] ]
is
1, 4, 1, 4
So in the procedure diff.ttest, when we have x[i] and y[i], we mean 
take the vectors x and y after grabbing the index values selected for 
each resample

That page I sent you a minute ago is from R by Example, which I think is 
quite great.

http://www.mayin.org/ajayshah/KB/R/index.html
pj
--
Paul E. Johnson   email: [EMAIL PROTECTED]
Dept. of Political Sciencehttp://lark.cc.ku.edu/~pauljohn
1541 Lilac Lane, Rm 504
University of Kansas  Office: (785) 864-9086
Lawrence, Kansas 66044-3177   FAX: (785) 864-5700
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Users in Ukraine cyrillic support

2005-03-01 Thread Paul Johnson
Hello, everybody:
My friends in Ukraine are starting a research lab at a national
university and they asked what programs to use.  I said R of course,
and they then asked me 'what support does it have for Cyrillic'?
i've done some snooping in the R website and all the references i find
to foreign languages concern c and fortran, not Ukrainian or Russian.
Since i'm an English-only user, I have no idea what is involved here, or
what difficulties might be caused with non-English character sets and
file systems.
Not to mention the problem that the documentation/manuals are in English.
If you have any advice that I can collect up for my friends, I would
appreciate it.
--
Paul E. Johnson   email: [EMAIL PROTECTED]
Dept. of Political Sciencehttp://lark.cc.ku.edu/~pauljohn
1541 Lilac Lane, Rm 504
University of Kansas  Office: (785) 864-9086
Lawrence, Kansas 66044-3177   FAX: (785) 864-5700
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] dotplot lattice problems: y axis values and bg color output in jpg

2004-10-22 Thread Paul Johnson
I have a linux system with Fedora Core 2 and R-2.0.
I was comparing plots made with plot() and dotplot() and discovered a 
problem. Although the dots are positioned correctly, the numerical 
labels in the dotplot y axis are not correct.

I put copies here:
http://lark.cc.ku.edu/~pauljohn/R/plotTrouble1.jpg
That is the correct one from plot, with the higest value on y showing 
at 18.

http://lark.cc.ku.edu/~pauljohn/R/plotTrouble2.jpg
That is the dotplot one. The picture is basically the same, except the 
numbers on the y axis only go up to 8.  But the dots are in the correct 
spots and the x axis is labeled correctly.

On the screen, the plots have white backgrounds, but the picture from 
the dotplot turns out gray. Why is that?  Sometimes, if I run another 
lattice plot in the same session, the colors change to the dark 
background, and I suppose the same trouble is happening.  Isn't 
trellis.par.set() going to remain in place for all following trellis plots?

Here's the code I used to make these 2 graphs.
x11(height=6.5,width=6.5)
data2003- subset(elaine1,POLCYEAR==2003)
plot(data2003$OLDCRASH,data2003$RENUCYC)
modall - lm(RENUCYC~OLDCRASH,data=data2003)
abline(modall)
dev.copy(device=jpeg,file=plotTrouble1.jpg)
dev.off()
dev.off()
trellis.par.set(theme=col.whitebg(),height=9,width=6.5)
dotplot (RENUCYC~OLDCRASH, data=elaine1, xlab=ECR, 
as.table=T,subset=POLCYEAR==2003,
panel=function(x,y)
{
  panel.dotplot(x,y,cex=0.2);
  panel.lmline(x,y);
}
   )

jpeg(file=plotTrouble2.jpg,bg=white,height=480,width=480)
trellis.last.object()
dev.off()
dev.off()
I tried to force the ylim on the dotplot up to 18, but it just produced 
this even uglier result.

http://lark.cc.ku.edu/~pauljohn/R/plotTrouble3.jpg
--
Paul E. Johnson   email: [EMAIL PROTECTED]
Dept. of Political Sciencehttp://lark.cc.ku.edu/~pauljohn
1541 Lilac Lane, Rm 504
University of Kansas  Office: (785) 864-9086
Lawrence, Kansas 66044-3177   FAX: (785) 864-5700
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Diagnosing trouble with R-2.0, Fedora Core 2, and Rcmdf

2004-10-11 Thread Paul Johnson
Greetings, R-help!
On 2 Fedora Core 2 Linux systems, i've completely erased the previous R 
and all packages and then installed R-2.0 and installed fresh packages.

In using Rcmdr, I see some trouble and I wonder if other people see this 
and if it is due to the tcl/tk, or R, or Rcmdr.  (If readers have not 
yet tried Rcmdr, I recommend it not just because of the GUI it provides, 
but also because Prof. Fox has created several very handy commands, like 
scatter3d, which make it possible to use advanced packages like rgl. 
Rcmdr provides several very handy wrapper commands that just work and 
users don't have to fuss over so many details! And, if you are new at R, 
you can use pull down menus and then Rcmdr displays the commands that 
correspond with those menu options. Very educational!)

In no particular order, here are the issues I see with R-2.0 and Rcmdr
1. In the panel to select a dataset from a package (nenu: Data/Data in 
packages), I can type in the name of a dataset, but the GUI will not let 
me choose the name of a package from which to select a dataset. Nothing 
happens when I pick a package name.  (With previous R/Rcmdr, I did not 
have this trouble).

2. When using Rcmdr, some of the output from commands shows in the Rcmdr 
bottom window, but some is sent to the xterm in which R was started. For 
example, with a scatter3d() command, if I add the option 
model.summary=T, the model summary shows in the xterm.  But some other 
models show results in the bottom pane of Rcmdr.  And, it seems to me, 
there are some commands in which I've noticed some of the output going 
to the Rcmdr bottom buffer and  some to the xterm.

3. When I want to exit Rcmdr and choose the pull down exit from 
Commander and R, I get a Segmentation Fault that closes Rcmdr and R.

4. 3d plots do not display on the screen until I click in the rgl window 
that displays the graph.  This might be a video card/AGP driver problem, 
I suppose.  These systems have the NVidia FX5200 cards and the NVidia 
proprietary X driver.  If other users don't see the same on other kinds 
of systems, I guess that will tell me where the problem lies.

5. At random times, I get an error tcl/tk window popping up with a 
message about MASS and categorical variables.  It says

Error in data(package=parse(text=packageName));
'MASS' must be character string or NULL
It says that over and over, sometimes it has other package names, as in:
Error in data(package=parse(text=packageName));
'relimp' must be character string or NULL
These seem to be harmless?
--
Paul E. Johnson   email: [EMAIL PROTECTED]
Dept. of Political Sciencehttp://lark.cc.ku.edu/~pauljohn
1541 Lilac Lane, Rm 504
University of Kansas  Office: (785) 864-9086
Lawrence, Kansas 66044-3177   FAX: (785) 864-5700
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] polr (MASS) and lrm (Design) differences in tests of statistical signifcance

2004-09-30 Thread Paul Johnson
Greetings:
I'm running R-1.9.1 on Fedora Core 2 Linux.
I tested a proportional odds logistic regression with MASS's polr and 
Design's lrm.  Parameter estimates between the 2 are consistent, but the 
standard errors are quite different, and the conclusions from the t and 
Wald tests are dramatically different. I cranked the abstol argument 
up quite a bit in the polr method and it did not make the differences go 
away.

So
1. Can you help me see why the std. errors in the polr are so much 
smaller, and

2. Can I hear more opinions on the question of t vs. Wald in making 
these signif tests. So far, I understand the t is based on the 
asymptotic Normality of the estimate of b, and for finite samples b/se 
is not exactly distributed as a t. But I also had the impression that 
the Wald value was an approximation as well.

 summary(polr(as.factor(RENUCYC) ~ DOCS + PCT65PLS*RANNEY2 + OLDCRASH 
+  FISCAL2 + PCTMETRO + ADMLICEN, data=elaine1))

Re-fitting to get Hessian
Call:
polr(formula = as.factor(RENUCYC) ~ DOCS + PCT65PLS * RANNEY2 +
OLDCRASH + FISCAL2 + PCTMETRO + ADMLICEN, data = elaine1)
Coefficients:
Value  Std. Error   t value
DOCS  0.004942217 0.002952001  1.674192
PCT65PLS  0.454638558 0.113504288  4.005475
RANNEY2   0.110473483 0.010829826 10.200855
OLDCRASH  0.139808663 0.042245692  3.309418
FISCAL2   0.025592117 0.011465812  2.232037
PCTMETRO  0.018184093 0.007792680  2.333484
ADMLICEN -0.028490387 0.011470999 -2.483688
PCT65PLS:RANNEY2 -0.008559228 0.001456543 -5.876400
Intercepts:
  Value   Std. Error t value
2|36.6177  0.301921.9216
3|47.1524  0.277325.7938
4|5   10.5856  0.214949.2691
5|6   12.2132  0.185865.7424
6|8   12.2704  0.185666.1063
8|10  13.0345  0.218459.6707
10|12 13.9801  0.351739.7519
12|18 14.6806  0.558726.2782
Residual Deviance: 587.0995
AIC: 619.0995
 lrm(RENUCYC ~ DOCS + PCT65PLS*RANNEY2 + OLDCRASH +  FISCAL2 + 
PCTMETRO + ADMLICEN, data=elaine1)

Logistic Regression Model
lrm(formula = RENUCYC ~ DOCS + PCT65PLS * RANNEY2 + OLDCRASH +
FISCAL2 + PCTMETRO + ADMLICEN, data = elaine1)
Frequencies of Responses
  2   3   4   5   6   8  10  12  18
 21  12 149  46   1  10   6   2   2
Frequencies of Missing Values Due to Each Variable
 RENUCYC DOCS PCT65PLS  RANNEY2 OLDCRASH  FISCAL2 PCTMETRO ADMLICEN
   50060505
   Obs  Max Deriv Model L.R.   d.f.  P  C 
  Dxy
   249  7e-05  56.58  8  0  0.733 
0.465
 Gamma  Tau-a R2  Brier
  0.47  0.278   0.22  0.073

   Coef   S.E. Wald Z P
y=3-6.617857 6.716688 -0.99  0.3245
y=4-7.152561 6.716571 -1.06  0.2869
y=5   -10.585705 6.74 -1.57  0.1164
y=6   -12.213340 6.755656 -1.81  0.0706
y=8   -12.270506 6.755571 -1.82  0.0693
y=10  -13.034584 6.756829 -1.93  0.0537
y=12  -13.980235 6.767724 -2.07  0.0389
y=18  -14.680760 6.786639 -2.16  0.0305
DOCS 0.004942 0.002932  1.69  0.0918
PCT65PLS 0.454653 0.552430  0.82  0.4105
RANNEY2  0.110475 0.076438  1.45  0.1484
OLDCRASH 0.139805 0.042104  3.32  0.0009
FISCAL2  0.025592 0.011374  2.25  0.0245
PCTMETRO 0.018184 0.007823  2.32  0.0201
ADMLICEN-0.028490 0.011576 -2.46  0.0138
PCT65PLS * RANNEY2  -0.008559 0.006417 -1.33  0.1822

--
Paul E. Johnson   email: [EMAIL PROTECTED]
Dept. of Political Sciencehttp://lark.cc.ku.edu/~pauljohn
1541 Lilac Lane, Rm 504
University of Kansas  Office: (785) 864-9086
Lawrence, Kansas 66044-3177   FAX: (785) 864-5700
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


followup: Re: [R] Issue with predict() for glm models

2004-09-23 Thread Paul Johnson
I have a follow up question that fits with this thread.
Can you force an overlaid plot showing predicted values to follow the 
scaling of the axes of the plot over which it is laid?

Here is an example based on linear regression, just for clarity.  I have 
followed the procedure described below to create predictions and now 
want to plot the predicted values on top of a small section of the x-y 
scatterplot.

x - rnorm(100, 10, 10)
e - rnorm(100, 0, 5)
y - 5 + 10 *x + e
myReg1 - lm (y~x)
plot(x,y)
newX - seq(1,10,1)
myPred - predict(myReg1,data.frame(x=newX))
Now, if I do this, I get 2 graphs overlaid but their axes do not line 
up.

par(new=T)
plot(newX,myPred$fit)
The problem is that the second one uses the whole width of the graph 
space, when I'd rather just have it go from the small subset where its x 
is defined, from 1 to 10.  Its stretching the range (1,10) for newX to 
use the same scale that goes from (-15, 35) where it plots x

I know abline() can do this for lm, but for some other kinds of models, 
no  lines() method is provided, and so I am doing this the old fashioned 
way.

pj
John Fox wrote:
Dear Uwe, 


-Original Message-
From: Uwe Ligges [mailto:[EMAIL PROTECTED] 
Sent: Thursday, September 23, 2004 8:06 AM
To: John Fox
Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED]
Subject: Re: [R] Issue with predict() for glm models

John Fox wrote:

Dear Uwe,
Unless I've somehow messed this up, as I mentioned 
yesterday, what you 

suggest doesn't seem to work when the predictor is a 
matrix. Here's a 

simplified example:

X - matrix(rnorm(200), 100, 2)
y - (X %*% c(1,2) + rnorm(100))  0
dat - data.frame(y=y, X=X)
mod - glm(y ~ X, family=binomial, data=dat) new - data.frame(X = 
matrix(rnorm(20),2)) predict(mod, new)
Dear John,
the questioner had a 2 column matrix with 40 and one with 50 
observations (not a 100 column matrix with 2 observation) and 
for those matrices it works ...


Indeed, and in my example the matrix predictor X has 2 columns and 100 rows;
I did screw up the matrix for the new data to be used for predictions (in
the example I sent today but not yesterday), but even when this is done
right -- where the new data has 10 rows and 2 columns -- there are 100 (not
10) predicted values:

X - matrix(rnorm(200), 100, 2)  # original predictor matrix with 100 rows
y - (X %*% c(1,2) + rnorm(100))  0
dat - data.frame(y=y, X=X)
mod - glm(y ~ X, family=binomial, data=dat)
new - data.frame(X = matrix(rnorm(20),10, 2)) # corrected -- note 10 rows
predict(mod, new) # note 100 predicted values
   12345
6 
  5.75238091   0.31874587  -3.00515893  -3.77282121  -1.97511221
0.54712914 
   789   10   11
12 
  1.85091226   4.38465524  -0.41028694  -1.53942869   0.57613555
-1.82761518 

 . . .
  91   92   93   94   95
96 
  0.36210780   1.71358713  -9.63612775  -4.54257576  -5.29740468
2.64363405 
  97   98   99  100 
 -4.45478627  -2.44973209   2.51587537  -4.09584837 

Actually, I now see the source of the problem:
The data frames dat and new don't contain a matrix named X; rather the
matrix is split columnwise:

names(dat)
[1] y   X.1 X.2
names(new)
[1] X.1 X.2
Consequently, both glm and predict pick up the X in the global environment
(since there is none in dat or new), which accounts for why there are 100
predicted values.
Using list() rather than data.frame() produces the originally expected
behaviour:

new - list(X = matrix(rnorm(20),10, 2))
predict(mod, new)
 1  2  3  4  5  6  7
 5.9373064  0.3687360 -8.3793045  0.7645584 -2.6773842  2.4130547  0.7387318
 8  9 10 
-0.4347916  8.4678728 -0.8976054 

Regards,
 John

Best,
Uwe




  12345
6 
 1.81224443  -5.92955128   1.98718051 -10.05331521   2.6506
-2.50635812 
  789   10   11
12 
 5.63728698  -0.94845276  -3.61657377  -1.63141320   5.03417372
1.80400271 
 13   14   15   16   17
18 
 9.32876273  -5.32723406   5.29373023  -3.90822713 -10.95065186
4.90038016

. . .
  97   98   99  100 
-6.92509812   0.59357486  -1.17205723   0.04209578 

Note that there are 100 rather than 10 predicted values.
But with individuals predictors (rather than a matrix),

x1 - X[,1]
x2 - X[,2]
dat.2 - data.frame(y=y, x1=x1, x2=x2)
mod.2 - glm(y ~ x1 + x2, family=binomial, data=dat.2)
new.2 - data.frame(x1=rnorm(10), x2=rnorm(10)) 
predict(mod.2, new.2)
1  2  3  4  5  
   6  7
6.5723823  0.6356392  4.0291018 -4.7914650  2.1435485 -3.1738096 
-2.8261585

8  9 10 
-1.5255329 -4.7087592  4.0619290

works as expected (?).
Regards,
John


-Original Message-
From: [EMAIL 

Re: [R] R glm

2004-09-23 Thread Paul Johnson
No!
 ?family
 The 'gaussian' family accepts the links 'identity', 'log' and
  'inverse';
Kahra Hannu wrote:
In Venables  Ripley: Modern Applied Statistics with S (MASS), (4th edition), on page 184 there 
is a table Families and link functions that gives you the available links with different 
families. The default and the only link with the gaussian family is identity.
ciao,
Hannu Kahra
-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] Behalf Of Shuangge Ma
Sent: Thursday, September 23, 2004 6:26 PM
To: [EMAIL PROTECTED]
Subject: [R] R glm
Hello:
would you please help me with the following glm question?
for the R function glm, what I understand is: once you specify the
family, then the link function is fixed.
My question is: is it possible I use, for example, log link function,
but the estimation approach for the guassian family?
Thanks,
Shuangge Ma, Ph.D.

* CHSCC, Department of Biostatistics   *
* University of Washington *
* Building 29, Suite 310, 6200 NE 74th ST. *
* Seattle, WA 98115*
* Tel: 206-685-7123 Fax: 206-616-4075  *
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

--
Paul E. Johnson   email: [EMAIL PROTECTED]
Dept. of Political Sciencehttp://lark.cc.ku.edu/~pauljohn
1541 Lilac Lane, Rm 504
University of Kansas  Office: (785) 864-9086
Lawrence, Kansas 66044-3177   FAX: (785) 864-5700
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Ever see a stata import problem like this?

2004-09-21 Thread Paul Johnson
Greetings Everybody:
I generated a 1.2MB dta file based on the general social survey with 
Stata8 for linux. The file can be re-opened with Stata, but when I bring 
it into R, it says all the values are missing for most of the variables.

This dataset is called morgen.dta and I dropped a copy online in case 
you are interested

http://www.ku.edu/~pauljohn/R/morgen.dta
looks like this to R (I tried various options on the read.dta command):
 myDat - read.dta(morgen.dta)
 summary(myDat)
 CASEID  yearid hrs1 hrs2
Min.   :   19721   Min.   :1972   Min.   :   1   NAP :0   NAP :0
1st Qu.: 1983475   1st Qu.:1978   1st Qu.: 445   DK  :0   DK  :0
Median : 1996808   Median :1987   Median : 905   NA  :0   NA  :0
Mean   : 9963040   Mean   :1986   Mean   : 990   NA's:40933   NA's:40933
 3rd Qu.:19872187   3rd Qu.:1994   3rd Qu.:1358
 Max.   :20002817   Max.   :2000   Max.   :3247
  prestige  agewedage  educpaeduc
 DK,NA,NAP:0   NAP :0   DK  :0   NAP :0   NAP :0
 NA's :40933   DK  :0   NA  :0   DK  :0   DK  :0
   NA  :0   NA's:40933   NA  :0   NA  :0
   NA's:40933NA's:40933   NA's:40933

  maeduc   speduc income
 NAP :0   NAP :0   $25000 OR MORE:14525
 DK  :0   DK  :0   $1 - 14999: 5022
 NA  :0   NA  :0   $15000 - 1: 3869
 NA's:40933   NA's:40933   $2 - 24999: 3664
   REFUSED   : 1877
   (Other)   : 8523
   NA's  : 3453

Here's what Stata sees when I load the same thing:
summarize, detail
 Case identification number
-
  Percentiles  Smallest
 1%   197432  19721
 5%   199649  19722
10%  1974116  19723   Obs   40933
25%  1983475  19724   Sum of Wgt.   40933
50%  1996808  Mean9963040
Largest   Std. Dev.   9006352
75% 1.99e+07   2.00e+07
90% 2.00e+07   2.00e+07   Variance   8.11e+13
95% 2.00e+07   2.00e+07   Skewness .18931
99% 2.00e+07   2.00e+07   Kurtosis   1.045409
GSS YEAR FOR THIS RESPONDENT
-
  Percentiles  Smallest
 1% 1972   1972
 5% 1973   1972
10% 1974   1972   Obs   40933
25% 1978   1972   Sum of Wgt.   40933
50% 1987  Mean   1986.421
Largest   Std. Dev.   8.61136
75% 1994   2000
90% 1998   2000   Variance   74.15552
95% 2000   2000   Skewness  -.0789223
99% 2000   2000   Kurtosis   1.799939
RESPONDENT ID NUMBER
-
  Percentiles  Smallest
 1%   18  1
 5%   89  1
10%  178  1   Obs   40933
25%  445  1   Sum of Wgt.   40933
50%  905  Mean   989.9129
Largest   Std. Dev.  689.0596
75% 1358   3244
90% 2027   3245   Variance   474803.2
95% 2437   3246   Skewness   .8359211
99% 2867   3247   Kurtosis   3.311248
  NUMBER OF HOURS WORKED LAST WEEK
-
  Percentiles  Smallest
 1%6  0
 5%   15  0
10%   21  0   Obs   23279
25%   37  0   Sum of Wgt.   23279
50%   40  Mean   41.05206
Largest   Std. Dev.  13.95931
75%   48 89
90%   60 89   Variance   194.8624
95%   65 89   Skewness.195045
99%   82 89   Kurtosis   4.448998
 NUMBER OF HOURS USUALLY WORK A WEEK
-
  Percentiles  Smallest
 1%4  0
 5%   15  0
10%   20  1   Obs 774
25%   38  2   Sum of Wgt. 774
50%   40  Mean   39.79199
Largest   Std. Dev.  13.43383
75%   45 89
90%   55 89   Variance   180.4677
95%   60 89 

[R] R-release.diff + R-1.9 - success on Fedora Core 2, R RPM available; ess-emacs-5.1.20 also available

2004-05-21 Thread Paul Johnson
Dear Everybody:
I have Fedora Core 2  and R-1.9.0 does not build out of the box. After 
applying the daily patch file R-release.diff, I find it does build and 
I've made RPMS and SRPM.

In case you want to save yourself a recompile, you can find Fedora Core 
RPMs in here:

http://lark.cc.ku.edu/~pauljohn/software/R
These are based on the standard R distribution SPEC file, only the patch 
was added.

Also, I have created an RPM for ESS for Emacs on Fedora Core 2 (not 
Xemacs) and you can find that here:

http://lark.cc.ku.edu/~pauljohn/software/favoriteEmacsFiles
--
While I'm on the RPM subject, can I ask an R RPM packaging question?  I 
want the R RPM to install so that post install then R starts and runs
 update.packages()

as well as
install.packages(c(Design,Hmisc,lmtest,car,rgl,effects))
well, you get the idea. I want to add in more packages, of course.  Can 
I ask what might be the best way to package this?

pj
ps: Here's the error that results if you do not patch R-1.9.0 on FC2:
gcc -I. -I../../../src/include -I../../../src/include 
-I/usr/X11R6/include -I/usr/local/incl 

ude -DHAVE_CONFIG_H -D__NO_MATH_INLINES -mieee-fp -fPIC  -O2 -g -pipe 
-march=i386 -mcpu=i686 

 -c dataentry.c -o dataentry.lo
In file included from dataentry.c:31:
/usr/X11R6/include/X11/Xlib.h:1400: error: syntax error before _Xconst
/usr/X11R6/include/X11/Xlib.h:1488: error: syntax error before char
/usr/X11R6/include/X11/Xlib.h:1516: error: syntax error before _Xconst
/usr/X11R6/include/X11/Xlib.h:1520: error: syntax error before char
/usr/X11R6/include/X11/Xlib.h:1542: error: syntax error before _Xconst
/usr/X11R6/include/X11/Xlib.h:1577: error: syntax error before '*' token
/usr/X11R6/include/X11/Xlib.h:1586: error: syntax error before _Xconst
/usr/X11R6/include/X11/Xlib.h:1611: error: syntax error before _Xconst
/usr/X11R6/include/X11/Xlib.h:1661: error: syntax error before _Xconst
/usr/X11R6/include/X11/Xlib.h:1667: error: syntax error before _Xconst
/usr/X11R6/include/X11/Xlib.h:1714: error: syntax error before char
/usr/X11R6/include/X11/Xlib.h:1753: error: syntax error before _Xconst
/usr/X11R6/include/X11/Xlib.h:1994: error: syntax error before _Xconst
/usr/X11R6/include/X11/Xlib.h:2078: error: syntax error before _Xconst
/usr/X11R6/include/X11/Xlib.h:2331: error: syntax error before _Xconst
/usr/X11R6/include/X11/Xlib.h:2341: error: syntax error before _Xconst
/usr/X11R6/include/X11/Xlib.h:2413: error: syntax error before _Xconst
/usr/X11R6/include/X11/Xlib.h:2423: error: syntax error before _Xconst
/usr/X11R6/include/X11/Xlib.h:2581: error: syntax error before _Xconst
/usr/X11R6/include/X11/Xlib.h:2596: error: syntax error before _Xconst
/usr/X11R6/include/X11/Xlib.h:2789: error: syntax error before _Xconst
/usr/X11R6/include/X11/Xlib.h:2856: error: syntax error before _Xconst
/usr/X11R6/include/X11/Xlib.h:2861: error: syntax error before char
/usr/X11R6/include/X11/Xlib.h:2975: error: syntax error before _Xconst
/usr/X11R6/include/X11/Xlib.h:3001: error: syntax error before _Xconst
/usr/X11R6/include/X11/Xlib.h:3012: error: syntax error before _Xconst
/usr/X11R6/include/X11/Xlib.h:3037: error: syntax error before _Xconst
/usr/X11R6/include/X11/Xlib.h:3046: error: syntax error before char
/usr/X11R6/include/X11/Xlib.h:3059: error: syntax error before _Xconst
/usr/X11R6/include/X11/Xlib.h:3202: error: syntax error before _Xconst
/usr/X11R6/include/X11/Xlib.h:3251: error: syntax error before _Xconst
/usr/X11R6/include/X11/Xlib.h:3283: error: syntax error before _Xconst
/usr/X11R6/include/X11/Xlib.h:3374: error: syntax error before _Xconst
/usr/X11R6/include/X11/Xlib.h:3381: error: syntax error before _Xconst
/usr/X11R6/include/X11/Xlib.h:3401: error: syntax error before _Xconst
/usr/X11R6/include/X11/Xlib.h:3407: error: syntax error before _Xconst
/usr/X11R6/include/X11/Xlib.h:3419: error: syntax error before _Xconst
/usr/X11R6/include/X11/Xlib.h:3429: error: syntax error before _Xconst
/usr/X11R6/include/X11/Xlib.h:3770: error: syntax error before _Xconst
/usr/X11R6/include/X11/Xlib.h:3781: error: syntax error before _Xconst
/usr/X11R6/include/X11/Xlib.h:3792: error: syntax error before _Xconst
/usr/X11R6/include/X11/Xlib.h:3803: error: syntax error before _Xconst
/usr/X11R6/include/X11/Xlib.h:3814: error: syntax error before _Xconst
/usr/X11R6/include/X11/Xlib.h:3825: error: syntax error before _Xconst
In file included from dataentry.c:32:
/usr/X11R6/include/X11/Xutil.h:566: error: syntax error before _Xconst
/usr/X11R6/include/X11/Xutil.h:606: error: syntax error before _Xconst
/usr/X11R6/include/X11/Xutil.h:666: error: syntax error before _Xconst
/usr/X11R6/include/X11/Xutil.h:678: error: syntax error before _Xconst
/usr/X11R6/include/X11/Xutil.h:801: error: syntax error before _Xconst
dataentry.c: In function `GetKey':
dataentry.c:1272: warning: passing arg 4 of `XLookupString' from 
incompatible pointer type
dataentry.c: In function `GetCharP':
dataentry.c:1281: warning: passing arg 4 of 

Re: [R] Explaining Survival difference between Stata and R

2004-05-13 Thread Paul Johnson
Dear Goran (and others)

I did not know about the eha package, but reading the docs, I see many 
things I've been looking for, including the parametric hazard model with 
Weibull baseline.  Thanks for the tip, and the package.

I still don't quite understand your point about the reason that coxph 
crashes.  Why does the difference of scale between the variables cause 
trouble?  I understand that a redundant set of variables would cause 
singularity, but do not see why differing scales for variables would 
cause that.  Does the explanation lie in the optimization algorithm 
itself?  I'll read the eha package coxreg source code. I bet that will 
show me what's going on in your fix, anyway.

Still, I wish coxph just handled all that stuff.

Just for fun, here are the Stata results for the same small model that 
you report.  If I ask stata to use the efron method of breaking ties, 
then the results do  almost exactly match what you found.

stset yrs2, failure (ratify)
stcox haz_wst  pol_free  , robust nohr efron
Iteration 0:   log pseudo-likelihood = -45.380139
Iteration 1:   log pseudo-likelihood =  -45.00981
Iteration 2:   log pseudo-likelihood = -45.001196
Iteration 3:   log pseudo-likelihood = -45.001193
Refining estimates:
Iteration 0:   log pseudo-likelihood = -45.001193
Cox regression -- Efron method for ties

No. of subjects   =   21   Number of obs   = 
21
No. of failures   =   21
Time at risk  =   78
   Wald chi2(2)= 
   1.09
Log pseudo-likelihood =   -45.001193   Prob  chi2 = 
0.5785

--
 |   Robust
  _t |  Coef.   Std. Err.  zP|z| [95% Conf. 
Interval]
-+
 haz_wst |   8.48e-08   8.63e-08 0.98   0.326-8.43e-08 
2.54e-07
pol_free |   .0089626   .1047656 0.09   0.932-.1963741 
.2142994
--



Göran Broström wrote:

Paul and Thomas,

'coxph' or the data (I got it from Paul) indeed has a problem:

Call:
coxph(formula = Surv(yrs2, ratify) ~ haz.wst + pol.free, data = dat)
 coef exp(coef) se(coef) zp
haz.wst  8.53e-08 1 9.47e-08 0.901 0.37
pol.free   NANA 0.00e+00NA   NA
Likelihood ratio test=0.76  on 1 df, p=0.385  n= 21 
Warning message: 
X matrix deemed to be singular; variable 2 in: 
coxph(Surv(yrs2, ratify) ~ haz.wst + pol.free, data = dat) 
---

Is it the data? Let's try 'coxreg' (eha):
---
Call:
coxreg(formula = Surv(yrs2, ratify) ~ haz.wst + pol.free, data = dat)
Covariate   Mean   CoefRR   Wald p
haz.wst   2054901 0.000 1.0000.372 
pol.free2.090 0.009 1.0090.958 

Events21 
Total time at risk78 
Max. log. likelihood  -45.001 
LR test statistic 0.76 
Degrees of freedom2 
Overall p-value   0.684583


This worked just fine (Paul, same results as in Stata?). But, we 
seem to have a scaling problem; lok at the means of the covariates!
Some rescaling gives:

Call:
coxph(formula = Surv(yrs2, ratify) ~ I(haz.wst * 1e-06) + pol.free, 
data = dat)

  coef exp(coef) se(coef)  zp
I(haz.wst * 1e-06) 0.08479  1.090.095 0.8920 0.37
pol.free   0.00896  1.010.170 0.0526 0.96
Likelihood ratio test=0.76  on 2 df, p=0.685  n= 21 

and now 'coxph' gets the same results as 'coxreg'. I don't know about coxph
for sure, but I do know that coxreg centers all covariates before the NR
procedure starts. Maybe we also should rescale to unit variance? And of
course scale back the coefficients and se:s at the end? 

BTW, Paul's data is heavily tied. Could be an idea to use a discrete time
version of the PH model: you can do that with 'mlreg' (eha): 
--- 
Call:
mlreg(formula = Surv(yrs2, ratify) ~ I(haz.wst * 1e-06) + pol.free, 
data = dat)

Covariate   Mean   CoefRR Wald p
I(haz.wst * 1e-06)  2.055 0.097 1.102  0.320 
pol.free2.090 0.003 1.003  0.985 

Events21 
Total time at risk78 
Max. log. likelihood  -36.324 
LR test statistic 0.93 
Degrees of freedom2 
Overall p-value   0.627056

[R] Explaining Survival difference between Stata and R

2004-05-10 Thread Paul Johnson
Dear Everybody:

I'm doing my usual how does that work in R thing with some Stata 
projects.  I find a gross gap between the Stata and R in Cox PH models, 
and I hope you can give me some pointers about what goes wrong.  I'm 
getting signals from R/Survival that the model just can't be estimated, 
but Stata spits out numbers just fine.

I wonder if I should specify initial values for coxph?

I got a dataset from a student who uses Stata and try to replicate in R. 
I will share data to you in case you want to see for yourself.  Let me 
know if you want text or Stata data file.

In R, I try this:

 cox2 - coxph(Surv(yrs2,ratify)~ accession+ haz.wst+ haz.in +haz.out+ 
wefgov+ rle+ rqe + pol.free +tai.2001 + ny.gdp.pcap.pp.cd + eio, 
data=dat3, control=coxph.control(iter.max=1000),singular.ok=T)
Warning message:
Ran out of iterations and did not converge in: fitter(X, Y, strats, 
offset, init, control, weights = weights,

So I wrote out the file exatly as it was in R into Stata dataset

 write.dta(dat3,cleanBasel.dta)
Warning message:
Abbreviating variable names in: write.dta(dat3, cleanBasel.dta)
Here's the Stata output:

. use /home/pauljohn/ps/ps909/AdvancedRegression/duration_2/cleanBasel.dta
(Written by R.  )
. stset yrs2, failure (ratify)

 failure event:  ratify != 0  ratify  .
obs. time interval:  (0, yrs2]
 exit on or before:  failure

 --
   21  total obs.
0  exclusions

 --
   21  obs. remaining, representing
   21  failures in single record/single failure data
   78  total analysis time at risk, at risk from t = 0
 earliest observed entry t = 0
. stcox accessin haz_wst  haz_in  haz_out  wefgov  rle  rqe  pol_free 
tai_2001 ny_gd  eio, robust
  nohr

 failure _d:  ratify
   analysis time _t:  yrs2
Iteration 0:   log pseudo-likelihood = -49.054959
Iteration 1:   log pseudo-likelihood = -45.021682
Iteration 2:   log pseudo-likelihood = -44.525187
Iteration 3:   log pseudo-likelihood = -44.521588
Iteration 4:   log pseudo-likelihood = -44.521586
Refining estimates:
Iteration 0:   log pseudo-likelihood = -44.521586
Cox regression -- Breslow method for ties

No. of subjects   =   21   Number of obs   = 
21
No. of failures   =   21
Time at risk  =   78
   Wald chi2(11)   = 
  81.64
Log pseudo-likelihood =   -44.521586   Prob  chi2 = 
0.

--
 |   Robust
  _t |  Coef.   Std. Err.  zP|z|
-+
accessin |  -1.114101   .6343663-1.76   0.079
 haz_wst |   2.32e-08   1.08e-07 0.22   0.829
  haz_in |   3.78e-06   2.46e-06 1.54   0.124
 haz_out |  -3.80e-07   3.76e-07-1.01   0.312
  wefgov |   2.139127   .9136992 2.34   0.019
 rle |   1.827482   1.500878 1.22   0.223
 rqe |  -3.126696   1.332069-2.35   0.019
pol_free |  -.4498276.291764-1.54   0.123
tai_2001 |  -2.895922   2.577401-1.12   0.261
ny_gd___ |  -.0003223   .0002194-1.47   0.142
 eio |  -.053   .0726064-0.80   0.426
--
.
 last observed exit t = 7
--



Paul Johnson
Dept. of Political Science
University of Kansas
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] One inflated Poisson or Negative Binomal regression

2004-04-19 Thread Paul Johnson
Dear Peter:

I notice there is a R code for a Zero-inflated Poisson/NB process on the 
Stanford Political Science Computational Lab (Prof. Simon Jackman) web 
page.  If I were wanting to do  a one-inflated model, I would start with 
that because, at least to my eye, it is very easy to  follow.  Mind you, 
I did not try this myself, but I bet you could make it go.  In the file 
zeroinfl.r, look at the function:

zeroinflNegBin - function(parms){

it is pretty clear you'd have to supply a probability model for the 
outcomes valued 1 and then fit them into the overall likelihood.

pj

http://pscl.stanford.edu/content.html
Peter Flom wrote:
Hello

I am interested in Poisson or (ideally) Negative Binomial regression
with an inflated number of 1  responses
I have seen JK Lindsey's fmr function in the gnlm library, which fits
zero inflated Poisson (ZIP) or zero inflated negative binomial
regression, but the help file states that for ' Poisson or related
distributions  the mixture involves the zero category'.
I had thought of perhaps subtracting 1 from all the counts and then
fitting the ZIP or ZINB models, and then adding 1, but am not sure if
this is legitimate, or if there is some better method.
Contextual details:
The dependent variable is number of primary sexual partners in the last
year.  The independent variables include a) Being married or in a
committed relationship  b) using hard drugs  c) sex  d) age
N is c. 500

Not surprisingly, there are a large number of 1 responses, especially
for those who are married or in a relationship.  More surprisingly, the
mean number of partners is the same (1.05 vs. 1.02) for people in and
not in relationships, but the variances are very different, mostly
because those in a relationhsip are much more likely to say exactly 1.
Thanks in advance

Peter

Peter L. Flom, PhD
Assistant Director, Statistics and Data Analysis Core
Center for Drug Use and HIV Research
 

--
Paul E. Johnson   email: [EMAIL PROTECTED]
Dept. of Political Sciencehttp://lark.cc.ku.edu/~pauljohn
1541 Lilac Lane, Rm 504  
University of Kansas  Office: (785) 864-9086
Lawrence, Kansas 66044-3177   FAX: (785) 864-5700

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] solved mystery of difference between glmmPQL and lme

2004-03-22 Thread Paul Johnson
I asked a few days ago about the difference in results I saw between the 
MASS function glmmPQL (due to Venables and Ripley) and the lme function 
from the package nlme (due to Pinheiro and Bates).  When the two tools 
apply to the same model (gaussian, link=identity, correlation=AR1), I 
was getting different results and wondered if there was an argument in 
favor of one or the other.

Several list readers emailed me to point out that glmmPQL is repeatedly 
calling lme, so if a model really can be estimated by lme, then lme is 
the more appropriate one because it is maximum likelihood, rather than 
quasi-likelihood.

That did not explain the difference in results, so I read the source 
code for  glmmPQL and learned that it sets the method for lme fitting to 
 ML.  In contrast, lme defaults to REML.  The estimates from 
glmmPQL and lme (method=ML) are identical in my test case.

pj
--
Paul E. Johnson   email: [EMAIL PROTECTED]
Dept. of Political Sciencehttp://lark.cc.ku.edu/~pauljohn
1541 Lilac Lane, Rm 504
University of Kansas  Office: (785) 864-9086
Lawrence, Kansas 66044-3177   FAX: (785) 864-5700
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] contrast lme and glmmPQL and getting additional results...

2004-03-20 Thread Paul Johnson
I have a longitudinal data analysis project.  There are 10 observations 
on each of 15 units, and I'm estimating this with randomly varying 
intercepts along with an AR1 correction for the error terms within 
units.  There is no correlation across units.  Blundering around in R 
for a long time, I found that for linear/gaussian models, I can use 
either the MASS method glmmPQL (thanks to Venables and Ripley) or the 
lme from nlme (thanks to Pinheiro and Bates).  (I also find that the 
package lme4 has GLMM, but I can't get the correlation structure to work 
with that, so I gave up on that one.)

The glmmPQL and lme results are quite similar, but not identical.

Here are my questions.

1. I believe that both of these offer consistent estimates. Does one 
have preferrable small sample properties?  Is the lme the preferred 
method in this case because it is more narrowly designed to this 
gaussian family model with an identity link?  If there's an argument in 
favor of PQL, I'd like to know it, because a couple of the Hypothesis 
tests based on t-statistics are affected.

2. Is there a pre-made method for calculation of the robust standard errors?

I notice that model.matrix() command does not work for either lme or 
glmmPQL results, and so I start to wonder how people calculate sandwich 
estimators of the standard errors.

3. Are the AIC (or BIC) statistics comparable across models?  Can one 
argue in favor of the glmmPQL results (with, say, a log link) if the AIC 
is more favorable than the AIC from an lme fit?  In JK Lindsey's Models 
for Repeated Measurements, the AIC is sometimes used to make model 
selections, but I don't know where the limits of this application might 
lie.

--
Paul E. Johnson   email: [EMAIL PROTECTED]
Dept. of Political Sciencehttp://lark.cc.ku.edu/~pauljohn
1541 Lilac Lane, Rm 504
University of Kansas  Office: (785) 864-9086
Lawrence, Kansas 66044-3177   FAX: (785) 864-5700
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] using unstack inside my function: that old scope problem again

2004-03-19 Thread Paul Johnson
I've been reading the R mail archives and I've found  a lot of messages 
with this same kind of problem, but I can't understand the answers.  Can 
one of you try to explain this to me?

Here's my example. Given a regression model and a variable, I want to 
use unstack() on the vector of residuals and make some magic with the 
result. But unstack hates me.

PCSE - function (tmodel,groupVar) {
 myres1 - resid(tmodel)
 resUnstacked - unstack(myres1, form = myres1 ~ groupVar));
 E - as.matrix(resUnstacked)
 SIGMA - (1/nrow(E))*(t(E) %*% E)
 OMEGA - diag(x=1, nrow=nrow(E), ncol=nrow(E)) %x% SIGMA
 X - model.matrix(tmodel)
 XPRIMEXINV - solve(t(X) %*% X)
 PCSECOVB - XPRIMEXINV %*%  (t(X) %*% OMEGA %*% X ) %*% XPRIMEXINV
}
The error is:
PCSE(eld.ols1,dat2$STATEID)
Error in eval(expr, envir, enclos) : Object groupVar not found
Here's what I don't understand the most.

If I hack this so that the resUnstacked is created by a matrix 
command, then the function works. Why would matrix() work when unstack 
does not.  And why does model.matrix() work if unstack does not.

Thanks in advance, as usual.

--
Paul E. Johnson   email: [EMAIL PROTECTED]
Dept. of Political Sciencehttp://lark.cc.ku.edu/~pauljohn
1541 Lilac Lane, Rm 504  
University of Kansas  Office: (785) 864-9086
Lawrence, Kansas 66044-3177   FAX: (785) 864-5700

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] glm questions --- saturated model

2004-03-16 Thread Paul Johnson
I'm confused going back and forth between the textbooks and these 
emails.  Please pardon me that I seem so pedantic.

I am pretty certain that -2lnL(saturated) is not 0 by definition.  In 
the binomial model with groups of size=1, then the observed scores will 
be {0,1} but the predicted mean will be some number in [0,1], and so 
-2lnL will not be 0.  I'm reading, for example, Annette Dobson, An 
Introduction to Generalized Linear Models, 2ed (2002  p. 77), where it 
gives the formula one can use for -2lnL(saturated) in the binomial model.

For the Normal distribution, Dobson says

-2lnL(saturated) = N log(2 pi sigma^2)

She gives the saturated model -2lnL(saturated) for lots of 
distributions, actually.

I thought the point in the first note from Prof. Firth was that the 
deviance is defined up to an additive constant because you can add or 
subtract from lnL in the deviance formula

D = -2[lnL(full) - lnL(subset)]

and the deviance is unaffected.  But I don't think that means there is a 
completely free quantity in lnL(saturated).

I agree that the deviance of the saturated model is 0 by definition, if 
by that one means to say

-2[lnL(saturated)-lnL(saturated)]

but of course, that's just a tautology.

Respectfully yours,

pj

Peter Dalgaard wrote:

BXC (Bendix Carstensen) [EMAIL PROTECTED] writes:

 

It's important to remember that lnL is defined only up to an additive 
constant.  For example a Poisson model has lnL contributions -mu + 
y*log(mu) + constant, and the constant is arbitrary.  The 
differencing 
in the deviance calculation eliminates it.  What constant would you 
like to use??

 

I have always been und the impression that the constant chosen by glm is
that which makes the deviance of the saturated model 0, the saturated
model being the one with one parameter per observation in the dataset.
   

As David pointed out, the deviance of a saturated model is zero by
definition. However, there's nothing arbitrary about the constant in a
likelihood either since it is supposed to be a density if seen as a
function of y (well, if you *really* want to quibble, it's a density
with respect to an arbitrary measure, so you could get an arbitrary
constant in if you insist, I suppose). The point is that the constant
is *uniformative* since it depends on y only, not mu, and hence that
people tend to throw some bits of the likelihood away, and not always
the same bits.
 



--
Paul E. Johnson   email: [EMAIL PROTECTED]
Dept. of Political Sciencehttp://lark.cc.ku.edu/~pauljohn
1541 Lilac Lane, Rm 504  
University of Kansas  Office: (785) 864-9086
Lawrence, Kansas 66044-3177   FAX: (785) 864-5700

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] glm questions

2004-03-15 Thread Paul Johnson
Greetings, everybody. Can I ask some glm questions?

1. How do you find out -2*lnL(saturated model)?

In the output from glm, I find:

Null deviance:  which I think is  -2[lnL(null) - lnL(saturated)]
Residual deviance:   -2[lnL(fitted) - lnL(saturated)]
The Null model is the one that includes the constant only (plus offset 
if specified). Right?

I can use the Null and Residual deviance to calculate the usual model 
Chi-squared statistic
-2[lnL(null) - lnL(fitted)].

But, just for curiosity's sake, what't the saturated model's -2lnL ?

2. Why no 'scaled deviance' in output?  Or, how are you supposed to tell 
if there is over-dispersion?
I just checked andSAS gives us the scaled and nonscaled deviance. 

I have read the Venables  Ripley (MASS 4ed) chapter on GLM . I believe 
I understand the cautionary point about overdispersion toward the end 
(p. 408).  Since I'm comparing lots of other books at the moment, I 
believe I see people using the practice that is being criticized.  The 
Pearson Chi-square based estimate of dispersion is recommended and one 
uses an F test to decide if the fitted model is significantly worse than 
the saturated model.  But don't we still assess over-dispersion by 
looking at the scaled deviance (after it is calculated properly)?

Can I make a guess why glm does not report scaled deviance?  Are the glm 
authors trying to discourage us from making the lazy assessment in which 
one concludes over-dispersion is present if the scaled deviance exceeds 
the degrees of freedom?

3. When I run example(glm) at the end there's a Gamma model and the 
printout says:

(Dispersion parameter for Gamma family taken to be 0.001813340) 

I don't find an estimate for the Gamma distribution's shape paremeter in 
the output.  I'm uncertain what the reported dispersion parameter refers 
to.  Its the denominator (phi) in the exponential family formula, isn't 
it? 

   y*theta - c(theta) 
  exp [   -- h(y,phi) ]
   phi

4. For GLM teaching purposes, can anybody point me at some good examples 
of GLM that do not use Normal, Poisson, Negative Binomial, and/or 
Logistic Regression?  I want to justify the effort to understand the GLM 
as a framework.  We have already in previous semesters followed the 
usual econometric approach in which OLS, Poisson/Count, and Logistic 
regression are treated as special cases.  Some of the students don't see 
any benefit from tackling the GLM's new notation/terminology.

What I'm lacking is some persuasive evidence that the effort to master 
the details of the GLM is worthwhile.  I could really use some data and 
reference articles that have applications of Gamma distributed (or 
exponential) variables, say, or Weibull, or whatever.

I've been dropping my course notes in this directory:

http://lark.cc.ku.edu/~pauljohn/ps909/AdvancedRegression. 

The documents GLM1 and GLM2 are pretty good theoretical surveys patting 
self on back/.  But  I need to work harder to justify the effort by 
providing examples. 

I'd appreciate any feedback, if you have any. And, of course, if you 
want to take these documents and use them for your own purposes, be my 
guest.

4. Is it possible to find all methods that an object inherits?

I found out by reading the source code for J Fox's car package that model.matrix() returns the X matrix of coded input variables, so one can do fun things like calculate robust standard errors and such. That's really useful, because before I found that, I was recoding up a storm to re-create the X matrix used in a model.

Is there a direct way to find a list of all the other methods that would apply to an object?

--
Paul E. Johnson   email: [EMAIL PROTECTED]
Dept. of Political Sciencehttp://lark.cc.ku.edu/~pauljohn
1541 Lilac Lane, Rm 504  
University of Kansas  Office: (785) 864-9086
Lawrence, Kansas 66044-3177   FAX: (785) 864-5700s

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] R: Including R plots in a Microsoft Word document

2004-02-20 Thread Paul Johnson
I have wrestled with this problem a lot. I use Linux, coauthors use 
Windows, and the eps files I make from R don't work with MS Word.  Well, 
the don't ever have previews and they sometimes won't print at all 
when I use CrossOver Office with MS Office 2000 in Linux.  My coauthor 
says he can often wrestle my eps files into word on his system with 
Office 2003.  People keep telling me to use gsview to insert the preview 
panes into eps files, and that does work, but more than one half of the 
time my system creates eps files that look significantly worse than the 
originals.  Sometimes it inserts a blank page at the top of the eps or 
it reshapes a figure.  I don't care enough about MS to try to track that 
down.  It just pisses me off.

As a result, I think the answer is more complicated than other people 
make it seem.

I don't think it does any good to output a pdf file because, as  I 
learned yesterday, MS Word users can't import a pdf file into a doc.

Clearly, if you are an MS windows user of R, you can save graphics in 
the windows meta format (wmf)  (or is it enhanced meta format, emf?). 
That will go more or less seamlessly into Word.  If you have a chance to 
boot into Windows, and you really must make an image that works well 
with Word, then you should boot into Windows, run your R in there and 
make the wmf file.

If you are a Linux/Unix user, and you are too proud to use Windows,  the 
problem is much more difficult to deal with.

If you are ABSOLUTELY SURE that your image does not need to be resized 
in any way, you could output from R into a picture type format, such as 
png.  As long as the image does not need to resized in any way, that 
will be fine.  If it is resized, then all bets are off.

I find that the R output to the xfig format is quite good and I can edit 
files in xfig.  You can edit those files, add text, so its very very 
handy.  So right now I'm looking for a good bridge from xfig format to 
Word.  But I just started investigating that.

[EMAIL PROTECTED] wrote:

On Fri, Feb 20, 2004 at 05:54:33PM +0200, Mahmoud K. Okasha wrote:
 

Greetings List,

I am conducting some large simulations using R. As a result, I get many plots but I'm having some trouble with including some of them in a Microsoft Word document. Can any one tell me the easiest method of having copies of the R-graphs in the Word documents?
   

R can produce at least PostScript, PDF, png, jpeg/jpg

see:

help(postscript)
help(pdf)
help(png)
help(jpeg)
I don't use word, for me the PostScript format (more precisely Encapsulated
PostScript/.eps) is the best/more easy/powerful format if you don't have thousands of
points or lines :-)
por instance, to print a simple plot:

postscript(file=somefile.eps);

plot(whatever);

dev.off();  Important

other formats are similar

regards

	Ulisses

   Debian GNU/Linux: a dream come true
-
Computers are useless. They can only give answers.Pablo Picasso
Humans are slow, innaccurate, and brilliant.
Computers are fast, acurrate, and dumb. 
Together they are unbeatable

--- Visita http://www.valux.org/ para saber acerca de la---
--- Asociación Valenciana de Usuarios de Linux  ---
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
 



--
Paul E. Johnson   email: [EMAIL PROTECTED]
Dept. of Political Sciencehttp://lark.cc.ku.edu/~pauljohn
1541 Lilac Lane, Rm 504  
University of Kansas  Office: (785) 864-9086
Lawrence, Kansas 66044-3177   FAX: (785) 864-5700

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] diagnostic information in glm. How about N of missing observations?

2003-12-18 Thread Paul Johnson
I handed out some results from glm() and the students ask how many 
observations were dropped due to missing values?

How would I know? 

In other stat programs, the results will typically include N and the 
number dropped because of missings.  Without going back to R and 
fiddling about to find the total number of rows in the dataframe, there 
is no way to tell.  Somewhat inconvenient. Do you agree?

--
Paul E. Johnson   email: [EMAIL PROTECTED]
Dept. of Political Sciencehttp://lark.cc.ku.edu/~pauljohn
1541 Lilac Lane, Rm 504  
University of Kansas  Office: (785) 864-9086
Lawrence, Kansas 66044-3177   FAX: (785) 864-5700

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


Re: [R] Basic question on function identical

2003-12-15 Thread Paul Johnson
I hope I am not telling you things you already know. If so, I apologize 
in advance.

There are several C-library addons available to try to deal with the 
problem that comparisons of floating point numbers can be 
unpredictable.  I think your example with the greater than sign would 
not be a source of the trouble, but if you ended it with a comparison like

if (xmax == 10.7)

then you would be in trouble because the internal representation of the 
float xmax might not be precisely equal to 10.7.

Until I hear otherwise, I am thinking that ordinal comparisons like (x  
xmax) are accurate, but that equality comparisons like (x==xmax) are not. 

Here's one of the C library projects dealing with the subject:

http://fcmp.sourceforge.net/

The author of that library, Ted Belding, did this paper, Numerical 
Replication of Computer Simulations: Some Pitfalls..., which is 
informative (IMHO):
http://alife.ccp14.ac.uk/ftp-mirror/alife/zooland/pub/research/ci/EC/GA/papers/gecco2000.ps.gz

Also check this web page, which I bookmarked:
http://vision.eng.shu.ac.uk/C++/c/c-faq/cfaq14.html#r14.6
I know I've seen more, but can't remember where.

Ted Harding wrote:

For instance,
in C-speak, to find the maximum of an array of double x[], of
length n, something like the following code could be written:
 xmax=x[1];
 for(i=1;in;i++) if(x[i+1]x[i]) xmax=x[i+1];
Regardless of the accuracy of the comparison, each assignment
xmax = ... should make xmax an internally exact copy of the
thing on the righthand side. However, your reply suggests that this
may not happen, as a result of unpredictable use of 10-byte wide
floating point registers on Intel chips.
 



--
Paul E. Johnson   email: [EMAIL PROTECTED]
Dept. of Political Sciencehttp://lark.cc.ku.edu/~pauljohn
1541 Lilac Lane, Rm 504  
University of Kansas  Office: (785) 864-9086
Lawrence, Kansas 66044-3177   FAX: (785) 864-5700

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


[R] packaging standards for rda files?

2003-12-11 Thread Paul Johnson
Dear everybody:

We used the fine foreign library to bring in an SPSS dataset that was 
about 9 megabytes and I can squeeze it into a much smaller R object 
using compression with

save(ndat, file=NatAnnES2000.rda, compress=T). 

I can use load() to get the ndat dataframe back, that's all good as 
far as I can see.  If I put that file in the data subdirectory, then the 
data() command finds it and I can load it. 

Seems fine, but then I started wondering if there is not some more 
sophisticated way of packaging these things. For example, how do people 
put in the meta information that appears in the right side of the data() 
output, as in:

Data sets in package '.':

NatAnnES2000 

Data sets in package 'base':

FormaldehydeDetermination of Formaldehyde
HairEyeColorHair and Eye Color of Statistics Students
...
Are there other attributes that I should specify if I want to package an 
.rda file for other users?

An rda file created in this way will translate across platforms, won't it?

pj

--
Paul E. Johnson   email: [EMAIL PROTECTED]
Dept. of Political Sciencehttp://lark.cc.ku.edu/~pauljohn
1541 Lilac Lane, Rm 504  
University of Kansas  Office: (785) 864-9086
Lawrence, Kansas 66044-3177   FAX: (785) 864-5700

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


Re: [R] read SAS format file from R

2003-11-21 Thread Paul Johnson
Gabor Grothendieck wrote:

kan_liu2 wrote:
 

Can you please piont me how to read SAS format file from R (is 
it possible?)?
   

There was a thread on this last month.  Check out the replies to:

http://maths.newcastle.edu.au/~rking/R/help/03b/5450.html

__
[EMAIL PROTECTED] mailing listhttps://www.stat.math.ethz.ch/mailman/listinfo/r-help
 

I just worked on this. Care to check my example?

http://lark.cc.ku.edu/~pauljohn/ANES/2002/

This shows how we took a big american national election study dataset in 
either SPSS or SAS format and got over to R.

--
Paul E. Johnson   email: [EMAIL PROTECTED]
Dept. of Political Sciencehttp://lark.cc.ku.edu/~pauljohn
1541 Lilac Lane, Rm 504  
University of Kansas  Office: (785) 864-9086
Lawrence, Kansas 66044-3177   FAX: (785) 864-5700

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help