from:"Paul Johnson"

[R] lm equivalent of Welch-corrected t-test?

2018-11-13 Thread Paul Johnson

Long ago, when R's t.test had var.equal=TRUE by default, I wrote some
class notes showing that the result was equivalent to a one predictor
regression model.  Because t.test does not default to var.equal=TRUE
these days, I'm curious to know if there is a way to specify weights
in an lm to obtain the same result as the Welch-adjusted values
reported by t.test at the current time.  Is there a WLS equivalent
adjustment with lm?

Here's example code to show that lm is same as t.test with var.equal.
The signs come out differently, but otherwise the effect estimate,
standard error, t value are same:


set.seed(234234)
dat <- data.frame(x = gl(2, 50, labels = c("F", "M")))
dat$err <- rnorm(100, 0, 1)
dat$y <- ifelse(dat$x == "F", 40 + dat$err, 44 + dat$err)
m1 <- lm(y ~ x, dat)
summary(m1)
m1.t <- t.test(y ~ x, dat, var.equal = TRUE)
m1.t
## diff matches regression coefficient
(m1.t.effect <- diff(m1.t$estimate))
## standard error matches regression se
m1.t.stderr <- m1.t.effect/m1.t$statistic

If you run that, you see lm output:

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)  39.9456 0.1180  338.65   <2e-16 ***
xM3.9080 0.1668   23.43   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.8341 on 98 degrees of freedom
Multiple R-squared:  0.8485,Adjusted R-squared:  0.8469
F-statistic: 548.8 on 1 and 98 DF,  p-value: < 2.2e-16

and t.test:

> m1.t <- t.test(y ~ x, dat, var.equal = TRUE)
> m1.t

Two Sample t-test

data:  y by x
t = -23.427, df = 98, p-value < 2.2e-16
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -4.239038 -3.576968
sample estimates:
mean in group F mean in group M
   39.9455843.85358

> (m1.t.effect <- diff(m1.t$estimate))
mean in group M
   3.908003
> m1.t.effect/m1.t$statistic
mean in group M
 -0.1668129




-- 
Paul E. Johnson   http://pj.freefaculty.org
Director, Center for Research Methods and Data Analysis http://crmda.ku.edu

To write to me directly, please address me at pauljohn at ku.edu.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] ANOVA in R

2018-10-10 Thread Paul Johnson

We cannot read your message. Should post pure text, not html. Hm, my phone
now may post html, must try to stop. Your R code not legible. It seems to
be output? Lines all run together. I tried find articles you mention, but
"not found" resulted.

You should use aov() for fitting, then get post hoc comparisons.

In car package, Anova function will help. I may teach Anova soon, we'll see
if I have better answer then.

Paul Johnson
University of Kansas

On Wed, Oct 10, 2018, 1:14 AM Thanh Tran  wrote:

> Hi eveyone,
> I'm studying about variance (ANOVA) in R and have some questions to share.
> I read an article investigating the effect of factors (temperature, Asphalt
> content, Air voids, and sample thickness) on the hardness of asphalt
> concrete in the tensile test (abbreviated as Kic). Each condition was
> repeated four times (4 samples). In the paper, the authors used MINITAB to
> analyze Anova. The authors use "adjusted sums of squares" calculate the
> p-value I try to use ANOVA in R to analyze this data and get the result as
> shown in Figure 4. The results are different from the results in the
> article. Some papers say that in R, the default for ANOVA analysis is to
> use "sequential sums of squares" to calculate the p-value.
> So please help the following two questions: 1 / Introduction to code in R
> for anova analysis uses "adjusted sums of squares". The main part of the
> command in R / myself is as follows: > Tem = as.factor (temperature) > Ac =
> as.factor (AC) > Av = as.factor (AV) > Thick = as.factor (Thickness) >
> Twoway = lm (KIC ~ Tem + Ac + Av + Thick + Stamp + Ac + Stamp + Av + Stamp
> + Thick + Ac * Av + Ac * Thick + Av * Thick) > anova (twoway) 2/ When to
> use "sequential sums of squares" and when to use "adjusted sums of
> squares". Some papers recommend using the "oa.design
> <
> https://www.youtube.com/redirect?q=http%3A%2F%2Foa.design%2F&redir_token=AaSAPDY-5UAsoHxN6BdwfyIJ7R98MTUzOTIxNDg2OUAxNTM5MTI4NDY5&event=comments
> >"
> function in R to check for "orthogonal" designs. If not, use "adjusted sums
> of squares". I am still vague about this command, so look forward to
> everyone's suggestion. If you could answer all two of my questions, I would
> be most grateful. Ps: I have added a CSV file and the paper for practicing
> R. http://www.mediafire.com/file/e5oe54p2c2wd4bc/Saha+research.csv
>
> http://www.mediafire.com/file/39jlf9h539y9mdz/Homothetic+behaviour+investigation+on+fracture+toughness+of+asphalt+mixtures+using+semicircular+bending+test.pdf
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Problems to obtain standardized betas in multiply-imputed data

2018-10-05 Thread Paul Johnson

Greetings.
I would adjust approach  to calculate standardized estimates for each
imputed set. Then summarize them . The way you are doing it here implies
that standardization concept applies to model list, which seems doubtful.
The empirical std. dev. of the variables differs among imputed data sets,
after all.

I suppose I mean to say lm.beta is not intended to receive a list of
regressions. Put standardization in the with() work done on each imputed
set. I suspect it is as easy as putting lm.beta in there. If there is
trouble, I have a standardize function in the rockchalk package. Unlike
lm.beta, it actually standardizes variables and runs regression. lm.beta
resales coefficients instead.

Paul Johnson
University of Kansas

On Wed, Sep 26, 2018, 5:03 AM CHATTON Anne via R-help 
wrote:

> Dear all,
>
> I am having problems in obtaining standardized betas on a multiply-imputed
> data set. Here are the codes I used :
> imp = mice(data, 5, maxit=10, seed=42, print=FALSE)
> FitImp <- with(imp,lm(y ~ x1 + x2 + x3))
> Up to here everything is fine. But when I ask for the standardized
> coefficients of the multiply-imputed regressions using this command :
> sdBeta <- lm.beta(FitImp)
> I get the following error message:
> Error in b * sx : argument non numérique pour un opérateur binaire
>
> Can anyone help me with this please?
>
> Anne
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] looking for formula parser that allows coefficients

2018-08-21 Thread Paul Johnson

Can you point me at any packages that allow users to write a
formula with coefficients?

I want to write a data simulator that has a matrix X with lots
of columns, and then users can generate predictive models
by entering a formula that uses some of the variables, allowing
interactions, like

y ~ 2 + 1.1 * x1 + 3 * x3 + 0.1 * x1:x3 + 0.2 * x2:x2

Currently, in the rockchalk package, I have a function simulates
data (genCorrelatedData2), but my interface to enter the beta
coefficients is poor.  I assumed user would always enter 0's as
place holder for the unused coefficients, and the intercept is
always first. The unnamed vector is too confusing.  I have them specify:

c(2, 1.1, 0, 3, 0, 0, 0.2, ...)

I the documentation I say (ridiculously) it is easy to figure out from
the examples, but it really isnt.
It function prints out the equation it thinks you intended, thats
minimum protection against user error, but still not very good:

dat <- genCorrelatedData2(N = 10, rho = 0.0,
  beta = c(1, 2, 1, 1, 0, 0.2, 0, 0, 0),
  means = c(0,0,0), sds = c(1,1,1), stde = 0)
[1] "The equation that was calculated was"
y = 1 + 2*x1 + 1*x2 + 1*x3
 + 0*x1*x1 + 0.2*x2*x1 + 0*x3*x1
 + 0*x1*x2 + 0*x2*x2 + 0*x3*x2
 + 0*x1*x3 + 0*x2*x3 + 0*x3*x3
 + N(0,0) random error

But still, it is not very good.

As I look at this now, I realize expect just the vech, not the whole vector
of all interaction terms, so it is even more difficult than I thought to get the
correct input.Hence, I'd like to let the user write a formula.

The alternative for the user interface is to have named coefficients.
I can more or less easily allow a named vector for beta

beta = c("(Intercept)" = 1, "x1" = 2, "x2" = 1, "x3" = 1, "x2:x1" = 0.1)

I could build a formula from that.  That's not too bad. But I still think
it would be cool to allow formula input.

Have you ever seen it done?
pj
-- 
Paul E. Johnson   http://pj.freefaculty.org
Director, Center for Research Methods and Data Analysis http://crmda.ku.edu

To write to me directly, please address me at pauljohn at ku.edu.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Transforming data for nice output table

2018-08-21 Thread Paul Johnson

On Mon, Aug 20, 2018 at 2:17 PM David Doyle  wrote:
>
> Hello everyone,
>
> I'm trying to generate tables of my data out of R for my report.
>
> My data is setup in the format as follows and the example can be found at:
> http://doylesdartden.com/R/ExampleData.csv
>
> LocationDateYear  GW_Elevation
> 127(I)5/14/2006 2006   752.46
> 119(I)5/14/2006 2006   774.67
> 127(I)6/11/2007 2007   752.06
> 119(I)6/11/2007 2007   775.57
>
> I would like to generate a table that showed
>
> LocationGW_Elevation 2006GW_Elevation 2007GW_Elevation xxx.
>
> 119(I)774.67  775.57
>   
> 127(I)752.46  752.06
>   
>   XX   XX
>
>  Any thoughts on how to transform the data so it would be in this format??
>
> Thank you for your time
>
> David Doyle

Dear David

I'd consider studying R's reshape function, it was intended exactly
for this purpose. No reason to adventure into any user-contributed
tidy places to get this done.

dta <- read.csv("http://doylesdartden.com/R/ExampleData.csv";)
dta <- dta[c("Location", "Year", "GW_Elevation")]
dta.wide <- reshape(dta, direction = "wide", idvar = "Location",
v.names = "GW_Elevation", timevar = "Year")
head(dta.wide)

  Location GW_Elevation.2006 GW_Elevation.2007 GW_Elevation.2008
1   127(I)752.46NA757.50
2   119(S)774.67778.76776.40
3   132(I)759.45761.68764.27
4   132(S)761.77761.04765.44
5   111(I)753.52763.24764.24
6   111(S)766.18772.84767.41
  GW_Elevation.2009 GW_Elevation.2010 GW_Elevation.2011 GW_Elevation.2012
1759.90756.40759.05759.31
2777.59777.45778.21778.13
3761.90764.03763.63763.99
4761.21763.12762.69759.57
5750.85764.37762.99763.90
6769.77767.88767.95767.19
  GW_Elevation.2013 GW_Elevation.2014 GW_Elevation.2015 GW_Elevation.2016
1756.07756.66757.72757.66
2778.88778.28775.16778.28
3761.22762.81762.36764.46
4763.19763.87761.94763.90
5764.42761.65764.02762.93
6770.20767.25767.74766.87

The main difference between this and your stated target is that your
target column names have spaces in them, which are forbidden in
column names of data frames. Here R used a period for joining strings.
You can override
that if you want to with the reshape function, but usually I'd let the periods
happen.

If you do want to replace period with spaces, it can be done, but you
break the warranty
on other uses of a data frame. (Could get rid of underscore after GW
in same way)

colnames(dta.wide) <- sub("Elevation.", "Elevation ",
colnames(dta.wide), fixed = TRUE)

I'd not try to use that wide frame for many other purposes because of
the spaces, but it works well if you want to make a pleasant table out
of it. For example, xtable is my favorite:

library(xtable)

xt <- xtable(dta.wide)
print(xt)

The latex from that prints out beautifully in a document. The print
method for xtable has a file parameter if you want to save the file.

Good Luck

pj

>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Paul E. Johnson   http://pj.freefaculty.org
Director, Center for Research Methods and Data Analysis http://crmda.ku.edu

To write to me directly, please address me at pauljohn at ku.edu.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] [ESS] M-x R gives no choice of starting dir

2017-12-13 Thread Paul Johnson

If Emacs is not asking for starting directory, it is very likely your
init file has this somewhere:

(setq ess-ask-for-ess-directory nil)

On Mon, Sep 11, 2017 at 3:23 PM, Enrico Schumann  
wrote:
> On Mon, 11 Sep 2017, Christian writes:
>
>> Hi,
>>
>> I experienced a sudden change in the behavior of M-x R in not giving
>> me the choice where to start R. May be that I botched my
>> preferences. I am using Aquamacs 3.3 on MacOS 10.12.6
>>
>> Christian
>
> I suppose you are using ESS? There is a variable called
> 'ess-ask-for-ess-directory', which controls whether
> M-x R prompts for a directory. Perhaps you have set
> this to nil?
>
> I also Cc the ESS-help mailing list and suggest that
> follow-up be sent there.
>
>
> Kind regards
>  Enrico
>
> --
> Enrico Schumann
> Lucerne, Switzerland
> http://enricoschumann.net
>
> __
> ess-h...@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/ess-help

-- 
Paul E. Johnson   http://pj.freefaculty.org
Director, Center for Research Methods and Data Analysis http://crmda.ku.edu

To write to me directly, please address me at pauljohn at ku.edu.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] function pointers?

2017-11-22 Thread Paul Johnson

We have a project that calls for the creation of a list of many
distribution objects.  Distributions can be of various types, with
various parameters, but we ran into some problems. I started testing
on a simple list of rnorm-based objects.

I was a little surprised at the RAM storage requirements, here's an example:

N <- 1
closureList <- vector("list", N)
nsize = sample(x = 1:100, size = N, replace = TRUE)
for (i in seq_along(nsize)){
closureList[[i]] <- list(func = rnorm, n = nsize[i])
}
format(object.size(closureList), units = "Mb")

Output says
22.4 MB

I noticed that if I do not name the objects in the list, then the
storage drops to 19.9 MB.

That seemed like a lot of storage for a function's name. Why so much?
My colleagues think the RAM use is high because this is a closure
(hence closureList).  I can't even convince myself it actually is a
closure. The R source has

rnorm <- function(n, mean=0, sd=1) .Call(C_rnorm, n, mean, sd)

The storage holding 1 copies of rnorm, but we really only need 1,
which we can use in the objects.

Thinking of this like C,  I am looking to pass in a pointer to the
function.  I found my way to the idea of putting a function in an
environment in order to pass it by reference:

rnormPointer <- function(inputValue1, inputValue2){
object <- new.env(parent=globalenv())
object$distr <- inputValue1
object$n <- inputValue2
class(object) <- 'pointer'
object
}

## Experiment with that
gg <- rnormPointer(rnorm, 33)
gg$distr(gg$n)

ptrList <- vector("list", N)
for(i in seq_along(nsize)) {
ptrList[[i]] <- rnormPointer(rnorm, nsize[i])
}
format(object.size(ptrList), units = "Mb")

The required storage is reduced to 2.6 Mb. Thats 1/10 of the RAM
required for closureList.  This thing works in the way I expect

## can pass in the unnamed arguments for n, mean and sd here
ptrList[[1]]$distr(33, 100, 10)
## Or the named arguments
ptrList[[1]]$distr(1, sd = 100)

This environment trick mostly works, so far as I can see, but I have
these questions.

1. Is the object.size() return accurate for ptrList?  Do I really
reduce storage to that amount, or is the required storage someplace
else (in the new environment) that is not included in object.size()?

2. Am I running with scissors here? Unexpected bad things await?

3. Why is the storage for closureList so great? It looks to me like
rnorm is just this little thing:

function (n, mean = 0, sd = 1)
.Call(C_rnorm, n, mean, sd)


4. Could I learn (you show me?) to store the bytecode address as a
thing and use it in the objects?  I'd guess that is the fastest
possible way. In an Objective-C problem in the olden days, we found
the method-lookup was a major slowdown and one of the programmers
showed us how to save the lookup and use it over and over.

pj



-- 
Paul E. Johnson   http://pj.freefaculty.org
Director, Center for Research Methods and Data Analysis http://crmda.ku.edu

To write to me directly, please address me at pauljohn at ku.edu.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Install package "diagram"

2017-08-17 Thread Paul Johnson

On Wednesday, August 16, 2017, AbouEl-Makarim Aboueissa <
abouelmakarim1...@gmail.com> wrote:

> Dear All:
>
> I am trying to install the package "diagram". It is in the list. But when I
> selected the package to install it, it says:
>
> Question: "would you like to use a personal library instead?"
>
> You should say yes here. Let go into directory where you are allowed to
write.

 Other one fails because you are not the administrator.
PJ

> I selected No.
>
> Then it says
>
> > utils:::menuInstallPkgs()
> Warning in install.packages(NULL, .libPaths()[1L], dependencies = NA, type
> = type) :
>   'lib = "C:/Program Files/R/R-3.4.1/library"' is not writable
> Error in install.packages(NULL, .libPaths()[1L], dependencies = NA, type =
> type) :
>   unable to install packages
> >
>
>
> Any help will be appreciated.
>
>
> with many thanks
> abou
> --
> __
> AbouEl-Makarim Aboueissa, PhD
> Department of Mathematics and Statistics
> University of Southern Maine
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org  mailing list -- To UNSUBSCRIBE and
> more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
Paul E. Johnson   http://pj.freefaculty.org
Director, Center for Research Methods and Data Analysis http://crmda.ku.edu

To write to me directly, please address me at pauljohn at ku.edu.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] workaround for "package building problem Centos 7 & R 3.4.1"

2017-08-14 Thread Paul Johnson

This is for the problem I posted about last Friday.

First, the happy part, a workaround:

$ cd ~/R/x86_64-redhat-linux-gnu-library/3.4
$ ln -sf /usr/share/R/library/* .

After that, all of the packages are found by R CMD check.  R CMD check
looks in the ~/R/x86_64-redhat-linux-gnu-library/3.4 folder for
packages, but not in "/usr/share/R/library" (for whatever reason, I do
not know yet).

Why packages are not found in /usr/share/R/library is the unsolved
mystery. I don't see any difference in configuration between Centos
and Debian. In  Centos /usr/lib64/R/etc/Renviron we have the EPEL
version of the settings that Dirk Eddelbuettel worked out with R Core
for Debian packaging way back in the beginning of, well, the century:

R_LIBS_USER=${R_LIBS_USER-'~/R/x86_64-redhat-linux-gnu-library/3.4'}
R_LIBS_SITE=${R_LIBS_SITE-'/usr/local/lib/R/site-library:/usr/local/lib/R/library:/usr/lib64/R/library:/usr/share/R/library'}

R_LIBS_SITE looks good in that file. I know (for sure) these settings
were put in as adaptation to Dirk's Debian package because I sent them
to Spot Calloway back in the early days of the EPEL package.

I've found a similar report about this in the EPEL R packaging bugzilla:

https://bugzilla.redhat.com/show_bug.cgi?id=1457395

Appears there that the RPM package builds, even though the package
does not survive the check. Interesting:)

Summary of the problem:

R CMD check has unexpected .libPaths().  It does not use the
R_LIBS_SITE settings in Renviron.

The error message from R CMD check without "--as-cran" is informative,
as it plainly states that .libPaths() does not seem to include the
location where the packages are installed:

###
$ R CMD check kutils_1.19.tar.gz
* using log directory ‘/home/pauljohn/GIT/CRMDA/kutils/package/kutils.Rcheck’
* using R version 3.4.1 (2017-06-30)
* using platform: x86_64-redhat-linux-gnu (64-bit)
* using session charset: UTF-8
* checking for file ‘kutils/DESCRIPTION’ ... OK
* checking extension type ... Package
* this is package ‘kutils’ version ‘1.19’
* checking package namespace information ... OK
* checking package dependencies ... OK
* checking if this is a source package ... OK
* checking if there is a namespace ... OK
* checking for executable files ... OK
* checking for hidden files and directories ... OK
* checking for portable file names ... OK
* checking for sufficient/correct file permissions ... OK
* checking whether package ‘kutils’ can be installed ... OK
* checking installed package size ... OK
* checking package directory ... OK
* checking ‘build’ directory ... OK
* checking DESCRIPTION meta-information ... OK
* checking top-level files ... OK
* checking for left-over files ... OK
* checking index information ... OK
* checking package subdirectories ... OK
* checking R files for non-ASCII characters ... OK
* checking R files for syntax errors ... OK
* checking whether the package can be loaded ... OK
* checking whether the package can be loaded with stated dependencies ... OK
* checking whether the package can be unloaded cleanly ... OK
* checking whether the namespace can be loaded with stated dependencies ... OK
* checking whether the namespace can be unloaded cleanly ... OK
* checking loading without being on the library search path ... WARNING
Error: package or namespace load failed for ‘kutils’ in
loadNamespace(j <- i[[1L]], c(lib.loc, .libPaths()), versionCheck =
vI[[j]]):
 there is no package called ‘openxlsx’
Execution halted

It looks like this package has a loading problem when not on .libPaths:
see the messages for details.
* checking dependencies in R code ... OK
##

However, I promise all of the packages are installed, in "/usr/share/R/library".

Interactive R sessions do find the required packages, and they also
seem to be found by R programs run with R CMD BATCH and R with or
without "--vanilla".

$ R --vanilla -e ".libPaths()"

R version 3.4.1 (2017-06-30) -- "Single Candle"
Copyright (C) 2017 The R Foundation for Statistical Computing
Platform: x86_64-redhat-linux-gnu (64-bit)

> .libPaths()
[1] "/home/pauljohn/R/x86_64-redhat-linux-gnu-library/3.4"
[2] "/usr/lib64/R/library"
[3] "/usr/share/R/library"
>

I don't understand why the Renviron settings are found, actually, but
I'm counting my good fortune that they are.

Before I found the simple easy workaround, I went though the tedious
process of building and installing all of the packages in
~/R/x86_64-redhat-linux-gnu-library/3.4. That solves the problem as
well.  It seems possible to me that many people have packages
installed in that location anyway, so they are never bothered by the
problem in the first place.

Anyway, we have a workaround, but I don't understand what's wrong to
begin with or how to fix it the right way.

If you keep Googling, you see that this problem was experienced as
early as 2012.  It has happened to some of my personal R heroes,
actually. You know who you are :)

pj
-- 
Paul E. Joh

Re: [R] Plotting log transformed predicted values from lme

2017-08-12 Thread Paul Johnson

In the rockchalk package, I have a function called newdata that will help
with this. Plenty of examples. Probably my predictOmatic function will just
work. Motivation is in the vignette.

Paul Johnson
University of Kansas

On Aug 9, 2017 11:23 AM, "Alina Vodonos Zilberg" 
wrote:

> Hi,
>
> I am performing meta-regression using linear mixed-effect model with the
> lme() function  that has two fixed effect variables;one as a log
> transformed variable (x)  and one as factor (y) variable, and two nested
> random intercept terms.
>
> I want to save the predicted values from that model and show the log curve
> in a plot ; predicted~log(x)
>
> mod<-lme(B~log(x)+as.factor(y), random=~1|cohort/Study,
> weights=varFixed(~I(SE^2)), na.action=na.omit, data=subset(meta),
>   control = lmeControl(sigma = 1, apVar = FALSE))
> summary(mod)
>
> newdat <- data.frame(x=seq(min(meta$x), max(meta$x),,118))  # I have 118
> observations. #How do I add the factor variable to my newdat?
> newdat$pred <- predict(mod, newdat,level = 0,type="response")
>
> plot(B ~ x, data=meta)
> lines(B ~ x, data=newdat)
>
> Can you please assist me ?
>
> Thank you!
>
> Alina
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] package building problem Centos 7 & R 3.4.1

2017-08-12 Thread Paul Johnson

On Aug 12, 2017 11:58 AM, "José Abílio Matos"  wrote:

On Friday, 11 August 2017 22.51.12 WEST Paul Johnson wrote:
> Dear everybody:
>
> Packages that DO pass the package check on my Ubuntu 17.04 laptop with
> R 3.4.1 and Mac OSX do not build on Centos 7 with R 3.4.1. I'm pretty
> sure I have an environment defect, but cannot find it.

Hi Paul,
the first error that you get is that ggplot2 is not installed:
Error in library(ggplot2) : there is no package called ‘ggplot2’

IIRC that package is not available in the EPEL repositories.

That's not it.
All of the concerned packages are installed and run fine. I demonstrated
that in original post. EPEL has nothing to do with this, install.packages
works fine.

PJ

I think that most of the other errors are similar.

Regards,
--
José Matos

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] package building problem Centos 7 & R 3.4.1

2017-08-11 Thread Paul Johnson

Dear everybody:

Packages that DO pass the package check on my Ubuntu 17.04 laptop with
R 3.4.1 and Mac OSX do not build on Centos 7 with R 3.4.1. I'm pretty
sure I have an environment defect, but cannot find it.

I find posts from various people about this problem since 2012.  But
I've checked the likely suspects. Among all the runs I pasted in
below, the most informative error message I found is the very last
one, which ends like this:

It looks like this package has a loading problem when not on .libPaths:
see the messages for details.

The messages don't really help me understand (all pasted in full
detail below), but I hope you'll say "aha", there's that environment
thing again.

I notice that package check proceeds differently without "--as-cran".
In some cases, package check passes. Other times, it fails
differently.

Here are some details, I hope you have guesses about where to check for trouble.

I suspected I had an error in my packages, so I picked some at random
from the CRAN list, things which depend on openxlsx. It turns out they
don't survive check either. Hopefully this makes it easy for you to
replicate (copy/paste :) )

$ wget http://rweb.quant.ku.edu/cran/src/contrib/reproducer_0.1.9.tar.gz
$ R CMD check --as-cran reproducer_0.1.9.tar.gz
* using log directory ‘/tmp/pj/reproducer.Rcheck’
* using R version 3.4.1 (2017-06-30)
* using platform: x86_64-redhat-linux-gnu (64-bit)
* using session charset: UTF-8
* using option ‘--as-cran’
* checking for file ‘reproducer/DESCRIPTION’ ... OK
* this is package ‘reproducer’ version ‘0.1.9’
* checking CRAN incoming feasibility ... WARNING
Maintainer: ‘Lech Madeyski ’

Insufficient package version (submitted: 0.1.9, existing: 0.1.9)
* checking package namespace information ... OK
* checking package dependencies ... OK
* checking if this is a source package ... OK
* checking if there is a namespace ... OK
* checking for executable files ... OK
* checking for hidden files and directories ... OK
* checking for portable file names ... OK
* checking for sufficient/correct file permissions ... OK
* checking whether package ‘reproducer’ can be installed ... OK
* checking installed package size ... OK
* checking package directory ... OK
* checking DESCRIPTION meta-information ... OK
* checking top-level files ... OK
* checking for left-over files ... OK
* checking index information ... OK
* checking package subdirectories ... OK
* checking R files for non-ASCII characters ... OK
* checking R files for syntax errors ... OK
* checking whether the package can be loaded ... OK
* checking whether the package can be loaded with stated dependencies ... OK
* checking whether the package can be unloaded cleanly ... OK
* checking whether the namespace can be loaded with stated dependencies ... OK
* checking whether the namespace can be unloaded cleanly ... OK
* checking loading without being on the library search path ... OK
* checking use of S3 registration ... OK
* checking dependencies in R code ... OK
* checking S3 generic/method consistency ... OK
* checking replacement functions ... OK
* checking foreign function calls ... OK
* checking R code for possible problems ... OK
* checking Rd files ... OK
* checking Rd metadata ... OK
* checking Rd line widths ... OK
* checking Rd cross-references ... OK
* checking for missing documentation entries ... OK
* checking for code/documentation mismatches ... OK
* checking Rd \usage sections ... OK
* checking Rd contents ... OK
* checking for unstated dependencies in examples ... OK
* checking contents of ‘data’ directory ... OK
* checking data for non-ASCII characters ... OK
* checking data for ASCII and uncompressed saves ... OK
* checking examples ... ERROR
Running examples in ‘reproducer-Ex.R’ failed
The error most likely occurred in:

> base::assign(".ptime", proc.time(), pos = "CheckExEnv")
> ### Name: boxplotAndDensityCurveOnHistogram
> ### Title: boxplotAndDensityCurveOnHistogram
> ### Aliases: boxplotAndDensityCurveOnHistogram
>
> ### ** Examples
>
> library(ggplot2)
Error in library(ggplot2) : there is no package called ‘ggplot2’
Execution halted
* checking for unstated dependencies in ‘tests’ ... OK
* checking tests ...
  Running ‘testthat.R’
 ERROR
Running the tests in ‘tests/testthat.R’ failed.
Complete output:
  > library(testthat)
  Error: package or namespace load failed for 'testthat' in
loadNamespace(j <- i[[1L]], c(lib.loc, .libPaths()), versionCheck =
vI[[j]]):
   there is no package called 'R6'
  Execution halted
* checking PDF version of manual ... OK
* DONE

Status: 2 ERRORs, 1 WARNING
See
  ‘/tmp/pj/reproducer.Rcheck/00check.log’
for details.

Contents of 00check.log are same:

The error most likely occurred in:
> base::assign(".ptime", proc.time(), pos = "CheckExEnv")
> ### Name: boxplotAndDensityCurveOnHistogram
> ### Title: boxplotAndDensityCurveOnHistogram
> ### Aliases: boxplotAndDensityCurveOnHistogram
>
> ### ** Examples
>
> library(ggplot2)
Error in library(ggplot2) : there is no package called

Re: [R] Interesting quirk with fractions and rounding

2017-04-22 Thread Paul Johnson

On Apr 21, 2017 12:01 PM, "JRG"  wrote:

A good part of the problem in the specific case you initially presented
is that some non-integer numbers have an exact representation in the
binary floating point arithmetic being used.  Basically, if the
fractional part is of the form 1/2^k for some integer k > 0, there is an
exact representation in the binary floating point scheme.

> options(digits=20)

> (100*23)/40
[1] 57.5

> 100*(23/40)
[1] 57.492895

So the two operations give a slightly different result because the
fractional part of the division of 100*23 by 40 is 0.5.  So the first
operations gives, exactly, 57.5 while the second operation does not
because 23/40 has no exact representation.

Thanks for answering.

This case seemed fun because it was not a contrived example.  We found this
one by comparing masses of report tables from 2 separate programs. It
happened 1 time in about 10,000 calculations.

Guidelines for R coders, though, would be welcome. So far, all I am sure of
is

1 Don't use == for floating point numbers.

Your 1/2^k point helps me understand why == does seem to work correctly
sometimes.

I wonder if we should be suspicious of >=. Imagine the horror if a= w/x >
b=y/z in fractions, but digitally a < b. Blech. Can that happen?

But, change the example's divisor from 40 to 30 [the fractional part
from 1/2 to 2/3]:

> (100*23)/30
[1] 76.671404

> 100*(23/30)
[1] 76.671404

Now the two operations give the same answer to the full precision
available.  So, it isn't "generally true true in R that (100*x)/y is
more accurate than 100*(x/y), if x > y."

The good news here is that round() gives same answer in both cases:)

I am looking for a case where the first method is less accurate than the
second. I expect that multiplying integers before dividing is never less
accurate. Sometimes it is more accurate.
`
Following your 1/2^k insight, you see why multiplying first is helpful in
some cases. Question is will situation get worse.

But Bert is right. I have to read more books.

I studied Golub and van Loan and came away with healthy fear of matrix
inversion. But when you look at user contributed regression packages, what
do you find? Matrix inversion and lots of X'X.

Paul Johnson
University of Kansask

The key (in your example) is a property of the way that floating point
arithmetic is implemented.

---JRG

On 04/21/2017 08:19 AM, Paul Johnson wrote:
> We all agree it is a problem with digital computing, not unique to R. I
> don't think that is the right place to stop.
>
> What to do? The round example arose in a real funded project where 2 R
> programs differed in results and cause was  that one person got 57 and
> another got 58. The explanation was found, but its less clear how to
> prevent similar in future. Guidelines, anyone?
>
> So far, these are my guidelines.
>
> 1. Insert L on numbers to signal that you really mean INTEGER. In R,
> forgetting the L in a single number will usually promote whole calculation
> to floats.
> 2. S3 variables are called 'numeric' if they are integer or double
storage.
> So avoid "is.numeric" and prefer "is.double".
> 3. == is a total fail on floats
> 4. Run print with digits=20 so we can see the less rounded number. Perhaps
> start sessions with "options(digits=20)"
> 5. all.equal does what it promises, but one must be cautious.
>
> Are there math habits we should follow?
>
> For example, Is it generally true in R that (100*x)/y is more accurate
than
> 100*(x/y), if x > y?   (If that is generally true, couldn't the R
> interpreter do it for the user?)
>
> I've seen this problem before. In later editions of the game theory
program
> Gambit, extraordinary effort was taken to keep values symbolically as
> integers as long as possible. Avoid division until the last steps. Same in
> Swarm simulations. Gary Polhill wrote an essay about the Ghost in the
> Machine along those lines, showing accidents from trusting floats.
>
> I wonder now if all uses of > or < with numeric variables are suspect.
>
> Oh well. If everybody posts their advice, I will write a summary.
>
> Paul Johnson
> University of Kansas
>
> On Apr 21, 2017 12:02 AM, "PIKAL Petr"  wrote:
>
>> Hi
>>
>> The problem is that people using Excel or probably other such
spreadsheets
>> do not encounter this behaviour as Excel silently rounds all your
>> calculations and makes approximate comparison without telling it does so.
>> Therefore most people usually do not have any knowledge of floating point
>> numbers representation.
>>
>>  Cheers
>> Petr
>>

__
R-help@r-project.org mailing list --

Re: [R] Interesting quirk with fractions and rounding

2017-04-21 Thread Paul Johnson

We all agree it is a problem with digital computing, not unique to R. I
don't think that is the right place to stop.

What to do? The round example arose in a real funded project where 2 R
programs differed in results and cause was  that one person got 57 and
another got 58. The explanation was found, but its less clear how to
prevent similar in future. Guidelines, anyone?

So far, these are my guidelines.

1. Insert L on numbers to signal that you really mean INTEGER. In R,
forgetting the L in a single number will usually promote whole calculation
to floats.
2. S3 variables are called 'numeric' if they are integer or double storage.
So avoid "is.numeric" and prefer "is.double".
3. == is a total fail on floats
4. Run print with digits=20 so we can see the less rounded number. Perhaps
start sessions with "options(digits=20)"
5. all.equal does what it promises, but one must be cautious.

Are there math habits we should follow?

For example, Is it generally true in R that (100*x)/y is more accurate than
100*(x/y), if x > y?   (If that is generally true, couldn't the R
interpreter do it for the user?)

I've seen this problem before. In later editions of the game theory program
Gambit, extraordinary effort was taken to keep values symbolically as
integers as long as possible. Avoid division until the last steps. Same in
Swarm simulations. Gary Polhill wrote an essay about the Ghost in the
Machine along those lines, showing accidents from trusting floats.

I wonder now if all uses of > or < with numeric variables are suspect.

Oh well. If everybody posts their advice, I will write a summary.

Paul Johnson
University of Kansas

On Apr 21, 2017 12:02 AM, "PIKAL Petr"  wrote:

> Hi
>
> The problem is that people using Excel or probably other such spreadsheets
> do not encounter this behaviour as Excel silently rounds all your
> calculations and makes approximate comparison without telling it does so.
> Therefore most people usually do not have any knowledge of floating point
> numbers representation.
>
>  Cheers
> Petr
>
> -Original Message-
> From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Paul
> Johnson
> Sent: Thursday, April 20, 2017 11:56 PM
> To: R-help 
> Subject: [R] Interesting quirk with fractions and rounding
>
> Hello, R friends
>
> My student unearthed this quirk that might interest you.
>
> I wondered if this might be a bug in the R interpreter. If not a bug, it
> certainly stands as a good example of the dangers of floating point numbers
> in computing.
>
> What do you think?
>
> > 100*(23/40)
> [1] 57.5
> > (100*23)/40
> [1] 57.5
> > round(100*(23/40))
> [1] 57
> > round((100*23)/40)
> [1] 58
>
> The result in the 2 rounds should be the same, I think.  Clearly some
> digital number devil is at work. I *guess* that when you put in whole
> numbers and group them like this (100*23), the interpreter does integer
> math, but if you group (23/40), you force a fractional division and a
> floating point number. The results from the first 2 calculations are not
> actually 57.5, they just appear that way.
>
> Before you close the books, look at this:
>
> > aa <- 100*(23/40)
> > bb <- (100*23)/40
> > all.equal(aa,bb)
> [1] TRUE
> > round(aa)
> [1] 57
> > round(bb)
> [1] 58
>
> I'm putting this one in my collection of "difficult to understand"
> numerical calculations.
>
> If you have seen this before, I'm sorry to waste your time.
>
> pj
> --
> Paul E. Johnson   http://pj.freefaculty.org
> Director, Center for Research Methods and Data Analysis
> http://crmda.ku.edu
>
> To write to me directly, please address me at pauljohn at ku.edu.
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
> 
> Tento e-mail a jakékoliv k němu připojené dokumenty jsou důvěrné a jsou
> určeny pouze jeho adresátům.
> Jestliže jste obdržel(a) tento e-mail omylem, informujte laskavě
> neprodleně jeho odesílatele. Obsah tohoto emailu i s přílohami a jeho kopie
> vymažte ze svého systému.
> Nejste-li zamýšleným adresátem tohoto emailu, nejste oprávněni tento email
> jakkoliv užívat, rozšiřovat, kopírovat či zveřejňovat.
> Odesílatel e-mailu neodpovídá za eventuální škodu způsobenou modifikacemi
> či zpožděním přenosu e-mailu.
>
> V případě, že je tento e-mail součástí obchodního jednání:
> - vyhrazuje si odesílatel právo ukončit kdykoli

Re: [R] Peformance question

2017-04-21 Thread Paul Johnson

I dont understand your code. But I do have suggestion. Run the functions in
the profiler, maybe differences will point at the enemy.

Know what I mean?

Rprof('check.out')
#run code
Rprof(NULL)
summaryRprof('check.out')

Do that for each method. That may be uninformative.

I wondered if you tried to compile your functions? In some cases it helps
erase differences like this. Norman Matloff has examples like that in Art
of R Programming.

I keep a list of things that are slow, if we can put finger on problem, I
will add to list. I suspect slow here is in runtime object lookup. The
environment ones have info located more quickly by the runtime, I expect.
Also, passing info back and forth from the R runtime system using [ is a
common cause of slow. It is why everybody yells 'vectorize' and 'use
lapply' all the time.  Then again, I'm guessing because I dont understand
your code:)

Good luck,
PJ


On Apr 11, 2017 7:44 PM, "Thomas Mailund"  wrote:

Hi y’all,

I’m working on a book on how to implement functional data structures in R,
and in particular on a chapter on implementing queues. You get get the
current version here https://www.dropbox.com/s/9c2yk3a67p1ypmr/book.pdf?dl=0
and the relevant pages are 50-59. I’ve implemented three versions of the
same idea, implementing a queue using two linked lists. One list contains
the elements you add to the end of a list, the other contains the elements
at the front of the list, and when you try to get an element from a list
and the front-list is empty you move elements from the back-list to the
front. The asymptotic analysis is explained in this figure
https://www.dropbox.com/s/tzi84zmyq16hdx0/queue-
amortized-linear-bound.png?dl=0 and all my implementations do get a linear
time complexity when I evaluate them on a linear number of operations.
However, the two implementations that uses environments seem to be almost
twice as fast as the implementation that gives me a persistent data
structure (see https://www.dropbox.com/s/i9dyab9ordkm0xj/queue-
comparisons.png?dl=0), and I cannot figure out why.

The code below contains the implementation of all three versions of the
queue plus the code I use to measure their performances. I’m sorry it is a
little long, but it is a minimal implementation of all three variants, the
comments just make it look longer than it really is.

Since the three implementations are doing basically the same things, I am a
little stumped about why the performance is so consistently different.

Can anyone shed some light on this, or help me figure out how to explore
this further?

Cheers

Thomas



## Implementations of queues ##

#' Test if a data structure is empty
#' @param x The data structure
#' @return TRUE if x is empty.
#' @export
is_empty <- function(x) UseMethod("is_empty")

#' Add an element to a queue
#' @param x A queue
#' @param elm An element
#' @return an updated queue where the element has been added
#' @export
enqueue <- function(x, elm) UseMethod("enqueue")

#' Get the front element of a queue
#' @param x A queue
#' @return the front element of the queue
#' @export
front <- function(x) UseMethod("front")

#' Remove the front element of a queue
#' @param x The queue
#' @return The updated queue
#' @export
dequeue <- function(x) UseMethod("dequeue")

## Linked lists #

#' Add a head item to a linked list.
#' @param elem  The item to put at the head of the list.
#' @param lst   The list -- it will become the tail of the new list.
#' @return a new linked list.
#' @export
list_cons <- function(elem, lst)
  structure(list(head = elem, tail = lst), class = "linked_list")

list_nil <- list_cons(NA, NULL)

#' @method is_empty linked_list
#' @export
is_empty.linked_list <- function(x) identical(x, list_nil)

#' Create an empty linked list.
#' @return an empty linked list.
#' @export
empty_list <- function() list_nil


#' Get the item at the head of a linked list.
#' @param lst The list
#' @return The element at the head of the list.
#' @export
list_head <- function(lst) lst$head

#' Get the tail of a linked list.
#' @param lst The list
#' @return The tail of the list
#' @export
list_tail <- function(lst) lst$tail

#' Reverse a list
#' @param lst A list
#' @return the reverse of lst
#' @export
list_reverse <- function(lst) {
  acc <- empty_list()
  while (!is_empty(lst)) {
acc <- list_cons(list_head(lst), acc)
lst <- list_tail(lst)
  }
  acc
}


## Environment queues #

queue_environment <- function(front, back) {
  e <- new.env(parent = emptyenv())
  e$front <- front
  e$back <- back
  class(e) <- c("env_queue", "environment")
  e
}

#' Construct an empty closure based queue
#' @return an empty queue
#' @export
empty_env_queue <- function()
  queue_environment(empty_list(), empty_list())

#' @method is_empty env_queue
#' @export
is_empty.env_queue <- function(x)
  is_empty(x$front) && is_empty(x$back)

#' @method enqueue env_queue
#' @export
enqueu

Re: [R] Prediction plots

2017-04-21 Thread Paul Johnson

I have done this a lot. Would you mind installing my pkg rockchalk and then
run example(plotSlope) and example(plotCurve)? If the output is close to
what you want, you can adjust my code. The vignette explains.

1. Create newdata object
2. Run that through predict
3. Make plot

None of this is rocket science, but it does help students here.

PJ

On Apr 18, 2017 1:17 PM, "Santiago Bueno"  wrote:

> Thanks Boris, the following is an extract of my data. I have developed
> biomass models using codes like:
>
> start <- coef (lm(log(Btot)~I(log(dbh**2*haut)),data=dat[dat$Btot>0,]))
>
> start[1] <- exp(start[1])
>
> names(start) <- c("a","b")
>
> M1 <- nls(Btot~a*(dbh**2*haut)**b,data=dat,start=start,weights=
> 1/dat$dbh**4)
>
>
> start <- coef(lm(log(Btot)~I(log(dbh))+I(log(haut)),data=dat[dat$
> Btot>0,]))
>
> start[1] <- exp(start[1])
>
> names(start) <- c("a","b1","b2")
>
> M2 <- nls(Btot~a*dbh**b1*haut**b2,data=dat,start=start,weights=
> 1/dat$dbh**4)
>
>
> Tree No dbh haut Btot
> 1 35.00 18.90 0.535
> 2 25.00 16.60 0.248
> 3 23.00 19.50 0.228
> 4 13.50 15.60 0.080
> 5 20.00 18.80 0.172
> 6 23.00 17.40 0.190
> 7 29.00 19.90 0.559
> 8 17.60 18.20 0.117
> 9 31.00 25.30 0.645
> 10 26.00 23.50 0.394
> 11 13.00 13.00 0.038
> 12 32.00 20.70 0.443
> It is my interest to get prediction plots for the models. I have tried to
> use the following codes with no success: Let m be one of the fitted models
> with dbh as the only entry. To construct a plot of the predictions made by
> this model I have tried:
> with(dat,plot(dbh,Btot,xlab="Dbh(cm)",ylab="Biomass (t)"))
> D <- seq(par("usr")[1],par("usr")[2],length=200)
> lines(D,predict(m,newdata=data.frame(dbh=D)),col="red")
> For a model m that has dbh and height as entries, I have tried to get its
> predictions as follows:
> D <- seq(0,180,length=20)
> H <- seq(0,61,length=20)
> B <- matrix(predict(m,newdata=expand.grid(dbh=D,height=H)),length(D))
>
> Can someone provide help please!!!
>
>
> Best regards,
>
> Santiago Bueno
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Interesting quirk with fractions and rounding

2017-04-20 Thread Paul Johnson

Hello, R friends

My student unearthed this quirk that might interest you.

I wondered if this might be a bug in the R interpreter. If not a bug,
it certainly stands as a good example of the dangers of floating point
numbers in computing.

What do you think?

> 100*(23/40)
[1] 57.5
> (100*23)/40
[1] 57.5
> round(100*(23/40))
[1] 57
> round((100*23)/40)
[1] 58

The result in the 2 rounds should be the same, I think.  Clearly some
digital number devil is at work. I *guess* that when you put in whole
numbers and group them like this (100*23), the interpreter does
integer math, but if you group (23/40), you force a fractional
division and a floating point number. The results from the first 2
calculations are not actually 57.5, they just appear that way.

Before you close the books, look at this:

> aa <- 100*(23/40)
> bb <- (100*23)/40
> all.equal(aa,bb)
[1] TRUE
> round(aa)
[1] 57
> round(bb)
[1] 58

I'm putting this one in my collection of "difficult to understand"
numerical calculations.

If you have seen this before, I'm sorry to waste your time.

pj
-- 
Paul E. Johnson   http://pj.freefaculty.org
Director, Center for Research Methods and Data Analysis http://crmda.ku.edu

To write to me directly, please address me at pauljohn at ku.edu.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] xvfb? cron job updates R packages, fails on some requiring X11

2017-01-19 Thread Paul Johnson

In Centos 7 systems, I wrote a script that runs on the cron and I
notice some package updates and installs fail like this:


Error : .onLoad failed in loadNamespace() for 'iplots', details:
  call: .jnew("org/rosuda/iplots/Framework")
  error: java.awt.HeadlessException:
No X11 DISPLAY variable was set, but this program performed an
operation which requires it.
Error: loading failed
Execution halted
ERROR: loading failed
* removing ‘/usr/share/R/library/iplots’
* restoring previous ‘/usr/share/R/library/iplots’


I can log in interactively and run  the same install successfully.

I understand I need something like xvfb to simulate an X11 session,
but I don't understand how to make it work.  Can one of you give me an
idiot's guide on what to do?

pj


-- 
Paul E. Johnson   http://pj.freefaculty.org
Director, Center for Research Methods and Data Analysis http://crmda.ku.edu

To write to me directly, please address me at pauljohn at ku.edu.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Error installing packages

2016-10-20 Thread Paul Johnson

On Wed, Oct 19, 2016 at 10:31 AM, David Winsemius
 wrote:
>
>> On Oct 19, 2016, at 4:54 AM, Kevin E. Thorpe  
>> wrote:
>>
>> Hello.
>>
>> I am posting this on behalf of one of my students who is getting error 
>> messages when installing some packages. I have not seen this before nor have 
>> I been able to replicate it. I'm including the relevant (I think) 
>> information. I get the students to install rms with dependencies. As you can 
>> see, rms does get installed but when the attempt is made to attach it, 
>> ggplot2 cannot be loaded. Thus I tried explicitly installing ggplot2 and you 
>> can see then ensuing errors below. I have included the sessionInfo() at the 
>> end.
>>
>> I hope someone can point me at a solution.
>>

I have just seen this on Win10 for the first time on a student's new
laptop.  Anti-virus software is possible reason for this, but I don't
know.

I did find a very direct solution.

WHen you run the package install, and it says "cannot move from
temporary C:\Users\yourname\AppData\XYZ to user directory
C:\Users\yourname\Documents\R\site-library", just move the folder
manually.  It works every time.

Unfortunately, Windows hides AppData.  In the Windows explorer view
options, tell it not to hide protected files.  Then in explorer
navigate into the user's AppData\Temp\Rxxx folder, you'll see the
downloaded zip files there for Rcpp and such.  Doubleclick the zip
file, right click copy the directory name "Rcpp" and paste it into the
user's R folder, under C:/Users/yourname/Documents/R/3.3/site-library
(or whatever that's called).

We ran into about 2 packages that have this failure to copy from
temporary space, but this old-fashioned copy a folder over method
worked fine. After this, R was able to load Rcpp, RcppEigen.

There is no indication that those R files are in a virus quarantine,
so I can't say for sure the security software is the cause.  At first,
I thought the problem was the usual one that we have been aware of for
some time--Windows thinks those files are in use and will not replace
them.

 We are trying to remove a layer of virus protection programs
installed by Dell to see if this happens later. If I hear back from
that student, I'll let you know. Another layer in this story is that
her 30 day free trials were expired, and I have feeling this means
that not only is McAfee still installed, but it is also running but
refusing to let you interact with it.

pj

-- 
Paul E. Johnson   http://pj.freefaculty.org
Director, Center for Research Methods and Data Analysis http://crmda.ku.edu

I only use this account for email list memberships. To write directly,
address me at pauljohn at ku.edu.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Ask if an object will respond to a function or method

2016-03-31 Thread Paul Johnson

In the rockchalk package, I want to provide functions for regression
objects that are "well behaved." If an object responds to the methods
that lm or glm objects can handle, like coef(), nobs(), and summary(),
I want to be able to handle the same thing.

It is more difficult than expected to ask a given fitted model object
"do you respond to these functions: coef(), nobs(), summary()." How
would you do it?

I tried this with the methods() function but learned that all methods
that a class can perform are not listed.  I'll demonstrate with a
regression "zz" that is created by the example in the plm package.
The coef() function succeeds on the zz object, but coef is not listed
in the list of methods that the function can carry out.

> library(plm)
> example(plm)

> class(zz)
[1] "plm""panelmodel"
> methods(class = "plm")
 [1] ercomp  fixef   has.intercept   model.matrix
 [5] pFtest  plmtest plotpmodel.response
 [9] pooltestpredict residuals   summary
[13] vcovBK  vcovDC  vcovG   vcovHC
[17] vcovNW  vcovSCC
see '?methods' for accessing help and source code
> methods(class = "panelmodel")
 [1] deviance  df.residual   fittedhas.intercept index
 [6] nobs  pbgtest   pbsytest  pcdtest   pdim
[11] pdwtest   phtestprint pwartest  pwfdtest
[16] pwtestresiduals terms updatevcov
see '?methods' for accessing help and source code
> coef(zz)
   log(pcap)  log(pc) log(emp)unemp
-0.026149654  0.292006925  0.768159473 -0.005297741

I don't understand why coef(zz) succeeds but coef is not listed as a method.

Right now, I'm contemplating this:

zz1 < - try(coef(zz))
if (inherits(zz1, "try-error")) stop("Your model has no coef method")

This seems like a bad workaround because I have to actually run the
function in order to find out if the function exists. That might be
time consuming for some summary() methods.

pj

-- 
Paul E. Johnson
Professor, Political ScienceDirector
1541 Lilac Lane, Room 504  Center for Research Methods
University of Kansas University of Kansas
http://pj.freefaculty.org  http://crmda.ku.edu

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] premature evaluation of symbols. Is that the way to describe this problem?

2014-04-10 Thread Paul Johnson

Dear eveRybody

In the package rockchalk, I have functions that take regressions and
make tables and plots.  Sometimes I'll combine a model with various
arguments and pass the resulting list around to be executed, usually
using do.call.  While debugging a function on a particularly large
dataset, I noticed that I've caused inconvenience for myself. I get
the correct calculations, but the arguments into functions are not
symbols when the function call happens. They are fully "written out".

So the debugger does not see function(x), it sees
function(c(1,3,1,3,45,2,4,2).  Debugging begins with a
comprehensive listing of the elements in every object, which freezes
Emacs/ESS and generally ruins my quality of life.

I don't know the right words for this, but it seems to me object names
are parsed and evaluated before they need to be. Function calls
include the evaluated arguments in them, not just the symbols for
them.  I have some guesses on fixes, wonder what you think. And I
wonder if fixing this problem might generally make my functions faster
and more efficient because I'm not passing gigantic collections of
numbers through the function call.  I know, I don't have all the right
words here.

I've built a toy example that will illustrate the problem, you tell me
the words for it.

## Paul Johnson 2014-04-10
dat <- data.frame(x = rnorm(50),y = rnorm(50))
m1 <- lm(y ~ x, dat)

myRegFit <- function(model, nd) predict(model, nd)

mySpecialFeature <- function(model, ci){
pargs <- list(model, model.frame(model)[1:3, ])
res <- do.call(myRegFit, pargs)
print(res)
}

mySpecialFeature (m1)

debug(myRegFit)

mySpecialFeature (m1)

Note when the debugger enters, it has the whole model's structure
"splatted" into the function call.

> dat <- data.frame(x = rnorm(50),y = rnorm(50))
> m1 <- lm(y ~ x, dat)
> myRegFit <-function(model, nd) predict(model, nd)
> mySpecialFeature <- function(model, ci){
+ pargs <- list(model, model.frame(model)[1:3, ])
+ res <- do.call(myRegFit, pargs)
+ print(res)
+ }
> mySpecialFeature (m1)
  1   2   3
-0.04755431  0.35162844 -0.11715522
> debug(myRegFit)
> mySpecialFeature (m1)
debugging in: (function (model, nd)
predict(model, nd))(list(coefficients = c(0.0636305741709566,
-0.177786836929453), residuals = c(-0.0152803151982162, -0.885875162659858,
-1.23645319405006, -1.77900639571943, -1.9952045397527, 1.38150266407176,
-2.27403449262599, 0.0367524776530579, -0.881037818467492, -1.10816713568432,
-0.55749829201921, -0.372526253742828, -0.353208893775679, 0.531708523456415,
-0.43187124865558, 1.03973431972897, 0.849170115617157, 1.11227803262189,
0.47216440383252, 0.920060697785203, -0.374672861268964, 2.94683565121636,
0.514112041811711, -0.52321362055969, -0.0412387814196237, 0.983863448669766,
0.534230127442599, -0.869960511196742, 1.90586406082412, -1.84705932449576,
0.806425475391075, 1.90939977897903, 0.41030042787483, 0.994503041407507,
0.715719209301158, -0.538096591457249, -0.482411304681239, 0.0323998214753804,
0.551162374882342, -0.618989357027834, 1.08996565055366, -0.697423620816604,
1.38170655971013, 1.55752893685726, -0.0929258405664267, -1.00210610433922,
-1.51879925258188, -1.57050250989563, -1.06868502360026, 0.458860605094578
), effects = c(-0.661274203468574, -1.07255577360914, -1.36123824096605,
-1.71875465303308, -1.8806486154155, 1.38636416232103, -2.29259163100096,
0.153263315278269, -0.950879523052079, -0.963705724647863, -0.62175976245114,
-0.423965256680951, -0.350662885659068, 0.469175412149025, -0.400505083448679,
1.03440116252973, 0.878739280288923, 1.16574672001397, 0.358222935858666,
1.00514946836967, -0.316592303881481, 2.8507611924072, 0.573209391002668,
-0.393720180068215, 0.0971873363200073, 1.23818281311352, 0.449576129222722,
-0.929511151618747, 1.97922180250824, -1.80820009744905, 0.877996855335966,
1.86871623414376, 0.226023354471842, 0.814892815951223, 0.821400980265047,
-0.536299037896556, -0.358703204255386, 0.105714598012197, 0.543301738010905,
-0.643659132172249, 1.26412624281219, -0.808498804261978, 1.32273956796476,
1.57655585529458, 0.022266343917185, -1.20958321975888, -1.52288584310647,
-1.60904879400386, -1.08384090772898, 0.567729611277337), rank = 2L,
fitted.values = c(-0.0475543078742839, 0.351628437239565,
-0.11715522155, 0.165041007767092, 0.247859323684996,
0.0805663573239046, 0.0448510221995737, 0.250840726246576,
-0.0333621333428176, 0.293467635417987, -0.0248518201238109,
-0.00529650906918994, 0.0770350456001708, -0.0222159311803766,
0.120988140963125, 0.0650186744394092, 0.118247568257686,
0.154696293986828, -0.100617877288265, 0.202919505525394,
0.161729772710581, -0.0733692278375946, 0.163280463324874,
0.270640256833884, 0.284263319806951, 0.461009994308065,
-0.0559520919143943, -0.0176674192887905, 0.18502872

[R] [R-pkgs] update: rockchalk 1.8.0

2013-07-30 Thread Paul Johnson

This will appear on CRAN mirrors soon. It's my update for Spring, 2013. I keep
track of R problems that arise in the regression course and try to facilitate
them. There are functions for describing data, presenting regression plots and
tables, some regression diagnostics.

Most of the usages are illustrated in the vignette "rockchalk".  That has plenty
of illustrations if you care to take a quick look.

If you did not try this before, here is why you might want to. The function that
the package was organized around originally was plotSlopes: draw predicted value
lines on the same scatterplot. Now this is generalized a bit, the moderator
variable can be numeric or factor and I have bent over backwards to make this
flexible for the end users. If you run the examples for "predictOMatic" and
"plotSlopes" and "plotCurves," you will get the idea.

The rest is details.

I started a NEWS file using Emacs org mode, here it is:

* version 1.8

This is the end of the Spring semester, so its time for the new rockchalk
release.

** New, general, flexible framework for calculating marginal effects
   in regression models, linear or otherwise.

*** newdata function works. It can scan a regression, isolate the
   predictors, and then make a "mix and match" new data object for use
   with a predict function.  This is convenient for users but also very
   flexible.

*** The newdata framework is built on top of "divider" methods that can
check whether a variable is numeric or categorical, and select
example values according to user-specified criteria.

*** predictOMatic works dependably! Please try
example(predictOMatic). The problem with single predictor models
that bugged users of rockchalk 1.6.2 has been solved.

*** predictOMatic argument interval = c("none", "confidence",
"prediction").  Guess what that is supposed to do? For glm,
which does not provide a confidence interval, I've written code
for an approximate Wald type CI, and hope to do better in future.

** Regression diagnostics.

*** getPartialCor: get partial correlations from a fitted model
(student convenience).

*** getDeltaRsquare: Returns the change in estimated R-square observed
when each predictor is removed.  This is the squared semi-partial
correlation coefficients (student convenience).

*** mcDiangose for multicollinearity diagnostics (student convenience)

** MeanCenter: add arguments to make selection of variables for
centering more convenient when users don't like the automatic
options centerOnlyInteractors.

** plotSlopes, plotCurves:
 *** Added intervals argument, for confidence and prediction intervals.

*** Added opacity argument to determine darkness of interval regions
(which use the transparency "alpha layer.").

*** A lot of fiddling under the hood to make colors consistent when
levels of modx are altered to compare plots of a given model.

*** Can produce a simple regression prediction plot if modx argument
is omitted. This is a widely requested feature.

Please run example(plotSlopes) and example(plotCurves)

*** Changes under the hood. The common plotting functions of
plotSlopes and plotCurves are abstracted into a function
plotFancy, so now this will be eaiser for me to maintain. The
plotting ritual is the same, why keep 2 functions, you ask?
plotCurves tolerates more complicated regression
formula. plotSlopes leads to testSlopes, and hence to
plot.testSlopes.

** addLines: communication between 2 dimensional regression plots and 3
dimensional plots from plotPlane. Run example(addLines).

** plot.testSlopes. Run testSlopes on an interactive model. For a
model with 2 ocontinuous predictors that interact, this will generate
an ABSOLUTELY EXCELLENT and highly informative plot displaying the
effect of the interaction.

** outreg: LaTeX tables from regression models.

*** Reworked with new arguments to make tables for more types
of regressions. There's quite a bit more room for users to customize
the type of diagnostics they want to report.

The wide variety of output types from regression models is very
bothersome. I refuse to write a separate outreg method for each
different regression packages. If you want to use a package
from an author who is willing to do that, consider the "texreg"
package.

*** outreg2HTML. converts outreg to HTML markup and into a file.
Intended for importation into word processor programs.

** New vignette Rstyle. Most of the source-code files have been reformatted
to comply with my advice.

** genCorrelatedData2

*** genCorrelatedData2. For regression examples, suppose you want to
have 6 columns of an MVN with a certain mean and covariance
structure. And you want the regression formula to have
interactions and squared-terms. No more hassle. This is a
framework that works. Users set the mean, standard deviations, and
correlation values in various ways. Run
example(genCorrelatedData2).

*** To support that, there are more

Re: [R] 2SLS / TSLS / SEM non-linear

2013-06-29 Thread Paul Johnson

Please consider using the R package systemfit. It has existed for about 10
years, I've used it many times happily :)

I've not used gmm package, I'm not criticizing it. But I have good results
from systemfit. It has good documentation.

"This package contains functions for fitting
 simultaneous systems of linear and nonlinear
 equations using Ordinary Least Squares (OLS),
 Weighted Least Squares (WLS), Seemingly Unrelated
 Regressions (SUR), Two-Stage Least Squares (2SLS),
 Weighted Two-Stage Least Squares (W2SLS), and
 Three-Stage Least Squares (3SLS)."

If that doesn't work for you, please show us your example code, along with
the error messages.





On Sat, Jun 29, 2013 at 11:18 AM, John Fox  wrote:

> Dear Hans-Christian
>
> > -Original Message-
> > From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
> > project.org] On Behalf Of hck
> > Sent: Saturday, June 29, 2013 11:19 AM
> > To: r-help@r-project.org
> > Subject: Re: [R] 2SLS / TSLS / SEM non-linear
> >
> > The challenge is to firstly calculate the reduced form. As far as I
> > know, the
> > SEM package does not do this automatically. Am I correct?
> >
>
> Right. If you look at the code in sem:::tsls.default, you'll see that it
> formulates and solves the 2SLS estimating equations directly (using a
> Cholesky decomposition). Moreover, tsls() is for linear structural
> equations.
>
> Best,
>  John
>
> ---
> John Fox
> Senator McMaster Professor of Social Statistics
> Department of Sociology
> McMaster University
> Hamilton, Ontario, Canada
>
>
> >
> >
> >
> >
> > --
> > View this message in context: http://r.789695.n4.nabble.com/2SLS-TSLS-
> > SEM-non-linear-tp4670123p4670595.html
> > Sent from the R help mailing list archive at Nabble.com.
> >
> > __
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-
> > guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Paul E. Johnson
Professor, Political Science  Assoc. Director
1541 Lilac Lane, Room 504  Center for Research Methods
University of Kansas University of Kansas
http://pj.freefaculty.org   http://quant.ku.edu

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Windows R_LIBS_USER confusion under R-3.0.1

2013-06-12 Thread Paul Johnson

I would appreciate ideas about MS Windows install issues. I'm at our stats
summer camp and have been looking at a lot of Windows R installs and there
are some wrinkles about R_LIBS_USER.

On a clean Win7 or Win8 system, with R-3.0.1, we see the user library for
packages defaulting to  $HOME/R/win-library.

I think that's awesome, the way it should be. Yea! But it does not appear
that way on all systems, and I think it is because of lingering
after-effects of previous R installs.

In previous versions of R, R_LIBS_USER defaulted to
$HOME/AppData/Roaming/... That was not so great because it was (default)
hidden to the students and they were disoriented about how something that
does not exist (or show) could hold packages. Aside from teaching them how
to configure the file manager, we could navigate that.

The problem is that with R-3.0.1, sometimes we are seeing the user package
installs going into $HOME/AppData/Roaming/

In the user account, there is no $HOME/.Rprofile or $HOME/.Renviron.

I hate to tell non-expert users that they ought to go fishing in the
Windows registry, but I'm starting to suspect that is what they ought to
do.  What do you think?

PJ

-- 
Paul E. Johnson
Professor, Political Science  Assoc. Director
1541 Lilac Lane, Room 504  Center for Research Methods
University of Kansas University of Kansas
http://pj.freefaculty.org   http://quant.ku.edu

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] recode: how to avoid nested ifelse

2013-06-10 Thread Paul Johnson

Thanks, guys.


On Sat, Jun 8, 2013 at 2:17 PM, Neal Fultz  wrote:

> rowSums and Reduce will have the same problems with bad data you alluded
> to earlier, eg
> cg = 1, hs = 0
>
> But that's something to check for with crosstabs anyway.
>
>
This "wrong data" thing is a distraction here.  I guess I'd have to craft 2
solutions, depending on what the researcher says. (We can't assume es = 0
or es = NA and cg = 1 is bad data. There are some people who finish college
without doing elementary school (wasn't Albert Einstein one of those?) or
high school. I once went to an eye doctor who didn't finish high school,
but nonetheless was admitted to optometrist school.)

I did not know about the Reduce function before this. If we enforce the
ordering and clean up the data in the way you imagine, it would work.

I think the pmax is the most teachable and dependably not-getting-wrongable
approach if the data is not wrong.


> Side note: you should check out the microbenchmark pkg, it's quite handy.
>
>
Perhaps the working example of microbenchmark is the best thing in this
thread! I understand the idea behind it, but it seems like I can never get
it to work right. It helps to see how you do it.

>
> R>require(microbenchmark)
> R>microbenchmark(
> +   f1(cg,hs,es),
> +   f2(cg,hs,es),
> +   f3(cg,hs,es),
> +   f4(cg,hs,es)
> + )
> Unit: microseconds
>expr   min lq median uq   max neval
>  f1(cg, hs, es) 23029.848 25279.9660 27024.9640 29996.6810 55444.112   100
>  f2(cg, hs, es)   730.665   755.5750   811.7445   934.3320  6179.798   100
>  f3(cg, hs, es)85.029   101.6785   129.8605   196.2835  2820.187   100
>  f4(cg, hs, es)   762.232   804.4850   843.7170  1079.0800 24869.548   100
>
> On Fri, Jun 07, 2013 at 08:03:26PM -0700, Joshua Wiley wrote:
> > I still argue for na.rm=FALSE, but that is cute, also substantially
> faster
> >
> > f1 <- function(x1, x2, x3) do.call(paste0, list(x1, x2, x3))
> > f2 <- function(x1, x2, x3) pmax(3*x3, 2*x2, es, 0, na.rm=FALSE)
> > f3 <- function(x1, x2, x3) Reduce(`+`, list(x1, x2, x3))
> > f4 <- function(x1, x2, x3) rowSums(cbind(x1, x2, x3))
> >
> > es <- rep(c(0, 0, 1, 0, 1, 0, 1, 1, NA, NA), 1000)
> > hs <- rep(c(0, 0, 1, 0, 1, 0, 1, 0, 1, NA), 1000)
> > cg <- rep(c(0, 0, 0, 0, 1, 0, 1, 0, NA, NA), 1000)
> >
> > system.time(replicate(1000, f1(cg, hs, es)))
> > system.time(replicate(1000, f2(cg, hs, es)))
> > system.time(replicate(1000, f3(cg, hs, es)))
> > system.time(replicate(1000, f4(cg, hs, es)))
> >
> > > system.time(replicate(1000, f1(cg, hs, es)))
> >user  system elapsed
> >   22.730.03   22.76
> > > system.time(replicate(1000, f2(cg, hs, es)))
> >user  system elapsed
> >0.920.040.95
> > > system.time(replicate(1000, f3(cg, hs, es)))
> >user  system elapsed
> >0.190.020.20
> >  > system.time(replicate(1000, f4(cg, hs, es)))
> >user  system elapsed
> >0.950.030.98
> >
> >
> > R version 3.0.0 (2013-04-03)
> > Platform: x86_64-w64-mingw32/x64 (64-bit)
> >
> >
> >
> >
> > On Fri, Jun 7, 2013 at 7:25 PM, Neal Fultz  wrote:
> > > I would do this to get the highest non-missing level:
> > >
> > > x <- pmax(3*cg, 2*hs, es, 0, na.rm=TRUE)
> > >
> > > rock chalk...
> > >
> > > -nfultz
> > >
> > > On Fri, Jun 07, 2013 at 06:24:50PM -0700, Joshua Wiley wrote:
> > >> Hi Paul,
> > >>
> > >> Unless you have truly offended the data generating oracle*, the
> > >> pattern: NA, 1, NA, should be a data entry error --- graduating HS
> > >> implies graduating ES, no?  I would argue fringe cases like that
> > >> should be corrected in the data, not through coding work arounds.
> > >> Then you can just do:
> > >>
> > >> x <- do.call(paste0, list(es, hs, cg))
> > >>
> > >> > table(factor(x, levels = c("000", "100", "110", "111"), labels =
> c("none", "es","hs", "cg")))
> > >> none   es   hs   cg
> > >>4112
> > >>
> > >> Cheers,
> > >>
> > >> Josh
> > >>
> > >> *Drawn from comments by Judea Pearl one lively session.
> > >>
> > >>
> > >> On Fri, Jun 7, 2013 at 6:13 PM, Paul Johnson 
> wrote:
> > >> > In our Summer Stats Institute, I was asked a question that

[R] recode: how to avoid nested ifelse

2013-06-07 Thread Paul Johnson

In our Summer Stats Institute, I was asked a question that amounts to
reversing the effect of the contrasts function (reconstruct an ordinal
predictor from a set of binary columns). The best I could think of was to
link together several ifelse functions, and I don't think I want to do this
if the example became any more complicated.

I'm unable to remember a less error prone method :). But I expect you might.

Here's my working example code

## Paul Johnson 
## 2013-06-07

## We need to create an ordinal factor from these indicators
## completed elementary school
es <- c(0, 0, 1, 0, 1, 0, 1, 1)
## completed high school
hs <- c(0, 0, 1, 0, 1, 0, 1, 0)
## completed college graduate
cg <- c(0, 0, 0, 0, 1, 0, 1, 0)

ed <- ifelse(cg == 1, 3,
 ifelse(hs == 1, 2,
ifelse(es == 1, 1, 0)))

edf <- factor(ed, levels = 0:3,  labels = c("none", "es", "hs", "cg"))
data.frame(es, hs, cg, ed, edf)

## Looks OK, but what if there are missings?
es <- c(0, 0, 1, 0, 1, 0, 1, 1, NA, NA)
hs <- c(0, 0, 1, 0, 1, 0, 1, 0, 1, NA)
cg <- c(0, 0, 0, 0, 1, 0, 1, 0, NA, NA)
ed <- ifelse(cg == 1, 3,
 ifelse(hs == 1, 2,
ifelse(es == 1, 1, 0)))
cbind(es, hs, cg, ed)

## That's bad, ifelse returns NA too frequently.
## Revise (becoming tedious!)

ed <- ifelse(!is.na(cg) & cg == 1, 3,
 ifelse(!is.na(hs) & hs == 1, 2,
ifelse(!is.na(es) & es == 1, 1,
   ifelse(is.na(es), NA, 0
cbind(es, hs, cg, ed)


## Does the project director want us to worry about
## logical inconsistencies, such as es = 0 but cg = 1?
## I hope not.

Thanks in advance, I hope you are having a nice summer.

pj

-- 
Paul E. Johnson
Professor, Political Science  Assoc. Director
1541 Lilac Lane, Room 504  Center for Research Methods
University of Kansas University of Kansas
http://pj.freefaculty.org   http://quant.ku.edu

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] inverse for formula transformations on LHS

2013-05-21 Thread Paul Johnson

On Sat, May 18, 2013 at 7:05 AM, Stephen Milborrow  wrote:

> Paul Johnson  wrote:
>>
>> m1 <- lm(log(y) ~ log(x), data = dat)
>>
>> termplot shows log(y) on the vertical.  What if I want y on the vertical?
>>
>
> plotmo in the plotmo package has an inverse.func argument,
> so something like the following might work for you?
>
> Thanks, I will read plotmo. I did not hear of that one before.

It looks like we have been working on same problem. Take at look at my
package "rockchalk" and run the examples for "plotSlopes" and
"plotCurves."  Exact same ideas you are thinking about.

I don't think it will help in this case because it still relies on the user
to know about  "exp" as the inverse of log. When users do predict on glm,
it "just knows" how to put predictions on the response or link scales, and
I am aiming for something automagical like that.

> library(MASS)
> library(plotmo)
> log.brain <- log(Animals$brain)
> log.body  <- log(Animals$body)
> m2 <- lm(log.brain ~ log.body)
>
> myplot <- function(...)
>plotmo(m2, do.par=F, nrug=-1, col.resp=2, pch=20,se=1, main="",
>  xlab="log(body)", ...)
>
> par(mfrow = c(2, 2))
> myplot(ylab="log(brain)")
> myplot(ylab="brain", inverse.func=exp)
> termplot(m2, se=T, rug=T) #  for comparison
>
>
> --
Paul E. Johnson
Professor, Political Science  Assoc. Director
1541 Lilac Lane, Room 504  Center for Research Methods
University of Kansas University of Kansas
http://pj.freefaculty.org   http://quant.ku.edu

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] inverse for formula transformations on LHS

2013-05-17 Thread Paul Johnson

This is an R formula handling question. It arose in class. We were working
on the Animals data in the MASS package. In order to see a relationship,
you need to log brain and body weight.  It's a fun one for teaching
regression, if you did not try it yet.  There are outliers too!

Students wanted to make a predicted value plot in the non-logged values of
y, for comparison, and I wondered if I couldn't automate this somehow for
them.

It made me wonder how R manages formulae and if a transformation like
log(y) can be be mechanically inverted.

So we have something concrete to talk about, suppose x and y are variables
in dat, a person fits

m1 <- lm(log(y) ~ log(x), data = dat)

termplot shows log(y) on the vertical.  What if I want y on the vertical?
Similarly, predict gives values on the log(y) scale, there's no argument
like type = "untransformed".

I want my solution to be a bit general, so that it would give back
predicted y for formulae like

sqrt(y)

or

exp(y)

or

log(y + d)

or whatever other math people might throw in there.

Here's what I can tell so far about R's insides.  The formula handler makes
a list out of the formula, I can get that from the terms object that the
model generates. The formula list has "~" as element 1, and "log(x)"
becomes element [[2]].

Where in the R source code can I see how R "looks at" the symbol log(y) and
discerns that there is a variable y that needs to be logged? If I could
understand that, and if R has a table of inverse functions, then maybe I
could see what to do.

If you have ideas, I'm very grateful if you share them.

pj
-- 
Paul E. Johnson
Professor, Political Science  Assoc. Director
1541 Lilac Lane, Room 504  Center for Research Methods
University of Kansas University of Kansas
http://pj.freefaculty.org   http://quant.ku.edu

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R installation error

2013-05-10 Thread Paul Johnson

Meenu:

You have an elementary Linux setup and configuration problem to understand
first, before you worry about configuring and compiling your own R.  I
agree strongly that this is something that all linux users should learn to
do, but compiling R itself is like climbing Mt Everest as your first
mountain climb.

So, for whatever Linux us use, join their email support lists. Find out
more about how to install packages there. Not all Linux systems use "yum"
or "apt", we can't help you with that.  SO learn about whatever linux you
have and find out if it separately installs packages required for compiling
programs.  It may even be you have a LInux system without sudo installed,
so learn how to install that.

After you succeed in compiling something simple, then get serious.

My course notes on this:

http://pj.freefaculty.org/guides/Computing-HOWTO/IntroTerminal-3/terminal-3.pdf

On Fri, May 10, 2013 at 7:33 AM, Meenu Chopra wrote:

> Thanks to all
>
> The main problem with ma linux system is that its not able to install any
> software using sudo command.
> like i used command yum search libX11 , it shown that yum is not installed
> and when i use sudo apt-get install yum
> its giving error
> E: Unable to locate package yum
> same problem with all software.
>
> even when u use sudo apt-get update it shows error like
>
> W: Failed to fetch
>
> http://old.releases.ubuntu.com/ubuntu/dists/maverick-security/universe/binary-amd64/Packages.gz
> 404  Not Found
>
> I am not getting whats wrong with my system
>
> Please help me out
>
>
>
> On Fri, May 10, 2013 at 2:32 AM, Jim Lemon  wrote:
>
> > On 05/09/2013 11:06 PM, Meenu Chopra wrote:
> >
> >> Hiii
> >>
> >> I am trying to install R-2.15.2
> >> after doing ./configure its showing error: --with-x=yes (default) and
> X11
> >> headers/libs are not available
> >>
> >> and when i am running make its showing
> >> make: *** No targets specified and no makefile found. Stop.
> >> Even I read the install file also but i am not getting any solution.
> >>
> >>  Hi Meenu,
> > The second error is due to the first, so don't worry about it for the
> > moment. Since you are compiling, you are probably on a Linux system. Try
> > this:
> >
> > su
> > 
> > yum search libX11
> >
> > 
> > 
> >
> > yum install libX11-devel.i686 libX11.i686 libX11-common.noarch
> >
> > 
> >
> > yum install libX11-devel.x86_64 libX11.x86_64 libX11-common.noarch
> >
> > exit
> >
> > This should get the necessary stuff. You may also have problems with
> > "readline". If so, let us know.
> >
> > Jim
> >
>
>
>
> --
> Meenu Chopra
> Research Associate
> Animal Genomics Lab
> NDRI, Karnal
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Paul E. Johnson
Professor, Political Science  Assoc. Director
1541 Lilac Lane, Room 504  Center for Research Methods
University of Kansas University of Kansas
http://pj.freefaculty.org   http://quant.ku.edu

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Automatic way to embed fonts into all pdf output?

2013-05-10 Thread Paul Johnson

This is a feature request. Or else a howto request.  Can there be some
simple, automatic way to make fonts embed into pdf output? Could this be in
options() or par()?

Why?

I recently wanted to create a paper for a conference in a format called
AAAI (which I had never heard of before because I live in a cave).

http://www.aaai.org/Publications/Author/author.php

In their instructions for LaTeX, it has the statement:

All fonts must be embedded in the PDF file  this includes your figures.

I remember that after one creates a pdf file in R with the pdf device, one
can then add fonts to it with the embedFonts() function. But I forget!  And
this is difficult to explain to nonprogramemrs, who just want to put some
figures in a document.   I wish there were something I could put in
options() or par() or somewhere to just make this always happen.  I
understand this will make pdf files larger, but I'm willing to spend the
disk.

-- 
Paul E. Johnson
Professor, Political Science  Assoc. Director
1541 Lilac Lane, Room 504  Center for Research Methods
University of Kansas University of Kansas
http://pj.freefaculty.org   http://quant.ku.edu

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] significantly different from one (not zero) using lm

2013-04-30 Thread Paul Johnson

It is easy to construct your own test. I test against null of 0 first so I
can be sure I match the right result from summary.lm.

## get the standard error
seofb <- sqrt(diag(vcov(lm1)))
## calculate t. Replace 0 by your null
myt <- (coef(lm1) - 0)/seofb
mypval <- 2*pt(abs(myt), lower.tail = FALSE, df = lm1$df.residual)

## Note you can pass a vector of different nulls for the coefficients
myt <- (coef(lm1)  - c(0,1))/seofb

We could write this into a function if we wanted to get busy.  Not a bad
little homework exercise, I think.




> dat <- data.frame(x = rnorm(100), y = rnorm(100))
> lm1 <- lm(y ~ x, data = dat)
> summary(lm1)

Call:
lm(formula = y ~ x, data = dat)

Residuals:
Min  1Q  Median  3Q Max
-3.0696 -0.5833  0.1351  0.7162  2.3229

Coefficients:
 Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.001499   0.104865  -0.0140.989
x   -0.039324   0.113486  -0.3470.730

Residual standard error: 1.024 on 98 degrees of freedom
Multiple R-squared: 0.001224,Adjusted R-squared: -0.008968
F-statistic: 0.1201 on 1 and 98 DF,  p-value: 0.7297

> seofb <- sqrt(diag(vcov(lm1)))
> myt <- (coef(lm1) - 0)/seofb
> mypval <- 2*pt(abs(myt), lower.tail = FALSE, df = lm1$df.residual)
> myt
(Intercept)   x
-0.01429604 -0.34650900
> mypval
(Intercept)   x
  0.9886229   0.7297031
> myt <- (coef(lm1) - 1)/seofb
> mypval <- 2*pt(abs(myt), lower.tail = FALSE, df = lm1$df.residual)
> myt
(Intercept)   x
  -9.550359   -9.158166
> mypval
 (Intercept)x
1.145542e-15 8.126553e-15


On Tue, Apr 30, 2013 at 9:07 PM, Elaine Kuo  wrote:

> Hello,
>
>
>
> I am work with a linear regression model:
>
> y=ax+b with the function of lm.
>
> y= observed migration distance of butterflies
>
> x= predicted migration distance of butterflies
>
>
>
> Usually the result will show
>
> if the linear term a is significantly different from zero based on the
> p-value.
>
> Now I would like to test if the linear term is significantly different from
> one.
>
> (because I want to know if the regression line (y=ax+b) is significantly
> from the line with the linear term =1 and the intercept =0)
>
>
>
> Please kindly advise if it is possible
>
> to adjust some default parameters in the function to achieve the goal.
>
> Thank you.
>
>
> Elaine
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Paul E. Johnson
Professor, Political Science  Assoc. Director
1541 Lilac Lane, Room 504  Center for Research Methods
University of Kansas University of Kansas
http://pj.freefaculty.org   http://quant.ku.edu

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Trouble with methods() after loading gdata package.

2013-04-30 Thread Paul Johnson

Greetings to r-help land.

I've run into some program crashes and I've traced them back to methods()
behavior
after the package gdata is loaded.  I provide now a minimal re-producible
example. This seems bugish to me. How about you?

dat <- data.frame(x = rnorm(100), y = rnorm(100))
lm1 <- lm(y ~ x, data = dat)

methods(class = "lm")

## OK so far

library(gdata)
methods(class = "lm")
## epic fail



## OUTPUT.

> dat <- data.frame(x = rnorm(100), y = rnorm(100))
> lm1 <- lm(y ~ x, data = dat)
>
> methods(class = "lm")
 [1] add1.lm*   alias.lm*  anova.lm   case.names.lm*
 [5] confint.lm*cooks.distance.lm* deviance.lm*   dfbeta.lm*
 [9] dfbetas.lm*drop1.lm*  dummy.coef.lm* effects.lm*
[13] extractAIC.lm* family.lm* formula.lm*hatvalues.lm
[17] influence.lm*  kappa.lm   labels.lm* logLik.lm*
[21] model.frame.lm model.matrix.lmnobs.lm*   plot.lm
[25] predict.lm print.lm   proj.lm*   qr.lm*
[29] residuals.lm   rstandard.lm   rstudent.lmsimulate.lm*
[33] summary.lm variable.names.lm* vcov.lm*

   Non-visible functions are asterisked
>
> library(gdata)
gdata: read.xls support for 'XLS' (Excel 97-2004) files ENABLED.

gdata: read.xls support for 'XLSX' (Excel 2007+) files ENABLED.

Attaching package: gdata

The following object is masked from package:stats:

nobs

The following object is masked from package:utils:

object.size

> methods(class = "lm")
Error in data.frame(visible = rep.int(FALSE, n2), from = rep.int(msg,  :
  duplicate row.names: nobs.lm

> sessionInfo()
R version 3.0.0 (2013-04-03)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=C LC_NAME=C
 [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

other attached packages:
[1] gdata_2.12.0.2

loaded via a namespace (and not attached):
[1] gtools_2.7.1 tcltk_3.0.0  tools_3.0.0


gdata is one of my favorite packages, its worth the effort to get to the
bottom of this.

--
Paul E. Johnson
Professor, Political Science  Assoc. Director
1541 Lilac Lane, Room 504  Center for Research Methods
University of Kansas University of Kansas
http://pj.freefaculty.org   http://quant.ku.edu

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] pglm package: fitted values and residuals

2013-04-25 Thread Paul Johnson

On Wed, Apr 24, 2013 at 4:37 PM, Achim Zeileis  wrote:
> On Wed, 24 Apr 2013, Paul Johnson wrote:
>
>> On Wed, Apr 24, 2013 at 3:11 AM,  
>> wrote:
>>
>>> I'm using the package pglm and I'have estimated a "random probit model".
>>> I need to save in a vector the fitted values and the residuals of the model
>>> but I can not do it.
>>>
>>> I tried with the command fitted.values using the following procedure
>>> without results:
>>>
>> This is one of those "ask the pglm authors" questions. You should take it
>> up with the authors of the package.  There is a specialized email list
>> R-sig-mixed where you will find more people working on this exact same
>> thing.
>>
>> pglm looks like fun to me, but it is not quite done, so far as I can tell.
>
> I'm sure that there are many. One of my attempts to write up a list is in
> Table 1 of vignette("betareg", package = "betareg").

Yes! That's exactly the list I was thinking of.  It was driving me
crazy I could not find it.

Thanks for the explanation.  I don't think I should have implied that
the pglm author must actually implement all the methods, it is
certainly acceptable to leverage the methods that exist.  It just
happened that the ones I tested were not implemented by any of the
affiliated packages.

But this thread leads me to one question I've wondered about recently.

Suppose I run somebody's regression function and out comes an object.

Do we have a way to ask that object "what are all of the methods that
might apply to you?"  Here's why I wondered. You've noticed that
predict.lm has the interval="confidence" argument, but predict.glm
does not. So if I receive a regression model, I'd like to say to it
"do you have a predict method" and if I could get that predict method,
I could check to see if there is a formal argument interval. If it
does not, maybe I'd craft one for them.

pj

> Personally, I don't write anova() methods for my model objects because I can
> leverage lrtest() and waldtest() from "lmtest" and linearHypothesis() and
> deltaMethod() from "car" as long as certain standard methods are available,
> including coef(), vcov(), logLik(), etc.
>
> Similarly, an AIC() method is typically not needed as long as logLik() is
> available. And BIC() works if nobs() is available in addition.
>
> Best,
> Z
>
>>
>> pj
>>
>>> library(pglm)
>>>
>>> m1_S<-pglm(Feed ~ Cons_PC_1 + imp_gen_1 + LGDP_PC_1 + lnEI_1 +
>>>
>>> SH_Ren_1,data,family=binomial(probit),model="random",method="bfgs",index=c("Year","IDCountry"))
>>>
>>> m1_S$fitted.values
>>> residuals(m1)
>>>
>>>
>>> Can someone help me about it?
>>>
>>> Thanks
>>>
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>

--
Paul E. Johnson
Professor, Political Science  Assoc. Director
1541 Lilac Lane, Room 504  Center for Research Methods
University of Kansas University of Kansas
http://pj.freefaculty.org   http://quant.ku.edu

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] pglm package: fitted values and residuals

2013-04-24 Thread Paul Johnson

On Wed, Apr 24, 2013 at 3:11 AM,   wrote:
> I'm using the package pglm and I'have estimated a "random probit model".
> I need to save in a vector the fitted values and the residuals of the model
> but I can not do it.
>
> I tried with the command fitted.values using the following procedure without
> results:
>
This is one of those "ask the pglm authors" questions. You should take
it up with the authors of the package.  There is a specialized email
list R-sig-mixed where you will find more people working on this exact
same thing.

pglm looks like fun to me, but it is not quite done, so far as I can
tell. Well, the authors have not gone the "extra step" to make their
regression objects behave like other R regression objects.   In case
you need alternative software, ask in R-sig-mixed. You'll learn that
most of these can be estimated with other packages. But I really like
the homogeneous user interface that is spelled out in pglm, and I
expect my students will run into the same questions that you have..

I just downloaded their source code, you probably ought to do that so
you can understand what they are doing.   They provide the fitting
functions, but they do not do any of the other work necessary to make
these functions fit together with the R class framework.  There are no
methods for "predict", anova, and so forth.

I'm in their R folder looking for implementations:

pauljohn@pols110:/tmp/pglm/R$ grep summary *
pauljohn@pols110:/tmp/pglm/R$ grep predict *
pauljohn@pols110:/tmp/pglm/R$ grep class *
pauljohn@pols110:/tmp/pglm/R$ grep fitted *
pglm.R:  # glm models can be fitted

Run

> example(pglm)

what can we do after that?

> plot(anb)
Error in xy.coords(x, y, xlabel, ylabel, log) :
  'x' is a list, but does not have components 'x' and 'y'

## Nothing.
## We do get a regression summary object, that's better than some
packages provide:

> anbsum <- summary(anb)

## And a coefficient table

> coef(anbsum)
 Estimate  Std. error   t value  Pr(> t)
(Intercept) -6.933764e-01 0.061391429 -11.294351205 1.399336e-29
wage 1.517009e-02 0.006375966   2.379261231 1.734738e-02
exper1.314229e-03 0.007400129   0.177595444 8.590407e-01
ruralyes-8.594328e-05 0.051334716  -0.001674175 9.986642e-01

> model.matrix(anb)
Error in terms.default(object) : no terms component nor attribute
> anova(anb)
Error in UseMethod("anova") :
  no applicable method for 'anova' applied to an object of class
"c('maxLik', 'maxim')"
> predict(anb)
Error in UseMethod("predict") :
  no applicable method for 'predict' applied to an object of class
"c('maxLik', 'maxim')"

So, if you want those features with these models, you'll have to get
busy and do a lot of coding!

While working on regression support lately, I've reached the
conclusion that if an R package that claims to "do regression" but
does not provide methods for summary, predict, anova, nobs, fitted,
logLik, AIC, and so forth, then it is not done yet. Otherwise, users
like you who expect to be able to run methods like fitted or such have
a bad experience, as you are having now.

Maybe somebody reading this will remind us where the common list of R
regression methods is listed. I know for sure I've seen a document
about these things, but I'm baffled now trying to find it. But I'm
sure there is one.

pj

> library(pglm)
>
> m1_S<-pglm(Feed ~ Cons_PC_1 + imp_gen_1 + LGDP_PC_1 + lnEI_1 +
> SH_Ren_1,data,family=binomial(probit),model="random",method="bfgs",index=c("Year","IDCountry"))
>
> m1_S$fitted.values
> residuals(m1)
>
>
> Can someone help me about it?
>
> Thanks
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Needed: Beta Testers and Summer Camp Students

2013-04-23 Thread Paul Johnson

Greetings.

I'm teaching linear regression this semester and that means I write
more functions for my regression support package "rockchalk".  I'm at
a point now were some fresh eyes would help, so if you are a student
in a course on regression, please consider looking over my package
overview document here:

http://pj.freefaculty.org/R/rockchalk.pdf

That tells how you can grab the test verion, which is now at 1.7.I'm
leaving rockchalk on CRAN at 1.6.3, but as the document linked above
explains, you can download the test version from our local repository
called "kran". I have been making new packages every 3 days or so. If
you are a github user, you can clone a copy of the source if you like
(http://

The functions that have gotten the biggest workover are predictOMatic,
newdata, plotSlopes, plotCurves, and testSlopes. If you just install
the package and run those examples, you will be able to tell right
away if you are interested in trying to adapt these to your needs.
Generally speaking, I watch the students each semester to see which R
things frustrate them the most and then I try to automate them.
That's how the regression table function (outreg) started, and
difficulties in plotting predicted values for various kinds of
regression drive most of the rest. I guess the rockchalk vignette
explains all that.

If you are interested in that flavor of R, or know other people who
might be, spread the word:

we are offering a one-week summer course at the University of Kansas.

There is a discount for enrollment before the end of the month.  The R
course is part of our larger Summer Statistical Institute, which has
been growing rapidly for the past decade. We've had very popular
courses on structural equations modeling and hierarchical models.

Here's the more formal announcement.

Stats Camp 2013 registration is now in full swing. Last year we had
over 300 participants attend 11 different courses. This coming June we
have 15 different courses being offered. Please visit our web pages at
http://crmda.ku.edu/statscamp for information on the courses, a brief
syllabus for each, and registration information.

pj

--
Paul E. Johnson
Professor, Political Science  Assoc. Director
1541 Lilac Lane, Room 504  Center for Research Methods
University of Kansas University of Kansas
http://pj.freefaculty.org   http://quant.ku.edu

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Make barplot with error bars, anova out of a table

2013-04-11 Thread Paul Johnson

Hi

On Thu, Apr 11, 2013 at 6:43 AM, Simza  wrote:
> Helo everybody,
> I'm new to R and have some issues with my data in R.
>
> My raw data look like that:
>
> ID Day size 1 1 7 1 1 7.2 1 1 7.1 2 1 7.3 2 1 7.4 2 1 7.2 3 1 7 3 1 7.1 3 1
> 7.5 4 1 7.3 4 1 7.2 4 1 7.6 1 2 7 1 2 7.2 1 2 7.1 2 2 7.1 2 2 7.4 2 2 7.2 3
> 2 7.5 3 2 7.1 3 2 7.5 4 2 7.2 4 2 7.2 4 2 7.3 1 3 7.4 1 3 7.2 1 3 7.1 2 3
> 7.2 2 3 7.4 2 3 7.2 3 3 7.4 3 3 7.2 3 3 7.5 4 3 7.4 4 3 7.2 4 3 7.7
>
> What I want to do is:
> 1) calculate the significant difference of the size for the 4 groups but
> each day separately!)? E.g. difference in size between ID 1:2, 1:3, 1:4,
> 2:3, 2:4, 3:4 for each day.
>
> I already tried lm() and anova() but just get the difference between day, ID
> and size. I also tried to separate the data per hand but I think there
> should be a way in "R" to do that automatically!? Moreover, I was searching
> the web for similar examples but couln't find anything useful so far.
>
> 2) to make a barplot with error bars of the standart error (from the mean).
> So far I used:
> barplot(matrix(c(Rtest.dat$pH.mean),nr=3), beside=T,
> col=c("black","grey","white"), main="pH", names.arg=c("Green", "Yellow",
> "Blue", "Red"), ylab="pH") legend("topright", c("Day 1","Day 2","Day 3"),
> cex=0.6, bty="n", fill=c("black","grey","white"))
> But I have problems to add the error bars. Also here I already searched the
> web but couldn't manage to get a code working for my data.
>
R has a function called "arrows" that we use to draw intervals like
that. You have to calculate the intervals, of course, that's up to
you, and then draw them on the plot.

As luck would have it, I've been working a vaguely similar type of plot,

http://pj.freefaculty.org/scraps/plot-factor-proto.pdf

and I'm going to upload the R code that produces that plot. It shows
how to do arrows and such. I'm calling this a WorkingExample

http://pj.freefaculty.org/R/WorkingExamples/plot-regression-factors-1.R

> 3) and the last thing I would need is to add * or letters (a,b,...) to the
> graph, indicating the significant difference between them.
>
Try text() to insert stuff on plots. * is writable by text.


> Hope someone can help me!
>
> Thanks for your help!!
>
>
>
> --
> View this message in context: 
> http://r.789695.n4.nabble.com/Make-barplot-with-error-bars-anova-out-of-a-table-tp4663979.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



--
Paul E. Johnson
Professor, Political Science  Assoc. Director
1541 Lilac Lane, Room 504  Center for Research Methods
University of Kansas University of Kansas
http://pj.freefaculty.org   http://quant.ku.edu

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Superscript

2013-04-04 Thread Paul Johnson

On Thu, Apr 4, 2013 at 4:39 AM, Shane Carey  wrote:

> Hi William,
>
> for (i in one:length(DATA_names))
>   if ((grepl("_",DATA_names[i]))=="TRUE")
> DATA_names[i]<-f(DATA_names[i]))
>
> I keep getting an error saying: incompatible types (from symbol to
> character) in subassignment type fix
>
>
I would urge you not to obliterate DATA_names by writing on top of itself.
That creates bugs that are really tough to solve.

Remember that a character variable is not the same thing as an expression.
I mean, an expression is generally a list structure, not interchangeable
with a character string.  If that f() function is creating expressions, and
you are trying to obliterate a character vector with expressions, you will
get errors, as you have seen.

I'm pretty sure that to catch the results of a function that creates
expressions, your receiver object has to be a list.  My first try would be

myList <- vector("list", length(DATA_names))

for (i in one:length(DATA_names))
  if ((grepl("_", DATA_names[i])) == "TRUE")
myList[[i]] <- f(DATA_names[i]))

I copied your code to my system and found I couldn't run it because you had
one more paren than was needed and the word "one" was undefined. But after
fixing that, here's evidence my way works:

Note I inserted some spaces in your code, since it is inhumane to smash
together all those letters around a poor little <- :).

DATA_names<-c("A_mgkg","B_mgkg","C_mgkg","D_mgkg","E_mgkg","F_mgkg","G_mgkg","H
mgkg")

f <- function (name)
{
  # add other suffices and their corresponding plotmath expressions to the
list
  env <- list2env(list(mgkg = bquote(mg ~ kg^{-1}),
   ugkg = bquote(mu * g ~ kg^{-1})),
  parent = emptyenv())
  pattern <- paste0("_(", paste(objects(env), collapse="|"), ")")
  bquoteExpr <- parse(text=gsub(pattern,
"~(.(\\1))",
name))[[1]]
## I use do.call() to work around the fact that bquote's first argument is
## not evaluated.
  do.call(bquote, list(bquoteExpr, env))
}

myList <- vector("list", length(DATA_names))

for (i in 1:length(DATA_names))
  if ((grepl("_", DATA_names[i])) == "TRUE")
myList[[i]] <- f(DATA_names[i])

myList

> myList
[[1]]
A ~ (mg ~ kg^{
-1
})

[[2]]
B ~ (mg ~ kg^{
-1
})

[[3]]
C ~ (mg ~ kg^{
-1
})

[[4]]
D ~ (mg ~ kg^{
-1
})

[[5]]
E ~ (mg ~ kg^{
-1
})

[[6]]
F ~ (mg ~ kg^{
-1
})

[[7]]
G ~ (mg ~ kg^{
-1
})

[[8]]
NULL

I asked a very similar question here last year and got many interesting
suggestions. I was so fascinated I wrote a little programming vignette
called Rchaeology in my package "rockchalk". As soon as you settle on the
most direct route from start to end, please post, I might learn new tricks.

In your case, I've been wondering if you ought to edit the string before it
becomes an expression, or whether you should edit the expression after. I
needed a correct expression to start, though.  This writes the correct
thing in the middle of the graph. Correct?

xcorrect <- expression(X ~~ "mg kg"^{-1})
plot(1:10, 1:10, type ="n")
text(5, 5, xcorrect)

Supposing that's correct, watch this.

mgkg <- quote("mg kg"^{-1})

 text(4, 2, bquote(A ~~ .(mgkg)))
 text(4, 3, bquote(B ~~ .(mgkg)))
 text(4, 6, bquote(C ~~ .(mgkg)))

I've got the "units" captured in mgkg, and then when I want to draw that
into
and expression, i used bquote. substitute works as well, maybe more clear.

text(3, 8, substitute(B ~~ myUnit, list(myUnit = mgkg)))

In any case, I thought this was a fun question.

pj

> Have you any ideas on how to get around this, thanks again for your help,
> much appreciated.
> Cheers
>
>
> On Wed, Apr 3, 2013 at 5:33 PM, William Dunlap  wrote:
>
> > Are you trying to convert a column name like "Na_mgkg" to a plot label
> > like Na (mg kg^-1) ?
> > If so you will have to use both string manipulation functions like gsub()
> > and expression manipulating
> > functions like bquote().  E.g.,
> >
> > f <- function (name)
> > {
> ># add other suffices and their corresponding plotmath expressions to
> > the list
> >env <- list2env(list(mgkg = bquote(mg ~ kg^{-1}),
> > ugkg = bquote(mu * g ~ kg^{-1})),
> >parent = emptyenv())
> >pattern <- paste0("_(", paste(objects(env), collapse="|"), ")")
> >bquoteExpr <- parse(text=gsub(pattern,
> >  "~(.(\\1))",
> >  name))[[1]]
> ># I use do.call() to work around the fact that bquote's first argument
> > is not evaluated.
> >do.call(bquote, list(bquoteExpr, env))
> > }
> >
> > d <- data.frame("Na_mgkg"=1:10, "K_ugkg"=10:1)
> > plot(Na_mgkg ~ K_ugkg, data=d, xlab=f("K_ugkg"), ylab=f("Na_mgkg"))
> >
> > Bill Dunlap
> > Spotfire, TIBCO Software
> > wdunlap tibco.com
> >
> >
> > > -Original Message-
> > > From: r-help-boun...@r-project.org [mailto:
> r-help-boun...@r-project.org]
> > On Behalf
> > > Of Shane Carey
> > > S

Re: [R] DUD (Does not Use Derivatives) for nonlinear regression in R?

2013-04-02 Thread Paul Johnson

On Apr 1, 2013 1:10 AM, "qi A"  wrote:
>
> Hi, All
>
> SAS has DUD (Does not Use Derivatives)/Secant Method for nonlinear
> regression, does R offer this option for nonlinear regression?
>
> I have read the helpfile for nls() and could not find such option, any
> suggestion?
>

nelder-mead is default algorithm in optim. It does not use derivatives. dud
is from same generation, but John Nash recommended N-M method.

Pj

> Thanks,
>
> Derek
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] example to demonstrate benefits of poly in regression?

2013-04-01 Thread Paul Johnson

On Mon, Apr 1, 2013 at 12:42 PM, Gabor Grothendieck  wrote:

> On Mon, Apr 1, 2013 at 1:20 PM, Paul Johnson  wrote:
> > Here's my little discussion example for a quadratic regression:
> >
> > http://pj.freefaculty.org/R/WorkingExamples/regression-quadratic-1.R
> >
> > Students press me to know the benefits of poly() over the more obvious
> > regression formulas.
> >
> > I think I understand the theory on why poly() should be more numerically
> > stable, but I'm having trouble writing down an example that proves the
> > benefit of this.
> >
> > I am beginning to suspect that because R's lm uses a QR decomposition
> under
> > the hood, the instability that we used to see when we inverted (X'X) is
> no
> > longer so serious as it used to be. When the columns in X are
> >
> > x1 x2 x3 x3squared
> >
> > then the QR decomposition is doing about the same thing as we would do if
> > we replaced x3 and x3squared by the poly(x3, 2) results.
> >
> > Or not. I'm not arguing that, I'm thinking out loud, hoping you'll
> correct
> > me.
> >
>
> # 1. One benefit is if you fit a higher degree polynomial using
> poly(x, n) the lower degree coefficients will agree with the fit using
> a lower n.
> # For example, compare these two:
> set.seed(123)
> y <- rnorm(10); x <- 1:10
> lm(y ~ poly(x, 2))
> lm(y ~ poly(x, 3))
>
> # however, that is not true if you don't use orthogonal polynomials.
> Compare these two:
> lm(y ~ poly(x, 2, raw = TRUE))
> lm(y ~ poly(x, 3, raw = TRUE))
>
>
Thanks, Gabor:

As usual, you are very helpful. I can't thank you enough.

I'm pasting your comments to the end of that R example file I mentioned
before, changing the examples to fit with the variable names I use there.

I think your reason #2 is the prize winner, I can't see anything wrong with
that one.

Reason #1 is not quite as strong, only holds strictly if the poly() is the
only predictor.  Users usually have other variables in the model.

If you--or anybody else--has other ideas about this, I am delighted to hear
your ideas.  I still hold out hope that the "numerical stability" argument
will find some traction.

pj



> # 2. A second advantage is that the t statistics are invariant under
> shift if you use orthogonal polynomials
> # compare these two:
> summary(lm(y ~ poly(x, 2)))
> summary(lm(y ~ poly(x+1, 2)))
>
> # however, that is not true if you don;t use orthogonal polynomials,
> Compare these two:
> summary(lm(y ~ poly(x, 2, raw = TRUE)))
> summary(lm(y ~ poly(x+1, 2, raw = TRUE)))
>
> --
> Statistics & Software Consulting
> GKX Group, GKX Associates Inc.
> tel: 1-877-GKX-GROUP
> email: ggrothendieck at gmail.com
>



-- 
Paul E. Johnson
Professor, Political Science  Assoc. Director
1541 Lilac Lane, Room 504  Center for Research Methods
University of Kansas University of Kansas
http://pj.freefaculty.org   http://quant.ku.edu

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] example to demonstrate benefits of poly in regression?

2013-04-01 Thread Paul Johnson

Here's my little discussion example for a quadratic regression:

http://pj.freefaculty.org/R/WorkingExamples/regression-quadratic-1.R

Students press me to know the benefits of poly() over the more obvious
regression formulas.

I think I understand the theory on why poly() should be more numerically
stable, but I'm having trouble writing down an example that proves the
benefit of this.

I am beginning to suspect that because R's lm uses a QR decomposition under
the hood, the instability that we used to see when we inverted (X'X) is no
longer so serious as it used to be. When the columns in X are

x1 x2 x3 x3squared

then the QR decomposition is doing about the same thing as we would do if
we replaced x3 and x3squared by the poly(x3, 2) results.

Or not. I'm not arguing that, I'm thinking out loud, hoping you'll correct
me.

pj


-- 
Paul E. Johnson
Professor, Political Science  Assoc. Director
1541 Lilac Lane, Room 504  Center for Research Methods
University of Kansas University of Kansas
http://pj.freefaculty.org   http://quant.ku.edu

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] good documentation on use of Rscript. where to find?

2012-10-18 Thread Paul Johnson

What is the correct format for the shebang line and what options are
allowed or necessary along with this?

I find plenty of blogs and opinions, but few authoritative answers.

I want an R script to run and update packages periodically, with a
cron job that launches it. What is necessary to put in
line one of the R program. Aside from the basic part

#!/usr/bin/Rscript

what else can there be, or should there be?

Various people suggest adding things like "--vanilla" but those seem
not to be accepted, at least on RedHat EL 6 with R-2.15.1.  I'd like
to run this with a command line like:

$ R-labSelectInstall-02.R > /tmp/Rupdate.txt 2>&1

so that all the standard output and err goes to a text file, but I've
not found a way to make it work comparably to the R command line
option --vanilla or such.

# ./R-labSelectInstall-02.R --vanilla
'ARNING: unknown option '--vanilla

See what I mean?

-- 
Paul E. Johnson
Professor, Political Science  Assoc. Director
1541 Lilac Lane, Room 504  Center for Research Methods
University of Kansas University of Kansas
http://pj.freefaculty.org   http://quant.ku.edu

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] shade overlapping portions of circles (or other shapes)

2012-08-15 Thread Paul Johnson

I'm making some illustrations and it would be convenient to
automatically shade the overlapping portions of circles. These
illustrations are for Social Choice theory, a field in political
science and economics. I've wrestled together some examples so you
can see what I mean, but have not mastered the "color overlapping
sections" problem (as you will see):

http://pj.freefaculty.org/scraps/t-m15g.pdf

The shaded area does not overlap perfectly, mainly because I isolated
the points manually with locator(). Tedious!

I've not found R functions just for this. Are there any packages?

You can get an idea of the problem if you run, for example

library(plotrix)
example(draw.circle)

See the red and the blue circles overlap? What's the best way to
"shade in" that overlapping area.

I don't have Mathematica anymore, but they seem to be able to do this
more-or-less easily

http://mathematica.stackexchange.com/questions/2554/how-to-plot-venn-diagrams-with-mathematica

I want to make R do same ( or develop package to do it).

I was under some pressure to get a graph finished that I linked to
above. But I would not want to do it that way again. The shapes won't
always be perfect circles, they may be ellipses or other convex
shapes. But if I could understand the problem for circles, maybe I
could solve for others.

Do you have suggestions?

I am considering this. Collect the co-ordiates from the circle points

C1 <- draw.circle(3.5,7,0.8,border="blue",lty=2,lwd=2)
C2 <- draw.circle(2.5,8,0.6,border="red",lty=3,lwd=3)

Can I isolate the relevant arcs from C1 and C2? If I could, I'd join them

Cjoin <- list(x = c(C1$x, C2$x), y = c(C1$y, C2$y))

And use that for plotting.

-- Paul E. Johnson
Professor, Political ScienceAssoc. Director
1541 Lilac Lane, Room 504 Center for Research Methods
University of Kansas University of Kansas
http://pj.freefaculty.orghttp://quant.ku.edu

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] device "mismatch", coordinates trouble with X11 and pdf devices

2012-08-10 Thread Paul Johnson

Greetings.

I'm trying to understand a problem on a Dell Laptop.  Details below,
also uploaded the R working example that I pasted below.

http://pj.freefaculty.org/scraps/testSymbols.R

> sessionInfo()
R version 2.15.1 (2012-06-22)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=C LC_NAME=C
 [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base


## Paul Johnson 
## 2012-08-10
##
## I've got trouble with a mis-match between screen and pdf devices.
## Please run this and tell me if the point z's "marker" is on the
## intersection of the 2 circles for you.

## Why do I ask? On my Dell M4600 laptop, the marker (the pch=12) is
## on the intersection in the x11 device. See the screenshot:
##
## http://pj.freefaculty.org/scraps/testSymbol-01.png
##
## But the pdf output of same is not
##
## http://pj.freefaculty.org/scraps/testSymbol.pdf
##
## Notice one other weird thing. The circle sizes change. Maybe
## that's just the same weird thing.

## The X11 device is 7in x 7in. Right? Same as pdf.
## Why would coordinates from locator work in one display, but not the
## other?  I've abused the symbols function somehow? locator is device
## dependant?

## Maybe its the video driver & LCD.
## This is a Dell Precision M4600. The LCD display is 13.6 inches x
## 7.6 inches (345.4mm x 193mm). By my reckoning, it is not precisely
## 16:9.  The Nvidia card says the native resolution is 1920x1080,
## which is exactly 16:9.




saveFile <- FALSE
if(saveFile){
pdf(file="testSymbol.pdf", onefile=FALSE, paper="special", height = 7,
width = 7)
}

plot(c(0,1), c(0,1), type = "n", xlab = "", ylab = "")


x <- c(0.3, 0.3)
y <- c(0.7, 0.7)
points(x[1], x[2])
points(y[1], y[2])

symbols(x[1], x[2], circles=c(0.3), lty=2, inches = FALSE, add=TRUE)

symbols(y[1], y[2], circles=c(0.3), lty=2, inches = FALSE, add=TRUE)

##Here's what I get from locator(1) on the intersection
z <- list(x=0.4028, y=0.6485)
if (is.null(z)) z <- locator(1) ## click on intersection of lines


points(z, pch=12)
text(x =0.96*z[[1]], y = 1.05*z[[2]], label="z")

dev.off()




-- 
Paul E. Johnson
Professor, Political ScienceAssoc. Director
1541 Lilac Lane, Room 504 Center for Research Methods
University of Kansas   University of Kansas
http://pj.freefaculty.orghttp://quant.ku.edu

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] complexity of operations in R

2012-07-19 Thread Paul Johnson

On Thu, Jul 19, 2012 at 11:11 AM, Bert Gunter  wrote:
> Hadley et. al:
>
> Indeed. And using a loop is a poor way to do it anyway.
>
> v <- as.list(rep(FALSE,dotot))
>
> is way faster.
>
> -- Bert
>

Its not entirely clear to me what we are supposed to conclude about this.

I can confirm Bert's claim that his way is much  faster.  I ran
Johan's code, then Bert's suggestion.  The speedup is almost
unbelievable.

> system.time(f(dotot))

   user  system elapsed
 25.249   0.332  25.606
> #system.time(g(dotot))
> system.time(h(dotot))
   user  system elapsed
 15.753   0.404  16.173
> p <- function(dotot){v <- as.list(rep(FALSE,dotot))}
> system.time(p(dotot))
   user  system elapsed
  0.004   0.000   0.004

Why is this so much faster?

> f
function(dotot){
  v<-matrix(ncol=1,nrow=0)
  for(i in 1:dotot){
v<-rbind(v,FALSE)
  }
  return(v)
}

Obvious problems.

1. The return() at the end is unnecessary. Cutting that off saves 4%
or so on time.

>
> f2 <- function(dotot){
+   v<-matrix(ncol=1,nrow=0)
+   for(i in 1:dotot){
+ v<-rbind(v,FALSE)
+   }
+   v
+ }
> system.time(f2(dotot))
   user  system elapsed
 24.150   0.220  24.391

2. Use of rbind is SLOW.  Calling rbind over and over again is VERY
slow (http://pj.freefaculty.org/R/WorkingExamples/stackListItems.Rout).

3. Calling matrix over and over to create nothing is SLOW.

If we just want a giant empty list, do this:

mylst <- vector(mode="list", length=dotot)

> system.time(mylst <- vector(mode="list", length=dotot))
   user  system elapsed
  0.000   0.000   0.001

In a practical application, where something has to be created to go
into the list, I think some speed considerations should be:

1. When a 'thing' is created and put in the list, do we need to
allocate memory twice like this

x <- do.whatever()

mylst[[999]] <- x

This will create x, and then make a duplicate of x to go into mylst. BORING!

if x is a big thing, I think better to avoid copying by making it anonymous:

mylst[[999]] <- do.whatever()

If memory is a consideration at all, this is better, and it avoids the
problem of allocating memory twice.

Anyway, it seems to me the point of this thread should be that
allocating a big giant list of nothings is arbitrarily fast, but it is
still not known

1.  what is the efficient way to create things to go into the list,

2. what is the best way to make use of the results once they are collected.

I'm sure for looping and calling rbind lots of times is a really bad
idea.  Everything else is on the table for discussion. Its not the for
loop that makes the original f function slow, its repeated use of
matrix and rbind.

pj
-- 
Paul E. Johnson
Professor, Political ScienceAssoc. Director
1541 Lilac Lane, Room 504 Center for Research Methods
University of Kansas   University of Kansas
http://pj.freefaculty.orghttp://quant.ku.edu

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] EM algorithm to find MLE of coeff in mixed effects model

2012-07-03 Thread Paul Johnson

On Tue, Jul 3, 2012 at 12:41 PM, jimmycloud  wrote:
> I have a general question about coefficients estimation of the mixed model.
>

I have 2 ideas for you.

1. Fit with lme4 package, using the lmer function. That's what it is for.

2. If you really want to write your own EM algorithm, I don't feel
sure that very many R and EM experts are going to want to read through
the code you have because you don't follow some of the minimal
readability guidelines.  I accept the fact that there is no officially
mandated R style guide, except for "indent with 4 spaces, not tabs."
But there are some almost universally accepted basics like

1. Use <-, not =, for assignment
2. put a space before and  after symbols like <-, = , + , * , and so forth.
3. I'd suggest you get used to putting squiggly braces in the K&R style.

I have found the formatR package's tool tidy.source can do this
nicely. From tidy.source, here's what I get with your code

http://pj.freefaculty.org/R/em2.R

Much more readable!
(I inserted library(MASS) for you at the top, otherwise this doesn't
even survive the syntax check.)

When I copy from email to Emacs, some line-breaking monkey-business
occurs, but I expect you get the idea.

Now, looking at your cleaned up code, I can see some things to tidy up
to improve the chances that some expert will look.

First, R functions don't require "return" at the end, many experts
consider it distracting. (Run "lm" or such and review how they write.
No return at the end of functions).

Second, about that big for loop at the top, the one that goes from m 1:300

I don't know what that loop is doing, but there's some code in it that
I'm suspicious of. For example, this part:

 W.hat <- psi.old - psi.old %*% t(Zi) %*% solve(Sig.hat) %*%  Zi %*% psi.old

Sigb <- psi.old - psi.old %*% t(Zi) %*% solve(Sig.hat) %*%  Zi %*% psi.old

First, you've caused the possibly slow calculation of solve
 (Sig.hat) to run two times.  If you really need it, run it once and
save the result.

Second, a for loop is not necessarily slow, but it may be easier to
read if you re-write this:

 for (j in 1:n) {
tempaa[j]=Eh4new(datai=data.list[[j]],weights.m=weights.mat,beta=beta.old)
tempbb[j] <- Eh4newv2(datai = data.list[[j]], weights.m =
weights.mat, beta = beta.old)
 }

like this:

tempaa <- lapply(data.list, Eh4new, weights.m, beta.old)
tempbb <- lapply(data.list, Eh4newv2, weights.m, beta.old)

Third, here is a no-no

tempbb <- c()
for (j in 1:n) {
tempbb <- cbind(tempbb, Eh2new(data.list[[j]], weights.m = weights.mat))
}

That will call cbind over and over, causing a relatively slow memory
re-allocation.  See
(http://pj.freefaculty.org/R/WorkingExamples/stackListItems.R)

Instead, do this to apply Eh2new to each item in data.list

tempbbtemp <- lapply(data.list, Eh2new, weights.mat)

and then smash the results together in one go

tempbb <- do.call("cbind", tempbbtemp)

Fourth, I'm not sure on the matrix algebra. Are you sure you need
solve to get the full inverse of Sig.hat?  Once you start checking
into how estimates are calculated in R, you find that the
paper-and-pencil matrix algebra style of formula is generally frowned
upon. OLS (in lm.fit) doesn't do solve(X'X), it uses a QR
decomposition of matrices.  OR look in MASS package ridge regression
code, where the SVD is used to get the inverse.

I wish I knew enough about the EM algorithm to gaze at your code and
say "aha, error in line 332", but I'm not.  But if you clean up the
presentation and tighten up the most obvious things, you improve your
chances that somebody who is an expert will exert him/herself to do
it.

pj

>  b follows
> N(0,\psi)  #i.e. bivariate normal
> where b is the latent variable, Z and X are ni*2 design matrices, sigma is
> the error variance,
> Y are longitudinal data, i.e. there are ni measurements for object i.
> Parameters are \beta, \sigma, \psi; call them \theta.
>
> I wrote a regular EM, the M step is to maximize the log(f(Y,b;\theta))  by
> taking first order derivatives, setting to 0 and solving the equation.
> The E step involves the evaluation of E step, using Gauss Hermite
> approximation. (10 points)
>
> All are simulated data. X and Z are naive like cbind(rep(1,m),1:m)
> After 200 iterations, the estimated \beta converges to the true value while
> \sigma and \psi do not. Even after one iteration, the \sigma goes up from
> about 10^0 to about 10^1.
> I am confused since the \hat{\beta} requires \sigma and \psi from previous
> iteration. If something wrong then all estimations should be incorrect...
>
> Another question is that I calculated the logf(Y;\theta) to see if it
> increases after updating \theta.
> Seems decreasing.
>
> I thought the X and Z are linearly dependent would cause some issue but I
> also changed the X and Z to some random numbers from normal distribution.
>
> I also tried ECM, which converges fast but sigma and psi are not close to
> th

[R] non ascill characters in plots. no alternative but plotmath?

2012-06-06 Thread Paul Johnson

A student entered some data with text characters like epsilon and
alpha.   On her Windows system, the Greek letters did not display
properly in a plot.  There were some ordinary ASCII instead.

I asked her to send me the code so I could test. For me, the plot
looks ok on the screen.

Format1 <- c(320,500,700,1000,500,320,700,500,320)
Format2 <- c(800,1000,1150,1400,1500,1650,1800,2300,2500)
Vowel <- c("u","o", "α", "a","ø", "y", "ε", "e","i")
V1 <- data.frame(Format1,Format2,Vowel)
plot(Format1 ~ Format2, data = V1, type="n")
text(V1$Format2, V1$Format1, labels=V1$Vowel)

On my Debian linux system, the plot shows the Greek letters just fine
in the screen device.

However, I turned on a pdf device to run the same  code and see signs
of trouble.

> text(V1$Format2, V1$Format1, labels=V1$Vowel)
Warning messages:
1: In text.default(V1$Format2, V1$Format1, labels = V1$Vowel) :
  conversion failure on 'α' in 'mbcsToSbcs': dot substituted for 
2: In text.default(V1$Format2, V1$Format1, labels = V1$Vowel) :
  conversion failure on 'α' in 'mbcsToSbcs': dot substituted for 
3: In text.default(V1$Format2, V1$Format1, labels = V1$Vowel) :
  font metrics unknown for Unicode character U+03b1
4: In text.default(V1$Format2, V1$Format1, labels = V1$Vowel) :
  conversion failure on 'α' in 'mbcsToSbcs': dot substituted for 
5: In text.default(V1$Format2, V1$Format1, labels = V1$Vowel) :
  conversion failure on 'α' in 'mbcsToSbcs': dot substituted for 
6: In text.default(V1$Format2, V1$Format1, labels = V1$Vowel) :
  conversion failure on 'ε' in 'mbcsToSbcs': dot substituted for 
7: In text.default(V1$Format2, V1$Format1, labels = V1$Vowel) :
  conversion failure on 'ε' in 'mbcsToSbcs': dot substituted for 
8: In text.default(V1$Format2, V1$Format1, labels = V1$Vowel) :
  font metrics unknown for Unicode character U+03b5
9: In text.default(V1$Format2, V1$Format1, labels = V1$Vowel) :
  conversion failure on 'ε' in 'mbcsToSbcs': dot substituted for 
10: In text.default(V1$Format2, V1$Format1, labels = V1$Vowel) :
  conversion failure on 'ε' in 'mbcsToSbcs': dot substituted for 

The alpha and epsilon characters don't appear in the pdf.   I don't
know the proper terminology to describe the situation, thus I don't
know where to start reading. Until very recently, I didn't even know
it was possible to directly enter these characters in Emacs, but I've
learned that part.

I understand you might answer "use plotmath", if if that's the only
workable thing, I will teach her how. But that's a little bit of an up
hill climb (from where we are now standing). It will be a lot more
work for me to teach about expressions and whatnot, so if there is a
direct route from a column of non ASCII characters to a plot that has
those characters in it, I'd be glad to know.

pj

-- 
Paul E. Johnson
Professor, Political Science    Assoc. Director
1541 Lilac Lane, Room 504     Center for Research Methods
University of Kansas               University of Kansas
http://pj.freefaculty.org            http://quant.ku.edu

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Partial R-square in multiple linear regression

2012-06-03 Thread Paul Johnson

On Fri, Jun 1, 2012 at 2:05 PM, Jin Choi  wrote:
> Hello,
>
> I am trying to obtain the partial r-square values (r^2 or R2) for
> individual predictors of an outcome variable in multiple linear
> regression. I am using the 'lm' function to calculate the beta
> coefficients, however, I would like to know the individual %
> contributions of several indepenent variables. I tried searching for
> this function in many R packages, but it has proven elusive to me. I
> am an R beginner, and I am hoping that I will find a solution!
>
> Thank you very much.
>
> Sincerely,

This is the kind of practical user request I've been trying to answer
in the rockchalk package. I have a function called "getDeltaRsquare".
It takes a fitted regression and calculates the change in Rsquare when
each variable is removed.

It does not achieve the "Partition" idea for Rsquare that you might
want because there is no logically meaningful way to partition Rsquare
among predictors if there is any multicollinearity.  I could point you
at some stat books that try to achieve that partition, mostly they
seem to be by psychologists (who like that kind of thing.)  You will
go down a rabbit hole of "semi-partial correlation coefficients" and
so forth.

But if you just want the "how much does R-square drop if I leave out
this variable," I got that for you :)  Whether that is meaningful to
you, well, that's a bigger methodological question.

pj

>
> Jin Choi
> MSc Epidemiology student
> McGill University, Montreal, Quebec, CANADA
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Paul E. Johnson
Professor, Political Science    Assoc. Director
1541 Lilac Lane, Room 504     Center for Research Methods
University of Kansas               University of Kansas
http://pj.freefaculty.org            http://quant.ku.edu

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] memory usage benefit from "anonymous" variable constructions.

2012-06-03 Thread Paul Johnson

This is an "I was just wondering" question.

When the package "dataframe" was announced, the author claimed to
reduce the number of times a data frame was copied, I started to
wonder if I should care about this in my projects.  Has anybody
written a general guide for how to write R code that doesn't
needlessly exhaust RAM?

In Objective-C, we used to gain some considerable advantages by
avoiding declaring objects separately, using anonymous variable
instead. The storage was allocated on the stack, I think, and I think
there was talk that the numbers might stay 'closer' to the CPU
(register?) for immediate usage.

Does this benefit in R as well?  For example, instead of the way I
would usually do this:

 mf <- model.frame(model)
  y <- model.response(mf)

Here is the anonymous alternative, mf is never declared

y <- model.response(model.frame(model))

On the face of it, I can imagine this might be better because no
permanent thing "mf" is created, the garbage collector wouldn't be
called into play if all the data is local and disappears immediately.
But, then again, R is doing lots of stuff "under the hood" that I've
never bothered to learn about.


pj
-- 
Paul E. Johnson
Professor, Political Science    Assoc. Director
1541 Lilac Lane, Room 504     Center for Research Methods
University of Kansas               University of Kansas
http://pj.freefaculty.org            http://quant.ku.edu

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] trouble automating formula edits when log or * are present; update trouble

2012-05-29 Thread Paul Johnson

>>
>
> Try substitute:
>
>> do.call("substitute", list(newFmla, setNames(list(as.name("x1c")), "x1")))
> y ~ log(x1c) + x2 * x3
>
Damn. That's pretty. I'd say "setNames" a magic bullet too.

Thanks very much.

The approach suggested by Michael and Bert has the little shortcoming
that grepping for "x1" also picks up similarly named variables like
"x1new" and "x1old".  I've not yet found a similar downside with
Gabor's method.

pj
>
> --
> Statistics & Software Consulting
> GKX Group, GKX Associates Inc.
> tel: 1-877-GKX-GROUP
> email: ggrothendieck at gmail.com



-- 
Paul E. Johnson
Professor, Political Science    Assoc. Director
1541 Lilac Lane, Room 504     Center for Research Methods
University of Kansas               University of Kansas
http://pj.freefaculty.org            http://quant.ku.edu

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] trouble automating formula edits when log or * are present; update trouble

2012-05-29 Thread Paul Johnson

Greetings

I want to take a fitted regression and replace all uses of a variable
in a formula. For example, I'd like to take

m1 <- lm(y ~ x1, data=dat)

and replace x1 with something else, say x1c, so the formula would become

m1 <- lm(y ~ x1c, data=dat)

I have working code to finish that part of the problem, but it fails
when the formula is more complicated. If the formula has log(x1) or
x1:x2, the update code I'm testing doesn't get right.

Here's the test code:

##PJ
## 2012-05-29
dat <- data.frame(x1=rnorm(100,m=50), x2=rnorm(100,m=50),
x3=rnorm(100,m=50), y=rnorm(100))

m1 <- lm(y ~ log(x1) + x1 + sin(x2) + x2 + exp(x3), data=dat)
m2 <- lm(y ~ log(x1) + x2*x3, data=dat)

suffixX <- function(fmla, x, s){
upform <- as.formula(paste0(". ~ .", "-", x, "+", paste0(x, s)))
update.formula(fmla, upform)
}

newFmla <- formula(m2)
newFmla
suffixX(newFmla, "x2", "c")
suffixX(newFmla, "x1", "c")

The last few lines of the output. See how the update misses x1 inside
log(x1) or in the interaction?


> newFmla <- formula(m2)
> newFmla
y ~ log(x1) + x2 * x3
> suffixX(newFmla, "x2", "c")
y ~ log(x1) + x3 + x2c + x2:x3
> suffixX(newFmla, "x1", "c")
y ~ log(x1) + x2 + x3 + x1c + x2:x3

It gets the target if the target is all by itself, but not otherwise.

After messing with this for quite a while, I conclude that update was
the wrong way to go because it is geared to replacement of individual
bits, not editing all instances of a thing.

So I started studying the structure of formula objects.  I noticed
this really interesting thing. the newFmla object can be probed
recursively to eventually reveal all of the individual pieces:


> newFmla
y ~ log(x1) + x2 * x3
> newFmla[[3]]
log(x1) + x2 * x3
> newFmla[[3]][[2]]
log(x1)
> newFmla[[3]][[2]][[2]]
x1

So, if you could tell me of a general way to "walk" though a formula
object, couldn't I use "gsub" or something like that to recognize each
instance of "x1" and replace with "x1c"??

I just can't figure how to automate the checking of each possible
element in a formula, to get the right combination of [[]][[]][[]].
See what I mean? I need to avoid this:

> newFmla[[3]][[2]][[3]]
Error in newFmla[[3]][[2]][[3]] : subscript out of bounds

pj

-- 
Paul E. Johnson
Professor, Political Science    Assoc. Director
1541 Lilac Lane, Room 504     Center for Research Methods
University of Kansas               University of Kansas
http://pj.freefaculty.org            http://quant.ku.edu

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R Memory Issues

2012-05-22 Thread Paul Johnson

Dear Emiliano:

When they say to read the posting guide, mostly they mean read the
posting guide. But I'll tell you the short version.

1. Include a full runable R program that causes the trouble you are
concerned about.  Include the data or a link to the data, usually the
smallest possible example is what they want.  They don't want 1000
lines of your dissertation project, they want 10 lines needed to
produce the problem you are concerned about.

The point here is this: Don't make people guess about what commands
you ran or what your data actually was. You are going to get the
attention of these folks one time, and you waste it by not reading the
guide and not giving the full details.

2. Include the output from sessionInfo() whenever you ask a question
of this sort.

> sessionInfo()
R version 2.15.0 (2012-03-30)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=C LC_NAME=C
 [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

I can tell  you my best guess about what is wrong: I suspect you have
a corrupted R install. If you had given us the full working code, I
could have tested that theory. But, alas, I can't.

Why do I think so? I've taught a course this term with 45 students and
about 1 time per week, a student would turn up with that "can't
allocate vector..." error you see.  On Windows, sometimes it seems the
problem is due to installing R as an administrator and then trying to
update some packages as a non-administrator.  In one really
frustrating case, student has installed "car" both as admin and as the
user, and the one that was at the front of the search path was
damaged, but we kept removing and re-installing the other one and
nothing was fixed.  Until I noticed there were 2

pj

On Tue, May 22, 2012 at 11:40 AM, Emiliano Zapata  wrote:
> As a continuation to my original question, here is the massage that I get:
>
> Error in glm.fit(x = structure(c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,  :
>  cannot allocate memory block of size 2.1 Gb
>
> The model "glm.fit" is a logistic type (in the family of GLM) model. Maybe
> this is not enough information; again!, but some feedback will be
> appreciated. To me the issues appears to be associated with manipulation of
> large dataset. Howeverl the algorithm runs fine in Unix; but not in Windows
> (64 bits windows 7).
>
> EZ
>
> On Sun, May 20, 2012 at 4:09 PM, Emiliano Zapata wrote:
>
>> Already then, thank you everyone. This information was extremly useful,
>> and I'll do a better job on the web next time.
>>
>> On Sun, May 20, 2012 at 2:10 PM, Prof Brian Ripley 
>> wrote:
>>
>>> On 20/05/2012 18:42, jim holtman wrote:
>>>
 At the point in time that you get the error message, how big are the
 objects that you have in memory?  What does 'memory.size()' show as
 being used?  What does 'memory.limit()' show?  Have you tried using
 'gc()' periodically to do some garbage collection?  It might be that
 you memory is fragmented.  You need to supply some additional
 information.

>>>
>>> Either this is a 32-bit version of R in which case the wrong version is
>>> being used, or your advice is wrong: there are no credible fragmentation
>>> issues (and no need to use gc()) on a 64-bit build of R.
>>>
>>> But, we have a posting guide, we require 'at a minimum information', and
>>> the OP failed to give it to us so we are all guessing, completely
>>> unnecessarily.
>>>
>>>
>>>
 On Sun, May 20, 2012 at 12:09 PM, Emiliano Zapata
  wrote:

> -- Forwarded message --
> From: Emiliano Zapata
> Date: Sun, May 20, 2012 at 12:09 PM
> Subject:
> To: R-help@r-project.org
>
>
> Hi,
>
> I have a 64 bits machine (Windows) with a total of 192GB of physical
> memory
> (RAM), and total of 8 CPU. I wanted to ask how can I make R make use of
> all
> the memory. I recently ran a script requiring approximately 92 GB of
> memory
> to run, and got the massage:
>
>  cannot allocate memory block of size 2.1 Gb
>
>
>
> I read on the web that if you increase the memory you have to reinstall
> R;
> would that be enough. Could I just increase the memory manually.
>
>
> Take you for any comments, or links on the web.
>
>
> EZ
>
>        [[alternative HTML version deleted]]
>
> __**
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/**listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/**
> posting-guide.html

[R] Build error on RedHat EL 5.5: byte-compiling package 'compiler'

2012-05-22 Thread Paul Johnson

Greetings R-help!

In case anybody has worked on an old Redhat system lately, can you
remember what is the cause of this problem?  I don't think this is a
fatal problem, because I can get around it by uninstalling all of the
R RPM packages and re-running the build. But if R is installed, the
build fails while byte-compiling as seen below.  A *guessed* removing
the installed R stuff might make a difference, and it did!

The build directory is /tmp/BUILD/R-2.15.0/ but the compile process
jumps out and tries to grab files in /usr/lib64...


make[6]: Entering directory `/tmp/BUILD/R-2.15.0/src/library/tools/src'
make[6]: `Makedeps' is up to date.
make[6]: Leaving directory `/tmp/BUILD/R-2.15.0/src/library/tools/src'
make[6]: Entering directory `/tmp/BUILD/R-2.15.0/src/library/tools/src'
mkdir -p -- ../../../../library/tools/libs
make[6]: Leaving directory `/tmp/BUILD/R-2.15.0/src/library/tools/src'
make[5]: Leaving directory `/tmp/BUILD/R-2.15.0/src/library/tools/src'
make[4]: Leaving directory `/tmp/BUILD/R-2.15.0/src/library/tools'
make[3]: Leaving directory `/tmp/BUILD/R-2.15.0/src/library/tools'
make[3]: Entering directory `/tmp/BUILD/R-2.15.0/src/library/compiler'
building package 'compiler'
mkdir -p -- ../../../library/compiler
make[4]: Entering directory `/tmp/BUILD/R-2.15.0/src/library/compiler'
mkdir -p -- ../../../library/compiler/R
mkdir -p -- ../../../library/compiler/po
make[4]: Leaving directory `/tmp/BUILD/R-2.15.0/src/library/compiler'
make[4]: Entering directory `/tmp/BUILD/R-2.15.0/src/library/compiler'
byte-compiling package 'compiler'
Warning in file(datafile, "wb") :
  cannot open file '/usr/lib64/R/library/compiler/R/compiler.rdb':
Permission denied
Error in file(datafile, "wb") : cannot open the connection
Calls:  -> code2LazyLoadDB -> makeLazyLoadDB -> close -> file
Execution halted
make[4]: *** [../../../library/compiler/R/compiler.rdb] Error 1
make[4]: Leaving directory `/tmp/BUILD/R-2.15.0/src/library/compiler'
make[3]: *** [all] Error 2
make[3]: Leaving directory `/tmp/BUILD/R-2.15.0/src/library/compiler'
make[2]: *** [R] Error 1
make[2]: Leaving directory `/tmp/BUILD/R-2.15.0/src/library'
make[1]: *** [R] Error 1
make[1]: Leaving directory `/tmp/BUILD/R-2.15.0/src'
make: *** [R] Error 1
error: Bad exit status from /var/tmp/rpm-tmp.5874 (%build)


The command line that initiates the RPM build is

rpmbuild -ba R-2.15-pj.spec --define "dist=kurocks" > Rbuild.txt 2>&1 &

and I uploaded the full script of the build process here:

http://pj.freefaculty.org/R/Rbuild.txt

I searched the archive and find a 2006 mention of the same problem,
and the solution was that the user had R_LIBS defined in the
environment.  I don't have anything like that in the output of "env",
but obviously, I'm getting something that confuses the build from the
R envionment.  I am just a little curious if somebody can explain
what's going wrong.

pj
-- 
Paul E. Johnson
Professor, Political Science    Assoc. Director
1541 Lilac Lane, Room 504     Center for Research Methods
University of Kansas               University of Kansas
http://pj.freefaculty.org            http://quant.ku.edu

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Names of Greek letters stored as character strings; plotmath.

2012-05-19 Thread Paul Johnson

On Sat, May 19, 2012 at 11:07 AM, William Dunlap  wrote:
> parse(text=paste(...)) works in simple cases but not in others.  The
> fortune about it is there because it is tempting to use but if you bury it
> in a general purpose function it will cause problems when people
> start using nonstandard names for variables.  bquote(), substitute(),
> call(), and relatives work in all cases.  E.g.,
>
>  > par(mfrow=c(2,1))
>  > power <- "gamma" ; x <- "Waist" ; y <- "Weight" # valid R variable names
>  > plot(0, main=bquote(.(as.name(x))^.(as.name(power))/.(as.name(y
>  > plot(0, main=parse(text=paste0(x, "^", power, "/", y))) # same as previous
>  >
>  > power <- "gamma" ; x <- "Waist Size (cm)" ; y <- "Weight (kg)" # invalid R 
> names
>  > plot(0, main=bquote(.(as.name(x))^.(as.name(power))/.(as.name(y
>  > plot(0, main=parse(text=paste0(x, "^", power, "/", y))) # whoops
>  Error in parse(text = paste0(x, "^", power, "/", y)) :
>    :1:7: unexpected symbol
>  1: Waist Size
>           ^
>
> Now you might say that serves me right for using weird variable names,
> but some of us use R as a back end to a GUI system (one not designed
> around R) and don't want to inflict on users R's rules for names when
> we do not have to.
>
> Bill Dunlap
> Spotfire, TIBCO Software
> wdunlap tibco.com
>
Bill's point is on the money here. We should aim to teach one way that
works always. But finding that one way is the hard part for me.

Lately, I'm bothered that R (or computers in general?) allows too many
ways to do the same thing that work SOME of the time. Without a very
deep understanding of the terminology, users are bewildered.

Go read ?plotmath.  Do we see "as.name"?   NO.  Why does the idiom
expression(paste()) work as well?   (Don't answer, that's a rhetorical
question).

There are too many ways to do the same thing. Or, well, too many of us
know one way that works sometime and don't find out it doesn't always
work until, as Bill says, it is buried in the bottom of some big
function that behaves erratically.

Gabor notes this works (sometimes):

plot(0, xlab = parse(text = xNm))

Bert observes this works (sometimes)

xnm <- quote(gamma)
## makes xnm the name gamma not the string "gamma"
plot(0,xlab = bquote( .(xnm))

Bill observes this works:
xnm <- "gamma"
plot(0,xlab = bquote(.(as.name(xnm

In line with that, Bill suggests:

power <- "gamma" ; x <- "Waist Size (cm)" ; y <- "Weight (kg)" # invalid R names
plot(0, main=bquote(.(as.name(x))^.(as.name(power))/.(as.name(y

It appears to me that 2/3 of the as.name usages are not needed, at
least not in R 2.15 on Debian Linux.

This works just as well

plot(0, main=bquote(.(x)^.(as.name(power))/.(y)))


Is there any important reason to wrap x and y in this command with as.name?

It does work, maybe I be in the general habit of wrapping as.name
around variables in bquotes?  I'm more confused.

The good new for me is that I cannot find any replacements for the
usage of "as.name" in that particular expression. So there is just one
way.

Oh, wait, I spoke too soon. as.symbol is identical to as.name.

plot(0, main=bquote(.(x)^.(as.symbol(power))/.(y)))

And then, logically, this works:

plot(0, main=bquote(.(x)^.(as.vector(power, "symbol"))/.(y)))

I personally think "as.symbol" is more likely to be understood by
students, I may stick to that. So I think the most succinct, best
approach is


plot(0, main=bquote(.(x)^.(as.symbol(power))/.(y)))

or perhaps the most consistent-looking strategy is:

power <- as.symbol("gamma") ; x <- "Waist Size (cm)" ; y <- "Weight (kg)"
plot(0, main=bquote(.(x)^.(power)/.(y)))

Speaking of commands that have identical results and exist separately
for no apparently good reason:

library(help=rockchalk)

help(package=rockchalk)

pj

>
>> -Original Message-
>> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
>> Behalf
>> Of Bert Gunter
>> Sent: Saturday, May 19, 2012 7:24 AM
>> To: Gabor Grothendieck
>> Cc: r-help
>> Subject: Re: [R] Names of Greek letters stored as character strings; 
>> plotmath.
>>
>> ... and here is another incantation that may be  informative.
>>
>> xnm<- as.name("gamma')  ## This does the parsing
>> plot(0, xlab =bquote(.(xnm))
>>
>> The initial puzzle is that if you just set
>> xnm <- "gamma"
>>
>> bquote will insert the string "gamma" rather than the symbol. After
>> all, that's what plotmath sees for xnm. So the key is telling plotmath
>> that it's a symbol, not a string. This can either be done before, as
>> above, or inline, as you and Gabor showed. Unsurprisingly. this also
>> does it, since as.name() is doing the parsing:
>>
>> xnm <- "gamma"
>>  plot(0,xlab=bquote(.(as.name(xnm
>>
>> AND we are adhering to Thomas's dictum: bquote is a wrapper for
>> substitute(), which is what he recommends as the preferable
>> alternative to eval(parse(...)) . But, heck -- all such software
>> principles are just guidelines. Whatever works (robustly).
>>
>> HTH.
>>
>> Cheers,
>> Be

[R] Windows Task Scheduler and R updates. Need basic tips

2012-05-17 Thread Paul Johnson

This is a basic Windows system administrator problem, asked by a Linux
guy who is helping out in a Windows lab.

I want to keep R packages up to date on MS Windows 7 with a job in the
"Task Scheduler".  I have an R program that I can run (as
administrator) that updates the existing packages and then installs
all the new ones.

I do not understand how to run that in a dependable way in the scheduler.

If I put the update script "R-update.R" in, for example, in

C:\Program Files\R\R-update.R

Then what?  Do I need a CMD batch script to run the R script?

I can't tell where Windows wants to write the standard output and
error for my R job.

And while I'm asking, does Windows care if I run

R CMD BATCH C:\Program Files\R\R-update.R

or

R --vanilla -f C:\Program Files\R\R-update.R

pj
-- 
Paul E. Johnson
Professor, Political Science    Assoc. Director
1541 Lilac Lane, Room 504     Center for Research Methods
University of Kansas               University of Kansas
http://pj.freefaculty.org            http://quant.ku.edu

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] glmmADMB

2012-05-09 Thread Paul Johnson

On Tue, May 8, 2012 at 9:06 PM, rbuxton  wrote:

> Update!
>
> I changed the "site" categories.  I noticed that I had coded them as
> "North,
> South, East, West" on different islands, which may have caused confusion in
> the model.
>
> [...]

> mod <- glmmadmb(LESP.CHUCKLE~ 1+(1|ISLAND), data=callsna,
> zeroInflation=TRUE, family="nbinom")
>
> Any thoughts?
>
> YES.

POST THE ENTIRE R PROGRAM YOU ARE RUNNING as well as a link to the EXACT
data set that is causing the problem.

pj

> Thanks so so much!
>
> Rachel Buxton
>
>

-- 
Paul E. Johnson
Professor, Political ScienceAssoc. Director
1541 Lilac Lane, Room 504 Center for Research Methods
University of Kansas   University of Kansas
http://pj.freefaculty.orghttp://quant.ku.edu

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] low R square value from ANCOVA model

2012-05-08 Thread Paul Johnson

On Tue, May 8, 2012 at 3:45 PM, array chip  wrote:
> Thanks again Peter. What about the argument that because low R square (e.g. 
> R^2=0.2) indicated the model variance was not sufficiently explained by the 
> factors in the model, there might be additional factors that should be 
> identified and included in the model. And If these additional factors were 
> indeed included, it might change the significance for the factor of interest 
> that previously showed significant coefficient. In other word, if R square is 
> low, the significant coefficient observed is not trustworthy.
>
> What's your opinion on this argument?

I think that argument is silly. I'm sorry if that is too blunt. Its
just plain superficial.
 It reflects a poor understanding of what the linear model is all
about. If you have
other variables that might "belong" in the model, run them and test.
The R-square,
either low or high, does not have anything direct to say about whether
those other
variables exist.

Here's my authority.

Arthur Goldberger (A Course in Econometrics, 1991, p.177)
“Nothing in the CR (Classical Regression) model requires that R2 be high. Hence,
a high R2 is not evidence in favor of the model, and a low R2 is not evidence
against it.”

I found that reference in Anders Skrondal and  Sophia Rabe-Hesketh,
Generalized Latend Variable Modeling: Multilevel, Longitudinal,
and Structural Equation Models, Boca Raton, FL: Chapman and Hall/CRC, 2004.

>From Section 8.5.2:

"Furthermore, how badly the baseline model fits the data depends greatly
on the magnitude of the parameters of the true model. For instance, consider
estimating a simple parallel measurement model. If the true model is a
congeneric measurement model (with considerable variation in factor loadings
and measurement error variances between items), the fit index could be high
simply because the null model fits very poorly, i.e. because the
reliabilities of
the items are high. However, if the true model is a parallel measurement model
with low reliabilities the fit index could be low although we are estimating the
correct model. Similarly, estimating a simple linear regression model can yield
a high R2 if the relationship is actually quadratic with a considerable linear
trend and a low R2 when the model is true but with a small slope (relative to
the overall variance)."

For a detailed argument/explanation of the argument that the R-square is not
a way to decide if a model is "good" or "bad" see

King, Gary. (1986). How Not to Lie with Statistics: Avoiding Common Mistakes in
Quantitative Political Science. American Journal of Political Science,
30(3), 666–687. doi:10.2307/2111095

pj
-- 
Paul E. Johnson
Professor, Political Science    Assoc. Director
1541 Lilac Lane, Room 504     Center for Research Methods
University of Kansas               University of Kansas
http://pj.freefaculty.org            http://quant.ku.edu

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] glmmADMB

2012-05-08 Thread Paul Johnson

On Tue, May 8, 2012 at 5:16 PM, rbuxton  wrote:
> http://r.789695.n4.nabble.com/file/n4618871/Data_for_list_serve.csv
> Data_for_list_serve.csv
>
> Here is my data, hope this helps.
>

> The "LESP CHUCKLE" , "FTSP FLIGHT", and "ANMU CHIRRUP" are the dependent
> variables, I want to run one model for each.
>

I've not succeeded in replicating your work, the data set has
something funny perhaps.

I saved  your file "data.csv" and ran this:

callsna <- read.table("data.csv", sep=",", header=T)

library(glmmADMB)

mod <- glmmadmb(LESP.CHUCKLE ~ Years.Erad + IS + Ref + Dist.Buldir +
Food+Moon+Wind.Speed + (1|SITE/ISLAND), data=callsna,
zeroInflation=TRUE, family="nbinom")

I don't get as far as you do.

> mod <- glmmadmb(LESP.CHUCKLE ~ Years.Erad + IS + Ref + Dist.Buldir + 
> Food+Moon+Wind.Speed + (1|SITE/ISLAND), data=callsna, zeroInflation=TRUE, 
> family="nbinom")
Error in II[, ii] = II[, ii] + REmat$codes[[i]] :
  number of items to replace is not a multiple of replacement length
In addition: Warning messages:
1: In glmmadmb(LESP.CHUCKLE ~ Years.Erad + IS + Ref + Dist.Buldir +  :
  NAs removed in constructing fixed-effect model frame: you should
probably remove them manually, e.g. with na.omit()
2: In II[, ii] + REmat$codes[[i]] :
  longer object length is not a multiple of shorter object length
> callsna <- na.omit(callsna)
> mod <- glmmadmb(LESP.CHUCKLE ~ Years.Erad + IS + Ref + Dist.Buldir + 
> Food+Moon+Wind.Speed + (1|SITE/ISLAND), data=callsna, zeroInflation=TRUE, 
> family="nbinom")
Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) :
  contrasts can be applied only to factors with 2 or more levels


I wondered if the function you are using is supposed to be able to do
that.  I looked back at the web page for glmmADMB (note
capitalization), http://glmmadmb.r-forge.r-project.org/ and it says:

"Zero-inflation (currently only as a single constant term across all groups)"

But I'll try again if you post your R code that works with that CSV
file you posed, I may try again.

pj

> So, again the desired model is:
>
> mod <- glmmadmb(LESP.CHUCKLE~
> Years.Erad+IS+Ref+Dist.Buldir+Food+Moon+Wind.Speed+(1|SITE/ISLAND),
> data=callsna,
> zeroInflation=TRUE, family="nbinom")
>
> cheers!
> Rachel
>
>
> --
> View this message in context: 
> http://r.789695.n4.nabble.com/glmmADMB-tp4616701p4618871.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 
Paul E. Johnson
Professor, Political Science    Assoc. Director
1541 Lilac Lane, Room 504     Center for Research Methods
University of Kansas               University of Kansas
http://pj.freefaculty.org            http://quant.ku.edu

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] what folder to run write_PACKAGES in?

2012-05-08 Thread Paul Johnson

I set up a local repo for testing packages. My packages are not
showing up from the repository when viewed by Linux clients. I suspect
this is a web administrator/firewall issue, but it could be I created
the repo wrongly.  I am supposed to run write_PACKAGES separately in
each R-version folder. Right?

Maybe other novices can use these scripts, if they are not wrong :)

Here's the file structure. On the file space that the Web server can
see, I create a folder "/tools/kran" and directories

 bin
macosx
 leopard
  contrib
2.13
2.14
2.15

windows
 contrib
   2.13
   2.14
   2.15
src
contrib
   2.13
   2.14
   2.15

That's created by this:
#
create_repo_tree <- function(local.repos, rversions){

folders  <-  c("/bin/windows/contrib",
"/bin/macosx/leopard/contrib", "/src/contrib")
for(dir in folders){
dirs <- paste(local.repos, dir, "/", rversions, sep='')
lapply(dirs, dir.create, recursive = TRUE, showWarnings = TRUE)
}
}

create_repo_tree("/tools/kran", c(2.13, 2.14, 2.15))
###

My CRAN mirror is in a sister folder /tools/cran and that works
properly to be served at the address http://rweb.quant.ku.edu/cran.

I want our local testing thing to show at similar
http://rweb.quant.ku.edu/kran.  Supposing the Apache web server magic
is done, I *believe* the following should work.

I dropped packages in the right version folders, and I wrote a script
that goes separately to each version number folder and runs
write_PACKAGES.

### Researchers can upload
### packages into the approrpriate folder.
### Administratively, we schedule this run run every night
write_PACKAGES_wrapper <- function(local.repos) {
require(tools)

rversions <-  dir(path = paste(local.repos,
"/bin/windows/contrib", sep=""), full.names = TRUE)
for (i in rversions) write_PACKAGES(dir = i, subdirs=TRUE,
type="win.binary")

#repeat
rversions <-  dir(path = paste(local.repos,
"/bin/macosx/leopard/contrib", sep=""), full.names = TRUE)
for (i in rversions) write_PACKAGES(dir = i, subdirs=TRUE,
type="mac.binary")

rversions <-  dir(path = paste(local.repos, "/src/contrib",
sep=""), full.names = TRUE)
for (i in rversions) write_PACKAGES(dir = i, subdirs=TRUE, type="source")
}


write_PACKAGES_wrapper("/tools/kran")

#

Right?

After running that, I do see the PACKAGES files appear under the
version number directories.

However, from the linux clients I see this:

> install.packages("rockchalk", repos="http://rweb.quant.ku.edu/kran";)
Installing package(s) into ‘/home/pauljohn/R/x86_64-pc-linux-gnu-library/2.15’
(as ‘lib’ is unspecified)
Warning: unable to access index for repository
http://rweb.quant.ku.edu/kran/src/contrib
Warning message:
package ‘rockchalk’ is not available (for R version 2.15.0)

The Web administrator here suggests I've done the write_PACKAGES
incorrectly because there is no PACKAGES file in
/tools/kran/src/contrib.  But I do have PACKAGES files in the
subfolders 2.15.


However, on a windows system, it does work.

> install.packages("rockchalk", repos="http://rweb.quant.ku.edu/kran";)
trying URL 
'http://rweb.quant.ku.edu/kran/bin/windows/contrib/2.15/rockchalk_1.5.5.06.zip'
Content type 'application/zip' length 486682 bytes (475 Kb)
opened URL
downloaded 475 Kb

package ‘rockchalk’ successfully unpacked and MD5 sums checked

The downloaded binary packages are in
C:\Users\pauljohn32\AppData\Local\Temp\Rtmpq0m3Id\downloaded_packages

The Web admin folks say to me, "if we did it wrong, nothing would
work. Some does, so it is your fault."

-- 
Paul E. Johnson
Professor, Political Science    Assoc. Director
1541 Lilac Lane, Room 504     Center for Research Methods
University of Kansas               University of Kansas
http://pj.freefaculty.org            http://quant.ku.edu

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Solve an ordinary or generalized eigenvalue problem in R?

2012-04-20 Thread Paul Johnson

On Fri, Apr 20, 2012 at 2:18 PM, Jonathan Greenberg  wrote:
> Ok, I figured out a solution and I'd like to get some feedback on this from
> the R-helpers as to how I could modify the following to be "package
> friendly" -- the main thing I'm worried about is how to dynamically set the


There's documentation on this in the "Writing R Extensions" manual,
See 1.2.1 Using Makevars and Section 6.7 "Numerical analysis
subrouties." They recommend looking at the package fastICA. It appears
to me you will use some platform neutral Makefile variables.

If you give a full working example from windows--your function, data
to use it--I'll see what I can do on the Linux side.  I've never had
need to worry about this question before, but it is about time I
learned.

Before you think about packaging something like this, I'd say step one
is to clean up your code so it is easier to read.  R CMD check won't
let you get bye with T or F in place of TRUE and FALSE, and you seem
to mix <- and = in your code, which makes it difficult to read.

pj


> "dyn.load" statement below correctly (obviously, its hard coded to my
> particular install, and would only work with windows since I'm using a
> .dll):
>
> Rdggev <- function(JOBVL=F,JOBVR=T,A,B)
> {
> # R implementation of the DGGEV LAPACK function (with generalized
> eigenvalue computation)
> # See http://www.netlib.org/lapack/double/dggev.f
>  # coded by Jonathan A. Greenberg 
>  dyn.load("C:\\Program Files\\R\\R-2.14.2\\bin\\x64\\Rlapack.dll")
>
> if(JOBVL)
> {
> JOBVL="V"
> } else
> {
> JOBVL="N"
> }
>  if(JOBVR)
> {
> JOBVR="V"
> } else
> {
> JOBVR="N"
> }
>  # Note, no error checking is performed.
>  # Input parameters
> N=dim(A)[[1]]
> LDA=N
> LDB=N
> LDVL=N
> LDVR=N
> LWORK=as.integer(max(1,8*N))
>  Rdggev_out <- .Fortran("dggev", JOBVL, JOBVR, N, A, LDA, B, LDB,
> double(N), double(N), double(N),
> array(data=0,dim=c(LDVL,N)), LDVL, array(data=0,dim=c(LDVR,N)), LDVR,
> double(max(1,LWORK)), LWORK, integer(1))
>
> names(Rdggev_out)=c("JOBVL","JOBVR","N","A","LDA","B","LDB","ALPHAR","ALPHAI",
> "BETA","VL","LDVL","VR","LDVR","WORK","LWORK","INFO")
>
> Rdggev_out$GENEIGENVALUES=(Rdggev_out$ALPHAR+Rdggev_out$ALPHAI)/Rdggev_out$BETA
>  return(Rdggev_out)
> }
>
>
>
> On Fri, Apr 20, 2012 at 12:01 PM, Jonathan Greenberg wrote:
>
>> Incidentally, if you want to test this out:
>>
>> > A
>>          [,1]     [,2]     [,3]
>> [1,] 1457.738 1053.181 1256.953
>> [2,] 1053.181 1213.728 1302.838
>> [3,] 1256.953 1302.838 1428.269
>>
>> > B
>>          [,1]     [,2]     [,3]
>> [1,] 4806.033 1767.480 2622.744
>> [2,] 1767.480 3353.603 3259.680
>> [3,] 2622.744 3259.680 3476.790
>>
>> I THINK this is correct for the other parameters:
>>         JOBVL="N"
>>         JOBVR="N"
>> N=dim(A)[[1]]
>>  LDA=N
>> LDB=N
>> LDVL=N
>>  LDVR=N
>> LWORK=max(1,8*N)
>>
>> .Call("La_dgeev", JOBVL, JOBVR, N, A, LDA, B, LDB, ALPHAR=NA, ALPHAI=NA,
>> BETA=NA,
>>  VL=NA, LDVL, VR=NA, LDVR, WORK=NA, LWORK, INFO=NA, PACKAGE="base")
>>  LAPACK's documentation is:
>> http://www.netlib.org/lapack/double/dggev.f
>>
>> --j
>>
>> On Fri, Apr 20, 2012 at 11:58 AM, Jonathan Greenberg 
>> wrote:
>>
>>> So I am a complete noob at doing this type of linking.  When I write this
>>> statement (all the input values are assigned, but I have to confess I don't
>>> know what to do with the output variables -- in this call they are all
>>> assigned "NA"):
>>>
>>> .Call("La_dgeev", JOBVL, JOBVR, N, A, LDA, B, LDB, ALPHAR=NA, ALPHAI=NA,
>>> BETA=NA,
>>> + VL=NA, LDVL, VR=NA, LDVR, WORK=NA, LWORK, INFO=NA, PACKAGE="base")
>>>
>>> I get:
>>>
>>> Error in .Call("La_dgeev", JOBVL, JOBVR, N, A, LDA, B, LDB, ALPHAR = NA,
>>>  :
>>>   C symbol name "La_dgeev" not in DLL for package "base"
>>>
>>> I'm running this on Windows 7 x64 version of R 2.14.2.  The
>>> R_ext/Lapack.h file states:
>>>
>>> /* DGEEV - compute for an N-by-N real nonsymmetric matrix A, the */
>>> /* eigenvalues and, optionally, the left and/or right eigenvectors */
>>> La_extern void
>>> F77_NAME(dgeev)(const char* jobvl, const char* jobvr,
>>> const int* n, double* a, const int* lda,
>>> double* wr, double* wi, double* vl, const int* ldvl,
>>>  double* vr, const int* ldvr,
>>> double* work, const int* lwork, int* info);
>>>
>>> Any ideas?
>>>
>>> --j
>>>
>>>
>>>
>>> On Fri, Apr 20, 2012 at 1:37 AM, Berend Hasselman  wrote:
>>>

 On 19-04-2012, at 20:50, Jonathan Greenberg wrote:

 > Folks:
 >
 > I'm trying to port some code from python over to R, and I'm running
 into a
 > wall finding R code that can solve a generalized eigenvalue problem
 > following this function model:
 >
 >
 http://docs.scipy.org/doc/scipy/reference/generated/scipy.linalg.eig.html
 >
 > Any ideas?  I don't want to call python from within R for various
 reasons,
 > I'd prefer a "native" R solution if one exists.  Thanks!

 An old thread is here:
 http://tolstoy.newcastle.edu.au/R/help/05/06/6802.html

 R does provid

[R] predictOMatic for regression. Please try and advise me

2012-04-20 Thread Paul Johnson

I'm pasting below a working R file featuring a function I'd like to polish up.

I'm teaching regression this semester and every time I come to
something that is very difficult to explain in class, I try to
simplify it by writing an R function (eventually into my package
"rockchalk").  Students have a difficult time with predict and newdata
objects, so right now I'm concentrated on that problem.

Fitting regressions is pretty easy, but getting predicted values for
particular combinations of input values can be difficult when the
model is complicated. If you agree, please try out the following and
let me know how its use could be enhanced, or what other features you
might want.

I know folks are busy, so to save you the trouble of actually running
the code, I also paste in a session demonstrating one run through.

Here's the code:

##Paul Johnson
## 2012-04-20
## Facilitate creation of "newdata" objects for use in predict
## statements (for interpretation of regression models)

## newdataMaker creates the newdata frame required in predict.
## It supplies a data frame with all input variables at a
## central value (mean, mode) except for some variables that
## the user wants to focus on. User should
## supply a fitted model "model" and a focus list "fl" of variable
## values. See examples below.  The list must be a named list,
## using names of variables from the regression formula.
## It is not needed to call this
## directly if one is satisfied with the results from
## predictOMatic

newdataMaker <- function (model = NULL, fl = NULL){
mf <- model.frame(model)
mterms <- terms(model)
mfnames <- colnames(mf)
modelcv <- centralValues(mf)
if (sum(!names(fl) %in% mfnames) > 0) stop(cat(c("Error. The focus
list (fl) requests variables that are not included in the original
model. The names of the variables in the focus list be drawn from this
list: ",  mfnames)))
## Consider "padding" range of fl for numeric variables so that we
## get newdata objects including the min and max values.
mixAndMatch <- expand.grid(fl)
mixAndMatch$combo <- 1:nrow(mixAndMatch)
newdf <- cbind(mixAndMatch, modelcv[  ,!colnames(modelcv) %in%
colnames(mixAndMatch)])
newdf
}



predictOMatic <- function(model = NULL, fl = NULL, ...){
nd <- newdataMaker(model, fl)
fit <- predict(model, newdata=nd, ...)
cbind(fit, nd)
}



set.seed(12345)
x1 <- rnorm(100)
x2 <- rnorm(100)
x3 <- rnorm(100)
x4 <- rnorm(100)
x5 <- rpois(100, 5)
x6 <- rgamma(100, 2,1)
xcat1 <- gl(2,50, labels=c("M","F"))
xcat2 <- cut(rnorm(100), breaks=c(-Inf, 0, 0.4, 0.9, 1, Inf),
labels=c("R", "M", "D", "P", "G"))
dat <- data.frame(x1, x2, x3, x4, x5, xcat1, xcat2)
rm(x1, x2, x3, x4, xcat1, xcat2)
dat$y <- with(dat, 0.03 + 0.1*x1 + 0.1*x2 + 0.4*x3 -0.1*x4 + 0.01*x5 +
0.1*x6 + 2*rnorm(100))

##ordinary regression.
m1 <- lm(y ~ x1 + x2 + x3 + x4 + x5 + x6 + xcat1 + xcat2, data=dat)
summary(m1)
## Use the fitted model's data frame for analysis
m1mf <- model.frame(m1)

## If you have rockchalk 1.5.4, run these:
library(rockchalk)
summarize(m1mf)
## otherwise, run
summary(m1mf)

## First, overview for values of xcat1
newdataMaker(m1, fl = list(xcat1 = levels(m1mf$xcat1)))

## mix and match all combinations of xcat1 and xcat2
newdataMaker(m1, fl = list(xcat1 = levels(m1mf$xcat2), xcat2 =
levels(m1mf$xcat2)))

## Pick some particular values for focus
newdataMaker(m1, fl = list(x2 = c(0.25, 1.0), xcat2 = c("M","D")))

## Generate a newdata frame and predictions in one step
predictOMatic(m1, fl = list(x2 = c(0.25, 1.0), xcat2 = c("M","D")))


predictOMatic(m1, fl = list(x2 = c(0.25, 1.0), xcat2 = c("M","D")),
interval="conf")

newdf <- predictOMatic(m1, fl = list(x2 = c(0.25, 1.0), xcat2 =
c("M","D"), x1=plotSeq(dat$x1)))

plot(y ~ x1, data= model.frame(m1))
by(newdf, list(newdf$x2, newdf$xcat2), function(x) {lines(x$x1, x$fit)})

newdataMaker(m1, fl = list(x2 = c(-1,0, 1), xcat2 = c("M","D")))

predictOMatic(m1, fl = list(x2 = range(dat$x2), xcat2 = c("M","D")))

newdf <- predictOMatic(m1, fl = list(x2 = quantile(dat$x2), xcat2 = c("M","D")))
plot(y ~ x2 , data=model.frame(m1))

lines(y ~ x2,  newdf)


predictOMatic(m1, fl = list(x2 = c(50, 60), xcat2 = c("M","D")),
interval="conf")

## just gets the new data
nd <- newdataMaker(m1, fl = list(x2 = c(50, 60), xcat2 = c("M","D")))

pr <- predictOMatic(m1, fl = list(x2 = c(50, 60), xcat2 = c("M","D")),
interval="conf")

##
m2 <- glm(xcat2 ~ x1 + x2 + x3 + xcat1, data=dat, family=binomial(logit))
summary(m2)
m2mf &l

[R] [R-pkgs] rockchalk_1.5.4 posted

2012-04-10 Thread Paul Johnson

Greetings:

rockchalk is a collection of functions to facilitate presentation of regression 
models.
It includes some functions that I have been circulating for quite some time 
(such as
"outreg") as well as several others. The main aim is to allow people who do not
understand very much R to survive a course in intermediate regression analysis.
The examples included with the functions include more than the usual amount of 
detail.

This version features

1) a full vignette called "rockchalk" that illustrates many of the
functions in the package.  The vignette includes an explanation of why 
mean-centering
and residual-centering do not help with the "inessential multicollinearity" 
problem
that concerns users of regression models in which there are interactions or
squared terms.

2) a function "summarize" that is intended to remedy some of the shortcomings I
perceive in R's summary function.

The package is not coordinated with any particular textbook and should help in
any class in which students are expected to estimate regression models, 
summarize
them in professional-looking tables, and conduct various follow up diagnostics.

Highlights:

plotSlopes & testSlopes: draw "simple slopes" graphs and conduct hypothesis 
tests
for moderator variables.

plotCurves: an attempt to provide a "termplot" replacement that allows plots 
that
incorporate mediators.

plotPlane: a wrapper for persp to plot regressions in 3d (similar to 
scatterplot3d)

standardize, meanCenter, residualCenter: after fitting a non-centered model, 
use these
to experiment with the benefits (or lack thereof) that flow from centering.

-- 
Paul E. Johnson email: paulj...@ku.edu
http://pj.freefaculty.org   Assoc. Director
Professor, Political ScienceCenter for Research Methods and Data Analysis
1541 Lilac Lane, Rm 504 1425 Jayhawk Blvd.  
University of KansasWatson Library, Rm. 470 
Lawrence, Kansas 66045-3129 Lawrence, Kansas 66045-7555
Ph: (785) 864-3523  Ph: (785) 864-3353

___
R-packages mailing list
r-packa...@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-packages

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] beginner's loop issue

2012-03-13 Thread Paul Johnson

On Tue, Mar 13, 2012 at 11:27 AM, aledanda  wrote:
> Dear All,
>
> I hope you don't mind helping me with this small issue. I haven't been using
> R in years and I'm trying to fill in a matrix
> with the output of a function (I'm probably using the Matlab logic here and
> it's not working).
> Here is my code:
>
> for (i in 1:length(input)){
>  out[i,1:3] <- MyFunction(input[i,1],input[i,2], input[i,3])
>    out[i,4:6] <- MyFunction(input[i,5],input[i,7], input[i,6])
>      out[i,7:9] <- MyFunction(input[i,8],input[i,10], input[i,9])
> }
>
> 'input' is a matrix
>> dim(input)
> [1] 46 10
>
> and each raw corresponds to a different subject.
> The error I get here is
>
> /Error in out[i, 1:3] <- get.vaTer(input[i, 2], input[i, 4], input[i, 3],  :
>  object 'out' not found/

out has to exist first, as previous commenter said.

Furthermore, suggestions:

Consider making MyFunction accept a vector of 3 arguments, rather than
separate arguments.

Consider making out 3 columns, as in

out <- matrix(0, nrow=N, ncol=3)
for(i ...){
out[i,1:3] <- MyFunction(input[i,1:3])
out[i,1:3] <- MyFunction(input[i,4:6])
out[i,1:3] <- MyFunction(input[i,7:9])
}

If you could re-shape your input "thing" as a list with one element
that needs to go into MyFunction, this could get easier still:

lapply(input, MyFunction)

 or if input were an array with 3 columns, you could revise MyFuntion
to accept a 3-vector.

apply(input, 1, MyFunction)

Hardly ever in R does one need to specify inputs as you have done in
your example.
pj

-- 
Paul E. Johnson
Professor, Political Science    Assoc. Director
1541 Lilac Lane, Room 504     Center for Research Methods
University of Kansas               University of Kansas
http://pj.freefaculty.org            http://quant.ku.edu

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] simulate an gradually increase of the number of subjects based on two variables

2012-03-13 Thread Paul Johnson

Suggestion below:

On Tue, Mar 13, 2012 at 1:24 PM, guillaume chaumet
 wrote:
> I omit to precise that I already try to generate data based on the mean and
> sd of two variables.
>
> x=rnorm(20,1,5)+1:20
>
> y=rnorm(20,1,7)+41:60
>
> simu<-function(x,y,n) {
>    simu=vector("list",length=n)
>
>    for(i in 1:n) {
>        x=c(x,rnorm(1,mean(x),sd(x)))
>        y=c(y,rnorm(1,mean(y),sd(y)))
>        simu[[i]]$x<-x
>        simu[[i]]$y<-y
>
>
>    }
>
>    return(simu)
> }
>
> test=simu(x,y,60)
> lapply(test, function(x) cor.test(x$x,x$y))
>
> As you could see, the correlation is disappearing with increasing N.
> Perhaps, a bootstrap with lm or cor.test could solve my problem.
>

In this case, you should consider creating the LARGEST sample first,
and then remove cases to create the smaller samples.

The problem now is that you are drawing a completely fresh sample
every time, so you are getting not only the effect of sample size, but
also that extra randomness when case 1 is replaced every time.

 I am fairly confident (80%)  that if you approach it my way, the
mystery you see will start to clarify itself.  That is, draw the big
sample with the desired characteristic, and once you understand the
sampling distribution of cor for that big sample,  you will also
understand what happens when each large sample is reduced  by a few
cases.

BTW, if you were doing this on a truly massive scale, my way would run
much faster.  Allocate memory once, then don't need to manually delete
lines, just trim down the index on the rows.  (Same data access
concept as bootstrap).

pj

-- 
Paul E. Johnson
Professor, Political Science    Assoc. Director
1541 Lilac Lane, Room 504     Center for Research Methods
University of Kansas               University of Kansas
http://pj.freefaculty.org            http://quant.ku.edu

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] multi-platform equivalent of x11() ?

2012-03-06 Thread Paul Johnson

I want to write an R help example that throws up 2 graphs in separate
windows, for comparison. In Linux I plot one, then run

x11()

to spawn a new on-screen device.

Is there some generic equivalent so I can write an example that will
work for Windows and Mac users as well?

If there is none, don't you think it would be fun if there were?

pj

-- 
Paul E. Johnson
Professor, Political Science    Assoc. Director
1541 Lilac Lane, Room 504     Center for Research Methods
University of Kansas               University of Kansas
http://pj.freefaculty.org            http://quant.ku.edu

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] replacing characters in matrix. substitute, delayedAssign, huh?

2012-01-30 Thread Paul Johnson

Henrik's proposal works well, so far.  Thanks very much. I could not
have figured that out (without much more suffering).

Here's the working example
in case future googlers find their way to this thread.


## Paul Johnson 
## 2012-01-30

## Special thanks to r-help email list contributors,
## especially Henrik Bengtsson


BM <- matrix("0.1", 5, 5)

BM[2,1] <- "a"
BM[3,2] <- "b"

BM

parseAndEval <- function(x, ...) eval(parse(text=x))

a <- 0.5
b <- 0.4

realBM <- apply(BM, MARGIN=c(1,2), FUN=parseAndEval)

BM[4,5] <- "rnorm(1, m=7, sd=1)"

BM

realBM <- apply(BM, MARGIN=c(1,2), FUN=parseAndEval)

realBM

## Now, what about gui interaction with that table?
## The best "nice looking" options are not practical at the moment.

## Try this instead

data.entry(BM)

## That will work on all platforms, so far as I know, without
## any special effort from us. Run that, make some changes, then
## make sure you insert new R variables to match in your environment.

## Suppose you inserted the letter z in there somewhere

## set z out here

z <- rpois(1, lambda=10)

realBM <- apply(BM, MARGIN=c(1,2), FUN=parseAndEval)


-- 
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] replacing characters in matrix. substitute, delayedAssign, huh?

2012-01-30 Thread Paul Johnson

A user question today has me stumped.  Can you advise me, please?

User wants a matrix that has some numbers, some variables, possibly
even some function names.  So that has to be a character matrix.
Consider:

> BM <- matrix("0.1", 5, 5)

Use data.entry(BM) or similar to set some to more abstract values.

> BM[3,1] <- "a"
> BM[4,2] <- "b"
> BM[5,2] <- "b"
> BM[5,3] <- "d"
> BM
 var1  var2  var3  var4  var5
[1,] "0.1" "0.1" "0.1" "0.1" "0.1"
[2,] "0.1" "0.1" "0.1" "0.1" "0.1"
[3,] "a"   "0.1" "0.1" "0.1" "0.1"
[4,] "0.1" "b"   "0.1" "0.1" "0.1"
[5,] "0.1" "b"   "d" "0.1" "0.1"

Later on, user code will set values, e.g.,

a <- rnorm(1)
b <- 17
d <- 4

Now, push those into "BM", convert whole thing to numeric

newBM <- apply(BM, c(1,2), as.numeric)

and use newBM for some big calculation.

Then re-set new values for a, b, d, do the same over again.

I've been trying lots of variations on parse, substitute, and eval.

The most interesting function I learned about this morning was delayedAssign.
If I had only to work with one scalar, it does what I want

> delayedAssign("a", whatA)
> whatA <- 91
> a
[1] 91

I can't see how to make that work in the matrix context, though.

Got ideas?

pj

> sessionInfo()
R version 2.14.1 (2011-12-22)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=C LC_NAME=C
 [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

loaded via a namespace (and not attached):
[1] tools_2.14.1

-- 
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] question on model.matrix

2012-01-30 Thread Paul Johnson

Greetings

On Sat, Jan 28, 2012 at 2:43 PM, Daniel Negusse
 wrote:
>>
>>

>> while reading some tutorials, i came across this and i am stuck. i want to 
>> understand it and would appreciate if anyone can tell me.
>>
>> design <- model.matrix(~ -1+factor(c(1,1,2,2,3,3)))
>>
>> can someone break down this code and explain to me what the "~", and the 
>> "-1+factor" are doing?

A formula would be y ~ x, so when you don't include y, it means you
only want the right hand side variables.  The term design matrix
generally means the numeric coding that is fitted in a statistical
procedure.

The -1 in the formula means "do not insert an intercept for me."  It
affects the way the factor variable is converted to numeric contrasts
in the design matrix.   If there is an intercept, then the contrasts
have to be adjusted to prevent perfect multicollinearity.

If you run a few examples, you will see. This uses lm, but the formula
and design matrix ideas are same. Note, with an intercept, I get 3
dummy variables from x2, but with no intercept, I get 4 dummies:

> x1 <- rnorm(16)
> x2 <- gl(4, 4, labels=c("none","some","more","lots"))
> y <- rnorm(16)
> m1 <- lm(y ~ x1 + x2)
> model.matrix(m1)
   (Intercept)  x1 x2some x2more x2lots
11 -0.2567  0  0  0
21  0.94963659  0  0  0
31  0.06915561  0  0  0
41  0.89971204  0  0  0
51  0.73817482  1  0  0
61  2.92451195  1  0  0
71 -0.80682449  1  0  0
81  1.07472998  1  0  0
91  1.34949123  0  1  0
10   1 -0.42203984  0  1  0
11   1 -1.66316740  0  1  0
12   1 -2.83232063  0  1  0
13   1  1.26177313  0  0  1
14   1  0.10359857  0  0  1
15   1 -1.85671242  0  0  1
16   1 -0.25140729  0  0  1
attr(,"assign")
[1] 0 1 2 2 2
attr(,"contrasts")
attr(,"contrasts")$x2
[1] "contr.treatment"

> m2 <- lm(y ~ -1 + x1 + x2)
> model.matrix(m2)
x1 x2none x2some x2more x2lots
1  -0.2567  1  0  0  0
2   0.94963659  1  0  0  0
3   0.06915561  1  0  0  0
4   0.89971204  1  0  0  0
5   0.73817482  0  1  0  0
6   2.92451195  0  1  0  0
7  -0.80682449  0  1  0  0
8   1.07472998  0  1  0  0
9   1.34949123  0  0  1  0
10 -0.42203984  0  0  1  0
11 -1.66316740  0  0  1  0
12 -2.83232063  0  0  1  0
13  1.26177313  0  0  0  1
14  0.10359857  0  0  0  1
15 -1.85671242  0  0  0  1
16 -0.25140729  0  0  0  1
attr(,"assign")
[1] 1 2 2 2 2
attr(,"contrasts")
attr(,"contrasts")$x2
[1] "contr.treatment"

I think you'll need to mess about with R basics like plot and lm
before you go off using the formulas that you really care about.
Otherwise, well, you'll always be lost about stuff like "~" and "-1".

I've started posting all my lecture notes (source code, R code, pdf
output) http://pj.freefaculty.org/guides.  That might be a quick start
for you.

-- 
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] fix and edit don't work: unable to open X Input Method->segfault

2012-01-08 Thread Paul Johnson

I can't run fix() or edit() anymore. Did I break my system?

I'm running Debian Linux with R-2.14.1. As far as I can tell, the R
packages came from Debian's testing "wheezy" repository.  I would like
to know if users on other types of systems see the same problem. If
no, then, obviously, it is a Debian-only issue and I can approach it
from that point of view.  And if no other Debian users see same, it
means it is a me-only problem, and that's discouraging :)

I get this same R crash whether I try fix when R is running in a
terminal or in Emacs with ESS. I've not seen this before, but Google
leads to some bug reports on Ubuntu in 2007, where it was claimed that
the problem was fixed.

The really bad part is that the second try causes a segmentation fault
in R itself.


> library(ggplot2)
Loading required package: reshape
Loading required package: plyr

Attaching package: ‘reshape’

The following object(s) are masked from ‘package:plyr’:

rename, round_any

Loading required package: grid
Loading required package: proto

> sessionInfo()
R version 2.14.1 (2011-12-22)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=C LC_NAME=C
 [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] grid  stats graphics  grDevices utils datasets  methods
[8] base

other attached packages:
[1] ggplot2_0.8.9 proto_0.3-9.2 reshape_0.8.4 plyr_1.6
> fix(mpg)
Error in dataentry(datalist, modes) : invalid device
In addition: Warning message:
In edit.data.frame(get(subx, envir = parent), title = subx, ...) :
  unable to open X Input Method
> fix(mpg)

 *** caught segfault ***
address (nil), cause 'unknown'

Traceback:
 1: edit.data.frame(get(subx, envir = parent), title = subx, ...)
 2: edit(get(subx, envir = parent), title = subx, ...)
 3: fix(mpg)

Possible actions:
1: abort (with core dump, if enabled)
2: normal R exit
3: exit R without saving workspace
4: exit R saving workspace
Selection:


Same happens no matter what packages are loaded, so far as I can tell.
 Here it is without ggplot2, in case you were suspicious of those
particular datasets.


> library(datasets)
> datasets()
Error: could not find function "datasets"
> help(package=datasets)
> fix(CO2)
Error in dataentry(datalist, modes) : invalid device
In addition: Warning message:
In edit.data.frame(get(subx, envir = parent), title = subx, ...) :
  unable to open X Input Method



-- 
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] How to properly re-set a saved seed? I've got the answer, but no explanation

2012-01-06 Thread Paul Johnson

Hello, happy new year.

I've come back to a problem from last spring. I need to understand
what what is the technically correct method to save a seed and then
re-set it.  Although this example arises in the context of MT19937,
I'm needing to do this with other generators as well, including
L'Ecuyer streams.

The puzzle is this comment in ?Random: "‘set.seed’ is the recommended
way to specify seeds."

What I did not understand before, and can't make sense of now, is that
set.seed does not "re-set" a saved seed.  Here's my working example:


## Paul Johnson
## April 20, 2011

## If you've never dug into the R random number generator and its use of the
## default MT19937, this might be informative.  It will also help you
## avoid a time-consuming mistake that I've made recently.

## I've been developing a simulation and I want to save and restore random
## streams exactly. I got that idea from John Chambers Software for
## Data Analysis and I studied his functions to see how he did it.
## The problem, as we see below, is that set.seed() doesn't do what I expected,
## and I feel lucky to have noticed it now, rather than later.

## I wish set.seed did work the way I want, though, and that's why I'm
## writing here, to see if anybody else wishes the same.

## Here's a puzzle.  Try this:
set.seed(444)
s1 <- .Random.seed
runif(1)
rnorm(1)

set.seed(444)
runif(1)
rnorm(1)
## Those matched. Good

## Re-insert the "saved seed" s1
set.seed(s1)
runif(1)
rnorm(1)

## Why don't the draws match? I "put back" the seed, didn't I?
## Hm. Try again
set.seed(s1)
runif(1)
rnorm(1)

## Why did those match? But neither matches cases 1 and 2.

## I was baffled & discouraged.

## The help page for random numbers says:
##  ‘set.seed’ is the recommended way to specify seeds.

## But, it doesn't say something like

# set.seed(s1)

## is supposed to work. Ah. My mistake.

 Here's the fix and the puzzle ###
## To re-set the generator to its position at s1, it is required
## to do what the help page seems to say we should not.

.Random.seed <- s1

runif(1)
#

-- 
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] 3d plotting alternatives. I like persp, but regret the lack of plotmath.

2011-12-29 Thread Paul Johnson

I have been making simple functions to display regressions in a new
package called "rockchalk". For 3d illustrations, my functions use
persp, and I've grown to like working with it. As an example of the
kind of things I like to do, you might consult my lecture on
multicollinearity, which is by far the most detailed illustration I've
prepared.

http://pj.freefaculty.org/guides/stat/Regression/Multicollinearity/Multicollinearity-1-lecture.pdf

I used persp mainly because I can understand it, and it can be made to
work like plot in R, with additional tools like lines and points and
such. I don't want to interact with these plots, I just need to put
them into lectures & documents relatively easily. And I've also
succeeded in turning them into flash video with R2SWF, which works
great!

Last summer, I put together some lecture notes illustrating persp,
scatterplot3d, and persp3d.
http://pj.freefaculty.org/guides/Rcourse/plot-3d/plots-3d.pdf. As I
review that now, I see I did not make any progress on the lattice
based plotters, or I did not write it down, anyway. scatterplot3d did
almost everything I needed to do, but not everything, and so I used
persp in my newer effort.

Then I put some plot math in an axis label and ran into the problem
that everybody else who uses persp finds, eventually. persp doesn't
allow expressions in axis labels. From searching in r-help, I see
that many people have run up against the same trouble. Most people say
"too bad", I'll switch to some other tool.

Suggested alternatives.

1. Use wireframe or cloud in lattice. They can handle plotmath.

I've been studying that, and it can handle plot math, but I just can't
get the kinds of graphs I want from it. In the persp framework, I can
draw the 3d plot, and then add details to it one by one. I can
comprehend the little steps. In wireframe and cloud, it *appears* to
me i have to pack all of that work into one single command, and it is,
well, either too difficult or impossible. Or perhaps I'm just not
understanding the documentation. If I could make the sorts of plots I
need with lattice tools, I would do it. But I'm really blocked at the
front door by the "write one giant function call" nature of it.

I realize that's vague because I have not told you specifically what I
want to do. If there is a lattice expert reading, can you give me
some HOWTO hints? Take, for example, Slide 19 in this one:

http://pj.freefaculty.org/guides/stat/Regression/Multicollinearity/Multicollinearity-1-lecture.pdf

gray dots in the x1-x2 plane, blue points hovering above, red pointed
arrows from gray to blue.

And then later on, say slide 36-60, I have regression planes drawn in
with the arrows from the plane to the points.

2. Use rgl based tools. These ones are especially neat if you want to
interact with the graph--"spin" a 3d graph or get a display with
lively colors.

It works more like plot, in the sense that you can draw the figure and
then add components with points3d and so forth. And there are nice
working examples of extensions in misc3d. And also, the R CMDR
interface has a working pull down menu system that can make rgl 3d
graphs. That's a big plus.

After playing with some examples, I see these troubles. The only
output device that runs well (finishes and does not generate massive
files) is png. The output quality on the screen is quite beautiful,
but transition to black-and-white is not as nice looking as persp, in
my opinion. These graphs draw much more slowly. They are more
difficult to script out in an Sweave document, it seems to me.

If those criticisms are wrong, I would be glad to know.

So I'm back wondering why persp can't be "updated".

Nobody has explained why it is not possible to revise persp to allow
expressions in axis labels. Perhaps nobody has done it because people
think that persp has no fans :) But I'm a fan.

--
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

[R] pls help to print out first row of terms(model) output in example program

2011-12-19 Thread Paul Johnson

Greetings.

I've written a convenience function for multicollinearity diagnosis.
I'd like to report to the user the formula that is used in a
regression.  I get output like this:

> mcDiagnose(m1)
[1] "The following auxiliary models are being estimated and returned in a list:"
[1] "`x1` ~ ."
formula(fmla)()
[1] "`x2` ~ ."

I'd like to fill in the period with the variable names that are in there.

I know the information is in there, I just need to get it.  The terms
output for a fitted model has the output at the very top, in the first
line, above the attributes.  I just want to print out that first line,
here:

> terms(m2)
y ~ log(10 + x1) + poly(x2, 2)
attr(,"variables")
list(y, log(10 + x1), poly(x2, 2))
attr(,"factors")
 log(10 + x1) poly(x2, 2)
y   0   0
log(10 + x1)1   0
poly(x2, 2) 0   1
attr(,"term.labels")
[1] "log(10 + x1)" "poly(x2, 2)"
[snip]

In my working example code below , I need the help where I have "##fix
me fix me##


##Paul Johnson
## 2011-12-19
## mcDiagnose.R

lmAuxiliary <- function(model){
  dat <- as.data.frame(model.matrix(model))
  ## ivnames <- attr(delete.response(terms(model)), "term.labels")
  ## previous does not work with transforms like poly
  hasIntercept <- attr(terms(model), "intercept")

  if (hasIntercept) dat <- dat[ , -1] # removes intercept. assumes
intercept in column 1
  ivnames <- colnames(dat)
  print("The following auxiliary models are being estimated and
returned in a list:")

  results <- list()

  for (i in ivnames) {
fmla <- paste( "`", i, "`", " ~ ." , sep="");
print(fmla)
maux <- lm( formula(fmla), data=dat)
results[[ i ]] <- maux
print(maux$call[2])
###fix me fix me ##
  }
  results
}


mcDiagnose <- function(model){
  auxRegs <- lmAuxiliary(model)
  auxRsq <- numeric(length=length(auxRegs))
  j <- 0
  for ( i in auxRegs ){
j <- j + 1
sumry <- summary(i)
auxRsq[j] <- sumry$r.squared
  }
  print("Drum roll please!")
  print("And your R_j Squareds are (auxiliary Rsq)")
  print(names(auxRegs))
  print(auxRsq)
}

x1 <- rnorm(100)
x2 <- rnorm(100)
y <- rnorm(100)

m1 <- lm(y~x1+x2)

mcDiagnose(m1)


m2 <- lm(y~log(10+x1) + poly(x2,2))

mcDiagnose(m2)

-- 
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] fundamental guide to use of numerical optimizers?

2011-12-15 Thread Paul Johnson

I was in a presentation of optimizations fitted with both MPlus and
SAS yesterday.  In a batch of 1000 bootstrap samples, between 300 and
400 of the estimations did not converge.  The authors spoke as if this
were the ordinary cost of doing business, and pointed to some
publications in which the nonconvergence rate was as high or higher.

I just don't believe that's right, and if some problem is posed so
that the estimate is not obtained in such a large sample of
applications, it either means the problem is badly asked or badly
answered.  But I've got no traction unless I can actually do
better

Perhaps I can use this opportunity to learn about R functions like
optim, or perhaps maxLik.

>From reading r-help, it seems to me there are some basic tips for
optimization, such as:

1. It is wise to scale the data so that all columns have the same
range before running an optimizer.

2. With estimates of variance parameters, don't try to estimate sigma
directly, instead estimate log(sigma) because that puts the domain of
the solution onto the real number line.

3 With estimates of proportions, estimate instead the logit, for the
same reason.

Are these mistaken generalizations?  Are there other tips that
everybody ought to know?

I understand this is a vague question, perhaps the answers are just in
the folklore. But if somebody has written them out, I would be glad to
know.

-- 
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] how "big" (in RAM and/or disk storage) is each of these objects in a list?

2011-11-26 Thread Paul Johnson

Greetings, friends (and others :) )

We generated a bunch of results and saved them in an RData file. We
can open, use, all is well, except that the size of the saved file is
quite a bit larger than we expected.  I suspect there's something
floating about in there that one of the packages we are using puts in,
such as a spare copy of a data frame that is saved in some subtle way
that has escaped my attention.

Consider a list of objects. Are there ways to do these things:

1. ask R how much memory is used by the things inside the list?

2.   Does "as.expression(anObject)" print everything in there? Or, is
there a better way to convert each thing to text or some other format
that you can actually read line by line to see what is in there, to
"see" everything?

If there's no giant hidden data frame floating about, I figure I'll
have to convert symmetric matrices to lower triangles or such to save
space.  Unless R already is automatically saving a matrix in that way
but just showing me the full matrix, which I suppose is possible. If
you have other ideas about general ways to make saved objects smaller,
I'm open for suggestions.

-- 
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Models with ordered and unordered factors

2011-11-15 Thread Paul Johnson

On Tue, Nov 15, 2011 at 9:00 AM, Catarina Miranda
 wrote:
> Hello;
>
> I am having a problems with the interpretation of models using ordered or
> unordered predictors.
> I am running models in lmer but I will try to give a simplified example
> data set using lm.
> Both in the example and in my real data set I use a predictor variable
> referring to 3 consecutive days of an experiment. It is a factor, and I
> thought it would be more correct to consider it ordered.
> Below is my example code with my comments/ideas along it.
> Can someone help me to understand what is happening?

Dear Catarina:

I have had the same question, and I hope my answers help you
understand what's going on.

The short version:

http://pj.freefaculty.org/R/WorkingExamples/orderedFactor-01.R

The longer version, "Working with Ordinal Predictors"

http://pj.freefaculty.org/ResearchPapers/MidWest09/Midwest09.pdf

HTH
pj

>
> Thanks a lot in advance;
>
> Catarina Miranda
>
>
> y<-c(72,25,24,2,18,38,62,30,78,34,67,21,97,79,64,53,27,81)
>
> Day<-c(rep("Day 1",6),rep("Day 2",6),rep("Day 3",6))
>
> dataf<-data.frame(y,Day)
>
> str(dataf) #Day is not ordered
> #'data.frame':   18 obs. of  2 variables:
> # $ y  : num  72 25 24 2 18 38 62 30 78 34 ...
> # $ Day: Factor w/ 3 levels "Day 1","Day 2",..: 1 1 1 1 1 1 2 2 2 2 ...
>
> summary(lm(y~Day,data=dataf))  #Day 2 is not significantly different from
> Day 1, but Day 3 is.
> #
> #Call:
> #lm(formula = y ~ Day, data = dataf)
> #
> #Residuals:
> #    Min      1Q  Median      3Q     Max
> #-39.833 -14.458  -3.833  13.958  42.167
> #
> #Coefficients:
> #            Estimate Std. Error t value Pr(>|t|)
> #(Intercept)   29.833      9.755   3.058 0.00797 **
> #DayDay 2      18.833     13.796   1.365  0.19234
> #DayDay 3      37.000     13.796   2.682  0.01707 *
> #---
> #Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> #
> #Residual standard error: 23.9 on 15 degrees of freedom
> #Multiple R-squared: 0.3241,     Adjusted R-squared: 0.234
> #F-statistic: 3.597 on 2 and 15 DF,  p-value: 0.05297
> #
>
> dataf$Day<-ordered(dataf$Day)
>
> str(dataf) # "Day 1"<"Day 2"<"Day 3"
> #'data.frame':   18 obs. of  2 variables:
> # $ y  : num  72 25 24 2 18 38 62 30 78 34 ...
> # $ Day: Ord.factor w/ 3 levels "Day 1"<"Day 2"<..: 1 1 1 1 1 1 2 2 2 2 ...
>
> summary(lm(y~Day,data=dataf)) #Significances reversed (or "Day.L" and
> "Day.Q" are not sinonimous "Day 2" and "Day 3"?): Day 2 (".L") is
> significantly different from Day 1, but Day 3 (.Q) isn't.
>
> #Call:
> #lm(formula = y ~ Day, data = dataf)
> #
> #Residuals:
> #    Min      1Q  Median      3Q     Max
> #-39.833 -14.458  -3.833  13.958  42.167
> #
> #Coefficients:
> #            Estimate Std. Error t value Pr(>|t|)
> #(Intercept)  48.     5.6322   8.601 3.49e-07 ***
> #Day.L        26.1630     9.7553   2.682   0.0171 *
> #Day.Q        -0.2722     9.7553  -0.028   0.9781
> #---
> #Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> #
> #Residual standard error: 23.9 on 15 degrees of freedom
> #Multiple R-squared: 0.3241,     Adjusted R-squared: 0.234
> #F-statistic: 3.597 on 2 and 15 DF,  p-value: 0.05297
>
>        [[alternative HTML version deleted]]
>
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>



-- 
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Tinn-R

2011-10-05 Thread Paul Johnson

On Tue, Oct 4, 2011 at 1:25 PM, Charles McClure  wrote:
> I am new to R and have recently tried Tinn-R with very mixed and unexpected
> results.  Can you point me to a Tinn-R tutorial on the web or a decent
> reference book?
>

In my experience, TINN-R does not work so well, and most new users are
recommended to try instead Notepad++ with the addon components
R2notepad++ or Rstudio.  I have MS Windows setup tips here

http://web.ku.edu/~quant/cgi-bin/mw1/index.php?title=Windows:AdminTips

Until I see evidence otherwise, I'm concluding that TINN-R was the
best in 2008, but it is harder to configure now and there's no reason
to prefer it over Notepad++.  "Real Men"[tm] still use Emacs, but new
users may not have enough muscles :)

pj

> Thank you for your help;
>
> Charles McClure

-- 
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Running a GMM Estimation on dynamic Panel Model using plm-Package

2011-10-05 Thread Paul Johnson

On Sun, Jun 12, 2011 at 2:43 PM, bstudent  wrote:
> Hello,
>
> although I searched for a solution related to my problem I didn´t find one,
> yet. My skills in R aren´t very large, however.
> For my Diploma thesis I need to run a GMM estimation on a dynamic panel
> model using the "pgmm" - function in the plm-Package.
>
> The model I want to estimate is: "Y(t) = Y(t-1) + X1(t) + X2(t) + X3(t)" .
>
> There are no "normal" instruments in this model. There just should be the
> "gmm-instruments" I need for the model.
> In order to estimate it, I tried the following code:
>
>>
>> library(plm)
>>
>> test <- pgmm(Y ~ lag(Y, 1) + X1 + X2 + X3 | lag(Y, 1), data=Model,
>> effect="individual", model="onestep")
>>
>>
>
> I tried "Model" as "Modelp <- pdata.frame(..." and as "Model <-
> read.table(..." but in both cases there´s an error-massage:
>
> Error in solve.default(Reduce("+", A2)) :
>  System ist für den Rechner singulär: reziproke Konditionszahl =
> 4.08048e-22
>

Hello,

I have students working on similar problems.  Here is what I would say to them:

Without a dataset and code that is supposed to work, nobody can figure
out what's wrong and help you around it.

2 suggestions

1. directly contact Yves Croissant, the plm principal author, and
give him your R code and the data set. Show him the error output you
get.  Here's the contact information:   Yves Croissant

If he answers, please let us know.

If you don't want to (or can't) give real data, make some up that
causes the same crash.

2. post in here a link to your data and the full code and I will try
to debug it to at least find out where this is going wrong.  I've been
studying debugging with R functions and this is a good  opportunity
for me.

I stopped focusing on panel estimator details in 2000, so I'm rusty,
but will probably recognize most of what is going on.  If you don't
want to broadcast this to everybody, uou can feel free to contact me
directly, pauljohn at ku.edu is my university address.

PJ

-- 
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Testing for arguments in a function

2011-09-30 Thread Paul Johnson

Just for the record, following Bill Dunlap's advice, I think this is
the best answer to the question as originally posed is.


myfun <- function(vec, i=stop("'i' must be supplied")){
vec[i]
}

> myfun(1:40,10)
[1] 10
> myfun(1:10)
Error in myfun(1:10) : 'i' must be supplied
>

-- 
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Testing for arguments in a function

2011-09-28 Thread Paul Johnson

On Mon, Sep 26, 2011 at 2:39 PM, Gene Leynes  wrote:
> I don't understand how this function can subset by i when i is missing
>
> ## My function:
> myfun = function(vec, i){
>    ret = vec[i]
>    ret
> }
>
> ## My data:
> i = 10
> vec = 1:100
>
> ## Expected input and behavior:
> myfun(vec, i)
>
> ## Missing an argument, but error is not caught!
> ## How is subsetting even possible here???
> myfun(vec)
>

Hello, Gene:

It seems to me the discussion of your question launches off into a
more abstract direction than we need here.  I've found it is wise to
name arguments to functions differently than variables in the
environment, so you don't have the function looking for i outside
itself.  And you can put each variable to a ridiculous default to
force an error when that option is not given.  "NAN" is nothing here,
just a string that has no meaning, so we get an error as you said you
wanted.

myfun <- function(ivec, ii=NAN){
ivec[ii]
}
myfun(1:40,10)

works

myfun(1:40)

Produces

Error in myfun(1:40) : object 'NAN' not found

If you are happy enough to just plop out an error, there's no need to worry.

Note the function can be written more succinctly than you originally
had it, and you are generally advised to use "<-" rather than "=" by
the R developers.

-- 
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] seeking advice about rounding error and %%

2011-08-13 Thread Paul Johnson

A client came into our consulting center with some data that had been
damaged by somebody who opened it in MS Excel.  The columns were
supposed to be integer valued, 0 through 5, but some of the values
were mysteriously damaged. There were scores like 1.18329322 and such
in there.  Until he tracks down the original data and finds out what
went wrong, he wants to take all fractional valued scores and convert
to NA.

As a quick hack, I suggest an approach using %%

> x <- c(1,2,3,1.1,2.12131, 2.001)
> x %% 1
[1] 0.0 0.0 0.0 0.1 0.12131 0.00100
> which(x %% 1 > 0)
[1] 4 5 6
> xbad <- which(x %% 1 > 0)
>  x[xbad] <- NA
>  x
[1]  1  2  3 NA NA NA

I worry about whether x %% 1 may ever return a non zero result for an
integer because of rounding error.

Is there a recommended approach?

What about zapsmall on the left, but what on the right of >?

which( zapsmall(x %% 1) >  0 )


Thanks in advance

-- 
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] fit.mult.impute() in Hmisc

2011-08-13 Thread Paul Johnson

On Thu, Mar 31, 2011 at 2:56 PM, Yuelin Li  wrote:
> I tried multiple imputation with aregImpute() and
> fit.mult.impute() in Hmisc 3.8-3 (June 2010) and R-2.12.1.
>
> The warning message below suggests that summary(f) of
> fit.mult.impute() would only use the last imputed data set.
> Thus, the whole imputation process is ignored.
>
>  "Not using a Design fitting function; summary(fit)
>   will use standard errors, t, P from last imputation only.
>   Use vcov(fit) to get the correct covariance matrix,
>   sqrt(diag(vcov(fit))) to get s.e."
>

Hello.  I fiddled around with rms & multiple imputation when I was
preparing these notes from our R summer course. I ran into that same
thing you did, and my conclusion is slightly different from yours.

http://pj.freefaculty.org/guides/Rcourse/multipleImputation/multipleImputation-1-lecture.pdf

Look down to slide 80 or so, where I launch off into that question.
It appears to me that aregImpute will give the "right answer" for
fitters from rms, but if you want to feel confident about the results
for other fitters, you should use mitools or some other paramater
combining approach. My conclusion (slide 105) is

"Please note: the standard errors in the output based on lrm match
the std.errors estimated by MItools. Thus I conclude
sqrt(diag(cov(fit.mult.impute.object) did not give correct results"




> But the standard errors in summary(f) agree with the values
> from sqrt(diag(vcov(f))) to the 4th decimal point.  It would
> seem that summary(f) actually adjusts for multiple
> imputation?
>
> Does summary(f) in Hmisc 3.8-3 actually adjust for MI?
>
> If it does not adjust for MI, then how do I get the
> MI-adjusted coefficients and standard errors?
>
> I can't seem to find answers in the documentations, including
> rereading section 8.10 of the Harrell (2001) book  Googling
> located a thread in R-help back in 2003, which seemed dated.
> Many thanks in advance for the help,
>
> Yuelin.
> http://idecide.mskcc.org
> ---
>> library(Hmisc)
> Loading required package: survival
> Loading required package: splines
>> data(kyphosis, package = "rpart")
>> kp <- lapply(kyphosis, function(x)
> +       { is.na(x) <- sample(1:length(x), size = 10); x })
>> kp <- data.frame(kp)
>> kp$kyp <- kp$Kyphosis == "present"
>> set.seed(7)
>> imp <- aregImpute( ~ kyp + Age + Start + Number, dat = kp, n.impute = 10,
> +                      type = "pmm", match = "closest")
> Iteration 13
>> f <- fit.mult.impute(kyp ~ Age + Start + Number, fitter=glm, xtrans=imp,
> +                 family = "binomial", data = kp)
>
> Variance Inflation Factors Due to Imputation:
>
> (Intercept)         Age       Start      Number
>       1.06        1.28        1.17        1.12
>
> Rate of Missing Information:
>
> (Intercept)         Age       Start      Number
>       0.06        0.22        0.14        0.10
>
> d.f. for t-distribution for Tests of Single Coefficients:
>
> (Intercept)         Age       Start      Number
>    2533.47      193.45      435.79      830.08
>
> The following fit components were averaged over the 10 model fits:
>
>  fitted.values linear.predictors
>
> Warning message:
> In fit.mult.impute(kyp ~ Age + Start + Number, fitter = glm, xtrans = imp,  :
>  Not using a Design fitting function; summary(fit) will use
> standard errors, t, P from last imputation only.  Use vcov(fit) to get the
> correct covariance matrix, sqrt(diag(vcov(fit))) to get s.e.
>
>
>> f
>
> Call:  fitter(formula = formula, family = "binomial", data = completed.data)
>
> Coefficients:
> (Intercept)          Age        Start       Number
>    -3.6971       0.0118      -0.1979       0.6937
>
> Degrees of Freedom: 80 Total (i.e. Null);  77 Residual
> Null Deviance:      80.5
> Residual Deviance: 58   AIC: 66
>> sqrt(diag(vcov(f)))
> (Intercept)         Age       Start      Number
>  1.5444782   0.0063984   0.0652068   0.2454408
>> -0.1979/0.0652068
> [1] -3.0350
>> summary(f)
>
> Call:
> fitter(formula = formula, family = "binomial", data = completed.data)
>
> Deviance Residuals:
>   Min      1Q  Median      3Q     Max
> -1.240  -0.618  -0.288  -0.109   2.409
>
> Coefficients:
>            Estimate Std. Error z value Pr(>|z|)
> (Intercept)  -3.6971     1.5445   -2.39   0.0167
> Age           0.0118     0.0064    1.85   0.0649
> Start        -0.1979     0.0652   -3.03   0.0024
> Number        0.6937     0.2454    2.83   0.0047
>
> (Dispersion parameter for binomial family taken to be 1)
>
>    Null deviance: 80.508  on 80  degrees of freedom
> Residual deviance: 57.965  on 77  degrees of freedom
> AIC: 65.97
>
> Number of Fisher Scoring iterations: 5
>
>
>     =
>
>     Please note that this e-mail and any files transmitted with it may be
>     privileged, confidential, and protected from disclosure under
>     applicable law. If the reader of this message is not the intended
>     recipient, or an employee or agent responsible

[R] Question re the plotrix package

2011-07-27 Thread Paul Johnson


Dear list,

I am using the  clock24.plot command in this excellent package to plot animal 
activity data. 

Does anyone know if both symbols and a line can be plotted on the same plot to 
show both raw data (symbols) and a line (describing a statistical model of the 
pattern) ? Or if more than one line etc can be plotted?

Thanks

Paul

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] graphics: 3D regression plane

2011-04-27 Thread Paul Johnson

Hi. Comments below

On Wed, Apr 27, 2011 at 2:32 AM, agent dunham  wrote:
> Hi, thanks, I think I've changed the previous as you told me but I'm having
> this error, what does it mean?
>
>
> model<- lm(log(v1)~log(v2)+v3, data=dat)
>
> newax<- expand.grid(
>    v2 = seq(min(log(dat$v2)), max(log(dat$v2)), length=100),
>    v3= seq(min(dat$v3), max(dat$v3), length=100))
>
> fit <- predict(model,newax)
>
> graph <- regr2.plot(x=seq(min(log(dat$v2)), max(log(dat$v2)), length=100),
>                 y=seq(min(dat$v3), max(dat$v3), length=100),
>                 z= matrix(fit, 100, 100), main="Least-squares",
>             resid.plot= TRUE, plot.base.points= TRUE, expand=0.5,
> ticktype="detailed", theta=-45)
>
>
> Error en cbind(x, y, z, 1) %*% pmat : argumentos no compatibles
>
> Thanks again, u...@host.com
>

I've struggled with these 3d things before.  You should supply us the
full testable program to fix.  Here I've fiddled around and can get a
plot from persp or regr2.plot, but I understand it is not informative.
 But the plot does run, and maybe you will see how to fix.

Generally, I would suggest you do the separate work in separate steps,
not all jammed together inside regr2.plot.  If you do each bit
separately, then you can inspect the result and be sure it is good.

See:

##PJ 2011-04-27
## testing to address question in r-help on persp and regr2.plot

v1 <- rnorm(100,m=100,s=10)
v2 <- rnorm(100, m=1000, s=100)
v3 <- rnorm(100)
dat <- data.frame(v1,v2,v3)
rm(v1,v2,v3)
model<- lm(log(v1)~log(v2)+v3, data=dat)


rv2 <- range(dat$v2)
plv2 <- seq(rv2[1], rv2[2], length.out=100)

rv3 <- range(dat$v3)
plv3 <- seq(rv3[1], rv3[2], length.out=100)

newax<- expand.grid(
   v2 = plv2,
   v3= plv3)

fit <- predict(model,newax)

zmat <- matrix(fit,100,100)

persp(x=plv2, y=plv3, z=zmat)



require(HH)
graph <- regr2.plot(x= plv2,
y=plv3,
z=zmat, main="Least-squares",
resid.plot= TRUE, plot.base.points= TRUE, expand=0.5,
ticktype="detailed", theta=-45)









> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] boostrap an nls regression

2011-02-05 Thread Paul Johnson

Hello!

On Thu, Feb 3, 2011 at 6:27 AM, m234  wrote:
>
> II functional response for 2 data sets:
>
> nls(eaten~(a*suppl)/(1+a*h*suppl)
>
> where eaten is the number of prey eaten by a predator and suppl is the
> number of prey initially supplied to the same predator.
>
> I have parameter estimates of 'a' and 'h' for the two populations studied
> and would like to know if there is a significant different in the estimates
> between the two i.e. a1 vs a2, h1 vs h2. I would like to bootstap the data
> to get multiple (~1000) estimates and compare then via a ttest or
> equivalent. Is it possible to do this and obtain multiple estimations which

Hi,

Please read the posting guide--a complete example of your code a link
to a data frame will get you more helpful answers.

I can tell you how to do bootstrap re-sampling on a model that is
estimated from one data set. I wrote up some notes on that last summer
(go to middle of this
http://pj.freefaculty.org/R/SummerCamp2010/PJ-Lectures/functions1.pdf).
I have other intro R material floating about under the R part of that
link.

But I'm a bit stumped by your question for statistical reasons--
nothing to do with R.  From a statistical point of view, how do you
test the difference between coefficients from 2 different fitted nls
models, which are based on separate data samples?  I don't think
that's a trivial stat question, even if you have large samples and you
only need to calculate that estimated difference one time.

Then I wonder, how would a person do re-sampling when there are 2 data
sets involved. If you know of a cite on that, I'd like to see it.

One avenue I'd consider is to stack the 2 data sets together and
rewrite my nls to estimate one equation with indicator variables
(dummy variables) to separate the estimates for the 2 separate
equations. But there would be some "pooling" tests you'd have to run
first, to justify the idea that the 2 sets of data belong in the same
analysis in the first place.

Know what I mean? Suppose eaten is one long column, for both sets
combined, but create suppl1 and suppl2 that are 0 for "the other" data
set's cases, but suppl for the right one.

Fit this:

combmod <-  nls(eaten~(a1*suppl1)/(1+a1*h1*suppl1) +
(a2*suppl2)/(1+a2*h2*suppl2))

This would conceivably allow comparison of a1 and a2. I think. I'm
trying to remember sampling theory on nls.

Well, in summary, I think you've got a harder stat problem than you
have R problem.  If you write out the code you use for a whole
exercise to do this 1 time, we might see what to do.  But remember to
post the full working example--as much as you have, anyway.

-- 
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] different results in MASS's mca and SAS's corresp

2011-02-05 Thread Paul Johnson

On Sat, Feb 5, 2011 at 9:19 AM, David Winsemius  wrote:
>
> On Feb 4, 2011, at 7:06 PM, Gong-Yi Liao wrote:
>
>> Dear list:
>>
>>  I have tried MASS's mca function and SAS's PROC corresp on the
>>  farms data (included in MASS, also used as mca's example), the
>>  results are different:
>>
>>  R: mca(farms)$rs:
>>             1             2
>> 1   0.059296637  0.0455871427
>> 2   0.043077902 -0.0354728795
>> 3   0.059834286  0.0730485572
>> 4   0.059834286  0.0730485572
[snip]
>>
>>     And in SAS's corresp output:
>>
>>                               Row Coordinates
>>
>>                                       Dim1       Dim2
>>
>>                         1           1.0607    -0.8155
>>                         2           0.7706     0.6346
>>                         3           1.0703    -1.3067
>>                         4           1.0703    -1.3067
>>                         5           0.2308     0.9000
[snip]
>>       Is MASS's mca developed with different definition to SAS's
>>       corresp ?
>
> No, it's just that the values can only be defined up to a scaling factor
> (the same situation as with eigenvector decompostion). Take a look at the
> two dimensions, when each is put on the same scale:
>
>> cbind(scale(rmca$D1),scale(smca$Dim1) )
>            [,1]        [,2]
>  [1,]  1.2824421  1.28242560
>  [2,]  0.9316703  0.93168561
>  [3,]  1.2940701  1.29403231
>  [4,]  1.2940701  1.29403231
>  [5,]  0.2789996  0.27905048

>> cbind(scale(rmca$D2),scale(smca$Dim2) )
>             [,1]        [,2]
>  [1,]  1.06673426 -1.06677626
>  [2,] -0.83006158  0.83012474
>  [3,]  1.70932841 -1.70932351
>  [4,]  1.70932841 -1.70932351
>  [5,] -1.17729983  1.17729909
>
> David Winsemius, MD
> West Hartford, CT


When I came to David's comment, I understood the theory, but not the
numbers in his answer.  I wanted to see the MASS mca answers "match
up" with SAS, and the example did not (yet).

Below see that if you scale the mca output, and then multiply column 1
of the scaled results by 0.827094, then  you DO reproduce the SAS
column 1 results exactly.  Just rescale item 1 in mca's first column
to match the SAS output.  Repeat same with column 2, multiply
-0.7644828, and you reproduce column 2.

> rmca <- mca(farms)
> scalermca <- scale(rmca$rs)
> scalermca[1,]
   12
1.282442 1.066734
> 1.0607/1.282442
[1] 0.827094
> -0.8155/1.06673426
[1] -0.7644828
> cbind(scalermca[,1] * 0.827094, scalermca[,2] *  -0.7644828)
  [,1][,2]
1   1.06070017 -0.8154
2   0.77057891  0.63456780
3   1.07031764 -1.30675217
4   1.07031764 -1.30675217
5   0.23075886  0.90002547
6   0.6943  0.60993995
7   0.10530240  0.78445402
8  -0.27026650  0.44225049
9   0.13426089  1.15670532
10  0.11861965  0.64778456
11  0.23807570  1.21775202
12  1.01156703 -0.01927226
13  0.28051938 -0.59805897
14 -1.17343686 -0.27122981
15 -0.83838041 -0.64003061
16 -0.05453708 -0.22925816
17 -0.91732401 -0.49899374
18 -0.92694148 -0.00774156
19 -1.30251038 -0.34994509
20 -1.30251038 -0.34994509

So, that does reproduce SAS exactly.  And I'm a little frustrated I
can't remember the matrix command to get that multiplication done
without cbinding the 2 columns together that way.

Question: I don't use mca, but to people who do, how are results
"supposed" to be scaled?  Is there a "community accepted method" or is
every user on his/her own to fiddle up the numbers however?

-- 
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] very basic HLM question

2011-02-05 Thread Paul Johnson

2011/2/5 Sebastián Daza :
> Hi everyone,
>
> I need to get a between-component variance (e.g. random effects Anova), but
> using lmer I don't get the same results (variance component) than using
> random effects Anova. I am using a database of students, clustered on
> schools (there is not the same number of students by school).
>
> According to the ICC1 command, the interclass correlation is .44
>
>> ICC1(anova1)
> [1] 0.4414491

If you don't tell us exactly what model you are calculating in
"anova1", how would we guess if there is something wrong?

Similarly, I get this
> ICC1
Error: object 'ICC1' not found

so it must mean you've loaded a package or written a function, which
you've not shown us.

I googled my way to a package called "multilevel" that has ICC1, and
its code for ICC1 shows a formula that does not match the one you used
to calculate ICC from lmer.

function (object)
{
MOD <- summary(object)
MSB <- MOD[[1]][1, 3]
MSW <- MOD[[1]][2, 3]
GSIZE <- (MOD[[1]][2, 1] + (MOD[[1]][1, 1] + 1))/(MOD[[1]][1,
1] + 1)
OUT <- (MSB - MSW)/(MSB + ((GSIZE - 1) * MSW))
return(OUT)
}

I'm not saying that's right or wrong, just not obviously identical to
the formula you proposed.

>
> However, I cannot get the same ICC from the lmer output:
>
>> anova2 <- lmer(math ~ 1 + (1|schoolid), data=nels88)
>> summary(anova2 <- lmer(math ~ 1 + (1|schoolid), data=nels88))
>

Instead, do this (same thing, fits model only once):

> anova2 <- lmer(math ~ 1 + (1|schoolid), data=nels88)
> summary(anova2)

Note that lmer is going to estimate a normally distributed random
effect for each school, as well as an individual observation random
effect (usual error term) that is assumed independent of the
school-level effect.  What is "anova1" estimating?


-- 
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Problems with "pdf" device using "plot" "glht" function on "multcomp" library.

2010-09-18 Thread Paul Johnson

Hi, Kenneth

It is not clear if you mean that your pdf output usually works, but it
does not in this special case, or that this is a first effort with
pdf.  The answer might depend on which is the case case.

If you are just getting started, can I refer you to some lecture notes
I made about saving graphs in R?

http://pj.freefaculty.org/R/SummerCamp2010/PJ-Lectures/plotting2.pdf

If you cut off the file name at the end,  you will see the list of the
other lectures I prepared for our university's "Summer Stats Camp" in
2010.

If pdf does usually work, Could I suggest you take this code and
rearrange? Where you have this:

pdf("plot1.pdf")
m1<-aov(Nitratos~Descripcion-1,data=Sx)
vect1<-table(Sx$Descripcion)
K<-contrMat(vect1,base=4)
dnk<-glht(m1,linfct=K)
summary(dnk)

old.par<-par(no.readonly = TRUE)
par(mai=c(1,2,1.25,1),mgp=c(3,1,0))
print(plot(dnk,las=1,xlab=""))
print(abline(v=0,lty=2))
par(old.par)

dev.off()



Instead, separate the "stats" part from the plot part


m1 <- aov(Nitratos~Descripcion-1, data=Sx)
vect1 <- table(Sx$Descripcion)
K <- contrMat(vect1, base=4)
dnk <- glht(m1, linfct=K)
summary(dnk)

pdf("plot1.pdf", onefile=F, paper="special", height=6, width=6)
### old.par<-par(no.readonly = TRUE)
par(mai = c(1, 2, 1.25, 1), mgp = c(3,1,0))
plot(dnk, las = 1, xlab = "")
abline(v = 0, lty = 2)
par(old.par)
dev.off()

I've made changes to cut your "print" statements and moved your pdf
command down to "sandwich" the plot commands.  I've added some
embellishment which I almost guarantee will make your pdf output more
useful.  (Spaces are inserted around <- and after commas for
readability.  Check the R examples, they generally recommend that).

To the R experts, it may be that the "par" stuff seems obvious, but to
the rest of us, well, it is not completely so.  You do not need to
save the par values because the par setting you change is instantly
forgotten when you do dev.off().   So you don't need to worry about
saving the default settings and then returning them.  Run par() to see
for yourself after dev.off().  par's defaults will be back where they
were at the beginning.

You would only need to do "remember my par" thing if you were trying
to put several graphs into a single output device, which you aren't.
And I've made that clear by putting onefile=F in the pdf statement.

I'd try it without the "par" at all if there is still trouble in the
output file.

I added the paper="special" option in the pdf command in order to make
the margins in the output more reasonable. If you don't do that, the
pdf output assumes you are wanting a whole sheet of paper, so there
will be some massive margins in your output.

Good luck.

-- 
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] coxph and ordinal variables?

2010-09-10 Thread Paul Johnson

Hi, everybody

>On Wed, Sep 8, 2010 at 5:43 PM, Min-Han Tan > wrote:


David said my R code text attachment got rejected by the mailing list.

Pooh.   I don't think that's nice.  I don't see anything in the
posting guide about a limit on text attachments.

Well, if you are still trying to understand what 'orthogonal
polynomial' means, I suggest you run the following through.  I thought
it was an
enlightening experience.

# Paul Johnson  Nov. 16, 2005

# Ordinal predictors with a small number of possible values

# Here is R code and commentary about ordinal predictors.
# Please let me know if you have insights to explain this more clearly.

set.seed(19)

sampleSize <- 100
subgroupSize <- sampleSize/5
# One of those "5 point opinion" questions: Strongly Disagree...
# Strongly agree with values assigned 1,3,5,7,9
surveyQuestion <-
c(rep(1,subgroupSize),rep(3,subgroupSize),rep(5,subgroupSize),rep(7,subgroupSize),rep(9,subgroupSize))

### Make this an unordered factor
myufac <- factor(surveyQuestion)
### Study the contrasts:
contrasts(myufac)

### Make an ordered  factor
myofac <- ordered(surveyQuestion)
contrasts(myofac)

# another independent variable
x <- rnorm(sampleSize)
# a random error
e <- rnorm(sampleSize)


# First, suppose the output data is really just a
# linear reflection of the surveyQuestion. It is created
# by multiplying against the evenly spaced values
# observed in the survey

y1 <- -5 +  4*surveyQuestion + 6*x + 10 * e


mod0 <- lm(y1~x + surveyQuestion)
summary(mod0)

# Question: are you making a mistake by just "throwing"
# surveyQuestion into the analysis? You should not be
# making a mistake, because you (luckily) guessed the right model

# You might check by running the model with the unordered factor,
# which amounts to running "dummy variables"

mod1 <- lm(y1~x + myufac)
summary(mod1)

# or with the ordered factor, which estimates the orthogonal
# polynomials

mod2 <- lm(y1~x + myofac)
summary(mod2)

# Does either of those model appear to be "better" than
# the original mod0?

# If I did not know for sure the factor was ordered, I'd
# be stuck with the treatment contrasts in mod1.  And what
# I would really like to know is this: are the coefficients
# in mod1 "stepping up" in a clear, ordinal, evenly spaced pattern?

# Since we know the data is created with a coefficient of 4
# we would expect that the coefficients would "step up" like this
# myufac3   8
# myufac5   16
# myufac7   24
# myufac9   32

# See why we expect this pattern? The intercept "gobbles up" myufac1,
# so each "impact" of the surveyQuestion has to be reduced by 4 units.
# The impact of myufac3, which you expect might be 3*4=12, is reduced
# to 3*4 - 4 = 8, and so forth.

# But that does not happen with a smallish sample. You can run this
# code a few times. It appears to me that a sample of more than
# 10,000 is necessary to get any faithful reproduction of the "steps"
# between values of surveyQuestion.

# Question: would we mistakenly think that the uneveness observed in
# summary(mod1) is evidence of the need to treat surveyQuestion as a
# nominal factor, even though we know (since we created the data) that
# we might as well just throw surveyQuestion in as a single variable?
# How to decide?

# We need a hypothesis test of the conjecture that
# the coefficient estimates in mod1 fall "along a line"
# i.e, myufac-i = (b2 * i ) - b2

# I believe that the correct test results from this command:

anova(mod0, mod1, test="Chisq")

# If that is significant, it means you are losing predictive
# power by throwing in surveyQuestion as if it were a numerical
# variable.



# Now, what if we are sure that the data gathered in surveyQuestion is
# really ordinal, and so we estimate mod2.

# What do those parameters mean? Here I'm just reasoning/guessing
# because I cannot find any complete idiot's guide to orthogonal
# polynomials and their use in regression and/or R.

# Take a look at the contrasts themselves
# > ord1 <- contrasts(myofac)
# ord1
#   .L .Q.C ^4
# 1 -6.324555e-01  0.5345225 -3.162278e-01  0.1195229
# 3 -3.162278e-01 -0.2672612  6.324555e-01 -0.4780914
# 5 -3.287978e-17 -0.5345225  1.595204e-16  0.7171372
# 7  3.162278e-01 -0.2672612 -6.324555e-01 -0.4780914
# 9  6.324555e-01  0.5345225  3.162278e-01  0.1195229

# What does this mean?  I believe: we act "as though" there are 4
# independent variables in the regression, L, Q, C, and ^4, and for
# each respondent in the dataset, we choose a row of these numbers
# to act as independent variables.  A person who
# answered 1 on the survey would have the input values (-.63, -.53,
# -.31, 0.11) for those four variables.

# This begs the question, what are L, Q, C, and ^4?

### This is tough to explain.

# Background Recall

[R] Fast / dependable way to "stack together" data frames from a list

2010-09-08 Thread Paul Johnson

Hi, everybody:

I asked about this in r-help last week and promised
a summary of answers. Special thanks to the folks
that helped me understand do.call and pointed me
toward plyr.

We face this problem all the time. A procedure
generates a list of data frames. How to stack them
together?

The short answer is that the plyr package's rbind.fill
method is probably the fastest method that is not
prone to trouble and does not require much user caution.

result <- rbind.fill(mylist)

A slower alternative that also works is

result <-  do.call("rbind", mylist)

That is always available in R and it works well enough, even
though it is not quite as fast. Both of these are much faster than
a loop that repeatedly applies "rbind".

Truly blazing speed can be found if we convert this into
matrices, but that is not possible if the list actually
contains data frames.

I've run this quite a few times, and the relative speed of the
different approaches has never differed much.

If you run this, I hope you will feel smarter, as I do!
:)


## stackListItems.R
## Paul Johnson 
## 2010-09-07

## Here is a test case

df1 <- data.frame(x=rnorm(100),y=rnorm(100))
df2 <- data.frame(x=rnorm(100),y=rnorm(100))
df3 <- data.frame(x=rnorm(100),y=rnorm(100))
df4 <- data.frame(x=rnorm(100),y=rnorm(100))

mylist <-  list(df1, df2, df3, df4)

## Here's the way we have done it. We understand this,
## we believe the result, it is easy to remember. It is
## also horribly slow for a long list.

resultDF <- mylist[[1]]
for (i in 2:4) resultDF <- rbind(resultDF, mylist[[i]])


## It works better to just call rbind once, as in:

resultDF2 <- rbind( mylist[[1]],mylist[[2]],mylist[[3]],mylist[[4]])


## That is faster because it calls rbind only once.

## But who wants to do all of that typing? How tiresome.
## Thanks to Erik Iverson in r-help, I understand that

resultDF3 <- do.call("rbind", mylist)

## is doing the EXACT same thing.
## Erik explained that "do.call( "rbind", mylist)"
## is *constructing* a function call from the list of arguments.
## It is shorthand for "rbind(mylist[[1]], mylist[[2]], mylist[[3]])"
## assuming mylist has 3 elements.

## Check the result:
all.equal( resultDF2, resultDF3)

## You often see people claim it is fast to allocate all
## of the required space in one shot and then fill it in.
## I got this algorithm from code in the
## "complete" function in the "mice" package.
## It allocates a big matrix of 0's and
## then it places the individual data frames into that matrix.

m <- 4
nr <- nrow(df1)
nc <- ncol(df1)
resultDF4 <- as.data.frame(matrix(0, nrow = nr*m, ncol = nc))
for (j in  1:m) resultDF4[(((j-1)*nr) + 1):(j*nr), ] <- mylist[[j]]

## This is a bit error prone for my taste. If the data frames have
## different numbers of rows, some major code surgery will be needed.

##
## Dennis Murphy pointed out the plyr package, by Hadley Wickham.
## Dennis said " ldply() in the plyr package. The following is the same
## idea as do.call(rbind, l), only faster."

library("plyr")
resultDF5  <- ldply(mylist, rbind)
all.equal(resultDF, resultDF5)



## Plyr author Hadley Wickham followed up with "I think all you want
here is rbind.fill:"

resultDF6 <- rbind.fill(mylist)
all.equal(resultDF, resultDF6)


## Gabor Grothendieck noted that if the elements in mylist were
matrices, this would all work faster.

mylist2 <- lapply(mylist, as.matrix)

matrixDoCall <-  do.call("rbind", mylist2)

all.equal(as.data.frame(matrixDoCall), resultDF)


## Gabor also showed a better way than 'system.time' to find out how
## long this takes on average using the rbenchmark package. Awesome!

#> library(rbenchmark)
#> benchmark(
#+ df = do.call("rbind", mylist),
#+ mat = do.call("rbind", L),
#+ order = "relative", replications = 250
#+ )



## To see the potentially HUGE impact of these changes, we need to
## make a bigger test case. I just used system.time to evaluate, but
## if this involved a close call, I'd use rbenchmark.

phony <- function(i){
  data.frame(w=rnorm(1000), x=rnorm(1000),y=rnorm(1000),z=rnorm(1000))
}
mylist <- lapply(1:1000, phony)




### First, try my usual way
resultDF <- mylist[[1]]
system.time(
for (i in 2:1000) resultDF <- rbind(resultDF, mylist[[i]])
   )
## wow, that's slow:
## user  system elapsed
## 168.040   4.770 173.028


### Now do.call method:
system.time( resultDF3 <- do.call("rbind", mylist) )
all.equal(resultDF, resultDF3)

## Faster! Takes one-twelfth as long
##   user  system elapsed
##  14.640.85   15.49


### Third, my adaptation of the complete function in the mice
### package:
m <- length(mylist)
nr <- nrow(mylist[[1]])
nc <- ncol(mylist[[1]])

system.time(
   resultDF4 <- as.data.frame(matrix(0, nrow

Re: [R] coxph and ordinal variables?

2010-09-08 Thread Paul Johnson

run it with factor() instead of ordered().  You don't want the
"orthogonal polynomial" contrasts that result from ordered if you need
to compare against Stata.

I attach an R program that I wrote to explore ordered factors a while
agol I believe this will clear everything up if you study the
examples.

pj

On Wed, Sep 8, 2010 at 5:43 PM, Min-Han Tan  wrote:
> Dear R-help members,
>
> Apologies - I am posting on behalf of a colleague, who is a little puzzled
> as STATA and R seem to be yielding different survival estimates for the same
> dataset when treating a variable as ordinal. Ordered() is used  to represent
> an ordinal variable) I understand that R's coxph (by default) uses the Efron
> approximation, whereas STATA uses (by default) the Breslow. but we did
> compare using the same approximations. I am wondering if this is a result of
> how coxph manages an ordered factor?
>
> Essentially, this is a survival dataset using tumor grade (1, 2, 3 and 4) as
> the risk factor. This is more of an 'ordinal' variable, rather than a
> continuous variable. For the same data set of 399 patients, when treating
> the vector of tumor grade as a continuous variable (range of 1 to 4),
> testing the Efron and the Breslow approximations yield the same result in
> both R and STATA.
>
> However, when Hist_Grade_4 grp is converted into an ordered factor using
> ordered(), and the same scripts are applied, rather different results are
> obtained, relative to the STATA output. This is tested across the different
> approximations, with consistent results. The comparison using Efron
> approximation and ordinal data is is below.
>
> Your advice is very much appreciated!
>
> Min-Han
>
> Apologies below for the slightly malaligned output.
>
> STATA output
>
> . xi:stcox i.Hist_Grade_4grp, efr
> i.Hist_Grade_~p   _IHist_Grad_1-4     (naturally coded; _IHist_Grad_1
> omitted)
>
>        failure _d:  FFR_censor
>  analysis time _t:  FFR_month
>
> Iteration 0:   log likelihood =  -1133.369
> Iteration 1:   log likelihood = -1129.4686
> Iteration 2:   log likelihood = -1129.3196
> Iteration 3:   log likelihood = -1129.3191
> Refining estimates:
> Iteration 0:   log likelihood = -1129.3191
>
> Cox regression -- Efron method for ties
>
> No. of subjects =          399                     Number of obs   =
> 399
> No. of failures =          218
> Time at risk    =  9004.484606
>                                                  LR chi2(3)      =
>  8.10
> Log likelihood  =   -1129.3191                     Prob > chi2     =
>  0.0440
>
> --
>         _t | Haz. Ratio   Std. Err.      z    P>|z|     [95% Conf.
> Interval]
> -+
> _IHist_Gra~2 |   1.408166   .3166876     1.52   0.128     .9062001
>  2.188183
> _IHist_Gra~3 |    1.69506   .3886792     2.30   0.021     1.081443
>  2.656847
> _IHist_Gra~4 |   2.540278   .9997843     2.37   0.018      1.17455
> 5.49403
>
>
>
> R Output using
>> summary ( coxph( Surv(FFR_month,FFR_censor) ~ Hist_Grade_4grp,
> method=c("breslow")))
>> summary ( coxph( Surv(FFR_month,FFR_censor) ~ Hist_Grade_4grp,
> method=c("exact")))
>> summary ( coxph( Surv(FFR_month,FFR_censor) ~ Hist_Grade_4grp,
> method=c("efron")))
>
>
>
>  n=399 (21 observations deleted due to missingness)
>
>                    coef exp(coef) se(coef)     z Pr(>|z|)
> Hist_Grade_4grp.L 0.66685   1.94809  0.26644 2.503   0.0123 *
> Hist_Grade_4grp.Q 0.03113   1.03162  0.20842 0.149   0.8813
> Hist_Grade_4grp.C 0.08407   1.08771  0.13233 0.635   0.5252
> ---
> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>
>                 exp(coef) exp(-coef) lower .95 upper .95
> Hist_Grade_4grp.L     1.948     0.5133    1.1556     3.284
> Hist_Grade_4grp.Q     1.032     0.9693    0.6857     1.552
> Hist_Grade_4grp.C     1.088     0.9194    0.8392     1.410
>
> Rsquare= 0.02   (max possible= 0.997 )
> Likelihood ratio test= 8.1  on 3 df,   p=0.044
> Wald test            = 8.02  on 3 df,   p=0.0455
> Score (logrank) test = 8.2  on 3 df,   p=0.04202
>
>        [[alternative HTML version deleted]]
>
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>



-- 
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Save data as .pdf or .JPG

2010-09-04 Thread Paul Johnson

On Wed, Sep 1, 2010 at 7:56 AM, khush   wrote:
> Hi all ,
>
> I have following script to plot some data.
>
> plot( c(1,1100), c(0,15), type='n', xlab='', ylab='', ylim=c(0.1,25) ,
> las=2)
> axis (1, at = seq(0,1100,50), las =2)
> axis (2, at = seq(0,25,1), las =2)
>>
> When I source("script.R"), I got the image on interface but I do not want to
> use screenshot option to save the image? How can save the output to .pdf or
> .jpg format?
>
> Thank you
> Khushwant

Hi!  This is one of the things that is difficult for newcomers. I've
written down a pretty thorough answer:

http://pj.freefaculty.org/R/Rtips.html#5.2

pj
-- 
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Please explain "do.call" in this context, or critique to "stack this list faster"

2010-09-04 Thread Paul Johnson

I've been doing some consulting with students who seem to come to R
from SAS.  They are usually pre-occupied with do loops and it is tough
to persuade them to trust R lists rather than keeping 100s of named
matrices floating around.

Often it happens that there is a list with lots of matrices or data
frames in it and we need to "stack those together".  I thought it
would be a simple thing, but it turns out there are several ways to
get it done, and in this case, the most "elegant" way using do.call is
not the fastest, but it does appear to be the least prone to
programmer error.

I have been staring at ?do.call for quite a while and I have to admit
that I just need some more explanations in order to interpret it.  I
can't really get why this does work

do.call( "rbind", mylist)

but it does not work to do

sapply ( mylist, rbind).

Anyway, here's the self contained working example that compares the
speed of various approaches.  If you send yet more ways to do this, I
will add them on and then post the result to my Working Example
collection.

## stackMerge.R
## Paul Johnson 
## 2010-09-02


## rbind is neat,but how to do it to a lot of
## data frames?

## Here is a test case

df1 <- data.frame(x=rnorm(100),y=rnorm(100))
df2 <- data.frame(x=rnorm(100),y=rnorm(100))
df3 <- data.frame(x=rnorm(100),y=rnorm(100))
df4 <- data.frame(x=rnorm(100),y=rnorm(100))

mylist <-  list(df1, df2, df3, df4)

## Usually we have done a stupid
## loop  to get this done

resultDF <- mylist[[1]]
for (i in 2:4) resultDF <- rbind(resultDF, mylist[[i]])

## My intuition was that this should work:
## lapply( mylist, rbind )
## but no! It just makes a new list

## This obliterates the columns
## unlist( mylist )

## I got this idea from code in the
## "complete" function in the "mice" package
## It uses brute force to allocate a big matrix of 0's and
## then it places the individual data frames into that matrix.

m <- 4
nr <- nrow(df1)
nc <- ncol(df1)
dataComplete <- as.data.frame(matrix(0, nrow = nr*m, ncol = nc))
for (j in  1:m) dataComplete[(((j-1)*nr) + 1):(j*nr), ] <- mylist[[j]]



## I searched a long time for an answer that looked better.
## This website is helpful:
## http://stackoverflow.com/questions/tagged/r
## I started to type in the question and 3 plausible answers
## popped up before I could finish.

## The terse answer is:
shortAnswer <- do.call("rbind",mylist)

## That's the right answer, see:

shortAnswer == dataComplete
## But I don't understand why it works.

## More importantly, I don't know if it is fastest, or best.
## It is certainly less error prone than "dataComplete"

## First, make a bigger test case and use system.time to evaluate

phony <- function(i){
  data.frame(w=rnorm(1000), x=rnorm(1000),y=rnorm(1000),z=rnorm(1000))
}
mylist <- lapply(1:1000, phony)


### First, try the terse way
system.time( shortAnswer <- do.call("rbind", mylist) )


### Second, try the complete way:
m <- 1000
nr <- nrow(df1)
nc <- ncol(df1)

system.time(
   dataComplete <- as.data.frame(matrix(0, nrow = nr*m, ncol = nc))
 )

system.time(
   for (j in  1:m) dataComplete[(((j-1)*nr) + 1):(j*nr), ] <- mylist[[j]]
)


## On my Thinkpad T62 dual core, the "shortAnswer" approach takes about
## three times as long:


## > system.time( bestAnswer <- do.call("rbind",mylist) )
##user  system elapsed
##  14.270   1.170  15.433

## > system.time(
## +dataComplete <- as.data.frame(matrix(0, nrow = nr*m, ncol = nc))
## +  )
##user  system elapsed
##   0.000   0.000   0.006

## > system.time(
## + for (j in  1:m) dataComplete[(((j-1)*nr) + 1):(j*nr), ] <- mylist[[j]]
## + )
##user  system elapsed
##   4.940   0.050   4.989


## That makes the do.call way look slow, and I said "hey,
## our stupid for loop at the beginning may not be so bad.
## Wrong. It is a disaster.  Check this out:


## > resultDF <- phony(1)
## > system.time(
## + for (i in 2:1000) resultDF <- rbind(resultDF, mylist[[i]])
## +)
##user  system elapsed
## 159.740   4.150 163.996


-- 
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] What solve() does?

2010-09-04 Thread Paul Johnson

On Wed, Sep 1, 2010 at 5:36 AM, Petar Milin  wrote:
> Hello!
> Can anyone explain me what solve() function does: Gaussian elimination or
> iterative, numeric solve? In addition, I would need both the Gaussian
> elimination and iterative solution for the course. Are the two built in R?
>
> Thanks!

> PM

Hello, Petar:

I think you are assuming that solve uses an elementary linear algebra
"paper and pencil" procedure, but I don't think it does.  In a digital
computer, those things are not precise, and I think the folks here
will even say you shouldn't use solve to get an inverse, but I can't
remember all of the details.

To see how solve works ...

Let me show you a trick I just learned. Read

?solve

notice it is a "generic method", meaning it does not actually do the
calculations for you. Rather, there are specific implementations for
different types of cases. To find the implementations, run

methods(solve)

I get:

> methods(solve)
[1] solve.default solve.qr

Then if you want to read HOW solve does what it does (which I think
was your question), run this:

> solve.default

or

> solve.qr

In that code, you will see the chosen procedure depends on the linear
algebra libraries you make available.  I'm no expert on the details,
but it appears QR decomposition is the preferred method.  You can read
about that online or in numerical algebra books.

-- 
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] plotmath vector problem; full program enclosed

2010-07-07 Thread Paul Johnson

On Wed, Jul 7, 2010 at 5:41 AM, Duncan Murdoch  wrote:

>>>>
>>>
>>> You want "as.expression(b1)", not "expression(b1)".  The latter means
>>> "the
>>> expression consisting of the symbol b1".  The former means "take the
>>> object
>>> stored in b1, and convert it to an expression.".
>>>

Thanks to Duncan and Allen, who pointed out that I was not even
reading my own code carefully.  I apologize for trying your patience.

Before I stop swinging at this one, I still want to bother everybody
about one thing, which really was the original question, but I did not
know the words to ask it.

The full code below is a working example that works, but I don't
understand why. Focus on these two commands that produce 2 axes.  Both
produce the desired output, but, as far as I can see, they should not!

1:
   axis(1, line=6, at=mu+dividers*sigma,
labels=as.expression(c(b1,b2,b3,b4,b5), padj=-1))

2:
   axis(1, line=9, at=mu+dividers*sigma,
labels=c(as.expression(b1),b2,b3,b4,b5), padj=-1)

This second one shouldn't work, I think.
It has as.expression on only the first element, and yet they all come
out right. Is there a spill over effect?

My original question should not have asked why b1 does not print
correctly when I do this:

   axis(1, line=9, at=mu+dividers*sigma,
labels=c(expression(b1),b2,b3,b4,b5), padj=-1)

but the correct question should have been "why do b2, b3, b4 , and b5"
get processed properly into plot math even though they are not
expressions??

???
pj

### Filename: plotMathProblem.R
### Paul Johnson July 7, 2010
### email me 

  sigma <- 10.0
  mu <- 4.0

  myx <- seq( mu - 3.5*sigma,  mu+ 3.5*sigma, length.out=500)

  myDensity <- dnorm(myx,mean=mu,sd=sigma)

  ### xpd needed to allow writing outside strict box of graph
  ### Need big bottom margin to add several x axes
  par(xpd=TRUE, ps=10, mar=c(18,2,2,2))

  plot(myx, myDensity, type="l", xlab="", ylab="Probability Density ",
main=myTitle1, axes=FALSE)
  axis(2, pos= mu - 3.6*sigma)
  axis(1, pos=0)

  lines(c(myx[1],myx[length(myx)]),c(0,0)) ### closes off axes

  addInteriorLine <- function(x, m, sd){
for (i in 1:(length(x))){
  lines( c(x[i],x[i]), c(0, dnorm(x[i],m=m,sd=sd)), lty= 14, lwd=.2)
}
  }

  dividers <- c(qnorm(0.025), -1, 0, 1, qnorm(0.975))
  addInteriorLine(mu+sigma*dividers, mu,sigma)

  # bquote creates an expression that text plotters can use
  t1 <-  bquote( mu== .(mu))
  mtext(bquote( mu == .(mu)), 1, at=mu, line=-1)

  addInteriorLabel <- function(pos1, pos2, m, s){
area <- abs(100*( pnorm(m+pos1*s,m,s)-pnorm(m+pos2*s, m,s)))
mid <- m+0.5*(pos1+pos2)*s
text(mid, 0.5*dnorm(mid,m,s),label=paste(round(area,2),"%"))
  }

  addInteriorLabel(dividers[1],dividers[2],  mu, sigma)
  addInteriorLabel(dividers[2],dividers[3],  mu, sigma)
  addInteriorLabel(dividers[3],dividers[4],  mu, sigma)
  addInteriorLabel(dividers[4],dividers[5],  mu, sigma)

   b1 <- substitute( mu - d*sigma, list(d=round(dividers[1],2)) )
   b2 <- substitute( mu - sigma )
   b3 <- substitute( mu )
   b4 <- substitute( mu + sigma )
   b5 <- substitute( mu + d*sigma, list(d=round(dividers[5],2)) )
   axis(1, line=4, at=mu+dividers*sigma, labels=c(b1,b2,b3,b4,b5), padj=-1)

   axis(1, line=6, at=mu+dividers*sigma,
labels=as.expression(c(b1,b2,b3,b4,b5), padj=-1))

   axis(1, line=9, at=mu+dividers*sigma,
labels=c(as.expression(b1),b2,b3,b4,b5), padj=-1)

-- 
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] plotmath vector problem; full program enclosed

2010-07-06 Thread Paul Johnson

On Tue, Jul 6, 2010 at 12:41 PM, Duncan Murdoch
 wrote:
> On 06/07/2010 10:54 AM, Paul Johnson wrote:
>>
>> Here's another example of my plotmath whipping boy, the Normal
>> distribution.
>>
> You want "as.expression(b1)", not "expression(b1)".  The latter means "the
> expression consisting of the symbol b1".  The former means "take the object
> stored in b1, and convert it to an expression.".
>
> It's not perfect, because you'll end up with "mu - -1.96sigma" (i.e. two
> minus signs), but it's closer than what you had.
>
> Duncan Murdoch
>

Hi, Duncan and David

Thanks for looking.  I suspect from the comment you did not run the
code.  The expression examples I give do work fine already.  But I
have to explicitly put in values like 1.96 to make them work.  I'm
trying to avid that with substitute, which does work for b2, b3, b4,
b5, all but b1.  Why just one?

I'm uploading a picture of it so you can see for yourself:

http://pj.freefaculty.org/R/plotmathwrong.pdf

please look in the middle axis.

Why does only b1 not work, but the rest do?

-- 
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] how to save summary(lm) and anova (lm) in format?

2010-07-06 Thread Paul Johnson

There are R packages that can make nice R regression tables in LaTeX
documents. I've used memisc and its good, there is also "apsrtable"
and the old standby xtable.  Also I use my own function "outreg", but
that's just a 'not invented here' attitude.

Your problem is that you need this to go into Word, in which case I
think a reasonable strategy is to create html output in R and then in
Word you can use "paste special" HTML and it will bring in the html as
a Word table.

I recently made a presentation about this, you might scan down to the
end where I have the html example for the poor-pitiful Word users of
the world:

http://pj.freefaculty.org/SummerCamp2010/regression2.pdf

Look down around slide 75

pj


On Fri, Jul 2, 2010 at 12:34 PM, Yi  wrote:
> Hi, folks,
>
> I would like to copy the output of summary(lm) and anova (lm) in R to my
> word file. But the output will be a mess if I just copy after I call summary
> and anova.
>
> #
> x=rnorm(10)
> y=rnorm(10,mean=3)
> lm=lm(y~x)
> summary(lm)
>
> Call:
> lm(formula = y ~ x)
> Residuals:
>      Min        1Q    Median        3Q       Max
> -1.278567 -0.312017  0.001938  0.297578  1.310113
> Coefficients:
>            Estimate Std. Error t value Pr(>|t|)
> (Intercept)   2.5221     0.2272  11.103 3.87e-06 ***
> x            -0.5988     0.2731  -2.192   0.0597 .
> ---
> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> Residual standard error: 0.7181 on 8 degrees of freedom
> Multiple R-squared: 0.3753,     Adjusted R-squared: 0.2972
> F-statistic: 4.807 on 1 and 8 DF,  p-value: 0.0597
> 
>
> How can I get the exact ouput as shown in R but not as the above?
>
>
> Thanks.
>
> Yi
>
>        [[alternative HTML version deleted]]
>
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>



-- 
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] plotmath vector problem; full program enclosed

2010-07-06 Thread Paul Johnson

Here's another example of my plotmath whipping boy, the Normal distribution.

A colleague asks for a Normal plotted above a series of axes that
represent various other distributions (T, etc).

I want to use vectors of equations in plotmath to do this, but have
run into trouble.  Now I've isolated the problem down to a relatively
small piece of working example code (below).  If you would please run
this, I think you will see the problem.  When plotmath meets one
vector of expressions, it converts all but one to math, so in the
figure output i get, in LaTeX speak

b1 $\mu-1.0 \sigma$$\mu$

All values except the first come out correctly.

This happens only when I try to use bquote or substitute to get
variables to fill in where the 1.96, 1.0, and so forth should be.  In
the figure output, you should see a second axis where all of the
symbols are resolved correctly.

As usual, thanks in advance for your help, sorry if I've made an
obvious mistake or overlooked a manual.

### Filename: plotMathProblem.R
### Paul Johnson July 5, 2010
### email me 

  sigma <- 10.0
  mu <- 4.0

  myx <- seq( mu - 3.5*sigma,  mu+ 3.5*sigma, length.out=500)

  myDensity <- dnorm(myx,mean=mu,sd=sigma)

  ### xpd needed to allow writing outside strict box of graph
  ### Need big bottom margin to add several x axes
  par(xpd=TRUE, ps=10, mar=c(18,2,2,2))

  plot(myx, myDensity, type="l", xlab="", ylab="Probability Density ",
main=myTitle1, axes=FALSE)
  axis(2, pos= mu - 3.6*sigma)
  axis(1, pos=0)

  lines(c(myx[1],myx[length(myx)]),c(0,0)) ### closes off axes


  addInteriorLine <- function(x, m, sd){
for (i in 1:(length(x))){
  lines( c(x[i],x[i]), c(0, dnorm(x[i],m=m,sd=sd)), lty= 14, lwd=.2)
}
  }


  dividers <- c(qnorm(0.025), -1, 0, 1, qnorm(0.975))
  addInteriorLine(mu+sigma*dividers, mu,sigma)

  # bquote creates an expression that text plotters can use
  t1 <-  bquote( mu== .(mu))
  mtext(bquote( mu == .(mu)), 1, at=mu, line=-1)


  addInteriorLabel <- function(pos1, pos2, m, s){
area <- abs(100*( pnorm(m+pos1*s,m,s)-pnorm(m+pos2*s, m,s)))
mid <- m+0.5*(pos1+pos2)*s
text(mid, 0.5*dnorm(mid,m,s),label=paste(round(area,2),"%"))
  }


  addInteriorLabel(dividers[1],dividers[2],  mu, sigma)
  addInteriorLabel(dividers[2],dividers[3],  mu, sigma)
  addInteriorLabel(dividers[3],dividers[4],  mu, sigma)
  addInteriorLabel(dividers[4],dividers[5],  mu, sigma)




### Following is problem point: axis will
### end up with correct labels, except for first point,
### where we end up with "b1" instead of "mu - 1.96*sigma".
   b1 <- substitute( mu - d*sigma, list(d=round(dividers[1],2)) )
   b2 <- substitute( mu - sigma )
   b3 <- substitute( mu )
   b4 <- substitute( mu + sigma )
   b5 <- substitute( mu + d*sigma, list(d=round(dividers[5],2)) )
##   plot(-20:50,-20:50,type="n",axes=F)
   axis(1, line=4,at=mu+dividers*sigma,
labels=c(expression(b1),b2,b3,b4,b5), padj=-1)




### This gets "right result" but have to hard code the dividers
  b1 <- expression( mu - 1.96*sigma )
  b2 <- expression( mu - sigma )
  b3 <- expression( mu )
  b4 <- expression( mu + sigma )
  b5 <- expression( mu + 1.96*sigma )
  axis(1, line=8,at=mu+dividers*sigma, labels=c(b1,b2,b3,b4,b5), padj=-1)




-- 
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Does S inherit the enhancements in R language?

2010-03-19 Thread Paul Johnson

I don't know anybody who has S-plus these days, but I expect some of
you might, and perhaps you won't mind telling me something.

I'm working on my presentation about R for the general audience.  As I
survey the suggestions from this list about that project, I find
myself wondering whether S-plus benefits from R.  Does S-plus syntax
change like R's does.  I can take code for S and run it through R, but
if I took some R code to an S-plus system, would it work?

Example 1. _, <-, =

The old S documents I find emphasize assigment with "_".

When I started with R, that was deprecated, then forbidden. "_" was
not allowed in file names, now it is.  It was absolutely necessary to
use <-. = caused errors.  Around the time of R-2.1 or so, it became
possible to run R code that has = for assignments. It's not encouraged
by R core team, but = is allowed.

Does S+ now accept either

<-

or

=

?

For that matter, in S+ is it now forbidden to use underscore for
assignment, as it is in R?

Example 2. Semicolons now discouraged in R code.

We used to require ; to end commands.

Now the R parser is smart enough to spot line breaks and interpret
them accordingly. R's been that way for as long as I can remember, but
I think the ; was required in earliest R releases.

I rather liked the definitive ; at the end of every command.  That
looks right to me, probably because of my SAS and C background.

Would S+ have a panic attack?

Example 3.  Namespace.  Does S-plus get better as R does?

Years ago, I was modeling US Democrats and Republicans and I created
an indicator variable called "rep".  regression models would not work
after that because the rep function had been blocked. It was very
frustrating to me.
Professor Ripley spotted the error and posed a message called "Don't
use the names of R functions as variable names"
http://www.mail-archive.com/r-h...@stat.math.ethz.ch/msg11585.html.

After that, I was terrified that any name I used might conflict with a
built in R function name.

Last month, i saw a student here with a political model and he used
rep as a variable in a regression model, it seemed to work just fine.
I surmise that the rise in usage of namespaces in R packages accounts
for that?

I'm sorry if this is too OT for r-help.

pj
-- 
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Care to share an R presentation?

2010-03-17 Thread Paul Johnson

The R movement is picking up steam in the center of America.  People
that ignored my R-enthusiasm 10 years ago are now calling me up asking
for presentations.  I need to make a 2  hour presentation to a
collection of faculty and grad students who might like to use R. I
don't want to make it seem too complicated (as I often do), but I
don't want to mislead them to think it will be easy.

I expect other r-help readers have been in this same situation.  I
have a recollection (5, 6 years ago) that one of the R leaders had a
slideshow for this exact purpose.  But I can't find it now. There is a
R-help similar request and both John Fox and  Deepayan Sarkar offered
links to their materials.  However, the links aren't valid anymore.

If I don't find a pre-existing example to work from, I'll slap
together a Beamer/Sweave presentation and post it where future speech
givers can get it.

-- 
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Shade area under curve

2010-03-10 Thread Paul Johnson

I use this to make illustration for some calculus notes. There are
examples of shaded areas in there:

### Filename: Normal1_2009_plotmathExample.R
### Paul Johnson June 3, 2009
### This code should be available somewhere in
http://pj.freefaculty.org/R.  If it is not
### email me 


###Set mu and sigma at your pleasure:
mu <- 10.03
sigma <- 12.5786786

myx <- seq( mu - 3.5*sigma,  mu+ 3.5*sigma, length.out=500)

myDensity <- dnorm(myx,mean=mu,sd=sigma)


# It is challenging to combine plotmath with values of mu and sigma in
one expression.
# Either bquote or substitute can be used.  First use bquote:

myTitle1 <- bquote (paste("x ~ Normal(", mu== .(round(mu,2)), ',',
sigma== .(round(sigma,2)),")") )

### Using substitute:
### myTitle1 <-  substitute( "x ~ Normal" ~~ group( "(", list(mu==mu1,
sigma^2==sigma2)#, ")") ,  list(mu1=round(mu,2),
sigma2=round(sigma^2,2)))

### Or combine the two:
### t1 <- substitute ( mu == a ,  list (a = mu))
### t2 <- substitute ( sigma == a, list(a = round(sigma,2)))
### myTitle1 <- bquote(paste("x ~ Normal(", .(t1),",", .(t2),")" ) )


## To save the plot in a file, change FALSE to TRUE
saveplot <- FALSE

if (saveplot == TRUE){
  pdf(file="Normal-2009.pdf", height=6.5, width=6.5, onefile=F,
paper="special", pointsize=10)

}else {
  dev.new(height=6.5, width=6.5,pointsize=9)
}
### xpd needed to allow writing outside strict box of graph
par(xpd=TRUE, ps=10)

plot(myx, myDensity, type="l", xlab="x", ylab="Probability Density ",
main=myTitle1, axes=FALSE)
axis(2, pos= mu - 3.6*sigma)
axis(1, pos=0)
lines(c(myx[1],myx[length(myx)]),c(0,0)) ### closes off axes

# bquote creates an expression that text plotters can use
t1 <-  bquote( mu== .(mu))

# Find a value of myx that is "very close to" mu
centerX <- max(which (myx <= mu))
# plot light vertical line under peak of density
lines( c(mu, mu), c(0, myDensity[centerX]), lty= 14, lwd=.2)

# label the mean in the bottom margin
mtext(bquote( mu == .(mu)), 1, at=mu, line=-1)

### find position 20% "up" vertically, to use for arrow coordinate
ss = 0.2 * max(myDensity)
# Insert interval to represent width of one sigma
arrows( x0=mu, y0= ss, x1=mu+sigma, y1=ss, code=3, angle=90, length=0.1)

### Write the value of sigma above that interval
t2 <-  bquote( sigma== .(round(sigma,2)))
text( mu+0.5*sigma, 1.15*ss, t2)

### Create a formula for the Normal
normalFormula <- expression (f(x) == frac (1, sigma* sqrt(2*pi)) *
e^{~~ - ~~ frac(1,2)~~ bgroup("(", frac(x-mu,sigma),")")^2})
# Draw the Normal formula
text ( mu + 0.5*sigma, max(myDensity)- 0.10 * max(myDensity),
normalFormula, pos=4)

### Theory says we should have 2.5% of the area to the left of: -1.96 * sigma.
### Find the X coordinate of that "critical value"
criticalValue <- mu -1.96 * sigma
### Then grab all myx values that are "to the left" of that critical value.
specialX <-  myx[myx <= criticalValue]

### mark the critical value in the graph
text ( criticalValue, 0 , label= paste(round(criticalValue,2)), pos=1)
### Take sequence parallel to values of myx inside critical region
specialY <- myDensity[myx < criticalValue]
#  Polygon makes a nice shaded illustration
polygon(c(specialX[1], specialX, specialX[length(specialX )]), c(0,
specialY, 0), density=c(-110),col="lightgray" )

shadedArea <- round(pnorm(mu - 1.96 * sigma, mean=mu, sd=sigma),4)


### I want to insert annotation about area on left side.

al1 <- bquote(Prob(x <= .(round(criticalValue,2
al2 <- bquote(phantom(0) == F( .(round(criticalValue,2)) ))
al3 <- bquote(phantom(0) == .(round(shadedArea,3)))

### Hard to position this text "just right"
### Have tried many ideas, this may be least bad.
### Get center position in shaded area
medX <- median(specialX)
### density at that center point:
denAtMedX <- myDensity[indexMed <- max(which(specialX < medX))]

text(medX, denAtMedX+0.0055, labels=al1)
text(medX, denAtMedX+0.004, labels=al2)
text(medX, denAtMedX+0.0025, labels=al3)

### point from text toward shaded area
arrows( x0=medX, y0=myDensity[indexMed]+0.002 ,x1= mu-2.5 *sigma, y1=
0.40*myDensity[length(specialX)] ,   length=0.1)



ss <- 0.1 * max(myDensity)
### Mark interval from mu to mu-1.96*sigma
arrows( x0=mu, y0= ss, x1=mu-1.96*sigma, y1=ss, code=3, angle=90, length=0.1)
### Put text above interval
text( mu - 2.0*sigma,1.15*ss,
bquote(paste(.(round(criticalValue,2)),phantom(1)==mu-1.96 *
sigma,sep="")),pos=4 )




criticalValue <- mu +1.96 * sigma
### Then grab all myx values that are "to the left" of that critical value.
specialX <-  myx[myx >= criticalValue]

### mark the critical value in the graph
text ( criticalValue, 0 , label= paste(round(criticalValue,2)), pos=1)
### Take

Re: [R] PCA

2010-03-10 Thread Paul Johnson

On Wed, Mar 10, 2010 at 4:42 PM, Xanthe Walker  wrote:
> Hello,
>
> I am trying to complete a PCA on a set of standardized ring widths from 8
> different sites (T10, T9, T8, T7, T6, T5, T3, and T2).
> The following is a small portion of my data:
>
> T10 T9 T8 T7 T6 T5 T3 T2 1.33738 0.92669 0.91146 0.98922 0.9308 0.88201
> 0.92287 0.91775 0.82181 1.05319 0.92908 0.97971 0.95165 0.98029 1.14048
> 0.77803 0.88294 0.96413 0.90893 0.87957 0.9961 0.74926 0.71394 0.70877
> 1.07549 1.13311 1.23536 1.19382 1.2362 1.07741 1.20334 0.8727 0.77737
> 0.99292 0.92703 1.02384 0.99831 1.1317 0.93672 1.07909 0.88933 1.15587
> 1.20885 0.8983 1.06476 0.81845 1.09017 0.72909 0.75347 0.95826 0.90922
> 0.73869 0.74846 0.70481 0.49826 0.91824 1.39082 1.1281 1.05147 0.95839
> 1.20648 1.24587 0.65045 1.23251 0.80977 0.89714 0.90042 0.9543 0.86217
> 1.20818 0.82725 0.7666 1.11092 1.10328 1.16464 1.00707 1.09575 1.04647
> 0.79045 0.47331 0.88753 1.04699 1.0854 0.91803 1.03622 0.80624 0.905 1.28271
> 0.91963 0.90121 0.89136 0.97408 1.0449 1.00572 0.7703 1.48373 1.31837
> 0.97733 1.04229 1.23096 1.14002 1.09911 0.77523 1.31543
>
> This is what I have done in R:
>
> rm(list=ls(all=TRUE))
> PCA<-read.csv("PCAsites.csv", header=TRUE)
> PCA
> attach(PCA)
> names(PCA)
>
> ###using correlation matrix#
>
> p1 <- princomp(PCA, cor = TRUE)
> summary(p1)
> loadings(p1)
> plot(p1)
> biplot(p1)
> p1$scores
> screeplot(p1, npcs=4, type="lines")
>
> The problem is that the purpose of this analysis is to derive a new data
> set, using the first component, that I can work with. In other words, I am
> doing this to compress the data into one column of ring widths. How do I
> derive a new data set?
>

the output of p1$scores is what you want, isn't it?  Or just one column of it?

desiredScores <- p1$scores[ , 1]

p



-- 
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

1 2 >

1 - 100 of 159 matches

Mail list logo