[R] mgcv::gam() scale parameter estimates for quasibinomial error models

2021-04-14 Thread John Maindonald
For both glm() and mgcv::gam() quasibinomial error models, the summary
object has a dispersion value that has the same role as sigma^2 in the
summary object for lm() model fits.

Where some fitted probabilities are small, the `gam()` default scale parameter 
estimates, returned as `scale` (and `sig2`) in the gam object and as 
“dispersion" 
in the summary object, can differ wildly from the Pearson values that `glm()`
works with, and that can be obtained by calling `gam()` with a suitable control
setting (see the code that now follows.)

The following demonstrates the issue:

  ## ‘mgcv’ version 1.8-34
  Cholera <- HistData:: Cholera
  library(mgcv)
  form <- cbind(cholera_deaths, popn-cholera_deaths) ~ 
 water + elevation + poor_rate
  default.gam <- gam(form, data=Cholera, family=quasibinomial)
  pearson.gam <- update(quasibin.gam, control=gam.control(scale.est=“pearson"))

  c(Pearson=pearson.gam$scale, Default=default.gam$scale)
  ##  Pearson Default 
  ## 33.545829  2.919535 

My own calculation (from either fitted model), returns 30.07 for the
(Fletcher 2012) version of the dispersion that was, according to 
Wood’s “Generalized Additive Models” (2nd edn, 2017, p.111),
returned as the GAM scale estimate at the time when the book was 
written. 

The default scale estimates returned by `gam()` vary wildly, relative 
to the relatively stable “pearson" estimates, in data that are simulated
to have comparable dispersion estimates.

For the Cholera data, it would make good sense to fit a model
with quasipoisson errors to the death counts, using log(popn) as an
offset.  The GAM model then uses, with default setting for `scale.est`, 
a scale parameter that is close to that returned by "pearson.gam”. 
SEs (as well as coefficients) are similar to those returned by 
"pearson.gam”. 

The detailed calculations that are documented in the following 
documents may be of some general interest.
  https://www.dropbox.com/s/vl9usat07urbgel/quasibin-gam.pdf?dl=0
  or https://www.dropbox.com/s/s83mh1mut5xc3gk/quasibin-gam.html?dl=0

I am posting this here now before posting a bug report in case I have
missed something important.  It would be useful to be directed to the 
mgcv code used for calculation of what is returned as the Fletcher statistic.

  Rmd file: https://www.dropbox.com/s/ghmsdcvgxp068bs/quasibin-gam.Rmd?dl=0
  .bib file: https://www.dropbox.com/s/r1yjqx0sni2pzjy/quasi.bib?dl=0

John Maindonald email: john.maindon...@anu.edu.au


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R-help Digest, Vol 217, Issue 20

2021-03-20 Thread John Maindonald
No, it is not distribution free.  Independent random sampling is assumed.
That is a non-trivial assumption, and one that is very often not true or not
strictly true.


John Maindonald email: 
john.maindon...@anu.edu.au<mailto:john.maindon...@anu.edu.au>


On 21/03/2021, at 00:00, 
r-help-requ...@r-project.org<mailto:r-help-requ...@r-project.org> wrote:

From: Jiefei Wang mailto:szwj...@gmail.com>>
Subject: Re: [R] about a p-value < 2.2e-16
Date: 20 March 2021 at 04:41:33 NZDT
To: Spencer Graves 
mailto:spencer.gra...@effectivedefense.org>>
Cc: Bogdan Tanasa mailto:tan...@gmail.com>>, Vivek Das 
mailto:vd4mm...@gmail.com>>, r-help 
mailto:r-help@r-project.org>>


Hi Spencer,

Thanks for your test results, I do not know the answer as I haven't
used wilcox.test for many years. I do not know if it is possible to compute
the exact distribution of the Wilcoxon rank sum statistic, but I think it
is very likely, as the document of `Wilcoxon` says:

This distribution is obtained as follows. Let x and y be two random,
independent samples of size m and n. Then the Wilcoxon rank sum statistic
is the number of all pairs (x[i], y[j]) for which y[j] is not greater than
x[i]. This statistic takes values between 0 and m * n, and its mean and
variance are m * n / 2 and m * n * (m + n + 1) / 12, respectively.

As a nice feature of the non-parametric statistic, it is usually
distribution-free so you can pick any distribution you like to compute the
same statistic. I wonder if this is the case, but I might be wrong.

Cheers,
Jiefei



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Regarding R licensing usage guidance

2019-07-23 Thread John Maindonald
I think this unfair, certainly the ‘odious’ comment.  Very many of us
have been able to contribute because supported to do such work
while occupying relatively comfortable academic positions.  In the
process of getting there, we have benefited enormously from the
work of those who have gone before.  I do not know what Anamika’s
position is, but he may well be trying to build a business from a
much less privileged starting point than others of us may have
enjoyed.  In any case, the comment strikes a tone that is out of
place in respectful discourse.


John Maindonald email: 
john.maindon...@anu.edu.au<mailto:john.maindon...@anu.edu.au>


On 23/07/2019, at 22:00, 
r-help-requ...@r-project.org<mailto:r-help-requ...@r-project.org> wrote:

From: Jeff Newmiller mailto:jdnew...@dcn.davis.ca.us>>
Subject: Re: [R] Regarding R licensing usage guidance
Date: 23 July 2019 at 07:00:26 NZST
To: mailto:r-help@r-project.org>>, ANAMIKA KUMARI 
mailto:anamika1...@gmail.com>>


Your internet skills are pathetic. Search Google for "proprietary use gpl" and 
the first hit is

https://opensource.stackexchange.com/questions/7078/is-it-legal-to-use-gpl-code-in-a-proprietary-closed-source-program-by-putting-i

Note that there are (at least) three obvious alternatives if there is any 
question in your case: release the code under GPL but also sell it with support 
(a la RStudio); only use it yourself (don't distribute it at all); or only use 
R for setting up your models but re-engineer implementations of the run-time 
prediction calculations yourself (often much easier than the initial algorithm 
development).

I think your desperation to steal the hard work of the various R contributors 
seems quite odious.

On July 22, 2019 6:55:31 AM PDT, ANAMIKA KUMARI 
mailto:anamika1...@gmail.com>> wrote:
Hello Team,

This mail is in reference to understanding R  license and also usage of
R
language to develop commercialised product.

I am working on one product of mine that uses R and python language.I
am
trying to understand the licensing issue if any related to R, if I am
going
to commercialise my product and keep my work proprietary.

I need your help to understand it. R comes under GNU-GPL-2.0. Now, do I
need to share source code of my product. if, I am moving planning  to
move
it to production or I can keep my code Proprietary.

Please note that I am just using R and its packages to  develop my own
statistical tool and api and Have not done any changes to existing R
code.

Please refer below for all R-packages used in my code:-

 1.
*R-3.4.4 *
 2. *'spacyr'*
 3.
*'jsonlite' *
 4.
*'lubridate' *
 5.
*'data.table' *
 6.
*'png' *
 7.
*'maps' *
 8.
*'countrycode' *
 9.
*'humaniformat' *
 10.
*'ngram' *
 11.
*'stringr' *
 12.
*'slam' *
 13.
*'tm' *
 14.
*'lsa' *
 15.
*'RTextTools' *
 16.
*'stringi' *
 17.
*'plumber' *
 18. *"Rook"*
 19. *"pdftools"*
 20. *'tokenizers'*
 21. *'zoo'*
 22. *"tidyr"*
 23. *"reqres"*
 24. *"rJava"*
 25. *"tiff"*
 26. *"splitstackshape"*
 27. *"stringdist"*
 28. *"RJSONIO"*
 29. *"ropensci/tabulizer"*
 30. *"staplr"*
 31. *"SparseM"*
 32. *"randomForest"*
 33. *"e1071"*
 34. *"ipred"*
 35. *"caTools"*
 36. *RCMD INSTALL maxent_1.3.3.1.tar.gz*
 37. *RCMD INSTALL tree_1.0-39.tar.gz*
 38. *RCMD INSTALL RTextTools_1.4.2.tar.gz*


*Any help from you will be highly appreciated as I am literally stuck
at a
dead end.*

*Regards*
*Anamika Kumari*

[[alternative HTML version deleted]]

__
R-help@r-project.org<mailto:R-help@r-project.org> mailing list -- To 
UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html<http://www.r-project.org/posting-guide.html>
and provide commented, minimal, self-contained, reproducible code.

--
Sent from my phone. Please excuse my brevity.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] p values from GLM

2016-04-03 Thread John Maindonald
How small does a p-value need to be to warrant attention, however?

Witness Fisher’s comment that:
“. . . we may, if we prefer it, draw the line at one in fifty (the 2 per cent 
point), or one in a hundred (the 1 per cent point). Personally, the writer 
prefers to set a low standard of significance at the 5 per cent point, and 
ignore entirely all results which fail to reach this level. A scientific fact 
should be regarded as experimentally established only if a properly designed 
experiment rarely fails to give this level of significance.”
[Fisher RA (1926), “The Arrangement of Field Experiments,” Journal of the 
Ministry of Agriculture of Great Britain, 33, 503-513.]

See the selection of Fisher quotes at http://www.jerrydallal.com/lhsp/p05.htm .

In contexts where a p <= 0.05 becomes more likely under the NULL (not the case 
if the experiment might just as well have been a random number generator), 
small P-values shift the weight of evidence.  An alternative that is apriori 
highly unlikely takes a lot of shifting.


John Maindonald email: 
john.maindon...@anu.edu.au<mailto:john.maindon...@anu.edu.au>


On 3/04/2016, at 22:00, 
r-help-requ...@r-project.org<mailto:r-help-requ...@r-project.org> wrote:

From: Heinz Tuechler <tuech...@gmx.at<mailto:tuech...@gmx.at>>
Subject: Re: [R] p values from GLM
Date: 3 April 2016 11:00:50 NZST
To: Bert Gunter <bgunter.4...@gmail.com<mailto:bgunter.4...@gmail.com>>, Duncan 
Murdoch <murdoch.dun...@gmail.com<mailto:murdoch.dun...@gmail.com>>
Cc: r-help <R-help@r-project.org<mailto:R-help@r-project.org>>



Bert Gunter wrote on 01.04.2016 23:46:
... of course, whether one **should** get them is questionable...

http://www.nature.com/news/statisticians-issue-warning-over-misuse-of-p-values-1.19503#/ref-link-1

This paper repeats the common place statement that a small p-value does not 
necessarily indicate an important finding. Agreed, but maybe I overlooked 
examples of important findings with large p-values.
If there are some, I would be happy to get to know some of them. Otherwise a 
small p-value is no guarantee of importance, but a prerequisite.

best regards,

Heinz


Cheers,
Bert



Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Fri, Apr 1, 2016 at 3:26 PM, Duncan Murdoch 
<murdoch.dun...@gmail.com<mailto:murdoch.dun...@gmail.com>> wrote:
On 01/04/2016 6:14 PM, John Sorkin wrote:
How can I get the p values from a glm ? I want to get the p values so I
can add them to a custom report


  fitwean<-
glm(data[,"JWean"]~data[,"Group"],data=data,family=binomial(link ="logit"))
  summary(fitwean) # This lists the coefficeints, SEs, z and p
values, but I can't isolate the pvalues.
  names(summary(fitwean))  # I see the coefficients, but not the p values
  names(fitmens)  # p values are not found here.

Doesn't summary(fitwean) give a matrix? Then it's
colnames(summary(fitwean)$coefficients) you want, not names(fitwean).

Duncan Murdoch

P.S. If you had given a reproducible example, I'd try it myself.




Thank you!
John

John David Sorkin M.D., Ph.D.
Professor of Medicine
Chief, Biostatistics and Informatics
University of Maryland School of Medicine Division of Gerontology and
Geriatric Medicine
Baltimore VA Medical Center
10 North Greene Street
GRECC (BT/18/GR)
Baltimore, MD 21201-1524
(Phone) 410-605-7119
(Fax) 410-605-7913 (Please call phone number above prior to faxing)

Confidentiality Statement:
This email message, including any attachments, is for the sole use of the
intended recipient(s) and may contain confidential and privileged
information. Any unauthorized use, disclosure or distribution is prohibited.
If you are not the intended recipient, please contact the sender by reply
email and destroy all copies of the original message.
__
R-help@r-project.org<mailto:R-help@r-project.org> mailing list -- To 
UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html<http://www.r-project.org/posting-guide.html>
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org<mailto:R-help@r-project.org> mailing list -- To 
UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html<http://www.r-project.org/posting-guide.html>
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org<mailto:R-help@r-project.org> mailing list -- To 
UNSUBSCRIBE and more, see
https://

Re: [R] SWEAVE - a gentle introduction

2015-11-18 Thread John Maindonald
What’s more, for pdf output one can use R Markdown and judiciously
sneak in html and/or LaTeX (consider however what the processing
steps might do to such markup).


John Maindonald email: 
john.maindon...@anu.edu.au<mailto:john.maindon...@anu.edu.au>


On 19/11/2015, at 00:00, 
r-help-requ...@r-project.org<mailto:r-help-requ...@r-project.org> wrote:

From: Duncan Murdoch <murdoch.dun...@gmail.com<mailto:murdoch.dun...@gmail.com>>
Subject: Re: [R] SWEAVE - a gentle introduction
Date: 18 November 2015 08:09:34 NZDT
To: Marc Schwartz <marc_schwa...@me.com<mailto:marc_schwa...@me.com>>, John 
Sorkin <jsor...@grecc.umaryland.edu<mailto:jsor...@grecc.umaryland.edu>>
Cc: R-help <r-help@r-project.org<mailto:r-help@r-project.org>>


On 17/11/2015 10:42 AM, Marc Schwartz wrote:

On Nov 17, 2015, at 9:21 AM, John Sorkin 
<jsor...@grecc.umaryland.edu<mailto:jsor...@grecc.umaryland.edu>> wrote:

I am looking for a gentle introduction to SWEAVE, and would appreciate 
recommendations.
I have an R program that I want to run and have the output and plots in one 
document. I believe this can be accomplished with SWEAVE. Unfortunately I don't 
know HTML, but am willing to learn. . . as I said I need a gentle introduction 
to SWEAVE.
Thank you,
John



John,

A couple of initial comments.

First, you will likely get some recommendations to also consider using Knitr:

  http://yihui.name/knitr/

which I do not use myself (I use Sweave), but to be fair, is worth considering 
as an alternative.

He did, and I'd agree with them.  I've switched to knitr for all new projects 
and some old ones.  knitr should be thought of as Sweave version 2.

Duncan Murdoch


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R wont accept my zero count values in the GLM with quasi_poisson dsitribution

2015-07-28 Thread John Maindonald
A further point to note is that with a log link, SEs for comparisons with any 
factor
level where counts are all zero will be huge and meaningless.  This phenomenon
has the name Hauck-Donner effect, though more commonly so identified for
comparisons with categories with very low or very high estimated proportions
for binomial data.  For the poisson or quasipoisson family, use of the sqrt link
avoids this problem.


John Maindonald email: 
john.maindon...@anu.edu.aumailto:john.maindon...@anu.edu.au


On 28/07/2015, at 22:00, 
r-help-requ...@r-project.orgmailto:r-help-requ...@r-project.org wrote:

From: Andrew Robinson 
a.robin...@ms.unimelb.edu.aumailto:a.robin...@ms.unimelb.edu.au
Subject: Re: [R] R wont accept my zero count values in the GLM with 
quasi_poisson dsitribution
Date: 28 July 2015 18:59:51 NZST
To: Charlotte 
charlotte.hu...@griffithuni.edu.aumailto:charlotte.hu...@griffithuni.edu.au
Cc: R help (r-help@r-project.orgmailto:r-help@r-project.org) 
r-help@r-project.orgmailto:r-help@r-project.org


You have selected the binomial family in the call to glm.  You should
instead try something like

family=quasipoisson(link = log)

I hope this helps

Andrew

On Tue, Jul 28, 2015 at 4:33 PM, Charlotte 
charlotte.hu...@griffithuni.edu.aumailto:charlotte.hu...@griffithuni.edu.au 
wrote:

Hello

I have count values for abundance which follow a pattern of over-dispersal
with many zero values.  I have read a number of documents which suggest
that
I don't use data transforming methods but rather than I run the GLM with
the
quasi poisson distribution.  So I have written my script and R is telling
me
that Y should be more than 0.

Everything I read tells me to do it this way but I can't get R to agree.
Did I need to add something else to my script to get it to work and keep my
data untransformed? The script I wrote is as follows:

fit - glm(abundance~Gender,data=teminfest,family=binomial())

then I get this error
Error in eval(expr, envir, enclos) : y values must be 0 = y = 1

I don't use R a lot so I am having trouble figuring out what to do next.

I would appreciate some help

Many Thanks
Charlotte






--
View this message in context:
http://r.789695.n4.nabble.com/R-wont-accept-my-zero-count-values-in-the-GLM-with-quasi-poisson-dsitribution-tp4710462.html
Sent from the R help mailing list archive at Nabble.comhttp://nabble.com/.

__
R-help@r-project.orgmailto:R-help@r-project.org mailing list -- To 
UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.htmlhttp://www.r-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




--
Andrew Robinson
Deputy Director, CEBRA, School of Biosciences
Reader  Associate Professor in Applied Statistics  Tel: (+61) 0403 138 955
School of Mathematics and StatisticsFax: +61-3-8344
4599
University of Melbourne, VIC 3010 Australia
Email: a.robin...@ms.unimelb.edu.aumailto:a.robin...@ms.unimelb.edu.au
Website: http://www.ms.unimelb.edu.au/~andrewpr

MSME: http://www.crcpress.com/product/isbn/9781439858028
FAwR: http://www.ms.unimelb.edu.au/~andrewpr/FAwR/
SPuR: http://www.ms.unimelb.edu.au/spuRs/


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] R: Re: Differences in output of lme() when introducing interactions

2015-07-23 Thread John Maindonald
Do you have legal advice that suing the University (if that is the right 
context)
would actually be a fruitful way forwards, that it would achieve anything
useful within reasonable time and without causing the student severe
financial risk?

What may work in that context is for students to collectively complain that
important aspects of their training and support are being neglected.
With the rapidity of recent technological change, the issue is widespread.
To an extent, able post-docs and PhDs have to lead the charge in getting
training and support updated and brought into the modern world.


John Maindonald email: 
john.maindon...@anu.edu.aumailto:john.maindon...@anu.edu.au

On 22/07/2015, at 22:00, 
r-help-requ...@r-project.orgmailto:r-help-requ...@r-project.org wrote:

Da: li...@dewey.myzen.co.ukmailto:li...@dewey.myzen.co.uk
Data: 21-lug-2015 11.58
A: 
angelo.arc...@virgilio.itmailto:angelo.arc...@virgilio.itangelo.arc...@virgilio.itmailto:angelo.arc...@virgilio.it,
 bgunter.4...@gmail.commailto:bgunter.4...@gmail.com
Cc: r-help@r-project.orgmailto:r-help@r-project.org
Ogg: Re: R: Re: [R] R: Re: Differences in output of lme() when introducing 
interactions

Dear Angelo

I suggest you do an online search for marginality which may help to
explain the relationship between main effects and interactions. As I
said in my original email this is a complicated subject which we are not
going to retype for you.

If you are doing this as a student I suggest you sue your university for
failing to train you appropriately and if it is part of your employment
I suggest you find a better employer.

On 21/07/2015 10:04, 
angelo.arc...@virgilio.itmailto:angelo.arc...@virgilio.it wrote:
Dear Bert,
thank you for your feedback. Can you please provide some references
online so I can improve my ignorance?
Anyways, please notice that it is not true that I do not know statistics
and regressions at all, and I am strongly
convinced that my question can be of interest for some one else in the
future.

This is what forums serve for, isn't it? This is why people help each
other, isn't it?

Moreover, don't you think that I would not have asked to this R forum if
I had the possibility to ask or pay a statician?
Don't you think I have done already my best to study and learn before
posting this message? Trust me, I have read different
online tutorials on lme and lmer, and I am confident that I have got the
basic concepts. Still I have not found the answer
to solve my problem, so if you know the answer can you please give me
some suggestions that can help me?

I do not have a book where to learn and unfortunately I have to analyze
the results soon. Any help? Any online reference to-the-point
that can help me in solving this problem?

Thank you in advance

Best regards

Angelo


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Error in lm() with very small (close to zero) regressor

2015-03-29 Thread John Maindonald
There are two issues:
1) Damage was done to accuracy in the calculation of t(X) %*% X
Where the mean to SD ratio is large, maybe even of the order of
100 or so, centre that column.

2) pseudo inverse() goes awry with columns of X that are of the
order of large negative (or, I expect, +ve) powers of ten.
were supplied to it. I�d guess that this has to do with the way that
a check for singularity is implemented.

Ben�s example:

set.seed(101)
n_obs - 1000
y  - rnorm(n_obs, 10,2.89)
x1 - rnorm(n_obs, mean=1.235657e-14,sd=4.5e-17)
x2 - rnorm(n_obs, 10,3.21)
X  - cbind(rep(1,n_obs),x1,x2)

coef(lm(y~x1+x2))
##   (Intercept)x1x2
##  1.155959e+01 -1.658420e+14  3.797342e-02

library(corpcor)
t(cbind(coef(lm(y~x1+x2)),
   as.vector(pseudoinverse(t(X) %*% X) %*% t(X) %*% y)))
##  (Intercept)x1 x2
## [1,]   11.559587 -1.658420e+14 0.03797342
## [2,] 9.511348  1.151908e-13  0.03788714

## Examine mean to SD ratios
round(c(mean(x1)/sd(x1), mean(x2)/sd(x2)), 2)
## [1] 263.65   3.28
## Notice that the coefficients for x2, where the mean/sd ratio
## is smaller, roughly agree.

## Damage was done to accuracy in the calculation of t(X) %*% X
## (but there is more to it than that, as we will see).

## Try
xc1 - scale(x1,center=TRUE, scale=FALSE)
xc2 - scale(x2,center=TRUE, scale=FALSE)
Xc - cbind(rep(1,n_obs),x1,x2)
as.vector(pseudoinverse(t(Xc) %*% Xc) %*% t(Xc) %*% y)
## [1] 9.511348e+00 1.151908e-13 3.788714e-02

## Note now, however, that one should be able to dispense with
## the column of 1s, with no change to the coefficients of x1  x2
Xc0 - cbind(xc1,xc2)
as.vector(pseudoinverse(t(Xc0) %*% Xc0) %*% t(Xc0) %*% y)
## [1] 1.971167e-20 3.788714e-02

##
## Now try a more sensible scaling for xc1
Xcs - cbind(rep(1,n_obs), xc1*1e14, xc2)
Xcs0 - cbind(xc1*1e14, xc2)

t(cbind(coef(lm(y~I(1e14*xc1)+xc2)),
as.vector(pseudoinverse(t(Xcs) %*% Xcs) %*% t(Xcs) %*% y)))
##  (Intercept) I(1e+14 * xc1)xc2
## [1,]9.899249   -1.65842 0.03797342
## [2,]9.899249   -1.65842 0.03797342

## Eureka!

## cf also
as.vector(pseudoinverse(t(Xcs0) %*% Xcs0) %*% t(Xcs0) %*% y)
## [1] -1.65842037  0.03797342


John Maindonald email: 
john.maindon...@anu.edu.aumailto:john.maindon...@anu.edu.au

Centre for Mathematics  Its Applications,

Australian National University, Canberra ACT 0200.


On 29/03/2015, at 23:00, 
r-help-requ...@r-project.orgmailto:r-help-requ...@r-project.org 
r-help-requ...@r-project.orgmailto:r-help-requ...@r-project.org wrote:

From: Ben Bolker bbol...@gmail.commailto:bbol...@gmail.com
Subject: Re: [R] Error in lm() with very small (close to zero) regressor
Date: 29 March 2015 6:28:22 NZDT
To: r-h...@stat.math.ethz.chmailto:r-h...@stat.math.ethz.ch


peter dalgaard pdalgd at gmail.comhttp://gmail.com/ writes:



On 28 Mar 2015, at 00:32 , RiGui raluca.gui at 
business.uzh.chhttp://business.uzh.ch/ wrote:

Hello everybody,

I have encountered the following problem with lm():

When running lm() with a regressor close to zero -
of the order e-10, the
value of the estimate is of huge absolute value , of order millions.

However, if I write the formula of the OLS estimator,
in matrix notation:
pseudoinverse(t(X)*X) * t(X) * y , the results are correct, meaning the
estimate has value 0.

 How do you know this answer is correct?

here is the code:

y  - rnorm(n_obs, 10,2.89)
x1 - rnorm(n_obs, 0.01235657,0.45)
x2 - rnorm(n_obs, 10,3.21)
X  - cbind(x1,x2)

Eh?  You want
X  - cbind(rep(1,n_obs),x1,x2)

bFE - lm(y ~ x1 + x2)
bFE

bOLS - pseudoinverse(t(X) %*% X) %*% t(X) %*% y
bOLS


Note: I am applying a deviation from the
mean projection to the data, that
is why I have some regressors with such small values.

Thank you for any help!

Raluca

 Is there a reason you can't scale your regressors?


Example not reproducible:


 I agree that the OP's question was not reproducible, but it's
not too hard to make it reproducible. I bothered to use
library(sos); findFn(pseudoinverse) to find pseudoinverse()
in corpcor:

It is true that we get estimates with very large magnitudes,
but their

set.seed(101)
n_obs - 1000
y  - rnorm(n_obs, 10,2.89)
x1 - rnorm(n_obs, mean=1.235657e-14,sd=4.5e-17)
x2 - rnorm(n_obs, 10,3.21)
X  - cbind(x1,x2)

bFE - lm(y ~ x1 + x2)
bFE
coef(summary(bFE))

Estimate   Std. Error t value  Pr(|t|)
(Intercept)  1.155959e+01 2.312956e+01  0.49977541 0.6173435
x1  -1.658420e+14 1.872598e+15 -0.08856254 0.9294474
x2   3.797342e-02 2.813000e-02  1.34992593 0.1773461

library(corpcor)
bOLS - pseudoinverse(t(X) %*% X) %*% t(X) %*% y
bOLS

[,1]
[1,] 9.807664e-16
[2,] 8.880273e-01

And if we scale the predictor:

bFE2 - lm(y ~ I(1e14*x1) + x2)
coef(summary(bFE2))

Estimate Std. Error t value  Pr(|t|)
(Intercept)   11.55958731   23.12956  0.49977541 0.6173435
I(1e+14 * x1) -1.65842037   18.72598 -0.08856254 0.9294474
x2

Re: [R] the making of _R_ eBooks

2015-03-24 Thread John Maindonald
Thanks for that.  Useful to have that question asked and to get that
information.


John Maindonald email: 
john.maindon...@anu.edu.aumailto:john.maindon...@anu.edu.au


On 25/03/2015, at 0:00, 
r-help-requ...@r-project.orgmailto:r-help-requ...@r-project.org wrote:

From: John McKown 
john.archie.mck...@gmail.commailto:john.archie.mck...@gmail.com
Subject: Re: [R] the making of _R_ eBooks
Date: 24 March 2015 1:36:38 NZDT
To: Dr. Wolfgang Lindner lindn...@t-online.demailto:lindn...@t-online.de
Cc: Help R r-help@r-project.orgmailto:r-help@r-project.org


On Mon, Mar 23, 2015 at 3:50 AM, Dr. Wolfgang Lindner
lindn...@t-online.demailto:lindn...@t-online.de wrote:
Dear list members,

I like the look and feel of the eBook versions of the R manuals very much.
So I would like to generate eBooks (teaching material etc) in that look.

I am not an expert. But I have looked at the source, so I can give you
some information.


Q1: is there a description how the _R_ ebooks have been produced?

Looking at the source, it appears that the source manuals are in a
document markup language called GNU Texinfo.
https://www.gnu.org/software/texinfo/
You can think of this as something akin to, but different from, HTML
or markdown encoding. Texinfo is an evolution by the system first
designed by Richard Stallman of MIT. He is the driving force behind
the GPL and most of the GNU software which forms the basis of the user
space commands for Linux and the *BSD operating systems. Texinfo is
then converted to TeX. TeX is the typesetting language designed by Dr.
Donald Knuth. TeX, nominally, is converted into a DVI printer control
language (DeVice Independent). But in the case of creating a PDF file,
there is a processor called pdftex,
http://en.wikipedia.org/wiki/PdfTeX, which produces a PDF file as
output . A good site for TeX is https://tug.org/

Texinfo has the plus of also having processor which will convert it to
UNIX man (manual) pages and HTML web pages. So one source document
can generate three different types of output document file types.

Most people use a enhanced TeX called LaTeX instead of plain TeX
when using TeX. LaTeX can be read up on here:
http://www.latex-project.org/ A good TeX document processor is
TeXstudio at http://texstudio.sourceforge.net/ . I use this one myself
(which is not necessary a strong endorsement because I'm nobody
special).

I feel the need to warn you that TeX is very powerful and, at least to
me, quite difficult, with a fairly step learning curve. Which may be
why the R project uses Texinfo because it is quite a bit easier to
learn.


Q2: which (free) software was used for them?

See the links above. On Fedora Linux, I get the TeX oriented software
from a bunch of packages which start with texlive. More information,
including the processors for Linux, Windows, and Mac are at
https://www.tug.org/texlive/

Q3: any other recommendations?

You might consider LyX.
http://www.lyx.org/
LyX is a document processor. It would likely be easier to use than the
above if you are used to MS Word or other word processing system. It
is cross platform: Linux, Windows, and Mac. It stores files in its own
textual format, which is somewhat human readable. LyX, like Texinfo,
translates its format into TeX as an intermediate on its way to its
ultimate destination. I am still learning LyX, but I personally like
it.

Your mention of LibreOffice is also a fairly good one. I, personally,
use LibreOffice. But I don't use it for big documents. I have a
learned aversion for word processors because it is so easy for them to
be misused. In my opinion, a good document needs good metadata in it
as well as just looking pretty. Word processor users tend to focus
on the format and not the content. That's just my opinion, based on
what I've seen where I work.


Seaching the internet gives me e.g.
[1]
https://sites.google.com/site/richardbyrnepdsite/ebooks-and-audiobooks/create-your-own-ebooks
[2]  
opensource.com/life/13/8/how-create-ebook-open-source-wayhttp://opensource.com/life/13/8/how-create-ebook-open-source-way
[3] http://scottnesbitt.net/ubuntublog/creating-a-ebook-with-libreoffice-writer/

but I m not sure, if there are better possibilities..

Thanks for any hint or link by expert R users.

Oh, well, that excludes me. I'm not an expert. But maybe it was helpful anyway.


Wolfgang Lindner
Leichlingen, Germany

--
If you sent twitter messages while exploring, are you on a textpedition?

He's about as useful as a wax frying pan.

10 to the 12th power microphones = 1 Megaphone

Maranatha! 
John McKown



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] rpart and randomforest results

2014-04-08 Thread John Maindonald
The function rpart may well overfit if the value of the CP statistic
is left at its default.  Use the functions printcp() and plotcP() to
check how the cross-validation estimate of relative ‘error’ (xerror)
changes with the number of splits (NB that the CP that leads to a
further split changes monotonically with the number of splits).
The ‘rel error’ column from printcp() can be hopelessly optimistic.


John Maindonald email: 
john.maindon...@anu.edu.aumailto:john.maindon...@anu.edu.au

phone : +61 2 (6125)3473fax  : +61 2(6125)5549

Centre for Mathematics  Its Applications, Room 1194,

John Dedman Mathematical Sciences Building (Building 27)

Australian National University, Canberra ACT 0200.


On 8/04/2014, at 8:00 pm, 
r-help-requ...@r-project.orgmailto:r-help-requ...@r-project.org wrote:

From: r-help-boun...@r-project.orgmailto:r-help-boun...@r-project.org 
[mailto:r-help-boun...@r-project.org] On Behalf Of Schillo, Sonja
Sent: Thursday, April 03, 2014 3:58 PM
To: Mitchell Maltenfort
Cc: r-help@r-project.orgmailto:r-help@r-project.org
Subject: Re: [R] rpart and randomforest results

Hi,

the random forest should do that, you're totally right. As far as I know it 
does so by randomly selecting the variables considered for a split (but here we 
set the option for how many variables to consider at each split to the number 
of variables available so that I thought that the random forest does not have 
the chance to randomly select the variables). The next thing that randomforest 
does is bootstrapping. But here again we set the option to the number of cases 
we have in the data set so that no bootstrapping should be done.
We tried to take all the randomness from the randomforest away.

Is that plausible and does anyone have another idea?

Thanks
Sonja


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Drawing a line in xyplot

2012-04-09 Thread John Maindonald
PS: The following shows possibilities that are available using latticeExtra 
layering:

## Best make type and attend factors
xdat = data.frame(mortality =c(5, 
8,7,5,8,10,11,6,4,5,20,25,27,30,35,32,28,21,20,34,11,15,18,12,15,12,10,15,19,20),
 type=
factor(c(1, 1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,3,3,3,3,3,3,3,3,3,3)),
attend = factor(c(1, 
0,1,1,1,1,0,0,0,1,0,1,0,0,0,0,1,0,0,0,1,1,1,0,1,0,1,0,0,0)))


gph - xyplot (mortality ~ attend|type, pch=16,
 data = xdat, aspect=2:1,layout=c(3,1))

one4all - layer(avy - median(xdat$mortality),
  panel.segments(0.1, avy, 0.3, avy, col=red,lwd=4),
  panel.segments(0.7, avy, 1, avy, col=red,lwd=4))

medbytype - layer(avy - median(y),
  panel.segments(0.1, avy, 0.3, avy, col=red,lwd=4),
  panel.segments(0.7, avy, 1, avy, col=red,lwd=4))

interact - layer(panel.average(x, y, fun=median, col='red', lwd=4))

Compare

gph + one4all
(shows overall median lines in all 3 panels)

gph + medbytype
(shows separate median lines for the separate panels)

gph+interact
(gives a form of interaction plot)

NB x (if its values are used) and y are local to the individual panel

NB also that layer() accepts a data argument.  The following is an
alternative way to calculate one4all:

one4all - layer(data=xdat, avy - median(mortality),
  panel.segments(0.1, avy, 0.3, avy, col=red,lwd=4),
  panel.segments(0.7, avy, 1, avy, col=red, lwd=4))

John Maindonald.


 This can be simplified by using the layering abilities that Felix Andrews 
 made available in latticeExtra.  These are too little known.  These pretty
 much make it unnecessary to resort to trellis.focus(), at least in such
 cases as this.  These layering abilities are too little known:
 
 library(latticeExtra)
 x11(height=8,width=11)
 xdat = data.frame(mortality =c(5, 
 8,7,5,8,10,11,6,4,5,20,25,27,30,35,32,28,21,20,34,11,15,18,12,15,12,10,15,19,20),
  type=
 c(1, 1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,3,3,3,3,3,3,3,3,3,3), attend = 
 c(1, 0,1,1,1,1,0,0,0,1,0,1,0,0,0,0,1,0,0,0,1,1,1,0,1,0,1,0,0,0))
 gph - xyplot ( mortality ~ attend|type,
 panel=function(x,y)
 {
 panel.grid(lty=5)
 panel.xyplot(x,y,pch=16,jitter.x=TRUE,col=1)
 for(i in 1:3)
 {
 panel.segments(x0=c(0.7, 1.7),
 x1=c(1.3, 2.3),
 y0=with(xdat, c(tapply(mortality[type==i], attend[type==i], median))),
 y1=with(xdat, c(tapply(mortality[type==i], attend[type==i],
 median))),col=red,lwd=4)
 } 
 },
 data = xdat, aspect=2:1,layout=c(3,1))
 
 ## Now create a layer object that will add the further segments.
 addlayer - layer(panel.segments(x0=1,y0=20,x1=1.2, y1=20),
 panel.segments(x0=2,y0=30,x1=2.2, y1=30))
 
 gph+addlayer
 
 The code that produces the object gph would also be simpler
 and easier to follow if some relevant part was separated out into 
 a separate layer.
 
 See also my notices on layering of lattice objects at:
 http://www.maths.anu.edu.au/%7Ejohnm/r-book/add-graphics.html
 
 John Maindonald email: john.maindon...@anu.edu.au
 phone : +61 2 (6125)3473fax  : +61 2(6125)5549
 Centre for Mathematics  Its Applications, Room 1194,
 John Dedman Mathematical Sciences Building (Building 27)
 Australian National University, Canberra ACT 0200.
 http://www.maths.anu.edu.au/~johnm
 
 On 08/04/2012, at 8:00 PM, r-help-requ...@r-project.org wrote:
 
 From: David Winsemius dwinsem...@comcast.net
 Subject: Re: [R] Drawing a line in xyplot
 Date: 8 April 2012 2:17:22 PM AEST
 To: wcheckle wchec...@jhsph.edu
 Cc: r-help@r-project.org
 
 
 
 On Apr 7, 2012, at 10:29 PM, wcheckle wrote:
 
 Thank you David, the bwplot option does what I need:
 
 x11(height=8,width=11)
 bwplot ( mortality~ attend|type,
 pch=95,cex=5,col=2,
 par.settings=list(
 box.rectangle = list(col = transparent),
 box.umbrella = list(col = transparent),
 plot.symbol = list(col = transparent)
 ),
 panel=function(x,y,...){
 panel.grid(lty=5)
 panel.xyplot(x,y,pch=16,jitter.x=TRUE,col=1)
 panel.bwplot(x,y,...)
 },
 data = x,aspect=2:1,layout=c(3,1))
 
 
 However, I am interested in also learning how to do it in xyplot as well.  I
 wasn’t able to follow the last two set of instructions (condition on
 packet.number and loop over segments), wondering if I can ask for your help
 for the correct code (my attempt resulted in all three mean lines within
 each panel):
 
 x11(height=8,width=11)
 xyplot ( mortality ~ attend|type,
 panel=function(x,y)
 {
 panel.grid(lty=5)
 panel.xyplot(x,y,pch=16,jitter.x=TRUE,col=1)
 for(i in 1:3)
 {
 panel.segments(x0=c(0.7, 1.7),
 x1=c(1.3, 2.3),
 y0=c(tapply(x$mortality[x$type==i], x$attend[x$type==i], median)),
 y1=c(tapply(x$mortality[x$type==i], x$attend[x$type==i],
 median)),col=red,lwd=4)
 }   
 },
 data = x,aspect=2:1,layout=c(3,1))
 
 thank you again. I also found your info on data.frame useful.
 
 I haven't figured out how to do it with the interaction of 'attend' and 
 ''type' from inside xyplot. I'm thinking I might be able to succeed using 
 trellis.focus() to address separate columns within

Re: [R] Drawing a line in xyplot

2012-04-08 Thread John Maindonald
This can be simplified by using the layering abilities that Felix Andrews 
made available in latticeExtra.  These are too little known.  These pretty
much make it unnecessary to resort to trellis.focus(), at least in such
cases as this.  These layering abilities are too little known:

library(latticeExtra)
x11(height=8,width=11)
xdat = data.frame(mortality =c(5, 
8,7,5,8,10,11,6,4,5,20,25,27,30,35,32,28,21,20,34,11,15,18,12,15,12,10,15,19,20),
 type=
c(1, 1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,3,3,3,3,3,3,3,3,3,3), attend = c(1, 
0,1,1,1,1,0,0,0,1,0,1,0,0,0,0,1,0,0,0,1,1,1,0,1,0,1,0,0,0))
gph - xyplot ( mortality ~ attend|type,
panel=function(x,y)
{
panel.grid(lty=5)
panel.xyplot(x,y,pch=16,jitter.x=TRUE,col=1)
for(i in 1:3)
{
panel.segments(x0=c(0.7, 1.7),
x1=c(1.3, 2.3),
y0=with(xdat, c(tapply(mortality[type==i], attend[type==i], median))),
y1=with(xdat, c(tapply(mortality[type==i], attend[type==i],
median))),col=red,lwd=4)
}   
},
data = xdat, aspect=2:1,layout=c(3,1))

## Now create a layer object that will add the further segments.
addlayer - layer(panel.segments(x0=1,y0=20,x1=1.2, y1=20),
panel.segments(x0=2,y0=30,x1=2.2, y1=30))

gph+addlayer

The code that produces the object gph would also be simpler
and easier to follow if some relevant part was separated out into 
a separate layer.

See also my notices on layering of lattice objects at:
http://www.maths.anu.edu.au/%7Ejohnm/r-book/add-graphics.html

John Maindonald email: john.maindon...@anu.edu.au
phone : +61 2 (6125)3473fax  : +61 2(6125)5549
Centre for Mathematics  Its Applications, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.
http://www.maths.anu.edu.au/~johnm

On 08/04/2012, at 8:00 PM, r-help-requ...@r-project.org wrote:

 From: David Winsemius dwinsem...@comcast.net
 Subject: Re: [R] Drawing a line in xyplot
 Date: 8 April 2012 2:17:22 PM AEST
 To: wcheckle wchec...@jhsph.edu
 Cc: r-help@r-project.org
 
 
 
 On Apr 7, 2012, at 10:29 PM, wcheckle wrote:
 
 Thank you David, the bwplot option does what I need:
 
 x11(height=8,width=11)
 bwplot ( mortality~ attend|type,
 pch=95,cex=5,col=2,
 par.settings=list(
 box.rectangle = list(col = transparent),
 box.umbrella = list(col = transparent),
 plot.symbol = list(col = transparent)
 ),
 panel=function(x,y,...){
 panel.grid(lty=5)
 panel.xyplot(x,y,pch=16,jitter.x=TRUE,col=1)
 panel.bwplot(x,y,...)
 },
 data = x,aspect=2:1,layout=c(3,1))
 
 
 However, I am interested in also learning how to do it in xyplot as well.  I
 wasn’t able to follow the last two set of instructions (condition on
 packet.number and loop over segments), wondering if I can ask for your help
 for the correct code (my attempt resulted in all three mean lines within
 each panel):
 
 x11(height=8,width=11)
 xyplot ( mortality ~ attend|type,
 panel=function(x,y)
 {
 panel.grid(lty=5)
 panel.xyplot(x,y,pch=16,jitter.x=TRUE,col=1)
 for(i in 1:3)
 {
 panel.segments(x0=c(0.7, 1.7),
 x1=c(1.3, 2.3),
 y0=c(tapply(x$mortality[x$type==i], x$attend[x$type==i], median)),
 y1=c(tapply(x$mortality[x$type==i], x$attend[x$type==i],
 median)),col=red,lwd=4)
 }
 },
 data = x,aspect=2:1,layout=c(3,1))
 
 thank you again. I also found your info on data.frame useful.
 
 I haven't figured out how to do it with the interaction of 'attend' and 
 ''type' from inside xyplot. I'm thinking I might be able to succeed using 
 trellis.focus() to address separate columns within a particular panel.
 
 This will draw  segments at (1,20) and (2,30) without resorting to low level 
 grid/viewport stuff.
 
 trellis.unfocus(); trellis.focus(panel, 1, 1)
 do.call(panel.segments, list(x0=1,y0=20,x1=1.2, y1=20))
 trellis.unfocus()
 trellis.unfocus(); trellis.focus(panel, 1, 1)
 do.call(panel.segments, list(x0=2,y0=30,x1=2.2, y1=30))
 trellis.unfocus()
 
 -- 
 David
 
 
 David Winsemius, MD
 West Hartford, CT


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] RC33 8th Int Conf on Social Science Methodology -- The R System ...

2011-11-02 Thread John Maindonald
I wish to draw attention to an R-related session that is planned for the 
RC33 Eighth International Conference on Social Science Methodology, 
to be held over July 9 - July 13 2012, at the University of Sydney.


The focus of the conference is on innovations and current best practice in all 
aspects of social science research methodology. It provides an opportunity to 
reflect on contemporary methods, as applied in a range of settings and 
disciplinary contexts, to hear about emerging methods, tools, techniques and 
technologies, and to discover what resources are available to social science 
researchers and users of research.


The title for the planned session is:
The R System as a Platform for Analysis and Development of Analysis 
Methodology

http://conference.acspri.org.au/index.php/rc33/2012/schedConf/trackPolicies

John Maindonald email: john.maindon...@anu.edu.au
phone : +61 2 (6125)3473fax  : +61 2(6125)5549
Centre for Mathematics  Its Applications, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.
http://www.maths.anu.edu.au/~johnm

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R-help Digest, Vol 97, Issue 11

2011-03-09 Thread John Maindonald
You might look at the function confusion() in the DAAGxtras package.

John Maindonald email: john.maindon...@anu.edu.au
phone : +61 2 (6125)3473fax  : +61 2(6125)5549
Centre for Mathematics  Its Applications, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.
http://www.maths.anu.edu.au/~johnm

On 09/03/2011, at 10:00 PM, r-help-requ...@r-project.org wrote:

 From: Shi, Tao shida...@yahoo.com
 Date: 9 March 2011 7:22:49 AM AEDT
 To: r-help@r-project.org
 Subject: [R] confusion matrix
 
 
 Hi list,
 
 Is there already a function somewhere to output the confusion matrix from two 
 input vectors?  table always automatically delete rows or columns with all 
 0's.  For example, I would like the columns for 10 and 30 added back.
 
 Thanks!
 
 ...Tao
 
 
 
  20 40 50 60 70 80 90 100
  10   0  0  1  0  0  0  1   0
  20   1  2  0  4  0  0  0   1
  30   0  0  0  0  0  0  1   1
  40   0  0  1  1  2  0  0   0
  50   0  1  0  0  0  1  0   0
  60   0  0  0  0  0  0  1   1
  70   0  0  1  1  1  1  1   1
  80   0  1  0  0  0  0  0   1
  90   0  0  0  0  0  0  0   3
  100  0  0  0  0  1  0  0   3
 


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] cv.lm syntax error

2011-03-09 Thread John Maindonald
It means that dat$v1, dat$v2, . . . are not columns in the data frame df

Specify the formula as v1 ~ v2+v2+v3+v4+v5+factor
Then (assuming that factor is a column object of the right length)
you should be fine.

John Maindonald email: john.maindon...@anu.edu.au
phone : +61 2 (6125)3473fax  : +61 2(6125)5549
Centre for Mathematics  Its Applications, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.
http://www.maths.anu.edu.au/~johnm

 From: agent dunham crossp...@hotmail.com
 Date: 9 March 2011 3:47:10 AM AEDT
 To: r-help@r-project.org
 Subject: Re: [R] cv.lm syntax error
 
 
 Thanks for your answer, but I type the following: 
 
 dfmod.1 - data.frame(dat$v1,dat$v2,dat$v3,dat$v4, dat$v5,factor)
 
 CVlm(df= dfmod.1, form.lm = formula(dat$v1 ~ dat$v2+dat$v3+ dat$v4+
 dat$v5+ factor), m=3, seed=29, plotit=TRUE, printit =TRUE) 
 
 Error en `[.data.frame`(df, , ynam) : undefined columns selected
 
 What does it mean?
 
 u...@host.com
 
 --
 View this message in context: 
 http://r.789695.n4.nabble.com/cv-lm-syntax-error-tp3334889p3341767.html
 Sent from the R help mailing list archive at Nabble.com.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] \Sweaveopts error

2010-11-27 Thread John Maindonald
Actually, I spoke too soon.  The files process without obvious error, 
but keep.source=TRUE is ignored.  I have posted small files 
test1.Rnw and test2.Rnw that can be used to demonstrate the 
problems at:
  http://wwwmaths.anu.edu.au/~johnm/r/issues/

Sweave(test1)  ## includes SweaveOpts settings

Sweave(test2, keep.source=TRUE)   
  ## SweaveOpts settings have been removed.

Comments do not appear in the LaTeX file that results, the code 
is reformatted and the graph from test1 goes into the working 
directory.

Notice also the NA that mysteriously appears in the second
line of code in output .tex file test2.tex

John.

John Maindonald email: john.maindon...@anu.edu.au
phone : +61 2 (6125)3473fax  : +61 2(6125)5549
Centre for Mathematics  Its Applications, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.
http://www.maths.anu.edu.au/~johnm

On 26/11/2010, at 4:26 PM, John Maindonald wrote:

 Yes, that has fixed the problem. (2010-11-24 r53659)
 
 Thanks.
 
 John Maindonald email: john.maindon...@anu.edu.au
 phone : +61 2 (6125)3473fax  : +61 2(6125)5549
 Centre for Mathematics  Its Applications, Room 1194,
 John Dedman Mathematical Sciences Building (Building 27)
 Australian National University, Canberra ACT 0200.
 http://www.maths.anu.edu.au/~johnm
 
 On 25/11/2010, at 10:56 PM, Duncan Murdoch wrote:
 
 On 25/11/2010 6:34 AM, John Maindonald wrote:
 I have a file 4lmetc.Rnw, intended for inclusion in a LaTeX document,
 that starts:
 
 I think this may have been fixed in the patched version.  Could you give it 
 a try to confirm?  If not, please send me a simplified version of the file, 
 and I'll see what's going wrong.
 
 Duncan Murdoch
 
 
 
 \SweaveOpts{engine=R, keep.source=TRUE}
 \SweaveOpts{eps=FALSE, prefix.string=snArt/4lmetc}
 
 The attempt to process the file through Sweave generates the error:
 
 Sweave(4lmetc)
 Writing to file 4lmetc.tex
 Processing code chunks ...
 1 : keep.source term verbatim
 Error in file(srcfile$filename, open = rt, encoding = encoding) :
 cannot open the connection
 In addition: Warning message:
 In file(srcfile$filename, open = rt, encoding = encoding) :
 cannot open file '4lmetc': No such file or directory
 
 The same file processes through Stangle() without problems.
 If I comment out the \Sweaveopts lines, there is no problem,
 except that I do not get the options that I want.
 
 This processed fine in R-2.11.1
 
 sessionInfo()
 R version 2.12.0 (2010-10-15)
 Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
 
 locale:
 [1] C
 
 attached base packages:
 [1] stats graphics  grDevices utils datasets  grid  methods
 [8] base
 
 other attached packages:
 [1] lattice_0.19-13 DAAG_1.02   randomForest_4.5-36
 [4] rpart_3.1-46MASS_7.3-8  reshape_0.8.3
 [7] plyr_1.2.1  proto_0.3-8
 
 loaded via a namespace (and not attached):
 [1] ggplot2_0.8.8   latticeExtra_0.6-14
 
 Is there a workaround?
 
 John Maindonald email: john.maindon...@anu.edu.au
 phone : +61 2 (6125)3473fax  : +61 2(6125)5549
 Centre for Mathematics  Its Applications, Room 1194,
 John Dedman Mathematical Sciences Building (Building 27)
 Australian National University, Canberra ACT 0200.
 http://www.maths.anu.edu.au/~johnm
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] \Sweaveopts error

2010-11-27 Thread John Maindonald
Actually, I had until I created this file had everything on one line,
or had separate \SweaveOpts commands.

Putting everything on one line ensures that the graphics file
goes to the subdirectory snArt, but the file test1.tex is unchanged.

Note however the difference between the files test1.tex and
test.tex, irrespective of the 1 line or 2 issue.  The first of the
\SweaveOpts lines had an effect, unless I am missing something.

John.

John Maindonald email: john.maindon...@anu.edu.au
phone : +61 2 (6125)3473fax  : +61 2(6125)5549
Centre for Mathematics  Its Applications, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.
http://www.maths.anu.edu.au/~johnm

On 28/11/2010, at 12:40 PM, Duncan Murdoch wrote:

 On 27/11/2010 7:57 PM, John Maindonald wrote:
 Actually, I spoke too soon.  The files process without obvious error,
 but keep.source=TRUE is ignored.  I have posted small files
 test1.Rnw and test2.Rnw that can be used to demonstrate the
 problems at:
   http://wwwmaths.anu.edu.au/~johnm/r/issues/
 
 In the test1.Rnw file, your \SweaveOpts statement spans two lines. Sweave() 
 doesn't recognize it, so it's completely ignored.
 
 I don't think this is new behaviour; it's a consequence of the way Sweave 
 looks for \SweaveOpts via regular expression.
 
 Not quite sure what's going on in the second one yet...
 
 Duncan Murdoch
 
 
 Sweave(test1)  ## includes SweaveOpts settings
 
 Sweave(test2, keep.source=TRUE)
   ## SweaveOpts settings have been removed.
 
 Comments do not appear in the LaTeX file that results, the code
 is reformatted and the graph from test1 goes into the working
 directory.
 
 Notice also the NA that mysteriously appears in the second
 line of code in output .tex file test2.tex
 
 John.
 
 John Maindonald email: john.maindon...@anu.edu.au
 phone : +61 2 (6125)3473fax  : +61 2(6125)5549
 Centre for Mathematics  Its Applications, Room 1194,
 John Dedman Mathematical Sciences Building (Building 27)
 Australian National University, Canberra ACT 0200.
 http://www.maths.anu.edu.au/~johnm
 
 On 26/11/2010, at 4:26 PM, John Maindonald wrote:
 
 Yes, that has fixed the problem. (2010-11-24 r53659)
 
 Thanks.
 
 John Maindonald email: john.maindon...@anu.edu.au
 phone : +61 2 (6125)3473fax  : +61 2(6125)5549
 Centre for Mathematics  Its Applications, Room 1194,
 John Dedman Mathematical Sciences Building (Building 27)
 Australian National University, Canberra ACT 0200.
 http://www.maths.anu.edu.au/~johnm
 
 On 25/11/2010, at 10:56 PM, Duncan Murdoch wrote:
 
 On 25/11/2010 6:34 AM, John Maindonald wrote:
 I have a file 4lmetc.Rnw, intended for inclusion in a LaTeX document,
 that starts:
 
 I think this may have been fixed in the patched version.  Could you give 
 it a try to confirm?  If not, please send me a simplified version of the 
 file, and I'll see what's going wrong.
 
 Duncan Murdoch
 
 
 
 \SweaveOpts{engine=R, keep.source=TRUE}
 \SweaveOpts{eps=FALSE, prefix.string=snArt/4lmetc}
 
 The attempt to process the file through Sweave generates the error:
 
 Sweave(4lmetc)
 Writing to file 4lmetc.tex
 Processing code chunks ...
 1 : keep.source term verbatim
 Error in file(srcfile$filename, open = rt, encoding = encoding) :
 cannot open the connection
 In addition: Warning message:
 In file(srcfile$filename, open = rt, encoding = encoding) :
 cannot open file '4lmetc': No such file or directory
 
 The same file processes through Stangle() without problems.
 If I comment out the \Sweaveopts lines, there is no problem,
 except that I do not get the options that I want.
 
 This processed fine in R-2.11.1
 
 sessionInfo()
 R version 2.12.0 (2010-10-15)
 Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
 
 locale:
 [1] C
 
 attached base packages:
 [1] stats graphics  grDevices utils datasets  grid  methods
 [8] base
 
 other attached packages:
 [1] lattice_0.19-13 DAAG_1.02   randomForest_4.5-36
 [4] rpart_3.1-46MASS_7.3-8  reshape_0.8.3
 [7] plyr_1.2.1  proto_0.3-8
 
 loaded via a namespace (and not attached):
 [1] ggplot2_0.8.8   latticeExtra_0.6-14
 
 Is there a workaround?
 
 John Maindonald email: john.maindon...@anu.edu.au
 phone : +61 2 (6125)3473fax  : +61 2(6125)5549
 Centre for Mathematics   Its Applications, Room 1194,
 John Dedman Mathematical Sciences Building (Building 27)
 Australian National University, Canberra ACT 0200.
 http://www.maths.anu.edu.au/~johnm
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 
 
 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help

[R] \Sweaveopts error

2010-11-25 Thread John Maindonald
I have a file 4lmetc.Rnw, intended for inclusion in a LaTeX document,
that starts:

\SweaveOpts{engine=R, keep.source=TRUE}
\SweaveOpts{eps=FALSE, prefix.string=snArt/4lmetc}

The attempt to process the file through Sweave generates the error:

 Sweave(4lmetc)
Writing to file 4lmetc.tex
Processing code chunks ...
 1 : keep.source term verbatim
Error in file(srcfile$filename, open = rt, encoding = encoding) : 
  cannot open the connection
In addition: Warning message:
In file(srcfile$filename, open = rt, encoding = encoding) :
  cannot open file '4lmetc': No such file or directory

The same file processes through Stangle() without problems.
If I comment out the \Sweaveopts lines, there is no problem,
except that I do not get the options that I want.

This processed fine in R-2.11.1

 sessionInfo()
R version 2.12.0 (2010-10-15)
Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)

locale:
[1] C

attached base packages:
[1] stats graphics  grDevices utils datasets  grid  methods  
[8] base 

other attached packages:
[1] lattice_0.19-13 DAAG_1.02   randomForest_4.5-36
[4] rpart_3.1-46MASS_7.3-8  reshape_0.8.3  
[7] plyr_1.2.1  proto_0.3-8

loaded via a namespace (and not attached):
[1] ggplot2_0.8.8   latticeExtra_0.6-14

Is there a workaround?

John Maindonald email: john.maindon...@anu.edu.au
phone : +61 2 (6125)3473fax  : +61 2(6125)5549
Centre for Mathematics  Its Applications, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.
http://www.maths.anu.edu.au/~johnm

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] \Sweaveopts error

2010-11-25 Thread John Maindonald
Yes, that has fixed the problem. (2010-11-24 r53659)

Thanks.

John Maindonald email: john.maindon...@anu.edu.au
phone : +61 2 (6125)3473fax  : +61 2(6125)5549
Centre for Mathematics  Its Applications, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.
http://www.maths.anu.edu.au/~johnm

On 25/11/2010, at 10:56 PM, Duncan Murdoch wrote:

 On 25/11/2010 6:34 AM, John Maindonald wrote:
 I have a file 4lmetc.Rnw, intended for inclusion in a LaTeX document,
 that starts:
 
 I think this may have been fixed in the patched version.  Could you give it a 
 try to confirm?  If not, please send me a simplified version of the file, and 
 I'll see what's going wrong.
 
 Duncan Murdoch
 
 
 
 \SweaveOpts{engine=R, keep.source=TRUE}
 \SweaveOpts{eps=FALSE, prefix.string=snArt/4lmetc}
 
 The attempt to process the file through Sweave generates the error:
 
 Sweave(4lmetc)
 Writing to file 4lmetc.tex
 Processing code chunks ...
 1 : keep.source term verbatim
 Error in file(srcfile$filename, open = rt, encoding = encoding) :
  cannot open the connection
 In addition: Warning message:
 In file(srcfile$filename, open = rt, encoding = encoding) :
  cannot open file '4lmetc': No such file or directory
 
 The same file processes through Stangle() without problems.
 If I comment out the \Sweaveopts lines, there is no problem,
 except that I do not get the options that I want.
 
 This processed fine in R-2.11.1
 
 sessionInfo()
 R version 2.12.0 (2010-10-15)
 Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
 
 locale:
 [1] C
 
 attached base packages:
 [1] stats graphics  grDevices utils datasets  grid  methods
 [8] base
 
 other attached packages:
 [1] lattice_0.19-13 DAAG_1.02   randomForest_4.5-36
 [4] rpart_3.1-46MASS_7.3-8  reshape_0.8.3
 [7] plyr_1.2.1  proto_0.3-8
 
 loaded via a namespace (and not attached):
 [1] ggplot2_0.8.8   latticeExtra_0.6-14
 
 Is there a workaround?
 
 John Maindonald email: john.maindon...@anu.edu.au
 phone : +61 2 (6125)3473fax  : +61 2(6125)5549
 Centre for Mathematics  Its Applications, Room 1194,
 John Dedman Mathematical Sciences Building (Building 27)
 Australian National University, Canberra ACT 0200.
 http://www.maths.anu.edu.au/~johnm
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] L-shaped boxes with lattice graphs?

2010-11-14 Thread John Maindonald
Can anyone suggest an equivalent, for lattice graphs,
of the base graphics argument bty=l? 

NB that I am leaving off the box around the strip,
with a strip function:
stripfun - function(which.given,which.panel,
 factor.levels=as.expression(levlist), ...){
  panel.text(x=0, y=0.5,
 lab = as.expression(levlist[which.panel[which.given]]), adj=0)
}

e.g.
levlist - list(A, B)
xyplot(11:14 ~ 1:4 | rep(1:2,2), scales=list(x=list(alternating=c(1,1), 
relation=sliced)), strip=stripfun, layout=c(1,2)) 


John Maindonald email: john.maindon...@anu.edu.au
phone : +61 2 (6125)3473fax  : +61 2(6125)5549
Centre for Mathematics  Its Applications, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.
http://www.maths.anu.edu.au/~johnm

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Why software fails in scientific research

2010-02-28 Thread John Maindonald
I came across this notice of an upcoming webinar.   The issues identified in 
the 
first paragraph below seem to me exactly those that the R project is designed
to address.  The claim that most research software is barely fit for purpose 
compared to equivalent systems in the commercial world seems to me not
quite accurate!  Comments!


WEBINAR SERIES 
A Crack in the Code: Why software fails in scientific research, and how to fix 
it. 
Thursday, March 25, 2010, 3:00 PM GMT 
http://physicsworld.com/cws/go/webinar9


In the 60 years since the invention of the digital computer, millions of lines 
of code have been developed to support scientific research. Although an 
increasingly important part of almost all research projects, most research 
software is barely fit for purpose compared to equivalent systems in the 
commercial world. The code is hard to understand or maintain, lacking 
documentation and version control, and is continually ‘re-invented’ as the code 
writers move on to new jobs. This represents a tremendous waste of the already 
inadequate resources that are put into its development. We will investigate how 
this situation has come about, why it is important to the future of research, 
and what can be done about it. 

Robert McGreevy will draw on his extensive experience at the STFC ISIS 
Facility, and explain how these issues are being addressed for the benefit of 
research science globally. Nicholas Draper, consultant at Tessella, will then 
expand on this, using the example of the Mantid project at ISIS. 


Tessella (www.tessella.com) is a technology and consultancy firm, based in 
Oxford.

ISIS (International Species Information System) (www.isis.org) has as its 
mission the facilitation of international collaboration in the collection and 
sharing of knowledge on animals and their environments for zoos, aquariums and 
related organizationsvalues the use of objective data to benefit conservation, 
science, animal welfare, education, and collection management.

John Maindonald email: john.maindon...@anu.edu.au
phone : +61 2 (6125)3473fax  : +61 2(6125)5549
Centre for Mathematics  Its Applications, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.
http://www.maths.anu.edu.au/~johnm

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] parsing pdf files

2010-01-10 Thread John Maindonald
Oblivious to the problems that Barry notes, I have used pdftotext,
from Xpdf at http://www.foolabs.com/xpdf/download.html
without apparent problem; this under MacOS X.  For my purposes,
I need to retain the CTRL/Fs that indicate page throws.  Other
converters that I have investigate seem not to retain such
information.  I use it with command line options thus:

pdftotext -layout -eol unix rnotes.pdf rnotes.txt

Given a pdf for a manuscript,  I have an R function that can then
create an index of functions that appear with their opening and 
closing parentheses.  (It may pick up a few strays, which can be
weeded out.)

The only issue I've found has been for use with the listings package.
In the LaTeX source, I set the option 'columns' to 'fullflexible', in order 
to generate a pdf file that does not have the unwanted hidden spaces,
which are then carried across to the text file.  

{\lstset{language=R, xleftmargin=2pt,
 basicstyle=\ttfamily,
 columns=fullflexible,   % Omit for final manuscript
 showstringspaces=false}}
{}

The setting 'columns=fullflexible'  messes up the formatting
in places where I use tabbing, so that I need to change
'columns' back to the default for the final pdf.

Here is the function that does most of the work. Like the function
that follows, it could no doubt be greatly tidied up:

locatefun -
function(txlines){
idtxt - \\.?[a-zA-Z][a-zA-Z0-9]*(\\.[a-zA-Z]+)*\\(
z - regexpr(\014, txlines)
z[z0] - 1
z[z=0] - 0
page - cumsum(z)+1
k - 0
findfun - function(tx){
mn - t(sapply(tx, function(x)
   {m - regexpr(idtxt, x); 
c(m, attr(m, match.length))}))
mn[,2] - mn[,1]+mn[,2]
rownames(mn) - paste(1:dim(mn)[1])
mn
}
for(i in 1:100){
mn - findfun(txlines)
if(all(mn[,1]==-1))break
here - mn[,1]0
page - page[here]
txlines - txlines[here]
mn - mn[here, , drop=FALSE]
m1 - regexpr(\\(, txlines)-1
tx1 - substring(txlines,mn[,1],m1)
if(i==1)xy - data.frame(nam=I(tx1), page=page) else
xy - rbind(xy, data.frame(nam=I(tx1), page=page))
txlines - substring(txlines,mn[,2])
here2 - nchar(txlines)0
txlines - txlines[here2]
page - page[here2]
if(length(txlines)==0)break
}
zz - !xy[,1]%in% c(,al,Pr,T, F, n,P, y, A,
transformation, left,f,site.,a,b, II,
ARCH, ARMA, MA)
xy - xy[zz,]
nam - xy$nam
ch - substring(nam,1,1)
nam[ch%in%c(=, ,,)] - substring(nam[ch%in%c(=, ,,)],2)
xy$nam - nam
ord - order(xy[,2])
xy[ord,]
}

Here is the function that calls findfun:

 makeFunIndex -
function(sourceFile=rnotes.txt,
 frompath=~/_notes/rnotes/, fileout=NULL,
 availfun=funpack,
 offset=0){
## pdftotext -layout -eol unix rnotes.pdf rnotes.txt
len - nchar(sourceFile)
lfrom - nchar(frompath)
if(substring(frompath, lfrom, lfrom)==/)frompath -
substring(frompath, 1, lfrom-1)
if(is.null(fileout)){
if (substring(sourceFile,len - 3, len-3) == .) 
fnam - substring(sourceFile, 1, len - 4) else fnam - 
sourceFile
fileout - paste(fnam, .fdx, sep = )
fdxfile - paste(fileout, sep=/)
fndfile - paste(fnam, .fnd, sep = )
}
sourceFile - paste(frompath, sourceFile, sep=/)
print(paste(Send output to, fndfile))
tx - readLines(sourceFile, warn=FALSE)
entrymat - locatefun(tx)
backn - regexpr(\\n,entrymat[,1],fixed=TRUE)
entrymat - entrymat[backn  0,]
entrymat[,2] - entrymat[,2] - offset
entrymat[,1] - gsub(_,\\_,entrymat[,1], fixed=TRUE)
nmatch - match(entrymat[,1], availfun[,2], nomatch=0)
use - nmatch  0
print(Unmatched functions:)
print(unique(entrymat[!use,1]))
entrymat[use,1] - paste(entrymat[use,1],  ({\\em ,
 availfun[nmatch,1], }), sep=)
funentries - paste(\\indexentry  , {, entrymat[,1],}{,
entrymat[,2], },sep=)
write(funentries, fdxfile)
system(paste(makeindex -o, fndfile, fdxfile))
}

John Maindonald email: john.maindon...@anu.edu.au
phone : +61 2 (6125)3473fax  : +61 2(6125)5549
Centre for Mathematics  Its Applications, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.
http://www.maths.anu.edu.au/~johnm

On 10/01/2010, at 10:00 PM, r-help-requ...@r-project.org wrote:

 From: Barry Rowlingson b.rowling...@lancaster.ac.uk
 Date: 10 January 2010 12:47:01 AM AEDT
 To: David Kane d...@kanecap.com
 Cc: r-help@r

Re: [R] R on Linux, and R on Windows , any difference in maturity+stability?

2009-10-06 Thread John Maindonald
I had a large job time ago that ran fine under MacOS X.
I'd expect the same to be true under Linux.  It would run
under Windows XP only if XP had been freshly rebooted.

John Maindonald email: john.maindon...@anu.edu.au
phone : +61 2 (6125)3473fax  : +61 2(6125)5549
Centre for Mathematics  Its Applications, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.


On 06/10/2009, at 9:00 PM, r-help-requ...@r-project.org wrote:

 From: Liviu Andronic landronim...@gmail.com
 Date: 6 October 2009 6:46:33 PM AEDT
 To: Robert Wilkins iwriteco...@gmail.com
 Cc: r-help@r-project.org
 Subject: Re: [R] R on Linux, and R on Windows , any difference in  
 maturity+stability?


 On 10/6/09, Robert Wilkins iwriteco...@gmail.com wrote:
 Will R have more glitches on one operating system as opposed to
 another,

 Probably not.

 or is it pretty much the same?

 Depending on the complexity of the code, it is pretty much the same. I
 recently had a (relatively simple) group project, with the three of us
 on different OSs: Win, Mac and Linux. We did not encounter one
 platform specific issue.
 Liviu


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Use of R in Schools

2009-09-18 Thread John Maindonald

I am looking for information on experimentation with the use
of R in the teaching of statistics and science in schools.  Any
leads would be very welcome.  I am certain that there is such
experimentation.

I've made this inquiry on r-sig-teaching, with no response.
John.

John Maindonald email: john.maindon...@anu.edu.au
phone : +61 2 (6125)3473fax  : +61 2(6125)5549
Centre for Mathematics  Its Applications, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R User Group in Canberra

2009-07-31 Thread John Maindonald
Please add

AUSTRALIA

Canberra: Canberra R Users Group (CANRUG) http://canrug.togaware.com/

Cheers
john.

John Maindonald email: john.maindon...@anu.edu.au
phone : +61 2 (6125)3473fax  : +61 2(6125)5549
Centre for Mathematics  Its Applications, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.


On 31/07/2009, at 8:00 PM, r-help-requ...@r-project.org wrote:

 listings


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Strip labels: use xyplot() to plot columns in parallel with outer=TRUE

2009-05-09 Thread John Maindonald

The following tinkers with the strip labels, where the
different panels are for different levelf of a conditioning
factor.

tau - (0:5)/2.5; m - length(tau); n - 200; SD - 2
x0 - rnorm(n, mean=12.5, sd=SD)
matdf - data.frame(
   x = as.vector(sapply((0:5)/2.5, function(s)x0+rnorm(n, sd=2*s))),
   y - rep(15+2.5*x0, m), taugp = factor(rep(tau, rep(n,m
names(matdf) - c(x,y,taugp)
lab - c(list(0 (No error in x)),
lapply(tau[-1], function(x)substitute(A*s[z], list(A=x
xyplot(y ~ x | taugp, data=matdf,
   strip=strip.custom(strip.names=TRUE,
var.name=Add error with SD, sep=expression( = ),
factor.levels=as.expression(lab)))
Is there any way to get custom labels when the same is done by
plotting, variables in parallel?:

df - unstack(matdf, x ~ taugp)
df$y - 15+2.5*x0
lab2 - c(list(0 (No error in x)),
lapply(tau[-1], function(x)substitute(Add error with SD ==  
A*s[z],

  list(A=x
form - formula(paste(y ~ , paste(paste(X, tau, sep=),
   collapse=+)))
xyplot(form, data=df, outer=TRUE)

I'd hoped that the following would do the trick, but the first label
is repeated in each panel, and the variable names are added:

xyplot(form, data=df, outer=TRUE, strip=strip.custom(strip.names=TRUE,
  var.name=as.expression(lab2)))

John Maindonald email: john.maindon...@anu.edu.au
phone : +61 2 (6125)3473fax  : +61 2(6125)5549
Centre for Mathematics  Its Applications, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Accuracy of R and other platforms

2009-03-23 Thread John Maindonald
These comparisons are very simplistic.  In most contexts, it would  
make much better sense to measure accuracy in standard error units,  
rather than in number of digits.

There doubtless are specialist applications where the 15th digit (or  
even the 10th!) are important.  But the check of accuracy really has  
then to be tuned to that application.  Results in cases where the  
accuracy beyond (optimistically) the sixth digit is of no  
consequence are unlikely to give much clue on performance in such  
specialist applications.

John Maindonald email: john.maindon...@anu.edu.au
phone : +61 2 (6125)3473fax  : +61 2(6125)5549
Centre for Mathematics  Its Applications, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.


On 23/03/2009, at 10:00 PM, r-help-requ...@r-project.org wrote:

 From: Karl Ove Hufthammer karl.huftham...@math.uib.no
 Date: 23 March 2009 2:58:23 AM
 To: r-h...@stat.math.ethz.ch
 Subject: Re: [R] Accuracy of R and other platforms


 Alejandro C. Frery:

 @ARTICLE{AlmironSilvaMM:2009,
 author = {Almiron, M. and Almeida, E. S. and Miranda, M.},
 title = {The Reliability of Statistical Functions in Four Software
 Packages Freely used in Numerical Computation},
 journal = {Brazilian Journal of Probability and Statistics},
 year = {in press},
 volume = {Special Issue on Statistical Image and Signal Processing},
 url = {http://www.imstat.org/bjps/}}

 is freely available under the Future Papers link. It makes a nice
 comparison of the numerical properties of R, Ox, Octave and Python.

 Thanks for posting this. I’m happy to see that the results for R were
 generally excellent, and almost always better than for the three other
 software packages.

 But there were a few cases where R did not turn out to be the winner.
 Rather surprising that Ox was better than R for computing the
 autocorrelation coefficient for two of the datasets, given its  
 terrible
 results for the standard deviation. Anybody have any ideas why?

 -- 
 Karl Ove Hufthammer


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] using a noisy variable in regression (not an R question)

2009-03-08 Thread John Maindonald
One can just include it in the regression, but the potential problems
for interpretation are surely greater than those indicated.  Inclusion  
of
X1 = T1+E1 may cause X2 to appear significant when in fact it is having
no effect at all.  Or the true effect can be reversed in sign.  This  
happens
because X1 and X2 are correlated.  Maybe this is implicit in what Jon
is saying.

See Carroll, Ruppert and Stefanski:
Measurement Error in Nonlinear Models (2004, pp.52-55).  The error in E1
may need to be fairly large relative to SD(T1) for this to be an  
issue.  My notes
at http://www.maths.anu.edu.au/%7Ejohnm/r-book/2edn/xtras/xtras.pdf
have brief comments, and code that can be used to illustrate the point.

I support Stephen Kolassa's suggestions re using simulation for
sensitivity analysis, though I think this can also be done analytically.

John Maindonald email: john.maindon...@anu.edu.au
phone : +61 2 (6125)3473fax  : +61 2(6125)5549
Centre for Mathematics  Its Applications, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.


On 08/03/2009, at 10:00 PM, r-help-requ...@r-project.org wrote:

 From: Jonathan Baron ba...@psych.upenn.edu
 Date: 8 March 2009 5:21:55 AM
 To: Juliet Hannah juliet.han...@gmail.com
 Cc: r-help@r-project.org
 Subject: Re: [R] using a noisy variable in regression (not an R  
 question)


 If you form categories, you add even more error, specifically, the
 variation in the distance between each number and the category
 boundary.

 What's wrong with just including it in the regression?

 Yes, the measure X1 will account for less variance than the underlying
 variable of real interest (T1, each individual's mean, perhaps), but
 X1 could still be useful in two ways.  One, it might be a significant
 predictor of the dependent variable Y despite the error.  Two, it
 might increase the sensitivity of the model to other predictors (X2,
 X3...) by accounting for what would otherwise be error.

 What you cannot conclude in this case (when you measure a predictor
 with error) is that the effect of (say) X2 is not accounted for by its
 correlation with T1.  Some people try to conclude this when X2 remains
 a significant predictor of Y when X1 is included in the model.  The
 trouble is that X1 is an error-prone measure of T1, so the full effect
 of T1 is not removed by inclusion of X1.

 Jon

 On 03/07/09 12:49, Juliet Hannah wrote:
 Hi, This is not an R question, but I've seen opinions given on non R
 topics, so I wanted
 to give it a try. :)

 How would one treat a variable that was measured once, but is known  
 to
 fluctuate a lot?
 For example, I want to include a hormone in my regression as an
 explanatory variable. However, this
 hormone varies in its levels throughout a day. Nevertheless, its  
 levels differ
 substantially between individuals so that there is information  
 there to use.

 One simple thing to try would be to form categories, but I assume
 there are better ways to handle this. Has anyone worked with such  
 data, or could
 anyone suggest some keywords that may be helpful in searching for  
 this
 topic. Thanks
 for your input.

 Regards,

 Juliet

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

 -- 
 Jonathan Baron, Professor of Psychology, University of Pennsylvania
 Home page: http://www.sas.upenn.edu/~baron
 Editor: Judgment and Decision Making (http://journal.sjdm.org)


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] The Origins of R

2009-02-03 Thread John Maindonald
In another thread on this list, various wild allegations have been  
made, relating to the New York Times article on R.  I object both to  
the subject line and to the content of several of the messages, and  
will not repeat or quote any of that content.  It smacks to me of  
mischief making.


Discussion has centered around the following quote from the NY Times  
article:
“According to them, the notion of devising something like R sprang up  
during a hallway conversation. They both wanted technology better  
suited for their statistics students, who needed to analyze data and  
produce graphical models of the information. Most comparable software  
had been designed by computer scientists and proved hard to use.”
The comment that the notion of devising something like R sprang up  
during a hallway conversation is strictly true.  Certainly, this  
seems like a very plausible account.  I'd have more difficulty  
believing that the notion was communicated to them in separate  
dreams.  Part of the wanted technology was freedom for students to  
take the software home, or copy it down from the web.
There was a further story to be told, about the origins of the  
language that Ross and Robert implemented and adapted.  The NY writer  
pretty much left out that part of the story (S did get a mention, but  
its connection with R did not), but did remedy this omission in a  
follow-up.
Nor did the article do much to acknowledge the workers and work that  
has gone into R's continuing development. Getting the attributions  
right is difficult.  Even if right according to common conventions  
(and one can argue as to just what the conventions are, especially in  
the matter of computer language development), they are unlikely to be  
totally fair.  Stigler's Law of Eponomy has wide sway!


In the preface to the first and second edition of Data Analysis and  
Graphics Using R, we have:
The R system implements a dialect of the S language that was  
developed at ATT Bell Laboratories by Rick Becker, John Chambers and  
Allan Wilks.
The only 1st edition attribution to Ihaka and Gentleman was in Chapter  
12: For citing R in a publication, use Ihaka and Gentleman (1996).   
[NB: Type citation() to see the form of citation that should now be  
used.]
That was as it now strikes me unfair to Ross and Robert, but no-one  
complained.  Perhaps no-one ever read that far through the preface!


There's an excellent brief summary of the history of R, and its  
connections with S, in Section 1.4 of John Chambers' Software for  
Data Analysis.Appendix A has further details on the development  
of S, a kind of pre-history of R.


John Maindonald email: john.maindon...@anu.edu.au
phone : +61 2 (6125)3473fax  : +61 2(6125)5549
Centre for Mathematics  Its Applications, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Sweave style file

2008-12-26 Thread John Maindonald
It would, I consider, be useful to note, on the help pages for Sweave  
and RweaveLatex, that the path to Sweave.sty in the R installation can  
be extracted as paste(R.home(),share/texmf/, sep=/) or its  
equivalent.


John Maindonald email: john.maindon...@anu.edu.au
phone : +61 2 (6125)3473fax  : +61 2(6125)5549
Centre for Mathematics  Its Applications, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R] adding a new dataset to the default R distribution

2008-12-04 Thread John Maindonald
Making data, especially data that have been the subject of published  
papers, widely available, can be a useful spinoff from the R project,  
another gift to the scientific community beyond the provision of  
computing and analytic tools.  Nowadays, in a complete publication of  
a scientific result, there is every reason for the data to be part of  
that publication.  The Gentleman and Lang 2004 paper Statistical  
Analyses and Reproducible Research takes this further still, making a  
compelling case for opening the analysis to ready view. 
(http://www.bepress.com/bioconductor/paper2/ 
)  How else can critics know what analysis was done, and whether the  
data do really support the claimed conclusions?

As I see it, the first recourse should be use of archives that  
individual communities may establish.  Instructions on how to input  
the data into R would be a useful small item of ancillary  
information.  Links to such archives (under Data Archives, maybe)  
might be included on CRAN.  The Open Archaeology project would seem a  
good umbrella for the archiving of archaeology data.

Where there is no available repository, or there are reasons for  
putting the data into an R package, one possibility is to advertise on  
this list: Orphan dataset, looking for a good home.  In this case I  
have offered to include the data in the DAAGxtras package, and I am  
open to further such requests.  Perhaps however, there should be a   
miscdata or suchlike package to which such datasets can be  
submitted? All it would require is for someone to offer to act as  
Keeper of the Miscellaneous Data.

John Maindonald email: [EMAIL PROTECTED]
phone : +61 2 (6125)3473fax  : +61 2(6125)5549
Centre for Mathematics  Its Applications, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.


On 04/12/2008, at 10:00 PM, [EMAIL PROTECTED] wrote:

 From: Stefano Costa [EMAIL PROTECTED]
 Date: 3 December 2008 9:29:13 PM
 To: r-help@r-project.org
 Subject: [R] adding a new dataset to the default R distribution


 Hi,
 I am a student in archaeology with some interest in statistics and R.

 Recently I've obtained the permission to distribute in the public  
 domain
 a small dataset (named spearheads with 40 obs. of  14 variables)  
 that
 was used in an introductory statistics book for archaeologists
 (published in 1994).

 I've rewritten most of the exercises of that book in R and made them
 available at http://wiki.iosa.it/diggingnumbers:start along with the
 original dataset, but I was wondering if there is a standard procedure
 for getting a new dataset included in the default R distribution, like
 cars and others.

 Please add me to CC if replying because I'm not subscribed to the  
 list.

 Best regards,
 Stefano

 -- 
 Stefano Costa
 http://www.iosa.it/ Open Archaeology


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Congratulations, Ross!

2008-11-11 Thread John Maindonald
Associate Professor Ihaka, of the Department of Statistics, was  
recognised with the Royal Society’s Pickering medal for his part in  
developing the statistical computing software R. R, created over 15  
years ago by Dr Ihaka and colleague Robert Gentleman (now at the Fred  
Hutchinson Cancer Research Center, Seattle), is free software used by  
academics, industry and government worldwide to analyse numerical data  
and present it in graphical forms. Computer programmers globally have  
contributed to R, with over 1000 industry-specific add-ons created.


from
http://www.auckland.ac.nz/uoa/about/news/articles/2008/11/rsnz_honours.cfm
12 November 2008.

John Maindonald email: [EMAIL PROTECTED]
phone : +61 2 (6125)3473fax  : +61 2(6125)5549
Centre for Mathematics  Its Applications, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R seven years ago

2008-10-09 Thread John Maindonald
So what is the answer to the question: Can success continue?

I suspect that R is now so firmly entrenched that it will
inevitably continue, in one or other incarnation, for a long
time to come.  The negative factors that John Fox lists
will surely, in time, make some changes inevitable.  Will
these come from force of circumstance rather than from
conscious planning?

In an August 12 message I posted details of R citation rates that
I had gleaned, following a lead from Simon Blomberg, from Web
of Science.  This, or some such measure, seems to me important
as giving a handle on the penetration of R into statistical
application areas.

The numbers I obtained [IG = Ihaka  Gentleman 1996; RSTAT is
the citation suggested by citation()] were:

IG: 1998=4,
  1999=15,
   2000=17,
   2001=39,
   2002=119,
   2003=276
RSTAT+IG: 2004:68+455 = 523
2005:433+512 = 945
2006:1049+426 = 1475
2007:1605+410 = 2015
2008, (to ~Aug10):1389+255 = 1644

cit - c(1998 = 4, 1999 = 15, 2000 = 17, 2001 = 39, 2002 =  
119,
2003 = 276, 2004 = 523,2005 = 945,2006 = 1475, 2007 =  
2015,
2008=1644)

These will not be all that accurate; there will be omissions
and duplications.

Growth is close to exponential.

John Maindonald email: [EMAIL PROTECTED]
phone : +61 2 (6125)3473fax  : +61 2(6125)5549
Centre for Mathematics  Its Applications, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.


On 09/10/2008, at 9:00 PM, [EMAIL PROTECTED] wrote:

 From: Peter Dalgaard [EMAIL PROTECTED]
 Date: 9 October 2008 5:42:19 AM
 To: [EMAIL PROTECTED]
 Cc: r-help@R-project.org r-help@r-project.org
 Subject: Re: [R] R seven years ago


 (Ted Harding) wrote:
 On 08-Oct-08 18:00:27, Liviu Andronic wrote:

 Hello everyone,

 As some may know, today Google unveiled its 2001 search index [1]. I
 was curious to see how was R like at that time, and was not
 disappointed. Compared to today's main page [2], seven years ago the
 page looked [3] a bit rudimentary, especially the graphic. (It is  
 wort
 noting that structurally the pages are very similar.) What  
 definitely
 changed is the `Contributed packages' section. Then R featured 29
 contributed packages [4], while now it features 1500+ [5]. It was
 surprising to realize the growth of R during the past seven years.

 Regards,
 Liviu

 [1] http://www.google.com/search2001.html
 [2] http://www.r-project.org/
 [3] http://web.archive.org/web/20010722202756/www.r-project.org/
 [4]
 http://web.archive.org/web/20010525004023/cran.r-project.org/bin/macos/c
 ontrib/src/
 [5] http://cran.at.r-project.org/web/packages/


 Many thanks for this, Liviu! One might also compare the mailing list
 usage:

 [R-help 1997]:   484 messages
 [R-help 2001]:  4309 messages
 [R-help 2007]: 26250
   1721+1909+2196+2145+2210+2309+
   2142+2246+2028+2711+2602+2031

 So we now get more posts in a week than we did in the whole of 1997!


 Those not present at the useR in Dortmund might want to skim John  
 Fox's talk

 http://www.statistik.uni-dortmund.de/useR-2008/slides/Fox.pdf

 (Actually, he did something at the end to avoid ending on a negative
 note. Flipped back to one of the increasing graphs, I suppose.)

 -- 
   O__   Peter Dalgaard Øster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark  Ph:  (+45)  
 35327918
 ~~ - ([EMAIL PROTECTED])  FAX: (+45)  
 35327907


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] R Citation rates

2008-08-12 Thread John Maindonald

Following some discussion with Simon Blomberg, I've done
a few Web of Science citation searches.  A topic search for
R LANG* STAT* seems to turn up most of the references to
R: A Language and Environment for Statistical Computing
R Development Core Team gets transformed into an
astonishing variety of variations.  Searching for citations
of the 1996 Ihaka and Gentleman paper (most references
up to and including 2004) turns up many fewer quirks.

What other forms of reference should be investigated?

Anyway, here are the numbers by year (there may a some
duplication.
1998: IG: 4 15 17 39 119 276
2004: RSTAT+IG: 68+455 433+512 1049+426 1605+410 1389+255
  523 945   
1475   2015  1644
cit - c(1998 = 4, 1999 = 15, 2000 = 17, 2001 = 39, 2002 =  
119,
+  2003 = 276,2004 = 523,2005 = 945,2006 = 1475,  
2007 = 2015,

+  2008=1644)

[~4550 references to R LANG* STAT*; ~2530 to IG)

On a rate per year basis, the 2008 figure scales up to 2691.
This does not however allow for growth over the course of the year.

The number of references grew by 37% from 2006 to 2007.
On current trends, the 2007-2008 increase seems likely to
be much larger than that.

The figures probably underestimate the contribution from
Bioconductor related work.  A direct search for
Bioconductor-related papers did not turn however up
enough papers to make too much difference to the numbers.

Here are some other summary figures, for graphing using
whatever form of presentation appeals most (the second
number is for the IG paper)

country - c(usa=1540+903, germany=539+304, england=507+328,
france=468+337,  canada=345+147, australia=329+169,
switzerland=279+121)

subj - c(ecology=924+349, statsANDprob=488+270,  
geneticsANDheredity=488+279,

  envScience=298+119, CSapplicatiions=269+108, zoology=267+111,
  plantSciences=250+108, biochemANDmolbio=229+200,
  mathANDcompBIO=224+143,  
biotechANDappliedmicrobiology=223+159,

  evolutionaryBIO=210+117)

There's a great deal more summary information that might
be extracted.  What is a good way, with readily available data,
to standardize the country data.

Environmental Science no doubt comes up tops because it
is a coarser grouping than many other areas.

John Maindonald email: [EMAIL PROTECTED]
phone : +61 2 (6125)3473fax  : +61 2(6125)5549
Centre for Mathematics  Its Applications, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Dividing by 0

2008-07-26 Thread John Maindonald

I suspect you should be smoothing the series in a manner that
replaces zeros by some usually small larger number before
you start.  Without more details on what you are trying to do, it
is impossible to know what is sensible.  You are proposing to
leave all smoothing (rolling?) till later; why not do some
smoothing at the start?

John Maindonald email: [EMAIL PROTECTED]
phone : +61 2 (6125)3473fax  : +61 2(6125)5549
Centre for Mathematics  Its Applications, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.


On 26 Jul 2008, at 8:00 PM, [EMAIL PROTECTED] wrote:


From: nmarti [EMAIL PROTECTED]
Date: 26 July 2008 1:42:09 AM
To: r-help@r-project.org
Subject: Re: [R] Dividing by 0


I'm well aware these are not errors, I guess I miss-wrote.
I understand your concern.  Thanks for passionately looking out for  
my well

being, you saved my life.

My variable has about 10,000 elements and sometime for the first 100  
to 500

elements there is lots of 0's, so I end up with lots of NA/NaN/Inf's.
However, when I try to use Rolling calculations I recieve error  
messages
because the Rolling functions reject the NA/NaN/Inf's.  So, I need  
0's in
place of the NA/NaN/Inf's so I can run the Rolling calculations.   
I can't
just delete these observations, because it messes up lots of other  
other

things within these dataframes.

I'm well aware these Rolling calculations will be wrong in the  
beginning
of the dataframe, so I just throw these out.  The rolling window is  
only
about 50 odservations, so out of 10,000, I still end up with ample  
correct

data and calculations.

So is this still idiotic?
Thanks again for your concern.  Now that you understand my situation a
little better, you might be less distracted today and be able to sleep
better tonight.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] predict lmer

2008-05-10 Thread John Maindonald
The following function is designed to work with a logit link.  It can
easily be generalized to work with any link.  The SEs and CIs are
evaluated accounting for all sources of random variation. The plot
may not be much help unless there is just one explanatory variate.

`ciplot` -
  function(obj=glmm2, data=data.site, xcol=2, nam=litter){
cilim - function(obj, xcol){
  b - fixef(obj)
  vcov - summary(obj)@vcov
  X - unique(model.matrix(obj))
  hat - X%*%b
  pval - exp(hat)/(1+exp(hat))   # NB, designed for logit link
  U - chol(as.matrix(summary(obj)@vcov))
  se - sqrt(apply(X%*%t(U), 1, function(x)sum(x^2)))
  list(hat=hat, se=se, x=X[,xcol])
   }
limfo - cilim(obj, xcol)
hat - limfo$hat
se - limfo$se
x - limfo$x
upper - hat+2*se
lower - hat-2*se
ord - order(x)
plot(x, hat, yaxt=n, type=l, xlab=nam, ylab=)
rug(x)
lines(x[ord], lower[ord])
lines(x[ord], upper[ord])
ploc - c(0.01, 0.05, 0.1, 0.2, 0.5, 0.8, 0.9)
axis(2, at=log(ploc/(1-ploc)), labels=paste(ploc), las=2)
}

## Usage
glmm2 - lmer(rcr ~ litter + (1 | Farm), family=binomial,  
data=data.site)
ciplot(obj=glmm2)

John Maindonald email: [EMAIL PROTECTED]
phone : +61 2 (6125)3473fax  : +61 2(6125)5549
Centre for Mathematics  Its Applications, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.


On 8 May 2008, at 8:00 PM, [EMAIL PROTECTED] wrote:
 From: May, Roel [EMAIL PROTECTED]
 Date: 8 May 2008 12:23:15 AM
 To: r-help@r-project.org
 Subject: [R] predict lmer


 Hi,

 I am using lmer to analyze habitat selection in wolverines using the
 following model:

 (me.fit.of -
 lmer(USED~1+STEP+ALT+ALT2+relM+relM:ALT+(1|ID)+(1| 
 ID:TRKPT2),data=vdata,
 control=list(usePQL=TRUE),family=poisson,method=Laplace))

 Here, the habitat selection is calaculated using a so-called discrete
 choice model where each used location has a certain number of
 alternatives which the animal could have chosen. These sets of  
 locations
 are captured using the TRKPT2 random grouping. However, these sets are
 also clustered over the different individuals (ID). USED is my binary
 dependent variable which is 1 for used locations and zero for unused
 locations. The other are my predictors.

 I would like to predict the model fit at different values of the
 predictors, but does anyone know whether it is possible to do this? I
 have looked around at the R-sites and in help but it seems that there
 doesn't exist a predict function for lmer???

 I hope someone can help me with this; point me to the right  
 functions or
 tell me to just forget it

 Thanks in advance!

 Cheers Roel

 Roel May
 Norwegian Institute for Nature Research
 Tungasletta 2, NO-7089 Trondheim, Norway


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] REvolution

2008-01-26 Thread John Maindonald
Does anyone know any more than is in the following press release
about REvolution Computing and their commercialization of R?

http://www.intel.com/capital/news/releases/080122.htm

Intel Capital, the global investment arm of Intel Corporation, today  
announced that it has invested in the Series A financing of REvolution  
Computing, creator of parallel computing software for computational  
statistics

R has become the de-facto statistical language and REvolution  
Computing's R-based solutions provide the necessary scale and support  
for its growing commercial usage,” said Andre M. Boisvert, Revolution  
Computing board member. “With this investment, Revolution Computing  
can deliver the type of performance that has been missing in existing  
computational statistics offerings.

For more information on Revolution Computing, RPro, and ParallelR,  
visit
www.revolution-computing.com

At present though, unless you have better success than me, all you  
will get from the Revolution Computing site is a revolutionary song!

John Maindonald email: [EMAIL PROTECTED]
phone : +61 2 (6125)3473fax  : +61 2(6125)5549
Centre for Mathematics  Its Applications, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Symbolic substitution in parallel; use infinity symbol?

2008-01-01 Thread John Maindonald
Actually this works beautifully as it stands:

breaks - c(-Inf, -1, 1, Inf)

zz - lapply(breaks, function(x)  if(x==-Inf) quote(-infinity) else
 if (x==Inf) quote(infinity) else  format(x))

lbl - mapply(function(x,y)
bquote(( * .(x) * , * .(y) * ]),
 zz[-length(zz)], zz[-1], SIMPLIFY=FALSE)

y - rnorm(40)
x0 - cut(y, breaks=breaks)
plot(x0, y, xaxt=n)

axis(1, at=1:3, labels=as.expression(lbl))

Thanks, Peter.

John Maindonald email: [EMAIL PROTECTED]
phone : +61 2 (6125)3473fax  : +61 2(6125)5549
Centre for Mathematics  Its Applications, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.


On 31 Dec 2007, at 12:35 PM, Peter Dalgaard wrote:

 John Maindonald wrote:
 I'd like to be able to modify axlab in (C) below so that 'Inf'
 is replaced by the infinity symbol.

 y - rnorm(40)
 breaks - c(-Inf, -1, 1, Inf)
 x - cut(y, breaks=breaks)
 plot(unclass(x), y, xaxt=n, xlab=)

 ## A: The following gives the axis labels (-Inf, 1], etc.
 axis(1, at=1:3, labels=expression((-Inf,-1], (-1,1], (1, Inf]))

 ## B: The following replaces Inf by the infinity symbol
 axis(1, at=1:3, labels=expression(( * -infinity * ,  * -1 * ],
 ( * -1 * , 1], (1,  *  
 infinity  * ]),
line=1.25, lty=0)

 ## C: The following gives the axis labels (-Inf, 1], etc.,
 ## in a more automated manner.
 axlab - lapply(levels(x), function(x)substitute(a, list(a=x)))
 # Can alternatively use bquote()
 axis(1, at=1:3, labels=as.expression(axlab), line=2.25, lty=0)

 However I have been unable to modify axlab so that the infinity
 symbol appears in place of 'Inf'.  Is there is some relatively
 straightforward way to do this?  The issue is of course more
 general than this specific example.

 Here's an idea, leaving some tinkering for you:

 breaks - c(-Inf, -1, 1, Inf)

 zz - lapply(breaks, function(x)  if(x==-Inf) quote(-infinity)  
 else
 if (x==Inf) quote(infinity) else  format(x))

 lbl - mapply(function(x,y)
bquote(( * .(x) * , * .(y) * ]),
 zz[-4], zz[-1], SIMPLIFY=FALSE)


 -- 
  O__   Peter Dalgaard Øster Farimagsgade 5, Entr.B
 c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark  Ph:  (+45)  
 35327918
 ~~ - ([EMAIL PROTECTED])  FAX: (+45)  
 35327907



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Symbolic substitution in parallel; use infinity symbol?

2007-12-30 Thread John Maindonald

I'd like to be able to modify axlab in (C) below so that 'Inf'
is replaced by the infinity symbol.

y - rnorm(40)
breaks - c(-Inf, -1, 1, Inf)
x - cut(y, breaks=breaks)
plot(unclass(x), y, xaxt=n, xlab=)

## A: The following gives the axis labels (-Inf, 1], etc.
axis(1, at=1:3, labels=expression((-Inf,-1], (-1,1], (1, Inf]))

## B: The following replaces Inf by the infinity symbol
axis(1, at=1:3, labels=expression(( * -infinity * ,  * -1 * ],
 ( * -1 * , 1], (1,  * infinity  
* ]),
line=1.25, lty=0)

## C: The following gives the axis labels (-Inf, 1], etc.,
## in a more automated manner.
axlab - lapply(levels(x), function(x)substitute(a, list(a=x)))
# Can alternatively use bquote()
axis(1, at=1:3, labels=as.expression(axlab), line=2.25, lty=0)

However I have been unable to modify axlab so that the infinity
symbol appears in place of 'Inf'.  Is there is some relatively
straightforward way to do this?  The issue is of course more
general than this specific example.

John Maindonald email: [EMAIL PROTECTED]
phone : +61 2 (6125)3473fax  : +61 2(6125)5549
Centre for Mathematics  Its Applications, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Reminiscing on 20 years using S

2007-12-27 Thread John Maindonald
My first exposure to S was on an ATT 3B2 (a 3B2/100,
I think), at the Auckland (Mt Albert) Applied Mathematics
Division Station of the NZ Dept of Scientific and Industrial
Research.  The AMD Head Office in Wellington had one
also.  There may have been one or more others; I cannot
remember. This would have been in 1983, maybe.

It was a superbly engineered machine, but the sofware
(System V, version 3.2) had its problems.  If you back
deleted too far along the command line, something
unpleasant (losing the line? or worse?) happened.
On typing 1+1 at the S command line, it took a second
to get an answer.

John Maindonald email: [EMAIL PROTECTED]
phone : +61 2 (6125)3473fax  : +61 2(6125)5549
Centre for Mathematics  Its Applications, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.


On 27 Dec 2007, at 10:00 PM, [EMAIL PROTECTED] wrote:

 From: roger koenker [EMAIL PROTECTED]
 Date: 27 December 2007 9:56:45 AM
 To: Greg Snow [EMAIL PROTECTED]
 Cc: R-help list [EMAIL PROTECTED]
 Subject: Re: [R] Reminiscing on 20 years using S

 On Dec 26, 2007, at 2:05 PM, Greg Snow wrote:

 I realized earlier this year (2007) that it was in 1987 that I first
 started using an early version of S (it was ported to VMS and was  
 called
 success).  That means that I have been using some variant of S (to
 various degrees) for over 20 years now (I don't feel that old).

 Boxing day somehow seems appropriate for this thread.  R.I.P. to all  
 those old boxes
 of yesteryore and the software that ran on them -- and yet there is  
 always a residual  archaeological curiosity.

 I discovered recently that the MIT athena network contains a circa  
 1989 version
 of S:  http://stuff.mit.edu/afs/athena/astaff/project/Sdev/S/  which  
 made me wonder
 whether there was any likelihood that one could recreate S Thu Dec   
 7 16:49:47 EST 1989.
 Curiosity is one thing, time to dig through the layers of ancient  
 civilizations is quite another.
 But if anyone would like to offer a (preferably educated) guess  
 about the feasibility of  such a project, like I said, I would be  
 curious.


 url:www.econ.uiuc.edu/~rogerRoger Koenker
 email   [EMAIL PROTECTED]   Department of  
 Economics
 vox:217-333-4558University of Illinois
 fax:217-244-6678Champaign, IL 61820

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Packages - a great resource, but hard to find the right one

2007-11-23 Thread John Maindonald
R as a whole could benefit from systematic attention.
There may be scope for several special issues of JSS.

- R overview and philosophy

- R penetration and influence, in the statistics community,
   in machine learning, and in a variety of application areas.

- R as a vehicle for fostering communication between
researchers in diverse areas.
[A great thing about R-help, though nowadays this role
 is passing across to other lists such as R-sig-ME, is that
 it facilitates and even forces communication between
 application area specialists, and between those
 specialists and statistics professionals.  This may be
 temporary; we may see the R community fragment into
 diverse communities that focus on their own specialist
 interests? Scope for a sociological study, perhaps?)

- Who is using R?, as reflected in published scientific literature.
 (I'd like to see a wiki or somesuch where authors
 are encouraged to give details of published analyses
 that have used R.)

- Where is R headed?  How will the shaping of its direction
 proceed?  Will it be a matter of step by step change and
 improvement, or is it (will it be) possible to lay out in
 advance an outline of the directions that its future
 development can be expected to take.

- Traps for new (and old) users.

- Books and papers on R.

- Then onto packages!  I guess what may be in order is
 something like the expansion of a task view into an
 extended paper.


John Maindonald email: [EMAIL PROTECTED]
phone : +61 2 (6125)3473fax  : +61 2(6125)5549
Centre for Mathematics  Its Applications, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.


On 23 Nov 2007, at 10:00 PM, [EMAIL PROTECTED] wrote:

 From: Antony Unwin [EMAIL PROTECTED]
 Date: 23 November 2007 7:50:52 PM
 To: [EMAIL PROTECTED]
 Subject: Re: [R] Packages - a great resource, but hard to find the  
 right one


 Johannes Hüsing wrote

 Above all there are lots of packages.  As the software editor of the
 Journal of Statistical Software I suggested we should review R
 packages.

 You mean: prior to submission?

 No.

 No one has shown any enthusiasm for this suggestion, but I
 think it would help.  Any volunteers?

 Thing is, I may like to volunteer, but not in the here's a
 package for you to review by week 32 way. Rather in the way that
 I search a package which fits my problem.

 That's what I was hoping for.

 One package lets me down
 and I'd like to know other users and the maintainer about it.
 The other one works black magic and I'd like to drop a raving
 review about it. This needs an infrastructure with a low barrier
 to entry. A wiki is not the worst idea if the initial infrastructure
 is geared at addressing problems rather than packages.

 We should differentiate between rave reviews of features that just
 happened to be very useful to someone and reviews of a package as a
 whole.  Both have their place and at the moment we don't have either.

 If you are willing to review an R package or aspects of R for JSS
 please let me know.

 Antony Unwin
 Professor of Computer-Oriented Statistics and Data Analysis,
 Mathematics Institute,
 University of Augsburg,
 Germany


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] [R-sig-ME] mixed model testing

2007-11-07 Thread John Maindonald
Whether or not you need a mixed model, e.g. random versus
fixed slopes, depends on how you intend to use results.

Suppose you have lines of depression vs lawn roller weight
calculated for a number of lawns. If the data will always be
used to make predictions for one of those same lawns, a
fixed slopes model is fine.

If you want to use the data to make a prediction for another
lawn from the same population (the population from which
this lawn is a random sample, right?), you need to model
the slope as a random effect.

Now for a more subtle point:

In the prediction for another lawn situation, it is possible that
the slope random effect can be zero, and analysts do very
commonly make this sort of assumption, maybe without
realizing that this is what they are doing.  You can test whether
the slope random effect is zero but, especially if you have data
from a few lawns only, failure to reject the null (zero random
effect) is not a secure basis for inferences that assume that
the slope is indeed zero. The test for zero random effect, then
infer is open to Box's pithy objection that
... to make preliminary tests on variances is rather like putting to
sea in a rowing boat to find out whether conditions are sufficiently
calm for an ocean liner to leave port.


John Maindonald email: [EMAIL PROTECTED]
phone : +61 2 (6125)3473fax  : +61 2(6125)5549
Centre for Mathematics  Its Applications, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.


On 8 Nov 2007, at 1:55 AM, Irene Mantzouni wrote:

 Is there a formal way to prove the need of a mixed model, apart from  
 e.g. comparing the intervals estimated by lmList fit?
 For example, should I compare (with AIC ML?) a model with seperately  
 (unpooled) estimated fixed slopes (i.e.using an index for each  
 group) with a model that treats this parameter as a random effect  
 (both models treat the remaining parameters as random)?

 Thank you!

 ___
 [EMAIL PROTECTED] mailing list
 https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] postscript(), used from a pre R-2.6.0 workspace

2007-10-30 Thread John Maindonald
I find that if start R-2.6.0 in a workspace with no .RData file,
load one of my workspaces from R-2.5.1 or earlier into R-2.6.0,
and then before doing anything else type postscript(file=z.ps),
I get::

  ls()
character(0)
  load(url(http://www.maths.anu.edu.au/~johnm/r/test5.RData;))
  postscript(file=z.ps)
Error in postscript(file = z.ps) : Invalid font type
In addition: Warning messages:
1: closing unused connection 3 
(gzcon(http://www.maths.anu.edu.au/~johnm/r/test5.RData) 
)
2: In postscript(file = z.ps) :
   font family not found in PostScript font database
3: In postscript(file = z.ps) :
   font family not found in PostScript font database

Or R may be started in a workspace with such a .RData file,
or with a .RData file from R-2.5.1 or earlier.

This behavior persists even if I take such an image file, load it,
clear the workspace, and save it.  An instance of such a .RData
file has the url:
http://www.maths.anu.edu.au/~johnm//r/test5.RData

It makes no difference whether I load the file into R-2.6.0
on a MacOS X or Windows system, or whether I use the current
R-patched (2007-10-27 r43288).  I do not know whether the
source of the original .RData file matters, or how it may depend
on what I have done in that earlier workspace.

For moving my .RData files across without messing up use of
postscript() [or pdf()],  a workaround is to dump the contents of the
earlier workspace, then start R in a working directory whose
.RData file, if any, has the proper R-2.6.0 pedigree, and source
the dumped file.  I do not want to have to directly run the function
that creates the database each time I start a session.

Has anyone else hit this?

John Maindonald email: [EMAIL PROTECTED]
phone : +61 2 (6125)3473fax  : +61 2(6125)5549
Centre for Mathematics  Its Applications, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.