(minor correct)

On 4/13/2014 7:41 PM, Gabor Grothendieck wrote:
On Sun, Apr 13, 2014 at 1:26 PM, John Fox <j...@mcmaster.ca> wrote:
I've attached the most recent data I have, which are from mid-2012. My
package counts came from
https://svn.r-project.org/R/branches/R-*-branch/tests/internet.Rout.save
(where the * is the R version).


It seems that the growth is exponential but at a lower slope (of the
log curve) after 2008 than before. A linear fit to the log curve is
shown  in blue before 2008 and in red after 2008.  What happened to
result in two such distinct regimes?


I got a great fit using a 4-parameter log-logistic model with drm{drc}; see below. This model suggests that CRAN will approach an asymptote of roughly 60,000 packages with a 95% confidence interval ranging from 31 to 117 thousand.


Obviously, the confidence interval for the asymptote assumes the 4-parameter log-logistic model is accurate. That's probably not realistic but is more accurate than assuming continued exponential growth. If I had time to develop more accurate predictions and confidence intervals, I'd try Bayesian Model Averaging with several different models.


      Thanks for the question and comments.


      Spencer


# Wait until "Build status: Current" at rev. 178 on Ecdat on R-Forge, then:
install.packages("Ecdat", repos="http://R-Forge.R-project.org";)

(day1 <- min(CRANpackages$Date)) # 2001-06-21
str(ddate <- CRANpackages$Date-day1)
CRANpackages$CRANdays <- as.numeric(ddate)

library(drc)
CRANlogLogis4. <- drm(log(Packages)~CRANdays, data=CRANpackages, fct=LL.4())
plot(CRANlogLogis4., log='y') # best I've found so far.

plot(resid(CRANlogLogis4.))
CRANlogLogis4.
# log(Packages) = c + (d-c)/(1 + (t/t0)^b)
# where
# b = -1.36 = log(60152)
# c = 4.73
# d = 11.0
# t0 = 3309 days since 2001-06-21

(ci4 <- confint(CRANlogLogis4.))

       2.5%   97.5%
b   -1.49   -1.24  # power of time = rate at which t^b -> 0
c    4.67    4.80   #
d   10.34   11.67 # asymptote of log(Packages)
t0 2800   3818 # reference number of days

# Asymptotic number of CRAN packages
exp(ci4[3, ])
    2.5 %    97.5 %
c(31, 117)*1000



Lines <- "version date        packages
1.3     2001-06-21   110
1.4     2001-12-17   129
1.5     2002-05-29   162
#1.6     2002-10-01   163
1.7     2003-05-27   219
1.8     2003-11-16   273
1.9     2004-06-05   357
2.0     2004-10-12   406
2.1     2005-06-18   548
2.2     2005-12-16   647
2.3     2006-05-31   739
2.4     2006-12-12   911
2.5     2007-04-12  1000
2.6     2007-11-16  1300
2.7     2008-03-18  1427
2.8     2008-10-18  1614  # updated
2.9     2009-04-17  1952
2.10    2009-10-26  2088
2.11    2010-04-22  2445
2.12    2010-10-15  2837
2.13    2011-04-13  3286
2.14    2011-06-20  3618
2.15    2012-07-07  4000
"
library(zoo)
zz <- read.zoo(text = Lines, header = TRUE, index = 2)[, 2]
plot(log(zz))
d <- as.Date("2008-01-01")
abline(v = d)
pre <- time(zz) < d
fo <- log(zz) ~ time(zz)
abline(lm(fo, subset = pre), col = "blue")
abline(lm(fo, subset = !pre), col = "red")

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to