[R] reading a big file
Dear All, I am on WindowsXP with 512 MB of RAM, R 2.4.0, and I want to read in a big file mln100.txt. The file is 390MB big, it contains a column of 100 millions integers. mln100=scan(mln100.txt) Error: cannot allocate vector of size 512000 Kb In addition: Warning messages: 1: Reached total allocation of 511Mb: see help(memory.size) 2: Reached total allocation of 511Mb: see help(memory.size) In fact, I would be quite happy if I could read, say, every tenth integer (line) of the file. Is it possible to do this? Cheers, Rem __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] multivariate normality test
Hello, Could someone help me to explain the VERY big difference in applying two tests on multivariate normality: library(mvnormtest) data(EuStockMarkets) mshapiro.test(t(EuStockMarkets[15:29,1:4])) Shapiro-Wilk normality test data: Z W = 0.8161, p-value = 0.005955 and library(energy) mvnorm.etest( EuStockMarkets[15:29,1:4] ) Energy test of multivariate normality: estimated parameters data: x, sample size 15, dimension 4, replicates 999 E-statistic = 1.0041, p-value = 0.2482 Many thanks, Rem __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] survival with Weibull
Hello, I want to test if the Weibull distribution is appropriate for the failure time. When trying to reproduce an example from MASS (the book, Ch. 13.2), I type library(survival) library(MASS) leuk.wei - survreg( Surv(time)~ag+log(wbc),data=leuk) ntimes - leuk$time*exp(-leuk.wei$linear.predictors) plot(survfit(Surv(ntimes)),log=T) and get (almost) the same graph as in the book (which means that the distribution is close to exponential). On the other hand, if I want to test for the Weibull distribution, the recommended command plot(survfit(Surv(ntimes)),fun=cloglog) ends in an error message: Error in rep.default(2, n2 - 1) : invalid number of copies in rep() In addition: Warning message: 2 x values = 0 omitted from logarithmic plot in: xy.coords(x, y, xlabel, ylabel, log) I'd appreciate any hints, Rem __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] priceIts
Dear All, There is an example for the priceIts function (the its package) which does not work for me as expected. ?priceIts x1 - priceIts(instrument = c(^ftse), start = 1998-01-01, + quote = Close) Error in validObject(.Object) : invalid class its object: Missing values in dates x2 - priceIts(instrument = c(^gdax), start = 1998-01-01, + quote = Close) Error in download.file(url, destfile, method = method, quiet = quiet) : cannot open URL 'http://chart.yahoo.com/table.csv?s=^gdaxa=0b=01c=1998d=8e=28f=2005g=dq=qy=0z=^gdaxx=.csv' In addition: Warning message: cannot open: HTTP status was '404 Not Found' x - union(x1,x2) Error: Object x1 not found Error in union(x1, x2) : unable to find the argument 'x' in selecting a method for function 'union' names(x) - c(FTSE,DAX) plot(x,lab=TRUE) Any help appreciated. Best wishes, Remigijus __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] What is median in survfit
Dear All, A very simple question: library(survival) fit - coxph(Surv(time, status) ~ x, data=aml) survfit(fit) Call: survfit.coxph(object = fit) n events median 0.95LCL 0.95UCL 23 18 30 18 45 I believe I know what is median here, but how to extract it? Many thanks, Rem __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] How to use lag?
I use the following two function for a lagged regression: lm.lag=function(y,lag=1) summary(lm(embed(y,lag+1)[,1]~embed(y,lag+1)[,2:(lag+1)])) lm.lag.x=function(y,x,lag=1) summary(lm(embed(y,lag+1)[,1]~embed(x,lag+1)[,2:(lag+1)])) for, respectively, y_t=a+b_1*y_t-1+...+b_lag*y_t-lag y_t=a+b_1*x_t-1+...+b_lag*x_t-lag I am not quite sure whether this an answer to your question, but here are two examples: set.seed(7) ar1=arima.sim(n=300,list(ar=0.8)) lm.lag(ar1) lm.lag.x(ar1,ar1) set.seed(8) ar3=arima.sim(n = 200, list(ar = c(0.4, -0.5, 0.7))) lm.lag(ar3,3) lm.lag.x(ar3,ar3,3) Best wishes, Rem Saturday, March 5, 2005, 6:14:15 PM, you wrote: SG Is it possible to fit a lagged regression, y[t]=b0+b1*x[t-1]+e, SG using the function lag? If so, how? If not, of what use is the SG function lag? I get the same answer from y~x as y~lag(x), whether SG using lm or arima. I found it using y~c(NA, x[-length(x)])). Consider SG the following: set.seed(1) x - rep(c(rep(0, 4), 9), len=9) y - (rep(c(rep(0, 5), 9), len=9)+rnorm(9)) # y[t] = x[t-1]+e lm(y~x) SG (Intercept)x SG 1.2872 -0.1064 lm(y~lag(x)) SG (Intercept) lag(x) SG 1.2872 -0.1064 arima(y, xreg=x) SG interceptx SG 1.2872 -0.1064 SG s.e. 0.9009 0.3003 SG sigma^2 estimated as 6.492: log likelihood = -21.19, aic = 48.38 arima(y, xreg=lag(x)) SG intercept lag(x) SG 1.2872 -0.1064 SG s.e. 0.9009 0.3003 arima(y, xreg=c(NA, x[-9])) SG intercept c(NA, x[-9]) SG 0.43920.8600 SG s.e. 0.23720.0745 SG sigma^2 estimated as 0.3937: log likelihood = -7.62, aic = 21.25 arima(ts(y), xreg=lag(ts(x))) SG arima(x = ts(y), xreg = lag(ts(x))) SG intercept lag(ts(x)) SG 1.2872 -0.1064 SG s.e. 0.9009 0.3003 SG sigma^2 estimated as 6.492: log likelihood = -21.19, aic = 48.38 SG Thanks for your help. SG Spencer Graves SG __ SG R-help@stat.math.ethz.ch mailing list SG https://stat.ethz.ch/mailman/listinfo/r-help SG PLEASE do read the posting guide! SG http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] How can I simulate Pareto distribution in R?
If you define Pareto density as p(x)=c*x^(-(c+1)) for x1, then dpareto - function(x,c){ if(c=0)stop(c must be positive) # Diagnostic step ifelse(x1,0,c/x^(c+1))} ppareto - function(q,c){ if(c=0)stop(c must be positive 0) ifelse(q1,0,1-1/q^c)} qpareto - function(p,c){ if(c=0) stop(c must be positive 0) if(any(p0)|any(p1)) # Symbol | denotes logical OR stop(p must be between 0 and 1) q - (1-p)^(-1/c) q} rpareto - function(n,c){ if(c=0) stop(c must be positive) rp - runif(n)^(-1/c) rp} Good luck, Rem Sunday, January 9, 2005, 10:21:47 AM, you wrote: NX Hi, guys, NXI need to simulate Pareto distribution. But I found NX 'rpareto' didn't exist in R. And it seems that Pareto distribution NX don't have mathematical relationships with other distributions. NX What can I do? NX Thanks a lot. NX Ni __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] R vs EViews - serial correlation
Dear all, I met with some problems when dealing with a time series with serial correlation. FIRST, I generate a series with correlated errors set.seed(1) x=1:50 y=x+arima.sim(n = 50, list(ar = c(0.47))) SECOND, I estimate three constants (a, b and rho) in the model Y=a+b*X+u, where u=rho*u(-1)+eps library(nlme) gls(y~x,correlation = corAR1(0.5)) # Is it the right procedure? Coefficients: (Intercept) x 0.1410465 1.0023341 Correlation Structure: AR(1) Formula: ~1 Parameter estimate(s): Phi 0.440594 Degrees of freedom: 50 total; 48 residual Residual standard error: 0.9835158 THIRD, I do the same procedure with EViews as LS Y C X AR(1) and get Y = 0.1375 + 1.0024*X + [AR(1)=0.3915] My problem is actually connected with the fitting procedure. As far as I understand gls(y~x,correlation = corAR1(0.5))$fit is obtained through the linear equation 0.1410+1.0023*X while in EViews through the nonlinear equation Y=rho*Y(-1) + (1-rho)*a+(X-rho*X(-1))*b where either dynamic or static fitting procedures are applied. X YYF_DYF_S gls.fit 1 1 1.1592 NA NA 1.1434 2 2 3.5866 2.1499 2.1499 2.1457 3 3 4.1355 3.1478 3.7103 3.1480 4 4 3.9125 4.1484 4.5352 4.1504 5 5 2.7442 5.1502 5.0578 5.1527 6 6 6.0647 6.1523 5.2103 6.1551 7 7 6.9855 7.1547 7.1203 7.1574 . 47 47 49.4299 47.2521 47.5288 47.2507 48 48 48.7748 48.2545 49.1072 48.2531 49 49 48.3200 49.2570 49.4607 49.2554 50 50 50.2501 50.2594 49.8926 50.2578 All gls.fit values are on a line, YF_D (D for dynamic) soon begin to follow a line and YF_S (S for static) try to mimic Y. My question is: do R and EViews estimate the same model? If yes, why the estimates are different and which of the two (three?) procedures is correct? Thanking you in advance, Rem __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] predict(arima)
Dear All, R 1.9.1, Windows When copying and pasting a few lines from the 'predict.Arima' help, I get an error message: data(lh) predict(arima(lh, order = c(3,0,0)), n.ahead = 12) Error in eval(expr, envir, enclos) : Object xreg not found On the other hand, the following is OK: data(lh) predict(arima0(lh, order = c(3,0,0)), n.ahead = 12) $pred Time Series: Start = 49 End = 60 Frequency = 1 [1] 2.460173 2.270829 2.198597 2.260696 2.346933 2.414479 2.438918 2.431440 2.410223 2.391645 2.382653 2.382697 $se Time Series: Start = 49 End = 60 Frequency = 1 [1] 0.4226823 0.5029332 0.5245256 0.5247161 0.5305499 0.5369159 0.5388045 0.5388448 0.5391043 0.5395174 0.5396991 0.5397140 So, what is wrong with arima? Thanking you in advance, Rem __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] automate save.image
Dear all, Sometimes, during an R session, my computer hangs and I loose all the objects created during this session. Is there a way to automatically type save.image() and savehistory() every 5 minutes? Best regards, Remigijus __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re[2]: [R] optim
Sunday, March 02, 2003, 9:48:26 PM, you wrote: MK [EMAIL PROTECTED] wrote: You are not solving the same problem, which is more than a little `odd'. MK Why? MK want to choose the three parameters so that MK these three integrals are as close to, resp., 2300, 4600 and 5800 as MK possible. As I have three equations with three unknowns, I expect the MK exact fit, i.e., the SS (see below) to be zero. Good evening to all! In fact, I want to estimate three unknown parameters TETA[1],TETA[2] (the scale parameter) and TETA[3]. What I know is 10 numbers aa - c(2300,4600,5800, 7100,...,37700) and these are the empirical counterparts of the integrals over [0,0.1],...[0.9,1]. I don't know whether this is correct but in order to find the parameters I minimize the sum((integr-aa)^2) (kind of the moment method). As I had problems with optim, I reduced the problem to three intervals. I have to admit, I'm still fighting with optim. Following Prof Ripley's suggestion, I replaced res - optim(init,LSS,aa=aa,method = L-BFGS-B,lower=c(0,0,0.5)) by res - optim(init,LSS,aa=aa,method = L-BFGS-B,lower=c(0,0,0.5), control=list(parscale=c(1,1000,1))) Now I have source(C:/Program Files/R/integral.R) [1] 2.50 7000.00 0.84 # initial [1] 2.052587 66734.47682242.110597 # final [1] 42820.43 instead of what I had earlier [1]2.50 7000.00 0.84# initial [1]2.3487221 6999.8230.5623628 # final [1] 75613.05 Now SS=42820, but it is still far from zero. I agree, I did not implemented all what I was advised to, but it could take some time. Many thanks for the help. Remigijus *** *** Do read the help page. It says: [...] `parscale' A vector of scaling values for the parameters. Optimization is performed on `par/parscale' and these should be comparable in the sense that a unit change in any element produces about a unit change in the scaled value. ** ** Dear all, I have a function MYFUN which depends on 3 positive parameters TETA[1], TETA[2], and TETA[3]; x belongs to [0,1]. I integrate the function over [0,0.1], [0.1,0.2] and [0.2,0.3] and want to choose the three parameters so that these three integrals are as close to, resp., 2300, 4600 and 5800 as possible. As I have three equations with three unknowns, I expect the exact fit, i.e., the SS (see below) to be zero. However, the optim function never gives me what I expect, the minimal SS value(=res$value) never comes close to zero, the estimates of the parameters, res$par, wildly depends on init etc. I would be grateful for any comments on this miserable situation. aa - c(2300,4600,5800) init - c(2.5,8000,0.84) # initial values of parameters print(init) ### myfun - function(x,TETA) TETA[2]*(((1-x)^(-1/TETA[3]))- 1)^(1/TETA[1]) ### x - seq(0,0.3,by=0.01) plot(x,myfun(x,init),type=l) ### LSS - function(teta,aa) { integr - numeric(3) for(i in 1:3) {integr[i] - 10*integrate(myfun, lower=(i-1)/10,upper=i/10,TETA=teta)$value } SS - # SS=Sum of Squares SS } res - optim(init,LSS,aa=aa, method = L-BFGS-B,lower=c(0,0,0.5)) print(res$par) print(res$value) source(C:/Program Files/R/integral.R) [1]2.50 7000.00 0.84# initial [1]2.3487221 6999.8230.5623628 # final [1] 75613.05 # minSS source(C:/Program Files/R/integral.R) [1] 2.5 150000.84 # initial [1] 2.125804 14999.999747 2.241179 # final [1] 50066.35 # minSS __ [EMAIL PROTECTED] mailing list http://www.stat.math.ethz.ch/mailman/listinfo/r-help
[R] optim
Dear all, I have a function MYFUN which depends on 3 positive parameters TETA[1], TETA[2], and TETA[3]; x belongs to [0,1]. I integrate the function over [0,0.1], [0.1,0.2] and [0.2,0.3] and want to choose the three parameters so that these three integrals are as close to, resp., 2300, 4600 and 5800 as possible. As I have three equations with three unknowns, I expect the exact fit, i.e., the SS (see below) to be zero. However, the optim function never gives me what I expect, the minimal SS value(=res$value) never comes close to zero, the estimates of the parameters, res$par, wildly depends on init etc. I would be grateful for any comments on this miserable situation. aa - c(2300,4600,5800) init - c(2.5,8000,0.84) # initial values of parameters print(init) ### myfun - function(x,TETA) TETA[2]*(((1-x)^(-1/TETA[3]))- 1)^(1/TETA[1]) ### x - seq(0,0.3,by=0.01) plot(x,myfun(x,init),type=l) ### LSS - function(teta,aa) { integr - numeric(3) for(i in 1:3) {integr[i] - 10*integrate(myfun, lower=(i-1)/10,upper=i/10,TETA=teta)$value } SS - sum((integr-aa)^2) # SS=Sum of Squares SS } res - optim(init,LSS,aa=aa, method = L-BFGS-B,lower=c(0,0,0.5)) print(res$par) print(res$value) source(C:/Program Files/R/integral.R) [1]2.50 7000.00 0.84# initial [1]2.3487221 6999.8230.5623628 # final [1] 75613.05 # minSS source(C:/Program Files/R/integral.R) [1] 2.5 150000.84 # initial [1] 2.125804 14999.999747 2.241179 # final [1] 50066.35 # minSS Best regards, Remigijus mailto:[EMAIL PROTECTED] __ [EMAIL PROTECTED] mailing list http://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re[2]: [R] Interpolation
Many thanks to all who replied to my e-mail. My problem was that I had not known about the approx function. By the way, if I have x - c(1990,1994,1995,1997), is there an automated way to fill in the gaps, i.e., to get c(1991,1992,1993,1996)? Cheers, Remigijus Wednesday, February 12, 2003, 12:09:33 PM, you wrote: PDB Remigijus Lapinskas [EMAIL PROTECTED] writes: Dear all, I have two vectors and connect the points by line segments: x - c(1990,1994,1995,1997) y - c(30,40,80,20) plot(x,y,type=l) I want to get the values of y's on these lines at missing x's, i.e., at 1991,1992,1993,and 1996. Is there a simple way to do this? PDB approx(x,y,c(1991,1992,1993,1996)) PDB plot(x,y,type=l) PDB points(approx(x,y,c(1991,1992,1993,1996)),pch=*,col=red,cex=5) PDB -- PDBO__ Peter Dalgaard Blegdamsvej 3 PDB c/ /'_ --- Dept. of Biostatistics 2200 Cph. N PDB (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 PDB ~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907 __ [EMAIL PROTECTED] mailing list http://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] Plotting coloured histograms...
Try the H. Bengtsson's function plot.histogram from http://www.maths.lth.se/matstat/staff/hb/mypackages/R/plot.histogram.R Remigijus Saturday, January 25, 2003, 10:21:30 PM, you wrote: FHFPdRHi, I am having some trouble trying to plot a histogram in more than one FHFPdR colour. What I want to do is, plot two vectors in the same histogram, but FHFPdR with different colours, for instance: FHFPdR x - rnorm(1000,20,4); FHFPdR y - rnorm(1000,10,2); FHFPdR Then I'd like to have x and y ploted on the same hist (I can do that FHFPdR already doing w - c(x,y) then hist(w)) but the bars representing the x's should FHFPdR be in one colour and the bars representing the y should be in another one, FHFPdR so that I could see the overlaping areas of the two distributions etc. FHFPdR Is there any way to do that? I've read through the hist docummentation (help(hist)) and also googled for R colour histogram but didn't find FHFPdR anything helpfull. FHFPdR Thank you for your attention, FHFPdR -- FHFPdR __ FHFPdR [EMAIL PROTECTED] mailing list FHFPdR http://www.stat.math.ethz.ch/mailman/listinfo/r-help __ [EMAIL PROTECTED] mailing list http://www.stat.math.ethz.ch/mailman/listinfo/r-help