Hi all,

I experienced some unmatched result using mean function in ffbase package 
and cannot figure out what's wrong.

I have a simulated ff vector with 1000000000 numbers inside and want to
calculate its mean. But the results are quite different.

With mean( ) function in ffbase package, the mean is 152.6858.
But with R's mean( ) or adding sum from chunks directly, I got 667.5595

any idea ? Thank you in advance!

Bayes Chen

# F1 is an ffdf , F1$X1 is an ff vector
> length(F1$X1)
[1] 1000000000

# Use mean() function in ffbase package
> mean(F1$X1)
[1] 152.6858

> X2 = F1$X1[]    #  X2 is now an non-ff  vector
> length(X2)
[1] 1000000000
> mean(X2)          # R's original mean function for ordinary vectors
[1] 667.5595

# calculate sum and then mean by chunks
> chunks = chunk(F1$X1, by=5000000)
> sumx = 0
> for (i in chunks) {
+     sumx = sumx + sum(F1$X1[i])
+ }
> sumx/length(F1$X1)
[1] 667.5595

----------------------------------- below are some other trials
> X2 = F1$X1[1:1000000]
> mean(X2)
[1] 59.43149
> mean(as.ff(X2))
[1] 59.43149

> X2 = F1$X1[1:100000000]
> mean(X2)
[1] 59.41978
> mean(as.ff(X2))
[1] 59.42128

> X2 = F1$X1[1:500000000]
> mean(X2)
[1] 60.53615
> mean(as.ff(X2))
[1] 57.72168

> X2 = F1$X1[1:750000000]
> mean(X2)
[1] 59.37562
> mean(as.ff(X2))
[1] 57.81179

> X2 = F1$X1[1:900000000]
> mean(X2)
[1] 57.0867
> mean(as.ff(X2))
[1] 57.44862

> X3 = F1$X1[900000000:1000000000]
> mean(X3)
[1] 6161.814
> mean(as.ff(X3))
[1] 6161.797
        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to