Hi all, I experienced some unmatched result using mean function in ffbase package and cannot figure out what's wrong.
I have a simulated ff vector with 1000000000 numbers inside and want to calculate its mean. But the results are quite different. With mean( ) function in ffbase package, the mean is 152.6858. But with R's mean( ) or adding sum from chunks directly, I got 667.5595 any idea ? Thank you in advance! Bayes Chen # F1 is an ffdf , F1$X1 is an ff vector > length(F1$X1) [1] 1000000000 # Use mean() function in ffbase package > mean(F1$X1) [1] 152.6858 > X2 = F1$X1[] # X2 is now an non-ff vector > length(X2) [1] 1000000000 > mean(X2) # R's original mean function for ordinary vectors [1] 667.5595 # calculate sum and then mean by chunks > chunks = chunk(F1$X1, by=5000000) > sumx = 0 > for (i in chunks) { + sumx = sumx + sum(F1$X1[i]) + } > sumx/length(F1$X1) [1] 667.5595 ----------------------------------- below are some other trials > X2 = F1$X1[1:1000000] > mean(X2) [1] 59.43149 > mean(as.ff(X2)) [1] 59.43149 > X2 = F1$X1[1:100000000] > mean(X2) [1] 59.41978 > mean(as.ff(X2)) [1] 59.42128 > X2 = F1$X1[1:500000000] > mean(X2) [1] 60.53615 > mean(as.ff(X2)) [1] 57.72168 > X2 = F1$X1[1:750000000] > mean(X2) [1] 59.37562 > mean(as.ff(X2)) [1] 57.81179 > X2 = F1$X1[1:900000000] > mean(X2) [1] 57.0867 > mean(as.ff(X2)) [1] 57.44862 > X3 = F1$X1[900000000:1000000000] > mean(X3) [1] 6161.814 > mean(as.ff(X3)) [1] 6161.797 [[alternative HTML version deleted]]
______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.