Yes, I'm also strongly in favor of having an option for this. If there was an option in base R for controlling this we would just use that and get rid of the separate RProtoBuf.int64AsString option we use in the RProtoBuf package on CRAN to control whether 64-bit int types from C++ are returned to R as numerics or character vectors.
I agree that reasonable people can disagree about the default, but I found my original bug report about this, so I will counter Robert's example with my favorite example of what was wrong with the previous behavior : tmp<-data.frame(n=c("72057594037927936", "72057594037927937"), name=c("foo", "bar")) length(unique(tmp$n)) # 2 write.csv(tmp, "/tmp/foo.csv", quote=FALSE, row.names=FALSE) data <- read.csv("/tmp/foo.csv") length(unique(data$n)) # 1 - Murray On Sat, Apr 19, 2014 at 10:06 AM, Simon Urbanek <simon.urba...@r-project.org> wrote: > On Apr 19, 2014, at 9:00 AM, Martin Maechler <maech...@stat.math.ethz.ch> > wrote: > >>>>>>> McGehee, Robert <robert.mcge...@geodecapital.com> >>>>>>> on Thu, 17 Apr 2014 19:15:47 -0400 writes: >> >>>> This is all application specific and >>>> sort of beyond the scope of type.convert(), which now behaves as it >>>> has been documented to behave. >> >>> That's only a true statement because the documentation was changed to >>> reflect the new behavior! The new feature in type.convert certainly does >>> not behave according to the documentation as of R 3.0.3. Here's a snippit: >> >>> The first type that can accept all the >>> non-missing values is chosen (numeric and complex return values >>> will represented approximately, of course). >> >>> The key phrase is in parentheses, which reminds the user to expect a >>> possible loss of precision. That important parenthetical was removed from >>> the documentation in R 3.1.0 (among other changes). >> >>> Putting aside the fact that this introduces a large amount of unnecessary >>> work rewriting SQL / data import code, SQL packages, my biggest conceptual >>> problem is that I can no longer rely on a particular function call >>> returning a particular class. In my example querying stock prices, about 5% >>> of prices came back as factors and the remaining 95% as numeric, so we had >>> random errors popping in throughout the morning. >> >>> Here's a short example showing us how the new behavior can be unreliable. I >>> pass a character representation of a uniformly distributed random variable >>> to type.convert. 90% of the time it is converted to "numeric" and 10% it is >>> a "factor" (in R 3.1.0). In the 10% of cases in which type.convert converts >>> to a factor the leading non-zero digit is always a 9. So if you were >>> expecting a numeric value, then 1 in 10 times you may have a bug in your >>> code that didn't exist before. >> >>>> options(digits=16) >>>> cl <- NULL; for (i in 1:10000) cl[i] <- >>>> class(type.convert(format(runif(1)))) >>>> table(cl) >>> cl >>> factor numeric >>> 990 9010 >> >> Yes. >> >> Murray's point is valid, too. >> >> But in my view, with the reasoning we have seen here, >> *and* with the well known software design principle of >> "least surprise" in mind, >> I also do think that the default for type.convert() should be what >> it has been for > 10 years now. >> > > I think there should be two separate discussions: > > a) have an option (argument to type.convert and possibly read.table) to > enable/disable this behavior. I'm strongly in favor of this. > > b) decide what the default for a) will be. I have no strong opinion, I can > see arguments in both directions > > But most importantly I think a) is better than the status quo - even if the > discussion about b) drags out. > > Cheers, > Simon > > > ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel