On 04/17/2014 02:21 PM, Murray Stokely wrote:
On Thu, Apr 17, 2014 at 6:42 AM, McGehee, Robert
<robert.mcge...@geodecapital.com> wrote:
Here's my use case: I have a function that pulls arbitrary
financial data from a web service call such as a stock's industry,
price, volume, etc. by reading the web output as a text table. The
data may be either character (industry, stock name, etc.) or
numeric (price, volume, etc.), and the function generally doesn't
know the class in advance. The problem is that we frequently get
numeric values represented with more precision than actually
exists, for instance a price of "2.6999999999999999" rather than
"2.70". The numeric representation is exactly one digit too much
for type.convert which (in R 3.10.0) converts it to character
instead of numeric (not what I want). This caused a bunch of
"non-numeric argument to binary operator" errors to appear today as
numeric data was now being represented as characters.

I have no doubt that this probably will cause some unwanted RODBC
side effects for us as well. IMO, getting the class right is more
important than infinite precision. What use is a character
representation of a number anyway if you can't perform arithmetic
on it? I would favor at least making the new behavior optional, but
I think many packages (like RODBC) potentially need to be patched
to code around the new feature if it's left in.

The uses of character representation of a number are many: unique
identifiers/user ids, hash codes, timestamps, or other values where
rounding results to the nearest value that can be represented as a
numeric type would completely change the results of any data
analysis performed on that data.

Database join operations are certainly an area where R's previous
behavior of silently dropping precision of numbers with type.convert
can get you into trouble.  For example, things like join operations
or group by operations performed in R code would produce erroneous
results if you are joining/grouping by a key without the full
precision of your underlying data.  Records can get joined up
incorrectly or aggregated with the wrong groups.

I don't understand this. Assuming you are sending the SQL statement to the database engine, none of this erroneous matching is happening in R. The calculations all happens on the database.

But, for the case where the database does know that numbers are double precision, it would be nice if they got transmitted by ODBC to R as numerics (the usual translation) just as they are by the native interfaces like RPostgreSQL. Do you get the erroneous results when you use a native interface?

( from second response:)
You want a casting operation in your SQL query or similar if you want
a rounded type that will always fit in a double.  Cast or Convert
operators in SQL, or similar for however you are getting the data you
want to use with type.convert().  This is all application specific and
sort of beyond the scope of type.convert(), which now behaves as it
has been documented to behave.

This seems to suggests I need to use different SQL statements depending on which interface I use to talk to the database.

If you do 1/3 in a database calculation and that ends up being represented as something more accurate than double precision on the database, then it needs to be transmitted as something with higher precision (character/factor?). If the result is double precision it should be sent as double precision, not as something pretending to be more accurate.

I suspect the difficulty with ODBC may be that type.convert() really should not be called when both ends of the communication know that a double precision number is being exchanged.

Paul

If you later want to do arithmetic on them, you can choose to lose
precision by using as.numeric() or use one of the large number
packages on CRAN (GMP, int64, bit64, etc.).  But once you've dropped
the precision with as.numeric you can never get it back, which is
why the previous behavior was clearly dangerous.

I think I had some additional examples in the original bug/patch I
filed about this issue a few years ago, but I'm unable to find it on
bugs.r-project.org and its not referenced in the cl descriptions or
news file.

- Murray


______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Reply via email to