[ https://issues.apache.org/jira/browse/ARROW-11455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ian Cook updated ARROW-11455: ----------------------------- Description: R’s {{integer}} range is 1 smaller than the normal 32-bit integer range of C++, Java, etc. In R, it’s {{-2^31 + 1}} to {{2^31 - 1}}. Elsewhere, it’s {{-2^31}} to {{2^31 - 1}}. So R's native {{integer}} type cannot represent {{-2^31}} ({{-2147483648}}). I believe this is because R uses {{-2^31}} as the sentinel value to represent {{NA_integer_}}. If you run {{-2147483648L}} in R, it converts the vector it to {{numeric}} type and issues a warning: {code:java} Warning message: non-integer value 2147483648L qualified with L; using numeric value {code} In the {{arrow}} R package, when a 32-bit integer Arrow field containing the value {{-2147483648}} is converted to an R {{integer}} vector, that value becomes {{NA_integer_}}. No warning is given. Simple command to repro this: {code:r} as.vector(Scalar$create(-2^31, int32())){code} Consider whether we should handle this case differently and whether it is feasible to do so without performance regressions. Other possible behaviors might be: * Converting the value to {{NA_integer_}} with a warning * Converting the field to {{bit64::integer64}} with a warning * Converting the field to {{base::numeric}} with a warning * Allowing the user to specify an argument or option to control the behavior {code:r} {code} was: R’s {{integer}} range is 1 smaller than the normal 32-bit integer range of C++, Java, etc. In R, it’s {{-2^31 + 1}} to {{2^31 - 1}}. Elsewhere, it’s {{-2^31}} to {{2^31 - 1}}. So R's native {{integer}} type cannot represent {{-2^31}} ({{-2147483648}}). I believe this is because R uses {{-2^31}} as the sentinel value to represent {{NA_integer_}}. If you run {{-2147483648L}} in R, it converts the vector it to {{numeric}} type and issues a warning: {code:java} Warning message: non-integer value 2147483648L qualified with L; using numeric value {code} In the {{arrow}} R package, when a 32-bit integer Arrow field containing the value {{-2147483648}} is converted to an R {{integer}} vector, that value becomes {{NA_integer_}}. No warning is given. Consider whether we should handle this case differently and whether it is feasible to do so without performance regressions. Other possible behaviors might be: * Converting the value to {{NA_integer_}} with a warning * Converting the field to {{bit64::integer64}} with a warning * Converting the field to {{base::numeric}} with a warning * Allowing the user to specify an argument or option to control the behavior > [R] Improve handling of -2^31 in 32-bit integer fields > ------------------------------------------------------ > > Key: ARROW-11455 > URL: https://issues.apache.org/jira/browse/ARROW-11455 > Project: Apache Arrow > Issue Type: Improvement > Components: R > Affects Versions: 3.0.0 > Reporter: Ian Cook > Assignee: Ian Cook > Priority: Major > > R’s {{integer}} range is 1 smaller than the normal 32-bit integer range of > C++, Java, etc. In R, it’s {{-2^31 + 1}} to {{2^31 - 1}}. Elsewhere, it’s > {{-2^31}} to {{2^31 - 1}}. So R's native {{integer}} type cannot represent > {{-2^31}} ({{-2147483648}}). I believe this is because R uses {{-2^31}} as > the sentinel value to represent {{NA_integer_}}. > If you run {{-2147483648L}} in R, it converts the vector it to {{numeric}} > type and issues a warning: > {code:java} > Warning message: > non-integer value 2147483648L qualified with L; using numeric value > {code} > In the {{arrow}} R package, when a 32-bit integer Arrow field containing the > value {{-2147483648}} is converted to an R {{integer}} vector, that value > becomes {{NA_integer_}}. No warning is given. > Simple command to repro this: > {code:r} > as.vector(Scalar$create(-2^31, int32())){code} > Consider whether we should handle this case differently and whether it is > feasible to do so without performance regressions. Other possible behaviors > might be: > * Converting the value to {{NA_integer_}} with a warning > * Converting the field to {{bit64::integer64}} with a warning > * Converting the field to {{base::numeric}} with a warning > * Allowing the user to specify an argument or option to control the behavior > {code:r} > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)