[ 
https://issues.apache.org/jira/browse/ARROW-11455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ian Cook updated ARROW-11455:
-----------------------------
    Description: 
R’s {{integer}} range is 1 smaller than the normal 32-bit integer range of C++, 
Java, etc. In R, it’s {{-2^31 + 1}} to {{2^31 - 1}}. Elsewhere, it’s {{-2^31}} 
to {{2^31 - 1}}. So R's native {{integer}} type cannot represent {{-2^31}} 
({{-2147483648}}). I believe this is because R uses {{-2^31}} as the sentinel 
value to represent {{NA_integer_}}.

If you run {{-2147483648L}} in R, it converts the vector it to {{numeric}} type 
and issues a warning:
{code:java}
Warning message:
non-integer value 2147483648L qualified with L; using numeric value 
{code}
In the {{arrow}} R package, when a 32-bit integer Arrow field containing the 
value {{-2147483648}} is converted to an R {{integer}} vector, that value 
becomes {{NA_integer_}}. No warning is given.

Simple command to repro this: 
{code:r}
as.vector(Scalar$create(-2^31, int32())){code}
Consider whether we should handle this case differently and whether it is 
feasible to do so without performance regressions. Other possible behaviors 
might be:
 * Converting the value to {{NA_integer_}} with a warning
 * Converting the field to {{bit64::integer64}} with a warning
 * Converting the field to {{base::numeric}} with a warning
 * Allowing the user to specify an argument or option to control the behavior

{code:r}
{code}

  was:
R’s {{integer}} range is 1 smaller than the normal 32-bit integer range of C++, 
Java, etc. In R, it’s {{-2^31 + 1}} to {{2^31 - 1}}. Elsewhere, it’s {{-2^31}} 
to {{2^31 - 1}}. So R's native {{integer}} type cannot represent {{-2^31}} 
({{-2147483648}}). I believe this is because R uses {{-2^31}} as the sentinel 
value to represent {{NA_integer_}}.

If you run {{-2147483648L}} in R, it converts the vector it to {{numeric}} type 
and issues a warning:
{code:java}
Warning message:
non-integer value 2147483648L qualified with L; using numeric value 
{code}
In the {{arrow}} R package, when a 32-bit integer Arrow field containing the 
value {{-2147483648}} is converted to an R {{integer}} vector, that value 
becomes {{NA_integer_}}. No warning is given. Consider whether we should handle 
this case differently and whether it is feasible to do so without performance 
regressions. Other possible behaviors might be:
 * Converting the value to {{NA_integer_}} with a warning
 * Converting the field to {{bit64::integer64}} with a warning
 * Converting the field to {{base::numeric}} with a warning
 * Allowing the user to specify an argument or option to control the behavior


> [R] Improve handling of -2^31 in 32-bit integer fields
> ------------------------------------------------------
>
>                 Key: ARROW-11455
>                 URL: https://issues.apache.org/jira/browse/ARROW-11455
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: R
>    Affects Versions: 3.0.0
>            Reporter: Ian Cook
>            Assignee: Ian Cook
>            Priority: Major
>
> R’s {{integer}} range is 1 smaller than the normal 32-bit integer range of 
> C++, Java, etc. In R, it’s {{-2^31 + 1}} to {{2^31 - 1}}. Elsewhere, it’s 
> {{-2^31}} to {{2^31 - 1}}. So R's native {{integer}} type cannot represent 
> {{-2^31}} ({{-2147483648}}). I believe this is because R uses {{-2^31}} as 
> the sentinel value to represent {{NA_integer_}}.
> If you run {{-2147483648L}} in R, it converts the vector it to {{numeric}} 
> type and issues a warning:
> {code:java}
> Warning message:
> non-integer value 2147483648L qualified with L; using numeric value 
> {code}
> In the {{arrow}} R package, when a 32-bit integer Arrow field containing the 
> value {{-2147483648}} is converted to an R {{integer}} vector, that value 
> becomes {{NA_integer_}}. No warning is given.
> Simple command to repro this: 
> {code:r}
> as.vector(Scalar$create(-2^31, int32())){code}
> Consider whether we should handle this case differently and whether it is 
> feasible to do so without performance regressions. Other possible behaviors 
> might be:
>  * Converting the value to {{NA_integer_}} with a warning
>  * Converting the field to {{bit64::integer64}} with a warning
>  * Converting the field to {{base::numeric}} with a warning
>  * Allowing the user to specify an argument or option to control the behavior
> {code:r}
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to