[jira] [Commented] (ARROW-18241) [R] as.integer can't handdle empty character cels (ex c(''))

2022-11-04 Thread Neal Richardson (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-18241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17628905#comment-17628905
 ] 

Neal Richardson commented on ARROW-18241:
-

> "I agree this would be a nice option to have." > sure. But should be the 
> default behavior, as that is what happens in base R, no?

Sorry, that was ambiguous. We would need the C++ cast function to support an 
option to return nulls for values that can't be converted, rather than just 
error. If/when that option exists, then yes, we would make that default in R.

I'll rename this ticket to be about that feature.

> [R] as.integer can't handdle empty character cels (ex c(''))
> 
>
> Key: ARROW-18241
> URL: https://issues.apache.org/jira/browse/ARROW-18241
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: R
>Reporter: Lucas Mation
>Priority: Major
>
> I am importing a dataset with arrow, and then converting variable types. But 
> I got an error message because the `arrow` implementation of `as.integer` 
> can't handle empty strings (which is legal in base R). Is this a bug?
> {code:r}
> #In R
> '' %>% as.integer()
> [1] NA
>  
> #in arrow
> q <- data.table(x=c('','1','2'))
> q %>% write_dataset('q')
> q2 <- 'q' %>% open_dataset %>% mutate(x=as.integer(x)) %>% collect
> Error in `collect()`:
> ! Invalid: Failed to parse string: '' as a scalar of type int32
> Run `rlang::last_error()` to see where the error occurred.
> {code}
> Update: tryed to preprocess x with `ifelse` but it also did not work.
> {code:r}
> 'q' %>% open_dataset %>% mutate(x= ifelse(x=='',NA,x)) %>% 
> mutate(x=as.integer(x)) %>% collect
> Error in `collect()`:
> ! NotImplemented: Function 'if_else' has no kernel matching input types 
> (bool, bool, string)
> Run `rlang::last_error()` to see where the error occurred.
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-18241) [R] as.integer can't handdle empty character cels (ex c(''))

2022-11-04 Thread Lucas Mation (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-18241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17628902#comment-17628902
 ] 

Lucas Mation commented on ARROW-18241:
--

[~npr] , tks. 

1)

"I agree this would be a nice option to have." > sure. But should be the 
default behavior, as that is what happens in base R, no?

2) 

" it should work as you typed it on the development version" > tested. Works. 
Tks.

{~}"On the released version, you can make it work by explicitly making the NA 
be a string so the types match{~}" > tested works.

 

> [R] as.integer can't handdle empty character cels (ex c(''))
> 
>
> Key: ARROW-18241
> URL: https://issues.apache.org/jira/browse/ARROW-18241
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: R
>Reporter: Lucas Mation
>Priority: Major
>
> I am importing a dataset with arrow, and then converting variable types. But 
> I got an error message because the `arrow` implementation of `as.integer` 
> can't handle empty strings (which is legal in base R). Is this a bug?
> {code:r}
> #In R
> '' %>% as.integer()
> [1] NA
>  
> #in arrow
> q <- data.table(x=c('','1','2'))
> q %>% write_dataset('q')
> q2 <- 'q' %>% open_dataset %>% mutate(x=as.integer(x)) %>% collect
> Error in `collect()`:
> ! Invalid: Failed to parse string: '' as a scalar of type int32
> Run `rlang::last_error()` to see where the error occurred.
> {code}
> Update: tryed to preprocess x with `ifelse` but it also did not work.
> {code:r}
> 'q' %>% open_dataset %>% mutate(x= ifelse(x=='',NA,x)) %>% 
> mutate(x=as.integer(x)) %>% collect
> Error in `collect()`:
> ! NotImplemented: Function 'if_else' has no kernel matching input types 
> (bool, bool, string)
> Run `rlang::last_error()` to see where the error occurred.
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-18241) [R] as.integer can't handdle empty character cels (ex c(''))

2022-11-03 Thread Neal Richardson (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-18241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17628622#comment-17628622
 ] 

Neal Richardson commented on ARROW-18241:
-

Two observations:

1. This isn't just about empty strings: cast string to int raises an error on 
any string that doesn't parse. I believe this was raised before but I can't 
seem to find an issue about it (that is, adding an option to return NA for 
values that don't parse instead of erroring). I agree this would be a nice 
option to have.

{code}
> arrow_table(x="a") %>% mutate(x = as.integer(x)) %>% collect()
Error in `compute.arrow_dplyr_query()`:
! Invalid: Failed to parse string: 'a' as a scalar of type int32
{code}

2. The ifelse workaround will work, and it should work as you typed it on the 
development version of the package. On the released version, you can make it 
work by explicitly making the NA be a string so the types match:

{code}
arrow_table(a=c("1", "", "3")) %>% 
  mutate(x = as.integer(ifelse(a == "", NA_character_, a))) %>% 
  collect()

# A tibble: 3 × 2
  a x
   
1 "1"   1
2 ""   NA
3 "3"   3
{code}


> [R] as.integer can't handdle empty character cels (ex c(''))
> 
>
> Key: ARROW-18241
> URL: https://issues.apache.org/jira/browse/ARROW-18241
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: R
>Reporter: Lucas Mation
>Priority: Major
>
> I am importing a dataset with arrow, and then converting variable types. But 
> I got an error message because the `arrow` implementation of `as.integer` 
> can't handle empty strings (which is legal in base R). Is this a bug?
> {code:r}
> #In R
> '' %>% as.integer()
> [1] NA
>  
> #in arrow
> q <- data.table(x=c('','1','2'))
> q %>% write_dataset('q')
> q2 <- 'q' %>% open_dataset %>% mutate(x=as.integer(x)) %>% collect
> Error in `collect()`:
> ! Invalid: Failed to parse string: '' as a scalar of type int32
> Run `rlang::last_error()` to see where the error occurred.
> {code}
> Update: tryed to preprocess x with `ifelse` but it also did not work.
> {code:r}
> paste0(p2,'/q') %>% open_dataset %>% mutate(x= ifelse(x=='',NA,x)) %>% 
> mutate(x=as.integer(x)) %>% collect
> Error in `collect()`:
> ! NotImplemented: Function 'if_else' has no kernel matching input types 
> (bool, bool, string)
> Run `rlang::last_error()` to see where the error occurred.
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)