[jira] [Commented] (ARROW-13117) [R] Retain schema in new Expressions

Neal Richardson (Jira) Mon, 21 Jun 2021 17:05:06 -0700


    [ 
https://issues.apache.org/jira/browse/ARROW-13117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17366908#comment-17366908
 ]


Neal Richardson commented on ARROW-13117:
-----------------------------------------

I think we should distinguish two things: (1) is.* functions that operate on 
the data, as in mutate(); and (2) is.* functions used in dplyr's column 
predicate across() et al. methods. {{is.double(x + y)}} doesn't make much sense 
to me as case (1) because it evaluates to a constant. It might make sense in 
the context of a Union type, where each row could be a different type. Either 
way, if this is something we want to support, it sounds like it should be a C++ 
compute kernel (a unary scalar one), and it sounds low priority (though maybe 
I'm missing something). 

In case (2), I would think we could handle this in a more targeted way inside 
of where() or across() etc. 

Re: ARROW-12055, I see where you're going with it but it feels like we're 
hacking on ourselves to support that, and we shouldn't have to do that. I'd 
personally prefer to add is_nan methods for all other types in C++ (always 
returning false). 

My pushback comes from various past experiences of trying to hack together 
interfaces that seemingly need to track their state, and trying to get certain 
APIs to conform to expectations from R. Sometimes that's the right choice, but 
it's a slippery slope and we should spend some extra time looking for a cleaner 
solution before going down it.

> [R] Retain schema in new Expressions
> ------------------------------------
>
>                 Key: ARROW-13117
>                 URL: https://issues.apache.org/jira/browse/ARROW-13117
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: R
>            Reporter: Ian Cook
>            Assignee: Ian Cook
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 5.0.0
>
>          Time Spent: 50m
>  Remaining Estimate: 0h
>
> When a new Expression is created, {{schema}} should be retained from the 
> expression(s) it was created from. That way, the {{type()}} and {{type_id()}} 
> methods of the new Expression will work. For example, currently this happens:
> {code:r}
> > x <- Expression$field_ref("x")
> > x$schema <- Schema$create(x = int32())
> > 
> > y <- Expression$field_ref("y")
> > y$schema <- Schema$create(y = int32())
> > 
> > Expression$create("add_checked", x, y)$type()
> Error: !is.null(schema) is not TRUE {code}
> This is what we want to happen:
> {code:r}
> > Expression$create("add_checked", x, y)$type()
> Int32
> int32
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-13117) [R] Retain schema in new Expressions

Reply via email to