[jira] [Commented] (ARROW-5718) [R] Add as_record_batch()

2019-06-25 Thread Neal Richardson (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16872497#comment-16872497
 ] 

Neal Richardson commented on ARROW-5718:


Makes sense to me.

> [R] Add as_record_batch()
> -
>
> Key: ARROW-5718
> URL: https://issues.apache.org/jira/browse/ARROW-5718
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Reporter: Neal Richardson
>Assignee: Romain François
>Priority: Minor
> Fix For: 0.14.0
>
>
> ARROW-3814 / 
> [https://github.com/apache/arrow/pull/3565/files#diff-95ad459e0128bfecf0d72ebd6d6ee8aaR94]
>  changed the API of `record_batch()` and `arrow::table()` such that you could 
> no longer pass in a data.frame to the function, not without [massaging it 
> yourself|https://github.com/apache/arrow/pull/3565/files#diff-09c05d1a6ff41bed094fbccfa76395a6R27].
>  That broke sparklyr integration tests with an opaque `cannot infer type from 
> data` error, and it's unfortunate that there's no longer a direct way to go 
> from a data.frame to a record batch, which sounds like a common need.
> In order to follow best practices (cf. the 
> [tibble|https://tibble.tidyverse.org/] package, for example), we should (1) 
> add an {{as_record_batch}} function, which the data.frame method is probably 
> just {{as_record_batch.data.frame <- function(x) record_batch(!!!x)}}; and 
> (2) if a user supplies a single, unnamed data.frame as the argument to 
> {{record_batch()}}, raise an error that says to use {{as_record_batch()}}. We 
> may later decide that we should automatically call as_record_batch(), but in 
> case that is too magical and prevents some legitimate use case, let's hold 
> off for now. It's easier to add magic than remove it.
> Once this function exists, sparklyr tests can try to use {{as_record_batch}}, 
> and if that function doesn't exist, fall back to {{record_batch}} (because 
> that means it has an older released version of arrow that doesn't have 
> as_record_batch, so record_batch(df) should work).
> cc [~javierluraschi]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-5718) [R] Add as_record_batch()

2019-06-25 Thread JIRA


[ 
https://issues.apache.org/jira/browse/ARROW-5718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16872287#comment-16872287
 ] 

Romain François commented on ARROW-5718:


I think it's fine if we were to auto splice, i.e.: 

{code:r}
record_batch(mtcars)
{code}

would be the same as 

{code:r}
record_batch(!!!mtcars)
{code}

because unnamed, this is the direction we'll take in. dplyr too for e.g. 
summarise and mutate. 

However, something like : 

{code:r}
record_batch(x = mtcars)
{code}

will create a struct array, aka a data frame column. 


> [R] Add as_record_batch()
> -
>
> Key: ARROW-5718
> URL: https://issues.apache.org/jira/browse/ARROW-5718
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Reporter: Neal Richardson
>Priority: Minor
> Fix For: 0.14.0
>
>
> ARROW-3814 / 
> [https://github.com/apache/arrow/pull/3565/files#diff-95ad459e0128bfecf0d72ebd6d6ee8aaR94]
>  changed the API of `record_batch()` and `arrow::table()` such that you could 
> no longer pass in a data.frame to the function, not without [massaging it 
> yourself|https://github.com/apache/arrow/pull/3565/files#diff-09c05d1a6ff41bed094fbccfa76395a6R27].
>  That broke sparklyr integration tests with an opaque `cannot infer type from 
> data` error, and it's unfortunate that there's no longer a direct way to go 
> from a data.frame to a record batch, which sounds like a common need.
> In order to follow best practices (cf. the 
> [tibble|https://tibble.tidyverse.org/] package, for example), we should (1) 
> add an {{as_record_batch}} function, which the data.frame method is probably 
> just {{as_record_batch.data.frame <- function(x) record_batch(!!!x)}}; and 
> (2) if a user supplies a single, unnamed data.frame as the argument to 
> {{record_batch()}}, raise an error that says to use {{as_record_batch()}}. We 
> may later decide that we should automatically call as_record_batch(), but in 
> case that is too magical and prevents some legitimate use case, let's hold 
> off for now. It's easier to add magic than remove it.
> Once this function exists, sparklyr tests can try to use {{as_record_batch}}, 
> and if that function doesn't exist, fall back to {{record_batch}} (because 
> that means it has an older released version of arrow that doesn't have 
> as_record_batch, so record_batch(df) should work).
> cc [~javierluraschi]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-5718) [R] Add as_record_batch()

2019-06-24 Thread Wes McKinney (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871884#comment-16871884
 ] 

Wes McKinney commented on ARROW-5718:
-

Is there a link to the related discussion or some other cross-reference?

> [R] Add as_record_batch()
> -
>
> Key: ARROW-5718
> URL: https://issues.apache.org/jira/browse/ARROW-5718
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Reporter: Neal Richardson
>Priority: Minor
> Fix For: 0.14.0
>
>
> ARROW-3814 / 
> [https://github.com/apache/arrow/pull/3565/files#diff-95ad459e0128bfecf0d72ebd6d6ee8aaR94]
>  changed the API of `record_batch()` and `arrow::table()` such that you could 
> no longer pass in a data.frame to the function, not without [massaging it 
> yourself|https://github.com/apache/arrow/pull/3565/files#diff-09c05d1a6ff41bed094fbccfa76395a6R27].
>  That broke sparklyr integration tests with an opaque `cannot infer type from 
> data` error, and it's unfortunate that there's no longer a direct way to go 
> from a data.frame to a record batch, which sounds like a common need.
> After some discussion, we resolved that a solution would be to (1) add an 
> {{as_record_batch}} function, which the data.frame method is probably just 
> {{as_record_batch.data.frame <- function(x) record_batch(!!!x)}}; and (2) if 
> a user supplies a single, unnamed data.frame as the argument to 
> {{record_batch()}}, raise an error that says to use {{as_record_batch()}}. We 
> may later decide that we should automatically call as_record_batch(), but in 
> case that is too magical and prevents some legitimate use case, let's hold 
> off for now. It's easier to add magic than remove it.
> Once this function exists, sparklyr tests can try to use {{as_record_batch}}, 
> and if that function doesn't exist, fall back to {{record_batch}} (because 
> that means it has an older released version of arrow that doesn't have 
> as_record_batch, so record_batch(df) should work).
> cc [~javierluraschi]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)