[jira] [Commented] (ARROW-5718) [R] Add as_record_batch()
[ https://issues.apache.org/jira/browse/ARROW-5718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16872497#comment-16872497 ] Neal Richardson commented on ARROW-5718: Makes sense to me. > [R] Add as_record_batch() > - > > Key: ARROW-5718 > URL: https://issues.apache.org/jira/browse/ARROW-5718 > Project: Apache Arrow > Issue Type: Improvement > Components: R >Reporter: Neal Richardson >Assignee: Romain François >Priority: Minor > Fix For: 0.14.0 > > > ARROW-3814 / > [https://github.com/apache/arrow/pull/3565/files#diff-95ad459e0128bfecf0d72ebd6d6ee8aaR94] > changed the API of `record_batch()` and `arrow::table()` such that you could > no longer pass in a data.frame to the function, not without [massaging it > yourself|https://github.com/apache/arrow/pull/3565/files#diff-09c05d1a6ff41bed094fbccfa76395a6R27]. > That broke sparklyr integration tests with an opaque `cannot infer type from > data` error, and it's unfortunate that there's no longer a direct way to go > from a data.frame to a record batch, which sounds like a common need. > In order to follow best practices (cf. the > [tibble|https://tibble.tidyverse.org/] package, for example), we should (1) > add an {{as_record_batch}} function, which the data.frame method is probably > just {{as_record_batch.data.frame <- function(x) record_batch(!!!x)}}; and > (2) if a user supplies a single, unnamed data.frame as the argument to > {{record_batch()}}, raise an error that says to use {{as_record_batch()}}. We > may later decide that we should automatically call as_record_batch(), but in > case that is too magical and prevents some legitimate use case, let's hold > off for now. It's easier to add magic than remove it. > Once this function exists, sparklyr tests can try to use {{as_record_batch}}, > and if that function doesn't exist, fall back to {{record_batch}} (because > that means it has an older released version of arrow that doesn't have > as_record_batch, so record_batch(df) should work). > cc [~javierluraschi] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-5718) [R] Add as_record_batch()
[ https://issues.apache.org/jira/browse/ARROW-5718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16872287#comment-16872287 ] Romain François commented on ARROW-5718: I think it's fine if we were to auto splice, i.e.: {code:r} record_batch(mtcars) {code} would be the same as {code:r} record_batch(!!!mtcars) {code} because unnamed, this is the direction we'll take in. dplyr too for e.g. summarise and mutate. However, something like : {code:r} record_batch(x = mtcars) {code} will create a struct array, aka a data frame column. > [R] Add as_record_batch() > - > > Key: ARROW-5718 > URL: https://issues.apache.org/jira/browse/ARROW-5718 > Project: Apache Arrow > Issue Type: Improvement > Components: R >Reporter: Neal Richardson >Priority: Minor > Fix For: 0.14.0 > > > ARROW-3814 / > [https://github.com/apache/arrow/pull/3565/files#diff-95ad459e0128bfecf0d72ebd6d6ee8aaR94] > changed the API of `record_batch()` and `arrow::table()` such that you could > no longer pass in a data.frame to the function, not without [massaging it > yourself|https://github.com/apache/arrow/pull/3565/files#diff-09c05d1a6ff41bed094fbccfa76395a6R27]. > That broke sparklyr integration tests with an opaque `cannot infer type from > data` error, and it's unfortunate that there's no longer a direct way to go > from a data.frame to a record batch, which sounds like a common need. > In order to follow best practices (cf. the > [tibble|https://tibble.tidyverse.org/] package, for example), we should (1) > add an {{as_record_batch}} function, which the data.frame method is probably > just {{as_record_batch.data.frame <- function(x) record_batch(!!!x)}}; and > (2) if a user supplies a single, unnamed data.frame as the argument to > {{record_batch()}}, raise an error that says to use {{as_record_batch()}}. We > may later decide that we should automatically call as_record_batch(), but in > case that is too magical and prevents some legitimate use case, let's hold > off for now. It's easier to add magic than remove it. > Once this function exists, sparklyr tests can try to use {{as_record_batch}}, > and if that function doesn't exist, fall back to {{record_batch}} (because > that means it has an older released version of arrow that doesn't have > as_record_batch, so record_batch(df) should work). > cc [~javierluraschi] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-5718) [R] Add as_record_batch()
[ https://issues.apache.org/jira/browse/ARROW-5718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871884#comment-16871884 ] Wes McKinney commented on ARROW-5718: - Is there a link to the related discussion or some other cross-reference? > [R] Add as_record_batch() > - > > Key: ARROW-5718 > URL: https://issues.apache.org/jira/browse/ARROW-5718 > Project: Apache Arrow > Issue Type: Improvement > Components: R >Reporter: Neal Richardson >Priority: Minor > Fix For: 0.14.0 > > > ARROW-3814 / > [https://github.com/apache/arrow/pull/3565/files#diff-95ad459e0128bfecf0d72ebd6d6ee8aaR94] > changed the API of `record_batch()` and `arrow::table()` such that you could > no longer pass in a data.frame to the function, not without [massaging it > yourself|https://github.com/apache/arrow/pull/3565/files#diff-09c05d1a6ff41bed094fbccfa76395a6R27]. > That broke sparklyr integration tests with an opaque `cannot infer type from > data` error, and it's unfortunate that there's no longer a direct way to go > from a data.frame to a record batch, which sounds like a common need. > After some discussion, we resolved that a solution would be to (1) add an > {{as_record_batch}} function, which the data.frame method is probably just > {{as_record_batch.data.frame <- function(x) record_batch(!!!x)}}; and (2) if > a user supplies a single, unnamed data.frame as the argument to > {{record_batch()}}, raise an error that says to use {{as_record_batch()}}. We > may later decide that we should automatically call as_record_batch(), but in > case that is too magical and prevents some legitimate use case, let's hold > off for now. It's easier to add magic than remove it. > Once this function exists, sparklyr tests can try to use {{as_record_batch}}, > and if that function doesn't exist, fall back to {{record_batch}} (because > that means it has an older released version of arrow that doesn't have > as_record_batch, so record_batch(df) should work). > cc [~javierluraschi] -- This message was sent by Atlassian JIRA (v7.6.3#76005)