[jira] [Created] (ARROW-16833) [R] how to enforce type conversion in open_dataset()
Zsolt Kegyes-Brassai created ARROW-16833: Summary: [R] how to enforce type conversion in open_dataset() Key: ARROW-16833 URL: https://issues.apache.org/jira/browse/ARROW-16833 Project: Apache Arrow Issue Type: Improvement Affects Versions: 8.0.0 Reporter: Zsolt Kegyes-Brassai Here is a small example: {{}} {code:java} library(arrow) df_numbers <- tibble::tibble(number = c(1,2,3,"error", 4, 5, NA, 6)) str(df_numbers) #> tibble [8 x 1] (S3: tbl_df/tbl/data.frame) #> $ number: chr [1:8] "1" "2" "3" "error" ... write_parquet(df_numbers, "numbers.parquet") open_dataset("numbers.parquet") #> FileSystemDataset with 1 Parquet file #> number: string open_dataset("numbers.parquet", schema(number = int8())) |> dplyr::collect() #> Error in `dplyr::collect()`: #> ! Invalid: Failed to parse string: 'error' as a scalar of type int8 {code} The expected result is having an input column of integers; where the non-integer values are converted to NAs. How this type conversion can be enforced using schema definition in in the {{{}open_dataset(){}}}? Rationale: I would like to include this in a code chunk which imports a csv dataset and saves to parquet dataset (open_dataset -> write_dataset); where the type conversion based on a preset schema would be done at the same time. And all these steps without loading all the data in memory. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[GitHub] [arrow-adbc] lidavidm opened a new pull request, #17: Add skeleton of Python bindings
lidavidm opened a new pull request, #17: URL: https://github.com/apache/arrow-adbc/pull/17 These bindings are structured as a low-level module that mostly mirrors the ADBC API, and a TBD high-level module that will implement PEP 249 (except with Turbodbc-style extensions). This PR is just to get the module set up, with features in future PRs. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (ARROW-16832) [C++] Remove cpp/src/arrow/dbi/hiveserver2
Kouhei Sutou created ARROW-16832: Summary: [C++] Remove cpp/src/arrow/dbi/hiveserver2 Key: ARROW-16832 URL: https://issues.apache.org/jira/browse/ARROW-16832 Project: Apache Arrow Issue Type: Improvement Components: C++ Reporter: Kouhei Sutou Assignee: Kouhei Sutou It's not maintained. No objection on the mailing list: https://lists.apache.org/thread/70qv1q9krx7ztk35tzxq8jp11vq5b5zt -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (ARROW-16831) [Go] ipc.Reader should panic for invalid string array offsets
Chris Hoff created ARROW-16831: -- Summary: [Go] ipc.Reader should panic for invalid string array offsets Key: ARROW-16831 URL: https://issues.apache.org/jira/browse/ARROW-16831 Project: Apache Arrow Issue Type: Bug Components: Go Affects Versions: 8.0.0 Reporter: Chris Hoff Assignee: Chris Hoff ipc.Reader will silently accept string columns with invalid offsets. This results in a panic later when attempting to access the table or write it with ipc.Writer. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[GitHub] [arrow-adbc] lidavidm merged pull request #16: Fix Windows build
lidavidm merged PR #16: URL: https://github.com/apache/arrow-adbc/pull/16 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow-adbc] lidavidm merged pull request #15: Fix MacOS build
lidavidm merged PR #15: URL: https://github.com/apache/arrow-adbc/pull/15 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow-adbc] lidavidm opened a new pull request, #16: Fix Windows build
lidavidm opened a new pull request, #16: URL: https://github.com/apache/arrow-adbc/pull/16 On top of #15 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (ARROW-16830) [Website] Mention Ursa Labs Zulip and ASF Slack on Community page
Ian Cook created ARROW-16830: Summary: [Website] Mention Ursa Labs Zulip and ASF Slack on Community page Key: ARROW-16830 URL: https://issues.apache.org/jira/browse/ARROW-16830 Project: Apache Arrow Issue Type: Wish Components: Website Reporter: Ian Cook Assignee: Ian Cook The Arrow developer community uses the Ursa Labs Zulip instance and some Arrow-related channels in the ASF Slack instance for synchronous communication. We should document this on the of [Community page|https://arrow.apache.org/community/] of the Arrow website. Some considerations: * Can we provide public links that people can use to join both Zulip and Slack? (From what I understand, yes for both.) * The ASF Slack has very low message volume. Is there any harm done by directing people toward a very quiet Slack channel? (IMO it is probably not a problem as long as we set expectations appropriately.) * By making these channels more accessible, will we increase misuse of these channels? For example will people start reporting issues here instead of through the appropriate channels? (IMO it will probably not create a significant problem as long as we communicate what the appropriate use of these channels is.) * How committed are we in the long term to use of these Zulip and Slack instances? (This is unclear to me, but even if we are not committed and might soon abandon these and move to different synchronous communication channels, this ticket and the associated PR should help to motivate more conversation about this.) -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (ARROW-16829) [R] Add link to new contributors guide to developer guide
Nicola Crane created ARROW-16829: Summary: [R] Add link to new contributors guide to developer guide Key: ARROW-16829 URL: https://issues.apache.org/jira/browse/ARROW-16829 Project: Apache Arrow Issue Type: Improvement Components: R Reporter: Nicola Crane Assignee: Nicola Crane -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (ARROW-16828) [R][Packaging] Turn on all compression libs for binaries
Will Jones created ARROW-16828: -- Summary: [R][Packaging] Turn on all compression libs for binaries Key: ARROW-16828 URL: https://issues.apache.org/jira/browse/ARROW-16828 Project: Apache Arrow Issue Type: Improvement Components: Packaging, R Affects Versions: 8.0.0 Reporter: Will Jones Fix For: 9.0.0 We notably don't ship brotli for MacOS. -- This message was sent by Atlassian Jira (v8.20.7#820007)