[jira] [Created] (ARROW-16833) [R] how to enforce type conversion in open_dataset()

2022-06-14 Thread Zsolt Kegyes-Brassai (Jira)
Zsolt Kegyes-Brassai created ARROW-16833:


 Summary: [R] how to enforce type conversion in open_dataset()
 Key: ARROW-16833
 URL: https://issues.apache.org/jira/browse/ARROW-16833
 Project: Apache Arrow
  Issue Type: Improvement
Affects Versions: 8.0.0
Reporter: Zsolt Kegyes-Brassai


Here is a small example:

{{}}
{code:java}
library(arrow)
df_numbers <- tibble::tibble(number = c(1,2,3,"error", 4, 5, NA, 6))
str(df_numbers)
#> tibble [8 x 1] (S3: tbl_df/tbl/data.frame)
#>  $ number: chr [1:8] "1" "2" "3" "error" ...
write_parquet(df_numbers, "numbers.parquet")
open_dataset("numbers.parquet") 
#> FileSystemDataset with 1 Parquet file
#> number: string
open_dataset("numbers.parquet", schema(number = int8())) |> dplyr::collect()
#> Error in `dplyr::collect()`:
#> ! Invalid: Failed to parse string: 'error' as a scalar of type int8

{code}
The expected result is having an input column of integers; where the 
non-integer values are converted to NAs.

How this type conversion can be enforced using schema definition in in the  
{{{}open_dataset(){}}}? 

Rationale: I would like to include this in a code chunk  which imports a csv 
dataset and saves to parquet dataset (open_dataset -> write_dataset); where the 
type conversion based on a preset schema would be done at the same time.  And 
all these steps without loading all the data in memory.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[GitHub] [arrow-adbc] lidavidm opened a new pull request, #17: Add skeleton of Python bindings

2022-06-14 Thread GitBox


lidavidm opened a new pull request, #17:
URL: https://github.com/apache/arrow-adbc/pull/17

   These bindings are structured as a low-level module that mostly
   mirrors the ADBC API, and a TBD high-level module that will
   implement PEP 249 (except with Turbodbc-style extensions).
   
   This PR is just to get the module set up, with features in future PRs.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (ARROW-16832) [C++] Remove cpp/src/arrow/dbi/hiveserver2

2022-06-14 Thread Kouhei Sutou (Jira)
Kouhei Sutou created ARROW-16832:


 Summary: [C++] Remove cpp/src/arrow/dbi/hiveserver2
 Key: ARROW-16832
 URL: https://issues.apache.org/jira/browse/ARROW-16832
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Kouhei Sutou
Assignee: Kouhei Sutou


It's not maintained.

No objection on the mailing list: 
https://lists.apache.org/thread/70qv1q9krx7ztk35tzxq8jp11vq5b5zt



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (ARROW-16831) [Go] ipc.Reader should panic for invalid string array offsets

2022-06-14 Thread Chris Hoff (Jira)
Chris Hoff created ARROW-16831:
--

 Summary: [Go] ipc.Reader should panic for invalid string array 
offsets
 Key: ARROW-16831
 URL: https://issues.apache.org/jira/browse/ARROW-16831
 Project: Apache Arrow
  Issue Type: Bug
  Components: Go
Affects Versions: 8.0.0
Reporter: Chris Hoff
Assignee: Chris Hoff


ipc.Reader will silently accept string columns with invalid offsets. This 
results in a panic later when attempting to access the table or write it with 
ipc.Writer.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[GitHub] [arrow-adbc] lidavidm merged pull request #16: Fix Windows build

2022-06-14 Thread GitBox


lidavidm merged PR #16:
URL: https://github.com/apache/arrow-adbc/pull/16


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [arrow-adbc] lidavidm merged pull request #15: Fix MacOS build

2022-06-14 Thread GitBox


lidavidm merged PR #15:
URL: https://github.com/apache/arrow-adbc/pull/15


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [arrow-adbc] lidavidm opened a new pull request, #16: Fix Windows build

2022-06-14 Thread GitBox


lidavidm opened a new pull request, #16:
URL: https://github.com/apache/arrow-adbc/pull/16

   On top of #15


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (ARROW-16830) [Website] Mention Ursa Labs Zulip and ASF Slack on Community page

2022-06-14 Thread Ian Cook (Jira)
Ian Cook created ARROW-16830:


 Summary: [Website] Mention Ursa Labs Zulip and ASF Slack on 
Community page
 Key: ARROW-16830
 URL: https://issues.apache.org/jira/browse/ARROW-16830
 Project: Apache Arrow
  Issue Type: Wish
  Components: Website
Reporter: Ian Cook
Assignee: Ian Cook


The Arrow developer community uses the Ursa Labs Zulip instance and some 
Arrow-related channels in the ASF Slack instance for synchronous communication. 
We should document this on the of [Community 
page|https://arrow.apache.org/community/] of the Arrow website.

Some considerations:
 * Can we provide public links that people can use to join both Zulip and 
Slack? (From what I understand, yes for both.)
 * The ASF Slack has very low message volume. Is there any harm done by 
directing people toward a very quiet Slack channel? (IMO it is probably not a 
problem as long as we set expectations appropriately.)
 * By making these channels more accessible, will we increase misuse of these 
channels? For example will people start reporting issues here instead of 
through the appropriate channels? (IMO it will probably not create a 
significant problem as long as we communicate what the appropriate use of these 
channels is.) 
 * How committed are we in the long term to use of these Zulip and Slack 
instances? (This is unclear to me, but even if we are not committed and might 
soon abandon these and move to different synchronous communication channels, 
this ticket and the associated PR should help to motivate more conversation 
about this.)



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (ARROW-16829) [R] Add link to new contributors guide to developer guide

2022-06-14 Thread Nicola Crane (Jira)
Nicola Crane created ARROW-16829:


 Summary: [R] Add link to new contributors guide to developer guide
 Key: ARROW-16829
 URL: https://issues.apache.org/jira/browse/ARROW-16829
 Project: Apache Arrow
  Issue Type: Improvement
  Components: R
Reporter: Nicola Crane
Assignee: Nicola Crane






--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (ARROW-16828) [R][Packaging] Turn on all compression libs for binaries

2022-06-14 Thread Will Jones (Jira)
Will Jones created ARROW-16828:
--

 Summary: [R][Packaging] Turn on all compression libs for binaries
 Key: ARROW-16828
 URL: https://issues.apache.org/jira/browse/ARROW-16828
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Packaging, R
Affects Versions: 8.0.0
Reporter: Will Jones
 Fix For: 9.0.0


We notably don't ship brotli for MacOS. 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)