[jira] [Created] (ARROW-18378) MIGRATION: Disable issue reporting in ASF Jira
Todd Farmer created ARROW-18378: --- Summary: MIGRATION: Disable issue reporting in ASF Jira Key: ARROW-18378 URL: https://issues.apache.org/jira/browse/ARROW-18378 Project: Apache Arrow Issue Type: Task Reporter: Todd Farmer ARROW-18364 enabled issue reporting for Apache Arrow in GitHub issues. Even though existing Jira issues have not yet been migrated and are still being worked in the Jira system, we should assess disabling creation of new issues in ASF Jira, and instead pointing users to GitHub issues. This may benefit the project by reducing the need to monitor inflow in two discrete systems. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-18377) MIGRATION: Automate component labels from issue form content
Todd Farmer created ARROW-18377: --- Summary: MIGRATION: Automate component labels from issue form content Key: ARROW-18377 URL: https://issues.apache.org/jira/browse/ARROW-18377 Project: Apache Arrow Issue Type: Task Reporter: Todd Farmer ARROW-18364 added the ability to report issues in GitHub, and includes GitHub issue templates with a drop-down component(s) selector. These form elements drive resulting issue markdown only, and cannot dynamically drive issue labels. This requires GitHub actions, which also have a few limitations. First, the issue form does not produce any structured data, it only produces the issue description markdown, so a parser is required. Second, ASF restricts GitHub actions to a selection of approved actions. It is likely that while community actions exist to generate structured data from issue forms, the Apache Arrow project will need to write its own parser and label application action. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-18376) MIGRATION: Add component labels to GitHub
Todd Farmer created ARROW-18376: --- Summary: MIGRATION: Add component labels to GitHub Key: ARROW-18376 URL: https://issues.apache.org/jira/browse/ARROW-18376 Project: Apache Arrow Issue Type: Task Reporter: Todd Farmer Similar to ARROW-18375, component labels have been established based on existing component values defined in ASF Jira. The following labels are needed: * Component: Archery * Component: Benchmarking * Component: C * Component: C# * Component: C++ * Component: C++ - Gandiva * Component: C++ - Plasma * Component: Continuous Integration * Component: Dart * Component: Developer Tools * Component: Documentation * Component: FlightRPC * Component: Format * Component: GLib * Component: Go * Component: GPU * Component: Integration * Component: Java * Component: JavaScript * Component: MATLAB * Component: Packaging * Component: Parquet * Component: Python * Component: R * Component: Ruby * Component: Swift * Component: Website * Component: Other -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-18375) MIGRATION: Enable GitHub issue type labels
Todd Farmer created ARROW-18375: --- Summary: MIGRATION: Enable GitHub issue type labels Key: ARROW-18375 URL: https://issues.apache.org/jira/browse/ARROW-18375 Project: Apache Arrow Issue Type: Task Reporter: Todd Farmer As part of enabling GitHub issue reporting, the following labels have been defined and need to be added to the repository label options. Without these labels added, [new issues|https://github.com/apache/arrow/issues/14692] do not get the issue template-defined issue type labels set properly. Labels: * Type: bug * Type: enhancement * Type: usage -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-18374) [Go][CI][Benchmarks] Fix Go Bench Script after conbench change
Matthew Topol created ARROW-18374: - Summary: [Go][CI][Benchmarks] Fix Go Bench Script after conbench change Key: ARROW-18374 URL: https://issues.apache.org/jira/browse/ARROW-18374 Project: Apache Arrow Issue Type: Bug Components: Benchmarking, Continuous Integration, Go Reporter: Matthew Topol Assignee: Matthew Topol Change [https://github.com/conbench/conbench/pull/417/files#] requires now putting an explicit {{github=None}} as an argument to {{BenchmarkResult}} to have it get the github info from the locally cloned repo. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-18373) MIGRATION: Enable multiple component selection in issue templates
Todd Farmer created ARROW-18373: --- Summary: MIGRATION: Enable multiple component selection in issue templates Key: ARROW-18373 URL: https://issues.apache.org/jira/browse/ARROW-18373 Project: Apache Arrow Issue Type: Task Reporter: Todd Farmer Per comments in [this merged PR|https://github.com/apache/arrow/pull/14675], we would like to enable selection of multiple components when reporting issues via GitHub issues. Additionally, we may want to add the needed Apache license to the issue templates and remove the exclusion rules from rat_exclude_files.txt. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-18372) [R] "Error in `collect()`: ! Invalid: negative malloc size" after large computation returning one cell
Lucas Mation created ARROW-18372: Summary: [R] "Error in `collect()`: ! Invalid: negative malloc size" after large computation returning one cell Key: ARROW-18372 URL: https://issues.apache.org/jira/browse/ARROW-18372 Project: Apache Arrow Issue Type: Bug Components: R Affects Versions: 10.0.0 Reporter: Lucas Mation I have a large parquet file 900 million rows , 40cols parquet file, subdivided into folders for each year. I was trying to calculate how many unique combinations of id1+id2+id3+id4 there are in the dataset. Notice that the "collected" dataset is supposed to be only one row and one cel, containing the count (I've confirmed this by subseting the dataset ("%>% head(10^6)" ) before computing the count, and it works). That is why the error below is so weird ``` fa <- 'myparteq folder' #huge va <- open_dataset(fa) tic() d <- va %>% head(10^6) %>% count(id1,id2,id3,id4) %>% count %>% collect toc() Error in `collect()`: ! Invalid: negative malloc size Run `rlang::last_error()` to see where the error occurred. > rlang::last_error() Error in `collect()`: ! Invalid: negative malloc size --- Backtrace: 1. ... %>% collect 3. arrow:::collect.arrow_dplyr_query(.) Run `rlang::last_trace()` to see the full context. > rlang::last_trace() Error in `collect()`: ! Invalid: negative malloc size --- Backtrace: x 1. +-... %>% collect 2. +-dplyr::collect(.) 3. \-arrow:::collect.arrow_dplyr_query(.) 4. \-base::tryCatch(...) 5. \-base (local) tryCatchList(expr, classes, parentenv, handlers) 6. \-base (local) tryCatchOne(expr, names, parentenv, handlers[[1L]]) 7. \-value[[3L]](cond) 8. \-arrow:::augment_io_error_msg(e, call, schema = x$.data$schema) 9. \-rlang::abort(msg, call = call) ``` I am running this on a windows server, 512Gb of RAM. sessionInfo() R version 4.2.1 (2022-06-23 ucrt) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows Server 2012 R2 x64 (build 9600) Matrix products: default locale: [1] LC_COLLATE=Portuguese_Brazil.1252 LC_CTYPE=Portuguese_Brazil.1252 LC_MONETARY=Portuguese_Brazil.1252 LC_NUMERIC=C [5] LC_TIME=Portuguese_Brazil.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] arrow_10.0.0 data.table_1.14.4 forcats_0.5.2 dplyr_1.0.10 purrr_0.3.5 readr_2.1.3 tidyr_1.2.1 tibble_3.1.8 [9] ggplot2_3.3.6 tidyverse_1.3.2 gt_0.7.0 xtable_1.8-4 ggthemes_4.2.4 collapse_1.8.6 pryr_0.1.5 janitor_2.1.0 [17] tictoc_1.1 lubridate_1.8.0 stringr_1.4.1 readxl_1.4.1 loaded via a namespace (and not attached): [1] Rcpp_1.0.9 assertthat_0.2.1 digest_0.6.30 utf8_1.2.2 R6_2.5.1 cellranger_1.1.0 backports_1.4.1 [8] reprex_2.0.2 httr_1.4.4 pillar_1.8.1 rlang_1.0.6 googlesheets4_1.0.1 rstudioapi_0.14 googledrive_2.0.0 [15] bit_4.0.4 munsell_0.5.0 broom_1.0.1 compiler_4.2.1 modelr_0.1.9 pkgconfig_2.0.3 htmltools_0.5.3 [22] tidyselect_1.2.0 codetools_0.2-18 fansi_1.0.3 crayon_1.5.2 tzdb_0.3.0 dbplyr_2.2.1 withr_2.5.0 [29] grid_4.2.1 jsonlite_1.8.3 gtable_0.3.1 lifecycle_1.0.3 DBI_1.1.3 magrittr_2.0.3 scales_1.2.1 [36] cli_3.4.1 stringi_1.7.8 fs_1.5.2 snakecase_0.11.0 xml2_1.3.3 ellipsis_0.3.2 generics_0.1.3 [43] vctrs_0.5.0 tools_4.2.1 bit64_4.0.5 glue_1.6.2 hms_1.1.2 parallel_4.2.1 fastmap_1.1.0 [50] colorspace_2.0-3 gargle_1.2.1 rvest_1.0.3 haven_2.5.1 arrow_info() Arrow package version: 10.0.0 Capabilities: dataset TRUE substrait FALSE parquet TRUE json TRUE s3 TRUE gcs TRUE utf8proc TRUE re2 TRUE snappy TRUE gzip TRUE brotli TRUE zstd TRUE lz4 TRUE lz4_frame TRUE lzo FALSE bz2 TRUE jemalloc FALSE mimalloc TRUE Arrow options(): arrow.use_threads FALSE Memory: Allocator mimalloc Current 74.82 Gb Max 97.75 Gb Runtime: SIMD Level avx2 Detected SIMD Level avx2 Build: C++ Library Version 10.0.0 C++ Compiler GNU C++ Compiler Version 10.3.0 Git ID aa7118b6e5f49b354fa8a93d9cf363c9ebe9a3f0 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-18371) [C++] Expose *FromJSON helpers
Rok Mihevc created ARROW-18371: -- Summary: [C++] Expose *FromJSON helpers Key: ARROW-18371 URL: https://issues.apache.org/jira/browse/ARROW-18371 Project: Apache Arrow Issue Type: New Feature Components: C++ Reporter: Rok Mihevc {Array,{{Exec,Record}Batch}FromJSON helper functions would be useful when testing in projects that use Arrow. BatchesWithSchema and MakeBasicBatches could be considered as well. -- This message was sent by Atlassian Jira (v8.20.10#820010)