[jira] [Created] (ARROW-7553) Unable to free all memory allocated when creating a GArrowSchema
Mike Pontillo created ARROW-7553: Summary: Unable to free all memory allocated when creating a GArrowSchema Key: ARROW-7553 URL: https://issues.apache.org/jira/browse/ARROW-7553 Project: Apache Arrow Issue Type: Bug Components: C, GLib Affects Versions: 0.15.1 Environment: Ubuntu Focal with Linuxbrew packages installed Reporter: Mike Pontillo I was not able to run the test code in [this gist|https://gist.github.com/mpontillo/cfbe7ebbf0b0f2acf31063512439bde7] (and corresponding [GitHub issue|https://github.com/apache/arrow/issues/6164]) without encountering a memory leak via {{garrow_schema_get_type_once}} (according to {{valgrind}}): {code:java} 24 bytes in 1 blocks are possibly lost in loss record 192 of 486 at 0x4A391AF: realloc (vg_replace_malloc.c:836) by 0x53B5D97: g_realloc (gmem.c:164) by 0x5128108: type_node_any_new_W (gtype.c:502) by 0x512D9EC: g_type_register_static (gtype.c:2766) by 0x512DDB4: g_type_register_static_simple (gtype.c:2719) by 0x4CB3CF8: garrow_schema_get_type_once (in /home/linuxbrew/.linuxbrew/Cellar/apache-arrow-glib/0.15.1_1/lib/libarrow-glib.so.15.0.1) by 0x4CB3E40: garrow_schema_get_type (in /home/linuxbrew/.linuxbrew/Cellar/apache-arrow-glib/0.15.1_1/lib/libarrow-glib.so.15.0.1) by 0x4CB4098: garrow_schema_new_raw(std::shared_ptr*) (in /home/linuxbrew/.linuxbrew/Cellar/apache-arrow-glib/0.15.1_1/lib/libarrow-glib.so.15.0.1) by 0x4CB5860: garrow_schema_new (in /home/linuxbrew/.linuxbrew/Cellar/apache-arrow-glib/0.15.1_1/lib/libarrow-glib.so.15.0.1) by 0x401219: main (main.c:13) {code} I'm not positive if this is an actual bug in {{apache-arrow-glib}} or if my test code is incorrect, but I thought I would file an issue here (in addition to my question posted as a GitHub issue) for greater visibility. It looks like the {{realloc}} call actually happens via {{glib}}; is this a one-time type registration that isn't expected to be cleaned up until the process exits? (I couldn't find a reference to {{garrow_schema_get_type_once}} in the {{arrow}} code.) I wanted to be certain, since I expect to use {{arrow-glib}} within a long-running process. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-7552) [C++] TestSlowInputStream is flaky
Neal Richardson created ARROW-7552: -- Summary: [C++] TestSlowInputStream is flaky Key: ARROW-7552 URL: https://issues.apache.org/jira/browse/ARROW-7552 Project: Apache Arrow Issue Type: Bug Components: C++ Reporter: Neal Richardson Fix For: 0.16.0 See https://github.com/apache/arrow/pull/6160/checks?check_run_id=384146741#step:5:1556 for example {code} [ RUN ] TestSlowInputStream.Basics /arrow/cpp/src/arrow/io/memory_test.cc:308: Failure Expected: (dt) < (latency * 3), actual: 4.96068 vs 1.8 [ FAILED ] TestSlowInputStream.Basics (4961 ms) [--] 1 test from TestSlowInputStream (4961 ms total) {code} Tests that rely on timing are pretty tough to do on public CI. We should consider moving this somewhere that doesn't run on CI. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-7551) [C++][Flight] Flight test on macOS periodically fails on master
Neal Richardson created ARROW-7551: -- Summary: [C++][Flight] Flight test on macOS periodically fails on master Key: ARROW-7551 URL: https://issues.apache.org/jira/browse/ARROW-7551 Project: Apache Arrow Issue Type: Bug Components: C++, FlightRPC Reporter: Neal Richardson Fix For: 0.16.0 See [https://github.com/apache/arrow/runs/380443548#step:5:179] for example. {code} 64/96 Test #64: arrow-flight-test .***Failed0.46 sec Running arrow-flight-test, redirecting output into /Users/runner/runners/2.163.1/work/arrow/arrow/build/cpp/build/test-logs/arrow-flight-test.txt (attempt 1/1) Running main() from /Users/runner/runners/2.163.1/work/arrow/arrow/build/cpp/googletest_ep-prefix/src/googletest_ep/googletest/src/gtest_main.cc [==] Running 42 tests from 11 test cases. [--] Global test environment set-up. [--] 2 tests from TestFlightDescriptor [ RUN ] TestFlightDescriptor.Basics [ OK ] TestFlightDescriptor.Basics (0 ms) [ RUN ] TestFlightDescriptor.ToFromProto [ OK ] TestFlightDescriptor.ToFromProto (0 ms) [--] 2 tests from TestFlightDescriptor (0 ms total) [--] 6 tests from TestFlight [ RUN ] TestFlight.UnknownLocationScheme [ OK ] TestFlight.UnknownLocationScheme (0 ms) [ RUN ] TestFlight.ConnectUri Server running with pid 15977 /Users/runner/runners/2.163.1/work/arrow/arrow/cpp/build-support/run-test.sh: line 97: 15971 Segmentation fault: 11 $TEST_EXECUTABLE "$@" 2>&1 15972 Done| $ROOT/build-support/asan_symbolize.py 15973 Done| ${CXXFILT:-c++filt} 15974 Done| $ROOT/build-support/stacktrace_addr2line.pl $TEST_EXECUTABLE 15975 Done| $pipe_cmd 2>&1 15976 Done| tee $LOGFILE ~/runners/2.163.1/work/arrow/arrow/build/cpp/src/arrow/flight {code} It's not failing every time but I'm seeing it fail frequently. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-7550) [R][CI] Run donttest examples in CI
Neal Richardson created ARROW-7550: -- Summary: [R][CI] Run donttest examples in CI Key: ARROW-7550 URL: https://issues.apache.org/jira/browse/ARROW-7550 Project: Apache Arrow Issue Type: Improvement Components: R Reporter: Neal Richardson Assignee: Neal Richardson Fix For: 0.16.0 We wrap the examples in {{\donttest{}}} so that they aren't run on CRAN, where the Arrow C++ library isn't present and thus the examples would error. But locally and on our CI, where we *are* testing with the C++ library, we should run the examples to ensure that they're valid. (As ARROW-7543 revealed, they weren't.) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-7549) [Java] Reorganize Flight modules to keep top level clean/organized
Jacques Nadeau created ARROW-7549: - Summary: [Java] Reorganize Flight modules to keep top level clean/organized Key: ARROW-7549 URL: https://issues.apache.org/jira/browse/ARROW-7549 Project: Apache Arrow Issue Type: Task Components: Java Reporter: Jacques Nadeau Lets create a flight parent module and then create the following below: flight-core (existing flight module) flight-grpc (existing flight-grpc module) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-7548) [Ruby] Can't install Ruby 2.7 - Can't update Ruby 2.6 Windows
Dominic Sisneros created ARROW-7548: --- Summary: [Ruby] Can't install Ruby 2.7 - Can't update Ruby 2.6 Windows Key: ARROW-7548 URL: https://issues.apache.org/jira/browse/ARROW-7548 Project: Apache Arrow Issue Type: Bug Components: Ruby Reporter: Dominic Sisneros Can't install in Ruby 2.7 on Windows C:\Users\Dominic E Sisneros\source\repos\ruby\try_arrow>ruby -v ruby 2.7.0p0 (2019-12-25 revision 647ee6f091) [x64-mingw32] C:\Users\Dominic E Sisneros\source\repos\ruby\try_arrow>gem install red-arrow Fetching glib2-3.4.1.gem Fetching extpp-0.0.8.gem Fetching native-package-installer-1.0.9.gem Fetching gio2-3.4.1.gem Fetching gobject-introspection-3.4.1.gem Fetching red-arrow-0.15.1.gem Building native extensions. This could take a while... Successfully installed extpp-0.0.8 Successfully installed native-package-installer-1.0.9 Installing required msys2 packages: mingw-w64-x86_64-glib2 warning: mingw-w64-x86_64-glib2-2.62.4-1 is up to date -- skipping Building native extensions. This could take a while... ERROR: Error installing red-arrow: ERROR: Failed to build gem native extension. current directory: C:/Ruby27-x64/lib/ruby/gems/2.7.0/gems/glib2-3.4.1/ext/glib2 C:/Ruby27-x64/bin/ruby.exe -I C:/Ruby27-x64/lib/ruby/2.7.0 -r ./siteconf20200110-28684-mef9su.rb extconf.rb checking for --enable-debug-build option... no checking for -Wall option to compiler... yes checking for -Waggregate-return option to compiler... yes checking for -Wcast-align option to compiler... yes checking for -Wextra option to compiler... yes checking for -Wformat=2 option to compiler... yes checking for -Winit-self option to compiler... yes checking for -Wlarger-than-65500 option to compiler... yes checking for -Wmissing-declarations option to compiler... yes checking for -Wmissing-format-attribute option to compiler... yes checking for -Wmissing-include-dirs option to compiler... yes checking for -Wmissing-noreturn option to compiler... yes checking for -Wmissing-prototypes option to compiler... yes checking for -Wnested-externs option to compiler... no checking for -Wold-style-definition option to compiler... yes checking for -Wpacked option to compiler... yes checking for -Wp,-D_FORTIFY_SOURCE=2 option to compiler... yes checking for -Wpointer-arith option to compiler... yes checking for -Wundef option to compiler... yes checking for -Wout-of-line-declaration option to compiler... no checking for -Wunsafe-loop-optimizations option to compiler... yes checking for -Wwrite-strings option to compiler... yes checking for Homebrew... no checking for gobject-2.0 version (>= 2.12.0)... yes checking for gthread-2.0... yes checking for unistd.h... yes checking for io.h... yes checking for g_spawn_close_pid() in glib.h... yes checking for g_thread_init() in glib.h... yes checking for g_main_depth() in glib.h... yes checking for g_listenv() in glib.h... yes checking for rb_check_array_type() in ruby.h... yes checking for rb_check_hash_type() in ruby.h... yes checking for rb_exec_recursive() in ruby.h... yes checking for rb_errinfo() in ruby.h... yes checking for rb_thread_call_without_gvl() in ruby.h... yes checking for ruby_native_thread_p() in ruby.h... yes checking for rb_thread_call_with_gvl() in ruby.h... yes checking for rb_gc_register_mark_object() in ruby.h... yes checking for rb_exc_new_str() in ruby.h... yes checking for rb_enc_str_new_static() in ruby.h... yes checking for curr_thread in ruby.h,node.h... no checking for rb_curr_thread in ruby.h,node.h... no creating ruby-glib2.pc creating glib-enum-types.c creating glib-enum-types.h creating Makefile current directory: C:/Ruby27-x64/lib/ruby/gems/2.7.0/gems/glib2-3.4.1/ext/glib2 make "DESTDIR=" clean current directory: C:/Ruby27-x64/lib/ruby/gems/2.7.0/gems/glib2-3.4.1/ext/glib2 make "DESTDIR=" compiling glib-enum-types.c cc1.exe: warning: C:/Ruby26-x64/msys64/Ruby26-x64/msys64/mingw64/include: No such file or directory [-Wmissing-include-dirs] cc1.exe: warning: C:/Ruby26-x64/msys64/Ruby26-x64/msys64/mingw64/lib/libffi-3.2.1/include: No such file or directory [-Wmissing-include-dirs] cc1.exe: warning: C:/Ruby26-x64/msys64/Ruby26-x64/msys64/mingw64/include/glib-2.0: No such file or directory [-Wmissing-include-dirs] cc1.exe: warning: C:/Ruby26-x64/msys64/Ruby26-x64/msys64/mingw64/lib/glib-2.0/include: No such file or directory [-Wmissing-include-dirs] cc1.exe: warning: C:/Ruby26-x64/msys64/Ruby26-x64/msys64/mingw64/include: No such file or directory [-Wmissing-include-dirs] cc1.exe: warning: C:/Ruby26-x64/msys64/Ruby26-x64/msys64/mingw64/include/glib-2.0: No such file or directory [-Wmissing-include-dirs] cc1.exe: warning: C:/Ruby26-x64/msys64/Ruby26-x64/msys64/mingw64/lib/glib-2.0/include: No such file or directory [-Wmissing-include-dirs] In file included from rbgprivate.h:31, from glib-enum-types.c:4:
[NIGHTLY] Arrow Build Report for Job nightly-2020-01-10-0
Arrow Build Report for Job nightly-2020-01-10-0 All tasks: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-10-0 Failed Tasks: - gandiva-jar-osx: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-10-0-travis-gandiva-jar-osx Succeeded Tasks: - centos-6: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-10-0-azure-centos-6 - centos-7: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-10-0-azure-centos-7 - centos-8: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-10-0-azure-centos-8 - conda-linux-gcc-py27: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-10-0-azure-conda-linux-gcc-py27 - conda-linux-gcc-py36: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-10-0-azure-conda-linux-gcc-py36 - conda-linux-gcc-py37: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-10-0-azure-conda-linux-gcc-py37 - conda-linux-gcc-py38: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-10-0-azure-conda-linux-gcc-py38 - conda-osx-clang-py27: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-10-0-azure-conda-osx-clang-py27 - conda-osx-clang-py36: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-10-0-azure-conda-osx-clang-py36 - conda-osx-clang-py37: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-10-0-azure-conda-osx-clang-py37 - conda-osx-clang-py38: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-10-0-azure-conda-osx-clang-py38 - conda-win-vs2015-py36: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-10-0-azure-conda-win-vs2015-py36 - conda-win-vs2015-py37: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-10-0-azure-conda-win-vs2015-py37 - conda-win-vs2015-py38: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-10-0-azure-conda-win-vs2015-py38 - debian-buster: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-10-0-azure-debian-buster - debian-stretch: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-10-0-azure-debian-stretch - gandiva-jar-trusty: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-10-0-travis-gandiva-jar-trusty - homebrew-cpp: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-10-0-travis-homebrew-cpp - macos-r-autobrew: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-10-0-travis-macos-r-autobrew - test-conda-cpp: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-10-0-circle-test-conda-cpp - test-conda-python-2.7-pandas-latest: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-10-0-circle-test-conda-python-2.7-pandas-latest - test-conda-python-2.7: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-10-0-circle-test-conda-python-2.7 - test-conda-python-3.6: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-10-0-circle-test-conda-python-3.6 - test-conda-python-3.7-dask-latest: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-10-0-circle-test-conda-python-3.7-dask-latest - test-conda-python-3.7-hdfs-2.9.2: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-10-0-circle-test-conda-python-3.7-hdfs-2.9.2 - test-conda-python-3.7-pandas-latest: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-10-0-circle-test-conda-python-3.7-pandas-latest - test-conda-python-3.7-pandas-master: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-10-0-circle-test-conda-python-3.7-pandas-master - test-conda-python-3.7-spark-master: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-10-0-circle-test-conda-python-3.7-spark-master - test-conda-python-3.7-turbodbc-latest: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-10-0-circle-test-conda-python-3.7-turbodbc-latest - test-conda-python-3.7-turbodbc-master: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-10-0-circle-test-conda-python-3.7-turbodbc-master - test-conda-python-3.7: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-10-0-circle-test-conda-python-3.7 - test-conda-python-3.8-dask-master: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-10-0-circle-test-conda-python-3.8-dask-master - test-conda-python-3.8-pandas-latest: URL:
[jira] [Created] (ARROW-7547) [C++] [Python] [Dataset] Additional reader options in ParquetFileFormat
Joris Van den Bossche created ARROW-7547: Summary: [C++] [Python] [Dataset] Additional reader options in ParquetFileFormat Key: ARROW-7547 URL: https://issues.apache.org/jira/browse/ARROW-7547 Project: Apache Arrow Issue Type: Improvement Components: C++ - Dataset, Python Reporter: Joris Van den Bossche [looking into using the datasets machinery in the current python parquet code] In the current python API, we expose several options that influence reading the parquet file (eg {{read_dictionary}} to indicate to read certain BYTE_ARRAY columns directly into a dictionary type, or {{memory_map}}, {{buffer_size}}). Those could be added to {{ParquetFileFormat}}. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-7546) [Java] Use new implementation to concat vectors values in batch
Ji Liu created ARROW-7546: - Summary: [Java] Use new implementation to concat vectors values in batch Key: ARROW-7546 URL: https://issues.apache.org/jira/browse/ARROW-7546 Project: Apache Arrow Issue Type: Bug Components: Java Reporter: Ji Liu Assignee: Ji Liu Per discussion https://github.com/apache/arrow/pull/5945#discussion_r365108806. In ARROW-7284, we write a simple method to concat vectors. However, ARROW-7073 is about to concat vector values efficiently, after this PR merged, we should use this new implementation in {{ArrowReader}}. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-7545) [C++] Scanning dataset with dictionary type hangs
Joris Van den Bossche created ARROW-7545: Summary: [C++] Scanning dataset with dictionary type hangs Key: ARROW-7545 URL: https://issues.apache.org/jira/browse/ARROW-7545 Project: Apache Arrow Issue Type: Bug Components: C++ - Dataset Reporter: Joris Van den Bossche I assume it is an issue on the C++ side of the datasets code, but reproducer in Python. I create a small parquet file with a single column of dictionary type. Reading it with {{pq.read_table}} works fine, reading it with the datasets machinery hangs when scanning: {code:python} import pandas as pd import pyarrow as pa import pyarrow.parquet as pq df = pd.DataFrame({'a': pd.Categorical(['a', 'b']*10)}) arrow_table = pa.Table.from_pandas(df) filename = "test.parquet" pq.write_table(arrow_table, filename) from pyarrow.fs import LocalFileSystem from pyarrow.dataset import ParquetFileFormat, Dataset, FileSystemDataSourceDiscovery, FileSystemDiscoveryOptions filesystem = LocalFileSystem() format = ParquetFileFormat() options = FileSystemDiscoveryOptions() discovery = FileSystemDataSourceDiscovery( filesystem, [filename], format, options) inspected_schema = discovery.inspect() dataset = Dataset([discovery.finish()], inspected_schema) # dataset.schema works fine and gives correct schema dataset.schema scanner_builder = dataset.new_scan() scanner = scanner_builder.finish() # this hangs scanner.to_table() {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-7544) pyarrow.lib.ArrowNotImplementedError: gRPC returned unimplemented error, with message: clear is not implemented
Ji Wong Park created ARROW-7544: --- Summary: pyarrow.lib.ArrowNotImplementedError: gRPC returned unimplemented error, with message: clear is not implemented Key: ARROW-7544 URL: https://issues.apache.org/jira/browse/ARROW-7544 Project: Apache Arrow Issue Type: Bug Reporter: Ji Wong Park /arrow/python/examples/flight$ py client.py do 0.0.0.0:5005 clear Running action clear Traceback (most recent call last): File "client.py", line 162, in main() File "client.py", line 158, in main commands[args.action](args, client) File "client.py", line 69, in do_action for result in client.do_action(action): File "pyarrow/_flight.pyx", line 1068, in do_action File "pyarrow/_flight.pyx", line 75, in pyarrow._flight.check_flight_status File "pyarrow/error.pxi", line 105, in pyarrow.lib.check_status pyarrow.lib.ArrowNotImplementedError: gRPC returned unimplemented error, with message: clear is not implemented.. Detail: Python exception: NotImplementedError -- This message was sent by Atlassian Jira (v8.3.4#803005)