[jira] [Created] (ARROW-15500) parquet link undefined reference to base64_encode(), unpack32(), etc
Ryan Seghers created ARROW-15500: Summary: parquet link undefined reference to base64_encode(), unpack32(), etc Key: ARROW-15500 URL: https://issues.apache.org/jira/browse/ARROW-15500 Project: Apache Arrow Issue Type: Bug Components: Parquet Affects Versions: 6.0.1 Environment: ubuntu 20.04 vcpkg master latest as of 2022-01-08, and tag 2022.01.01 gcc 9 and 10 latest cmake 3.22.1 Reporter: Ryan Seghers I'm trying to build on ubuntu 20.04, using vcpkg master latest, both gcc-9 and gcc-10 latest, cmake 3.22.1. I can build and link Arrow in a small test program and write a csv. When I try to build with parquet I get several link-time errors. Here is the first full linker error: /usr/bin/ld: /home/ryans/src/vcpkg/installed/x64-linux/lib/libparquet.a(writer.cc.o): in function `parquet::arrow::GetSchemaMetadata(arrow::Schema const&, arrow::MemoryPool*, parquet::ArrowWriterProperties const&, std::shared_ptr*) [clone .localalias]': writer.cc:(.text+0x179): undefined reference to `arrow::util::base64_encode[abi:cxx11](nonstd::sv_lite::basic_string_view >)' Here are snippets from the rest: undefined reference to `arrow::internal::unpack32(unsigned int const*, unsigned int*, int, int)' undefined reference to `arrow::internal::unpack64(unsigned char const*, unsigned long*, int, int)' undefined reference to `arrow::io::BufferedInputStream::Create(long, arrow::MemoryPool*, std::shared_ptr, long)' undefined reference to `arrow::util::base64_decode[abi:cxx11](nonstd::sv_lite::basic_string_view >)' I have tried vcpkg tag 2022.01.01 (I think it is Arrow 6.0.0) and looked like the same set of undefined symbols. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (ARROW-15499) [Python] Fix import error in pyarrow._orc
Krisztian Szucs created ARROW-15499: --- Summary: [Python] Fix import error in pyarrow._orc Key: ARROW-15499 URL: https://issues.apache.org/jira/browse/ARROW-15499 Project: Apache Arrow Issue Type: Bug Components: Python Reporter: Krisztian Szucs Assignee: Krisztian Szucs -- This message was sent by Atlassian Jira (v8.20.1#820001)
[GitHub] [arrow-testing] westonpace opened a new pull request #74: ARROW-15425: [Integration] Add delta dictionaries in file format to integration tests
westonpace opened a new pull request #74: URL: https://github.com/apache/arrow-testing/pull/74 This adds an example IPC file containing a delta dictionary for both the file and the streaming IPC format. It requires a small change to the integration programs (https://github.com/apache/arrow/pull/12291) to work correctly. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (ARROW-15498) Implement Bloom filter pushdown between hash joins
Sasha Krassovsky created ARROW-15498: Summary: Implement Bloom filter pushdown between hash joins Key: ARROW-15498 URL: https://issues.apache.org/jira/browse/ARROW-15498 Project: Apache Arrow Issue Type: Improvement Components: C++ Reporter: Sasha Krassovsky Assignee: Sasha Krassovsky When there is a chain of hash joins, it's often worthwhile to create Bloom filters and push them to the earliest possible point in the chain of joins to minimize number of materialized rows. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (ARROW-15497) [C++][Homebrew] Use Clang Tools 12
Kouhei Sutou created ARROW-15497: Summary: [C++][Homebrew] Use Clang Tools 12 Key: ARROW-15497 URL: https://issues.apache.org/jira/browse/ARROW-15497 Project: Apache Arrow Issue Type: Improvement Components: C++, Developer Tools Reporter: Kouhei Sutou Assignee: Kouhei Sutou -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (ARROW-15496) [Python] Log warning when user tries to write parquet table with incompatible type
Grant Williams created ARROW-15496: -- Summary: [Python] Log warning when user tries to write parquet table with incompatible type Key: ARROW-15496 URL: https://issues.apache.org/jira/browse/ARROW-15496 Project: Apache Arrow Issue Type: Wish Components: Parquet, Python Reporter: Grant Williams Could we get a logged warning when a user tries to `pyarrow.parquet.write_table()` with `version=1.0` and a schema that contains an incompatible `uint32()` type? I don't think the behavior to upcast to an `int64()` is immediately obvious (although the docs are clear on it) and I think it would help prevent some confusion for other users. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (ARROW-15495) [C++][FlightRPC] Ensure system gRPC is only used with system Protobuf
David Li created ARROW-15495: Summary: [C++][FlightRPC] Ensure system gRPC is only used with system Protobuf Key: ARROW-15495 URL: https://issues.apache.org/jira/browse/ARROW-15495 Project: Apache Arrow Issue Type: Improvement Components: C++, FlightRPC Reporter: David Li See Kou's post on the ML: [https://lists.apache.org/thread/dg2nm7r9vpo42toygg8o8rzf8gkg6knb] We should ensure system gRPC doesn't get mixed with bundled Protobuf which can cause test failures (also this is not really a valid combination, this will likely link in two copies of Protobuf). -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (ARROW-15494) [Docs] Clarify {{existing_data_behavior}} docstring
Martin Thøgersen created ARROW-15494: Summary: [Docs] Clarify {{existing_data_behavior}} docstring Key: ARROW-15494 URL: https://issues.apache.org/jira/browse/ARROW-15494 Project: Apache Arrow Issue Type: Improvement Components: Documentation Affects Versions: 7.0.1 Reporter: Martin Thøgersen Clarify wording slightly of \{{pyarrow.dataset.write_dataset()}} parameter {{existing_data_behavior}} [https://github.com/apache/arrow/blob/a27c55660e575a3987283d5d9e443642db48f215/python/pyarrow/dataset.py#L812-L827] Proposed wording: {noformat} existing_data_behavior : 'error' | 'overwrite_or_ignore' | \ 'delete_matching' Controls how the dataset will handle data that already exists in the destination. The default behavior ('error') is to raise an error if any data exists in the `base_dir` destination. 'overwrite_or_ignore' will ignore any existing data and will overwrite files with the same name as an output file. Other existing files will be ignored. This behavior, in combination with a unique basename_template for each write, will allow for an append workflow. 'delete_matching' is useful when you are writing a partitioned dataset. The first time each partition leaf-level directory is encountered the entire leaf-level directory will be deleted. This allows you to overwrite old partitions completely. {noformat} I.e. clarify that: - {{error}} applies to the base_dir level. - {{delete_matching}} applies to the leaf-level directory. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (ARROW-15493) [C++][Gandiva] Uninitialized data member causes random gandiva-filter-test failures
Yibo Cai created ARROW-15493: Summary: [C++][Gandiva] Uninitialized data member causes random gandiva-filter-test failures Key: ARROW-15493 URL: https://issues.apache.org/jira/browse/ARROW-15493 Project: Apache Arrow Issue Type: Bug Components: C++ - Gandiva Reporter: Yibo Cai Assignee: Yibo Cai gandiva-filter-test {{TestFilter.TestFilterCache}} fails on Arm, though the bug is not architecure dependent. Class member *mode_* is not initialized in one of ExpressionCacheKey constructors [1], but it's used to comapre equality of two instances [2]. It causes flaky gandiva-filter-test failures. [1] https://github.com/apache/arrow/blob/master/cpp/src/gandiva/expression_cache_key.h#L55 [2] https://github.com/apache/arrow/blob/master/cpp/src/gandiva/expression_cache_key.h#L92 -- This message was sent by Atlassian Jira (v8.20.1#820001)