This is an automated email from the ASF dual-hosted git repository.
wjones127 pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/arrow.git
The following commit(s) were added to refs/heads/main by this push:
new 8732b27858 GH-34546: [C++] Support casting from large string to string
scalar (#34549)
8732b27858 is described below
commit 8732b2785804d3c75c2e7c551ebec5e787214b47
Author: Will Jones <[email protected]>
AuthorDate: Tue Mar 14 12:29:21 2023 -0700
GH-34546: [C++] Support casting from large string to string scalar (#34549)
### Rationale for this change
We rely on scalar casting to create partition values. Some systems, such as
Polars, always use LargeString arrays instead of String, so we need to make
sure we can handle those partition values correctly.
### What changes are included in this PR?
Adds casting function for `LargeStringScalar` and `LargeBinaryScalar` to
`StringScalar`.
Tests that we correctly format Hive partition values for large string.
### Are these changes tested?
The casting functions are implicitly tested in other places (like the test
added). There are no existing tests for `CastTo` for binary/string scalars.
### Are there any user-facing changes?
Fixes a bug.
I guess it is a breaking change, given that Ruby was testing that the
serialized value of a LargeBinaryScalar was `"[\n{value}\n]"`. But I don't
think that was actually something anyone really wanted, was it?
* Closes: #34546
Authored-by: Will Jones <[email protected]>
Signed-off-by: Will Jones <[email protected]>
---
c_glib/test/test-large-binary-scalar.rb | 6 +-----
c_glib/test/test-large-string-scalar.rb | 6 +-----
cpp/src/arrow/dataset/partition_test.cc | 3 +++
cpp/src/arrow/scalar.cc | 8 ++++++--
4 files changed, 11 insertions(+), 12 deletions(-)
diff --git a/c_glib/test/test-large-binary-scalar.rb
b/c_glib/test/test-large-binary-scalar.rb
index d716e13f3e..820a9b451d 100644
--- a/c_glib/test/test-large-binary-scalar.rb
+++ b/c_glib/test/test-large-binary-scalar.rb
@@ -38,11 +38,7 @@ class TestLargeBinaryScalar < Test::Unit::TestCase
end
def test_to_s
- assert_equal(<<-BINARY.strip, @scalar.to_s)
-[
- 030102
-]
- BINARY
+ assert_equal("\x03\x01\x02", @scalar.to_s)
end
def test_value
diff --git a/c_glib/test/test-large-string-scalar.rb
b/c_glib/test/test-large-string-scalar.rb
index 42e24a601b..826b95000a 100644
--- a/c_glib/test/test-large-string-scalar.rb
+++ b/c_glib/test/test-large-string-scalar.rb
@@ -38,11 +38,7 @@ class TestLargeStringScalar < Test::Unit::TestCase
end
def test_to_s
- assert_equal(<<-STRING.strip, @scalar.to_s)
-[
- "Hello"
-]
- STRING
+ assert_equal("Hello", @scalar.to_s)
end
def test_value
diff --git a/cpp/src/arrow/dataset/partition_test.cc
b/cpp/src/arrow/dataset/partition_test.cc
index 3e681a9cb7..38dbb2d750 100644
--- a/cpp/src/arrow/dataset/partition_test.cc
+++ b/cpp/src/arrow/dataset/partition_test.cc
@@ -529,6 +529,9 @@ TEST_F(TestPartitioning, HivePartitioningFormat) {
AssertFormatError<StatusCode::TypeError>(
and_(equal(field_ref("alpha"), literal("0.0")),
equal(field_ref("beta"), literal("hello"))));
+
+ partitioning_ = std::make_shared<HivePartitioning>(schema({field("x",
large_utf8())}));
+ AssertFormat(equal(field_ref("x"), literal("hello")), "x=hello");
}
TEST_F(TestPartitioning, FilenamePartitioningFormat) {
diff --git a/cpp/src/arrow/scalar.cc b/cpp/src/arrow/scalar.cc
index 4e0a3c13b6..1f467ad93b 100644
--- a/cpp/src/arrow/scalar.cc
+++ b/cpp/src/arrow/scalar.cc
@@ -1049,8 +1049,12 @@ Status CastImpl(const StringScalar& from, ScalarType*
to) {
return Status::OK();
}
-// binary to string
-Status CastImpl(const BinaryScalar& from, StringScalar* to) {
+// binary/large binary/large string to string
+template <typename ScalarType>
+enable_if_t<std::is_base_of_v<BaseBinaryScalar, ScalarType> &&
+ !std::is_same<ScalarType, StringScalar>::value,
+ Status>
+CastImpl(const ScalarType& from, StringScalar* to) {
to->value = from.value;
return Status::OK();
}