This is an automated email from the ASF dual-hosted git repository.

kevingurney pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/main by this push:
     new f5e40dc45b GH-37978: [C++] Add support for specifying custom Array 
element delimiter to `arrow::PrettyPrintOptions` (#37981)
f5e40dc45b is described below

commit f5e40dc45ba3465dfaf1c32af020b63a294b6982
Author: Kevin Gurney <[email protected]>
AuthorDate: Thu Oct 5 12:25:56 2023 -0400

    GH-37978: [C++] Add support for specifying custom Array element delimiter 
to `arrow::PrettyPrintOptions` (#37981)
    
    ### Rationale for this change
    
    In order to make the 
[`arrow::PrettyPrint`](https://github.com/apache/arrow/blob/7667b81bffcb5b361fab6d61c42ce396d98cc6e1/cpp/src/arrow/pretty_print.h#L101)
 functionality for `arrow::Array`/`arrow::ChunkedArray` more flexible, it would 
be useful to be able to specify a custom element delimiter other than `","`.
    
    For example, the MATLAB interface wraps the Arrow C++ libraries and being 
able to specify a custom `Array` element delimiter, would make it possible to 
make the display of MATLAB `arrow.array.Array` objects more MATLAB-like.
    
    For the MATLAB interface, we would like to enable display that looks 
something like the following (note the ` | ` between individual `Array` 
elements):
    
    ```matlab
    % Make a MATLAB array.
    >> A = 1:5
    
    A =
    
         1     2     3     4     5
    
    % Make an Arrow array from the MATLAB array.
    >> B = arrow.array(A)
    
    B =
    
        [ 1 | 2 | 3 | 4 | 5 ]
    ```
    
    In order to support custom `Array` element delimiters, this pull request 
adds a new `struct` type `PrettyPrintDelimiters`. The `PrettyPrintDelimiters` 
type has one property `element` (of type `std::string`), which allows client 
code to control the delimiter used to distinguish between individual elements 
of an `arrow::Array` / `arrow::ChunkedArray`.
    
    In a future pull request, we plan to add more properties like `open` and 
`close` to allow client code to specify the opening and closing delimiters to 
use when printing an `arrow::Array` / `arrow::ChunkedArray` (e.g. `"<"` rather 
than `"["` and `">"` rather than `"]"`).
    
    ### What changes are included in this PR?
    
    1. Added a new `struct` type `PrettyPrintDelimiters` with one property 
`element` (of type `std::string`). The `element` property allows client code to 
specify any string value as the delimiter to distinguish between individual 
elements of an `arrow::Array` or `arrow::ChunkedArray` when printing using 
`arrow::PrettyPrint`.
    2. Added two new properties to `arrow::PrettyPrintOptions` - (1) 
`array_delimiters` (of type `arrow::PrettyPrintDelimiters`) and 
`chunked_array_delimiters` (of type `arrow::PrettyPrintDelimiters`). These 
properties can be modified to customize how 
`arrow::Arrow`/`arrow::ChunkedArray` are printed when using 
`arrow::PrettyPrint`.
    
    ### Are these changes tested?
    
    Yes.
    
    1. Added new tests `ArrayCustomElementDelimiter` and 
`ChunkedArrayCustomElementDelimiter` to `pretty_print_test.cc`.
    2. All existing `PrettyPrint`-related C++ tests pass.
    
    ### Are there any user-facing changes?
    
    Yes.
    
    1. User's can now specify a custom element delimiter to use when printing 
`arrow::Array`s / `arrow::ChunkedArray`s using  
[`arrow::PrettyPrint`](https://github.com/apache/arrow/blob/7667b81bffcb5b361fab6d61c42ce396d98cc6e1/cpp/src/arrow/pretty_print.h#L101)
 by modifying the `array_delimiters` or `chunked_array_delimiters` properties 
of `arrow::PrettyPrintOptions`.
    
    **Example**:
    
    ```cpp
    auto array = ...;
    auto stream = ...
    arrow::PrettyPrintOptions options = arrow::PrettyPrintOptions::Defaults();
    // Use " | " as the element-wise (element = scalar value) delimiter for 
arrow::Array.
    options.array_delimiters.element = " | ";
    // Use "';" as the element-wise (element = chunk) delimiter for 
arrow::ChunkedArray.
    options.chunked_array_delimiters.element = ";";
    arrow::PrettyPrint(array, options, stream);
    ```
    
    ### Future Directions
    
    1. To keep this pull request small and focused, I intentionally chose not 
to include changes related to specifying custom opening and closing `Array` 
delimiters (e.g. use `<` and `>` instead of `[` and `]`). I've captured the 
idea of supporting custom opening and closing `Array` delimiters in #37979. I 
will follow up with a future PR to address this.
    
    ### Notes
    
    1. This pull request was motivated by our desire to improve the display of 
Arrow related classes in the MATLAB interface, but it is hopefully a generic 
enough change that it may benefit other use cases too.
    3. @ rok helpfully pointed out in 
https://github.com/apache/arrow/issues/37978#issuecomment-1743715458 that a 
similar attempt to modify the default `Array` element delimiter to be `", "` 
(note the space after the comma) was taken in #30951. However, this issue 
appears to have gone stale and the PR 
(https://github.com/apache/arrow/pull/12420) that was opened also seems to have 
gone stale. If these changes get merged, it may make sense to close out this 
issue since this one seems to at  [...]
    * Closes: #37978
    
    Authored-by: Kevin Gurney <[email protected]>
    Signed-off-by: Kevin Gurney <[email protected]>
---
 cpp/src/arrow/pretty_print.cc      |  19 ++++---
 cpp/src/arrow/pretty_print.h       |  23 +++++++-
 cpp/src/arrow/pretty_print_test.cc | 111 +++++++++++++++++++++++++++++++++++++
 3 files changed, 143 insertions(+), 10 deletions(-)

diff --git a/cpp/src/arrow/pretty_print.cc b/cpp/src/arrow/pretty_print.cc
index 03e2051c2f..a4a1fa90c2 100644
--- a/cpp/src/arrow/pretty_print.cc
+++ b/cpp/src/arrow/pretty_print.cc
@@ -151,14 +151,14 @@ class ArrayPrinter : public PrettyPrinter {
         IndentAfterNewline();
         (*sink_) << "...";
         if (!is_last && options_.skip_new_lines) {
-          (*sink_) << ",";
+          (*sink_) << options_.array_delimiters.element;
         }
         i = array.length() - window - 1;
       } else if (array.IsNull(i)) {
         IndentAfterNewline();
         (*sink_) << options_.null_rep;
         if (!is_last) {
-          (*sink_) << ",";
+          (*sink_) << options_.array_delimiters.element;
         }
       } else {
         if (indent_non_null_values) {
@@ -166,7 +166,7 @@ class ArrayPrinter : public PrettyPrinter {
         }
         RETURN_NOT_OK(func(i));
         if (!is_last) {
-          (*sink_) << ",";
+          (*sink_) << options_.array_delimiters.element;
         }
       }
       Newline();
@@ -453,12 +453,12 @@ Status PrettyPrint(const ChunkedArray& chunked_arr, const 
PrettyPrintOptions& op
   if (!skip_new_lines) {
     *sink << "\n";
   }
-  bool skip_comma = true;
+  bool skip_element_delimiter = true;
   for (int i = 0; i < num_chunks; ++i) {
-    if (skip_comma) {
-      skip_comma = false;
+    if (skip_element_delimiter) {
+      skip_element_delimiter = false;
     } else {
-      (*sink) << ",";
+      (*sink) << options.chunked_array_delimiters.element;
       if (!skip_new_lines) {
         *sink << "\n";
       }
@@ -467,12 +467,13 @@ Status PrettyPrint(const ChunkedArray& chunked_arr, const 
PrettyPrintOptions& op
       for (int i = 0; i < indent; ++i) {
         (*sink) << " ";
       }
-      (*sink) << "...,";
+      (*sink) << "...";
+      (*sink) << options.chunked_array_delimiters.element;
       if (!skip_new_lines) {
         *sink << "\n";
       }
       i = num_chunks - window - 1;
-      skip_comma = true;
+      skip_element_delimiter = true;
     } else {
       PrettyPrintOptions chunk_options = options;
       chunk_options.indent += options.indent_size;
diff --git a/cpp/src/arrow/pretty_print.h b/cpp/src/arrow/pretty_print.h
index 5d22fd5c51..96a214c68b 100644
--- a/cpp/src/arrow/pretty_print.h
+++ b/cpp/src/arrow/pretty_print.h
@@ -32,7 +32,21 @@ class Schema;
 class Status;
 class Table;
 
-struct PrettyPrintOptions {
+/// \class PrettyPrintDelimiters
+/// \brief Options for controlling which delimiters to use when printing
+/// an Array or ChunkedArray.
+struct ARROW_EXPORT PrettyPrintDelimiters {
+  /// Delimiter for separating individual elements of an Array (e.g. ","),
+  /// or individual chunks of a ChunkedArray
+  std::string element = ",";
+
+  /// Create a PrettyPrintDelimiters instance with default values
+  static PrettyPrintDelimiters Defaults() { return PrettyPrintDelimiters(); }
+};
+
+/// \class PrettyPrintOptions
+/// \brief Options for controlling how various Arrow types should be printed.
+struct ARROW_EXPORT PrettyPrintOptions {
   PrettyPrintOptions() = default;
 
   PrettyPrintOptions(int indent,  // NOLINT runtime/explicit
@@ -47,6 +61,7 @@ struct PrettyPrintOptions {
         skip_new_lines(skip_new_lines),
         truncate_metadata(truncate_metadata) {}
 
+  /// Create a PrettyPrintOptions instance with default values
   static PrettyPrintOptions Defaults() { return PrettyPrintOptions(); }
 
   /// Number of spaces to shift entire formatted object to the right
@@ -77,6 +92,12 @@ struct PrettyPrintOptions {
 
   /// If true, display schema metadata when pretty-printing a Schema
   bool show_schema_metadata = true;
+
+  /// Delimiters to use when printing an Array
+  PrettyPrintDelimiters array_delimiters = PrettyPrintDelimiters::Defaults();
+
+  /// Delimiters to use when printing a ChunkedArray
+  PrettyPrintDelimiters chunked_array_delimiters = 
PrettyPrintDelimiters::Defaults();
 };
 
 /// \brief Print human-readable representation of RecordBatch
diff --git a/cpp/src/arrow/pretty_print_test.cc 
b/cpp/src/arrow/pretty_print_test.cc
index 9a6e347c0b..45bb4ecffe 100644
--- a/cpp/src/arrow/pretty_print_test.cc
+++ b/cpp/src/arrow/pretty_print_test.cc
@@ -200,6 +200,65 @@ TEST_F(TestPrettyPrint, PrimitiveTypeNoNewlines) {
   CheckPrimitive<Int32Type, int32_t>(options, is_valid, values, expected, 
false);
 }
 
+TEST_F(TestPrettyPrint, ArrayCustomElementDelimiter) {
+  PrettyPrintOptions options{};
+  // Use a custom array element delimiter of " | ",
+  // rather than the default delimiter (i.e. ",").
+  options.array_delimiters.element = " | ";
+
+  // Short array without ellipsis
+  {
+    std::vector<bool> is_valid = {true, true, false, true, false};
+    std::vector<int32_t> values = {1, 2, 3, 4, 5};
+    static const char* expected = R"expected([
+  1 | 
+  2 | 
+  null | 
+  4 | 
+  null
+])expected";
+    CheckPrimitive<Int32Type, int32_t>(options, is_valid, values, expected, 
false);
+  }
+
+  // Longer array with ellipsis
+  {
+    std::vector<bool> is_valid = {true, false, true};
+    std::vector<int32_t> values = {1, 2, 3};
+    // Append 20 copies of the value "10" to the end of the values vector.
+    values.insert(values.end(), 20, 10);
+    // Append 20 copies of the value "true" to the end of the validity bitmap 
vector.
+    is_valid.insert(is_valid.end(), 20, true);
+    // Append the values 4, 5, and 6 to the end of the values vector.
+    values.insert(values.end(), {4, 5, 6});
+    // Append the values true, false, and true to the end of the validity 
bitmap vector.
+    is_valid.insert(is_valid.end(), {true, false, true});
+    static const char* expected = R"expected([
+  1 | 
+  null | 
+  3 | 
+  10 | 
+  10 | 
+  10 | 
+  10 | 
+  10 | 
+  10 | 
+  10 | 
+  ...
+  10 | 
+  10 | 
+  10 | 
+  10 | 
+  10 | 
+  10 | 
+  10 | 
+  4 | 
+  null | 
+  6
+])expected";
+    CheckPrimitive<Int32Type, int32_t>(options, is_valid, values, expected, 
false);
+  }
+}
+
 TEST_F(TestPrettyPrint, Int8) {
   static const char* expected = R"expected([
   0,
@@ -1020,6 +1079,58 @@ TEST_F(TestPrettyPrint, ChunkedArrayPrimitiveType) {
   CheckStream(chunked_array_2, {0}, expected_2);
 }
 
+TEST_F(TestPrettyPrint, ChunkedArrayCustomElementDelimiter) {
+  PrettyPrintOptions options{};
+  // Use a custom ChunkedArray element delimiter of ";",
+  // rather than the default delimiter (i.e. ",").
+  options.chunked_array_delimiters.element = ";";
+  // Use a custom Array element delimiter of " | ",
+  // rather than the default delimiter (i.e. ",").
+  options.array_delimiters.element = " | ";
+
+  const auto chunk = ArrayFromJSON(int32(), "[1, 2, null, 4, null]");
+
+  // ChunkedArray with 1 chunk
+  {
+    const ChunkedArray chunked_array(chunk);
+
+    static const char* expected = R"expected([
+  [
+    1 | 
+    2 | 
+    null | 
+    4 | 
+    null
+  ]
+])expected";
+    CheckStream(chunked_array, options, expected);
+  }
+
+  // ChunkedArray with 2 chunks
+  {
+    const ChunkedArray chunked_array({chunk, chunk});
+
+    static const char* expected = R"expected([
+  [
+    1 | 
+    2 | 
+    null | 
+    4 | 
+    null
+  ];
+  [
+    1 | 
+    2 | 
+    null | 
+    4 | 
+    null
+  ]
+])expected";
+
+    CheckStream(chunked_array, options, expected);
+  }
+}
+
 TEST_F(TestPrettyPrint, TablePrimitive) {
   std::shared_ptr<Field> int_field = field("column", int32());
   auto array = ArrayFromJSON(int_field->type(), "[0, 1, null, 3, null]");

Reply via email to