pitrou commented on code in PR #45685:
URL: https://github.com/apache/arrow/pull/45685#discussion_r1982958362


##########
cpp/src/parquet/arrow/reader_internal.cc:
##########
@@ -256,6 +256,23 @@ Status ByteArrayStatisticsAsScalars(const Statistics& 
statistics,
   return Status::OK();
 }
 
+Result<std::shared_ptr<ChunkedArray>> ViewOrCastChunkedArray(
+    const std::shared_ptr<ChunkedArray>& array,
+    const std::shared_ptr<DataType>& logical_value_type) {
+  auto view_result = array->View(logical_value_type);
+  if (view_result.ok()) {
+    return view_result.ValueOrDie();
+  } else {
+    ::arrow::ArrayVector casted_chunks;
+    for (const auto& chunk : array->chunks()) {
+      ARROW_ASSIGN_OR_RAISE(auto cast_chunk,
+                            ::arrow::compute::Cast(chunk, logical_value_type));
+      casted_chunks.push_back(cast_chunk.make_array());
+    }

Review Comment:
   Two comments here:
   1. There is a `Cast` function overload that accepts a `Datum`, why not use 
that instead of casting each chunk individually?
   2. We should pass the right `MemoryPool` when casting.
   



##########
cpp/src/parquet/arrow/reader_internal.cc:
##########
@@ -256,6 +256,23 @@ Status ByteArrayStatisticsAsScalars(const Statistics& 
statistics,
   return Status::OK();
 }
 
+Result<std::shared_ptr<ChunkedArray>> ViewOrCastChunkedArray(
+    const std::shared_ptr<ChunkedArray>& array,
+    const std::shared_ptr<DataType>& logical_value_type) {
+  auto view_result = array->View(logical_value_type);

Review Comment:
   I'm curious, in which cases does the View succeed? If the bitwidth changes, 
then presumably the View will fail. 



##########
cpp/src/parquet/arrow/arrow_reader_writer_test.cc:
##########
@@ -3336,6 +3336,38 @@ TEST(TestArrowReadWrite, NonUniqueDictionaryValues) {
   }
 }
 
+TEST(TestArrowReadWrite, DictionaryIndexBidWidthsArePreservedOnRoundTrip) {

Review Comment:
   Suggestion: 1) fix typo 2) make test name slightly shorter :)
   ```suggestion
   TEST(TestArrowReadWrite, DictionaryIndexBitwidthRoundtrip) {
   ```



##########
cpp/src/parquet/arrow/arrow_reader_writer_test.cc:
##########
@@ -3336,6 +3336,38 @@ TEST(TestArrowReadWrite, NonUniqueDictionaryValues) {
   }
 }
 
+TEST(TestArrowReadWrite, DictionaryIndexBidWidthsArePreservedOnRoundTrip) {
+  using ::arrow::ArrayFromVector;
+
+  std::vector<std::string> values = {"first", "second", "third"};
+  std::shared_ptr<Array> dict_values;
+  ArrayFromVector<::arrow::StringType, std::string>(values, &dict_values);
+
+  auto value_type = ::arrow::utf8();
+  auto dict_type = ::arrow::dictionary(::arrow::int8(), value_type);
+
+  auto f0 = field("dictionary", dict_type);
+  std::vector<std::shared_ptr<::arrow::Field>> fields;
+  fields.emplace_back(f0);
+  auto schema = ::arrow::schema(fields);
+
+  std::shared_ptr<Array> f0_values, f1_values;
+  ArrayFromVector<::arrow::Int8Type, int8_t>({0, 1, 0, 2, 1}, &f0_values);
+  ArrayFromVector<::arrow::Int8Type, int8_t>({2, 0, 1, 0, 2}, &f1_values);
+  ::arrow::ArrayVector dict_arrays = {
+      std::make_shared<::arrow::DictionaryArray>(dict_type, f0_values, 
dict_values),
+      std::make_shared<::arrow::DictionaryArray>(dict_type, f1_values, 
dict_values)};

Review Comment:
   Ok, we have a facility named `DictArrayFromJSON` (you can grep for it) that 
will make constructing the dictionary arrays much easier.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to