wgtmac commented on code in PR #44739:
URL: https://github.com/apache/arrow/pull/44739#discussion_r1845738022
##########
cpp/src/parquet/stream_writer.cc:
##########
@@ -198,10 +212,15 @@ void StreamWriter::CheckColumn(Type::type physical_type,
"' not '" + TypeToString(physical_type) + "'");
}
if (converted_type != node->converted_type()) {
- throw ParquetException("Column converted type mismatch. Column '" +
node->name() +
- "' has converted type[" +
- ConvertedTypeToString(node->converted_type()) + "]
not '" +
- ConvertedTypeToString(converted_type) + "'");
+ // The converted type does not always match with the value
Review Comment:
The root cause should be at this line:
https://github.com/apache/arrow/blob/4c2aef7b231f129db7b3bdb232c6da00542ba7b3/cpp/src/parquet/stream_writer.cc#L145
I think a clean fix might be creating a thin wrapper around `const char*`,
`std::string` and `std:string_view` for binary data. Just like
`FixedStringView` for the fixed length type:
https://github.com/apache/arrow/blob/4c2aef7b231f129db7b3bdb232c6da00542ba7b3/cpp/src/parquet/stream_writer.h#L147
In this approach, we can safely call `CheckColumn(Type::BYTE_ARRAY,
ConvertedType::NONE);` in it.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]