sahil1105 commented on code in PR #43661:
URL: https://github.com/apache/arrow/pull/43661#discussion_r1715751879
##########
cpp/src/arrow/dataset/file_parquet.cc:
##########
@@ -555,6 +562,57 @@ Future<std::shared_ptr<parquet::arrow::FileReader>>
ParquetFileFormat::GetReader
});
}
+struct CastingGenerator {
Review Comment:
> Parquet logical type doesn't have an arrow schema, isn't it?
As far as I understand, the parquet metadata may or may not have the arrow
schema. I believe it depends on the writer. It looks like it tries to get that
using `GetOriginSchema` in `SchemaManifest::Make`. However, the schema at write
time might not be the same as the schema the reader expects.
> Binary reader reads from ::arrow::BinaryBuilder, and casting it to
user-specified binary type.
Sorry, I didn't quite follow. Are you saying that we should use this to do
the cast at read time somehow?
> I think a native cast is better here but this doesn't solve your problem,
perhaps I can trying to add a naive SchemaManifest with hint solving here, but
it would spend some time.
> Maybe we should rethink the GetTypeForNode handling for
string/large_string/stringView, or using some handle written type hint here.
That makes sense to me.
> Maybe I can add separate issue for that
That would be great, thanks!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]