joosthooz commented on PR #13820: URL: https://github.com/apache/arrow/pull/13820#issuecomment-1216675451
After having a better look, here's what seems to be happening: - The first part of the test checks if parsing the file as binary still works. But that doesn't work for utf16 because the column names are not utf8. So parsing the column names into the schema fails (silently!). - The second part tries to read the file, without specifying an encoding. It expects an exception. However, apparently the dataset reader has no problems with the null values every other character; it will just interpret it as a strange utf8 string. I've removed those 2 additional checks, and just check if the data is transcoded properly. The 2nd check is still present in the new `test_column_names_encoding` test (that only tests latin-1) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org