progval opened a new issue, #14055: URL: https://github.com/apache/datafusion/issues/14055
### Describe the bug `encode(..., "hex")` can be used to get the hexadecimal representation of a string or a binary. Since datafusion v43 (specifically, since 1b3608da7ca59d8d987804834d004e8b3e349d18), only strings and binaries that happen to be valid UTF-8 are supported. ### To Reproduce ``` vlorentz@maxxi:~/datafusion/datafusion-cli$ git checkout 1b3608da7ca59d8d987804834d004e8b3e349d18 HEAD is now at 1b3608da7 fix: coalesce schema issues (#12308) vlorentz@maxxi:~/datafusion/datafusion-cli$ TMPDIR=/srv/softwareheritage/tmp/vlorentz/ cargo run Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.27s Running `target/debug/datafusion-cli` DataFusion CLI v42.0.0 > create table test ( foo bytea ); 0 row(s) fetched. Elapsed 0.007 seconds. > insert into test (foo) values (X'8f50d3f60eae370ddbf85c86219c55108a350165'); +-------+ | count | +-------+ | 1 | +-------+ 1 row(s) fetched. Elapsed 0.006 seconds. > EXPLAIN SELECT encode(foo, 'hex') FROM test; +---------------+-----------------------------------------------------------------------------------------+ | plan_type | plan | +---------------+-----------------------------------------------------------------------------------------+ | logical_plan | Projection: encode(CAST(test.foo AS Utf8), Utf8("hex")) | | | TableScan: test projection=[foo] | | physical_plan | ProjectionExec: expr=[encode(CAST(foo@0 AS Utf8), hex) as encode(test.foo,Utf8("hex"))] | | | MemoryExec: partitions=1, partition_sizes=[1] | | | | +---------------+-----------------------------------------------------------------------------------------+ 2 row(s) fetched. Elapsed 0.007 seconds. > SELECT encode(foo, 'hex') FROM test; Arrow error: Invalid argument error: Encountered non UTF-8 data: invalid utf-8 sequence of 1 bytes from index 0 > \q ``` ### Expected behavior ``` vlorentz@maxxi:~/datafusion/datafusion-cli$ git checkout 1b3608da7ca59d8d987804834d004e8b3e349d18^ Previous HEAD position was 1b3608da7 fix: coalesce schema issues (#12308) HEAD is now at 9a3f8d115 Minor: Encapsulate type check in GroupValuesColumn, avoid panic (#12620) vlorentz@maxxi:~/datafusion/datafusion-cli$ TMPDIR=/srv/softwareheritage/tmp/vlorentz/ cargo run Finished `dev` profile [unoptimized + debuginfo] target(s) in 53.01s Running `target/debug/datafusion-cli` DataFusion CLI v42.0.0 > create table test ( foo bytea ); 0 row(s) fetched. Elapsed 0.005 seconds. > insert into test (foo) values (X'8f50d3f60eae370ddbf85c86219c55108a350165'); +-------+ | count | +-------+ | 1 | +-------+ 1 row(s) fetched. Elapsed 0.005 seconds. > EXPLAIN SELECT encode(foo, 'hex') FROM test; +---------------+---------------------------------------------------------------------------+ | plan_type | plan | +---------------+---------------------------------------------------------------------------+ | logical_plan | Projection: encode(test.foo, Utf8("hex")) | | | TableScan: test projection=[foo] | | physical_plan | ProjectionExec: expr=[encode(foo@0, hex) as encode(test.foo,Utf8("hex"))] | | | MemoryExec: partitions=1, partition_sizes=[1] | | | | +---------------+---------------------------------------------------------------------------+ 2 row(s) fetched. Elapsed 0.005 seconds. > SELECT encode(foo, 'hex') FROM test; +------------------------------------------+ | encode(test.foo,Utf8("hex")) | +------------------------------------------+ | 8f50d3f60eae370ddbf85c86219c55108a350165 | +------------------------------------------+ 1 row(s) fetched. Elapsed 0.004 seconds. > \q ``` ### Additional context note `CAST(test.foo AS Utf8)` as part of the first query plan, which does not happen in the second one. cc @mesejo -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org