progval opened a new issue, #14055:
URL: https://github.com/apache/datafusion/issues/14055
### Describe the bug
`encode(..., "hex")` can be used to get the hexadecimal representation of a
string or a binary. Since datafusion v43 (specifically, since
1b3608da7ca59d8d987804834d004e8b3e349d18), only strings and binaries that
happen to be valid UTF-8 are supported.
### To Reproduce
```
vlorentz@maxxi:~/datafusion/datafusion-cli$ git checkout
1b3608da7ca59d8d987804834d004e8b3e349d18
HEAD is now at 1b3608da7 fix: coalesce schema issues (#12308)
vlorentz@maxxi:~/datafusion/datafusion-cli$
TMPDIR=/srv/softwareheritage/tmp/vlorentz/ cargo run
Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.27s
Running `target/debug/datafusion-cli`
DataFusion CLI v42.0.0
> create table test ( foo bytea );
0 row(s) fetched.
Elapsed 0.007 seconds.
> insert into test (foo) values
(X'8f50d3f60eae370ddbf85c86219c55108a350165');
+-------+
| count |
+-------+
| 1 |
+-------+
1 row(s) fetched.
Elapsed 0.006 seconds.
> EXPLAIN SELECT encode(foo, 'hex') FROM test;
+---------------+-----------------------------------------------------------------------------------------+
| plan_type | plan
|
+---------------+-----------------------------------------------------------------------------------------+
| logical_plan | Projection: encode(CAST(test.foo AS Utf8), Utf8("hex"))
|
| | TableScan: test projection=[foo]
|
| physical_plan | ProjectionExec: expr=[encode(CAST(foo@0 AS Utf8), hex) as
encode(test.foo,Utf8("hex"))] |
| | MemoryExec: partitions=1, partition_sizes=[1]
|
| |
|
+---------------+-----------------------------------------------------------------------------------------+
2 row(s) fetched.
Elapsed 0.007 seconds.
> SELECT encode(foo, 'hex') FROM test;
Arrow error: Invalid argument error: Encountered non UTF-8 data: invalid
utf-8 sequence of 1 bytes from index 0
>
\q
```
### Expected behavior
```
vlorentz@maxxi:~/datafusion/datafusion-cli$ git checkout
1b3608da7ca59d8d987804834d004e8b3e349d18^
Previous HEAD position was 1b3608da7 fix: coalesce schema issues (#12308)
HEAD is now at 9a3f8d115 Minor: Encapsulate type check in GroupValuesColumn,
avoid panic (#12620)
vlorentz@maxxi:~/datafusion/datafusion-cli$
TMPDIR=/srv/softwareheritage/tmp/vlorentz/ cargo run
Finished `dev` profile [unoptimized + debuginfo] target(s) in 53.01s
Running `target/debug/datafusion-cli`
DataFusion CLI v42.0.0
> create table test ( foo bytea );
0 row(s) fetched.
Elapsed 0.005 seconds.
> insert into test (foo) values
(X'8f50d3f60eae370ddbf85c86219c55108a350165');
+-------+
| count |
+-------+
| 1 |
+-------+
1 row(s) fetched.
Elapsed 0.005 seconds.
> EXPLAIN SELECT encode(foo, 'hex') FROM test;
+---------------+---------------------------------------------------------------------------+
| plan_type | plan
|
+---------------+---------------------------------------------------------------------------+
| logical_plan | Projection: encode(test.foo, Utf8("hex"))
|
| | TableScan: test projection=[foo]
|
| physical_plan | ProjectionExec: expr=[encode(foo@0, hex) as
encode(test.foo,Utf8("hex"))] |
| | MemoryExec: partitions=1, partition_sizes=[1]
|
| |
|
+---------------+---------------------------------------------------------------------------+
2 row(s) fetched.
Elapsed 0.005 seconds.
> SELECT encode(foo, 'hex') FROM test;
+------------------------------------------+
| encode(test.foo,Utf8("hex")) |
+------------------------------------------+
| 8f50d3f60eae370ddbf85c86219c55108a350165 |
+------------------------------------------+
1 row(s) fetched.
Elapsed 0.004 seconds.
>
\q
```
### Additional context
note `CAST(test.foo AS Utf8)` as part of the first query plan, which does
not happen in the second one.
cc @mesejo
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]