progval opened a new issue, #14055:
URL: https://github.com/apache/datafusion/issues/14055

   ### Describe the bug
   
   `encode(..., "hex")` can be used to get the hexadecimal representation of a 
string or a binary. Since datafusion v43 (specifically, since 
1b3608da7ca59d8d987804834d004e8b3e349d18), only strings and binaries that 
happen to be valid UTF-8 are supported.
   
   ### To Reproduce
   
   ```
   vlorentz@maxxi:~/datafusion/datafusion-cli$ git checkout 
1b3608da7ca59d8d987804834d004e8b3e349d18
   HEAD is now at 1b3608da7 fix: coalesce schema issues (#12308)
   vlorentz@maxxi:~/datafusion/datafusion-cli$ 
TMPDIR=/srv/softwareheritage/tmp/vlorentz/ cargo run
       Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.27s
        Running `target/debug/datafusion-cli`
   DataFusion CLI v42.0.0
   > create table test ( foo bytea );
   0 row(s) fetched. 
   Elapsed 0.007 seconds.
   
   > insert into test (foo) values 
(X'8f50d3f60eae370ddbf85c86219c55108a350165');
   +-------+
   | count |
   +-------+
   | 1     |
   +-------+
   1 row(s) fetched. 
   Elapsed 0.006 seconds.
   
   > EXPLAIN SELECT encode(foo, 'hex') FROM test;
   
+---------------+-----------------------------------------------------------------------------------------+
   | plan_type     | plan                                                       
                             |
   
+---------------+-----------------------------------------------------------------------------------------+
   | logical_plan  | Projection: encode(CAST(test.foo AS Utf8), Utf8("hex"))    
                             |
   |               |   TableScan: test projection=[foo]                         
                             |
   | physical_plan | ProjectionExec: expr=[encode(CAST(foo@0 AS Utf8), hex) as 
encode(test.foo,Utf8("hex"))] |
   |               |   MemoryExec: partitions=1, partition_sizes=[1]            
                             |
   |               |                                                            
                             |
   
+---------------+-----------------------------------------------------------------------------------------+
   2 row(s) fetched. 
   Elapsed 0.007 seconds.
   
   > SELECT encode(foo, 'hex') FROM test;
   Arrow error: Invalid argument error: Encountered non UTF-8 data: invalid 
utf-8 sequence of 1 bytes from index 0
   > 
   \q
   ```
   
   ### Expected behavior
   
   ```
   vlorentz@maxxi:~/datafusion/datafusion-cli$ git checkout 
1b3608da7ca59d8d987804834d004e8b3e349d18^
   Previous HEAD position was 1b3608da7 fix: coalesce schema issues (#12308)
   HEAD is now at 9a3f8d115 Minor: Encapsulate type check in GroupValuesColumn, 
avoid panic (#12620)
   vlorentz@maxxi:~/datafusion/datafusion-cli$ 
TMPDIR=/srv/softwareheritage/tmp/vlorentz/ cargo run
       Finished `dev` profile [unoptimized + debuginfo] target(s) in 53.01s
        Running `target/debug/datafusion-cli`
   DataFusion CLI v42.0.0
   > create table test ( foo bytea );
   0 row(s) fetched. 
   Elapsed 0.005 seconds.
   
   > insert into test (foo) values 
(X'8f50d3f60eae370ddbf85c86219c55108a350165');
   +-------+
   | count |
   +-------+
   | 1     |
   +-------+
   1 row(s) fetched. 
   Elapsed 0.005 seconds.
   
   > EXPLAIN SELECT encode(foo, 'hex') FROM test;
   
+---------------+---------------------------------------------------------------------------+
   | plan_type     | plan                                                       
               |
   
+---------------+---------------------------------------------------------------------------+
   | logical_plan  | Projection: encode(test.foo, Utf8("hex"))                  
               |
   |               |   TableScan: test projection=[foo]                         
               |
   | physical_plan | ProjectionExec: expr=[encode(foo@0, hex) as 
encode(test.foo,Utf8("hex"))] |
   |               |   MemoryExec: partitions=1, partition_sizes=[1]            
               |
   |               |                                                            
               |
   
+---------------+---------------------------------------------------------------------------+
   2 row(s) fetched. 
   Elapsed 0.005 seconds.
   
   > SELECT encode(foo, 'hex') FROM test;
   +------------------------------------------+
   | encode(test.foo,Utf8("hex"))             |
   +------------------------------------------+
   | 8f50d3f60eae370ddbf85c86219c55108a350165 |
   +------------------------------------------+
   1 row(s) fetched. 
   Elapsed 0.004 seconds.
   
   > 
   \q
   ```
   
   ### Additional context
   
   note `CAST(test.foo AS Utf8)` as part of the first query plan, which does 
not happen in the second one.
   
   cc @mesejo 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to