maxburke opened a new issue, #6972:
URL: https://github.com/apache/arrow-datafusion/issues/6972

   ### Describe the bug
   
   The `array_contains` code seems to be overly-flattening input lists and in 
doing so is generating incorrect data when one of the parameters is a column of 
List-type.
   
   ### To Reproduce
   
   I've attached a parquet table containing a column with type List(String).
   
   When use array_contains on this data, I get this result set:
   
   ```
   ❯ create external table t0 stored as parquet location 
'/Users/max/tmp/array_contains.parquet';
   0 rows in set. Query took 0.017 seconds.
   ❯ select bid_node_ids from t0 where array_contains(bid_node_ids, 
['z+CPVybgUuCXlAE3A3jqyg==']);
   +----------------------------+
   | bid_node_ids               |
   +----------------------------+
   | [okwzcOFM3yjUzNFbc/BYBQ==] |
   | [DbNysJTF560NzR/HLbAa/Q==] |
   | [ivO3+Z+WMRqwhivy85d6KA==] |
   +----------------------------+
   3 rows in set. Query took 0.076 seconds.
   ❯
   ```
   Note that none of the resulting `bid_node_ids` values contain the 
queried-for value of `z+CPVybgUuCXlAE3A3jqyg==`
   
   
[array_contains.parquet.zip](https://github.com/apache/arrow-datafusion/files/12061936/array_contains.parquet.zip)
   
   
   ### Expected behavior
   
   I was expecting that there are 861 matching results in the result set, all 
of which contain the value `z+CPVybgUuCXlAE3A3jqyg==`
   
   ```
   ❯ select bid_node_ids from t0 where array_contains(bid_node_ids, 
['z+CPVybgUuCXlAE3A3jqyg==']);
   
+--------------------------------------------------------------------------------+
   | bid_node_ids                                                               
    |
   
+--------------------------------------------------------------------------------+
   | [wFEkOS2AFYxekv7SzPrkiQ==, z+CPVybgUuCXlAE3A3jqyg==]                       
    |
   [....snip...]
   | [O3GAOhhCbfxgXcZEwLI7aQ==, z+CPVybgUuCXlAE3A3jqyg==]                       
    |
   | [O3GAOhhCbfxgXcZEwLI7aQ==, z+CPVybgUuCXlAE3A3jqyg==]                       
    |
   | [z+CPVybgUuCXlAE3A3jqyg==]                                                 
    |
   | [iTd7HyShRr0PqSKyqKT0+A==, z+CPVybgUuCXlAE3A3jqyg==]                       
    |
   | [edSh3ZpG53UB+JMV875ipg==, z+CPVybgUuCXlAE3A3jqyg==]                       
    |
   | [O3GAOhhCbfxgXcZEwLI7aQ==, z+CPVybgUuCXlAE3A3jqyg==]                       
    |
   | [edSh3ZpG53UB+JMV875ipg==, z+CPVybgUuCXlAE3A3jqyg==]                       
    |
   | [O3GAOhhCbfxgXcZEwLI7aQ==, z+CPVybgUuCXlAE3A3jqyg==]                       
    |
   | [O3GAOhhCbfxgXcZEwLI7aQ==, z+CPVybgUuCXlAE3A3jqyg==]                       
    |
   | [edSh3ZpG53UB+JMV875ipg==, z+CPVybgUuCXlAE3A3jqyg==]                       
    |
   | [edSh3ZpG53UB+JMV875ipg==, z+CPVybgUuCXlAE3A3jqyg==]                       
    |
   
+--------------------------------------------------------------------------------+
   861 rows in set. Query took 1.069 seconds.
   ```
   
   ### Additional context
   
   I've hacked together a change on our branch that gives us the changes we are 
expecting: 
https://github.com/urbanlogiq/arrow-datafusion/commit/a381f10257243b22375b42973d7701a3130c05e1
 but I'm not sure if this fix is what is intended by the original author.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to