rowillia opened a new issue, #36187:
URL: https://github.com/apache/arrow/issues/36187

   ### Describe the enhancement requested
   
   Joining two tables where 1 has any column of type `list` (even if it's not 
the join column) results in an exception.  For example:
   
   ```python
   import pyarrow as pa
   import random
   NUM_ITEMS = 30
   t1 = pa.Table.from_pydict({
       'id': [x.to_bytes(4, 'big') for x in range (NUM_ITEMS)],
       'array_column': [[z for z in range(3)] for x in range(NUM_ITEMS)],
   })
   t2 = pa.Table.from_pydict({
       'id': [x.to_bytes(4, 'big') for x in range (NUM_ITEMS)],
       'value': [x for x in range(NUM_ITEMS)]
   })
   t1.join(t2, 'id', join_type='inner')
   ```
   Results in the following exception:
   `ArrowInvalid: Data type list<item: int64> is not supported in join non-key 
field`
   
   This [exception](
   
https://github.com/apache/arrow/blob/f959a2e05c79351255227a91cb36d6ca39d01a3d/cpp/src/arrow/acero/hash_join_node.cc#L235-L248)
 is fairly unintuitive (I spent a few hours today trying to understand what was 
causing this exception) and could be made a lot clearer by providing the field 
name if it's available (I'm new to Arrow but I believe the name should be 
available?  
https://github.com/apache/arrow/blob/f959a2e05c79351255227a91cb36d6ca39d01a3d/cpp/src/arrow/type.h#L1829-L1831)
   
   ### Component(s)
   
   C++


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to