[GitHub] [arrow] jorisvandenbossche commented on pull request #36846: GH-36845: [C++][Python] Allow type promotion on `pa.concat_tables`

via GitHub Tue, 26 Sep 2023 03:06:04 -0700


jorisvandenbossche commented on PR #36846:
URL: https://github.com/apache/arrow/pull/36846#issuecomment-1735229268


   Comparing the current branch with numpy promotion rules:
   
   ```python
   import itertools
   import pyarrow as pa
   import numpy as np
   
   types = ["uint8", "uint16", "uint32", "uint64", "int8", "int16", "int32", 
"int64", "float16", "float32", "float64"]
   
   for t1, t2 in itertools.combinations(types, 2):
       np_result = np.result_type(t1, t2)
       pa_result = pa.unify_schemas([pa.schema([('a', t1)]), pa.schema([('a', 
t2)])], options="permissive").field('a').type
       pa_result_np = np.dtype(pa_result.to_pandas_dtype())
       if np_result != pa_result_np:
           print(f"{t1} + {t2}\t:\t{np_result} (numpy), {pa_result_np} (arrow)")
   ```
   ```
   uint8 + int8 :       int16 (numpy), int8 (arrow)
   uint16 + int8        :       int32 (numpy), int16 (arrow)
   uint16 + int16       :       int32 (numpy), int16 (arrow)
   uint32 + int8        :       int64 (numpy), int32 (arrow)
   uint32 + int16       :       int64 (numpy), int32 (arrow)
   uint32 + int32       :       int64 (numpy), int32 (arrow)
   uint64 + int8        :       float64 (numpy), int64 (arrow)
   uint64 + int16       :       float64 (numpy), int64 (arrow)
   uint64 + int32       :       float64 (numpy), int64 (arrow)
   uint64 + int64       :       float64 (numpy), int64 (arrow)
   ```
   
   That shows the issue of widening the bitwidth for int+uint as commented 
above. 
   Numpy also seems to go to float when combining uint64 with any signed 
integer (since uint64 values cannot all be held in int64). Not sure if we 
should also do that ..
   
   And comparing with our numeric kernels DispatchBest (implemented in 
`CommonNumeric`, added in https://github.com/apache/arrow/pull/9294):
   
   ```python
   for t1, t2 in itertools.combinations(types, 2):
       result_merge = pa.unify_schemas([pa.schema([('a', t1)]), 
pa.schema([('a', t2)])], options="permissive").field('a').type
       try:
           result_dispatch= pa.compute.add(pa.array([1, 2, 3], t1), 
pa.array(np.array([1, 2, 3], t2), t2)).type
       except:
           continue
       if result_merge != result_dispatch:
           print(f"{t1} + {t2}\t:\t{result_merge} (MergeOptions), 
{result_dispatch} (DispatchBest)")
   ```
   ```
   uint8 + int8 :       int8 (MergeOptions), int16 (DispatchBest)
   uint16 + int8        :       int16 (MergeOptions), int32 (DispatchBest)
   uint16 + int16       :       int16 (MergeOptions), int32 (DispatchBest)
   uint32 + int8        :       int32 (MergeOptions), int64 (DispatchBest)
   uint32 + int16       :       int32 (MergeOptions), int64 (DispatchBest)
   uint32 + int32       :       int32 (MergeOptions), int64 (DispatchBest)
   uint32 + float32     :       double (MergeOptions), float (DispatchBest)
   uint64 + float32     :       double (MergeOptions), float (DispatchBest)
   int32 + float32      :       double (MergeOptions), float (DispatchBest)
   int64 + float32      :       double (MergeOptions), float (DispatchBest)
   ```
   
   So the numeric kernels already widen the uint+int case. And they do not 
widen the float bitwidth in the uint+float case (that's maybe something to 
change in DispatchBest in a follow-up).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] jorisvandenbossche commented on pull request #36846: GH-36845: [C++][Python] Allow type promotion on `pa.concat_tables`

Reply via email to