chairmank commented on issue #19157:
URL: https://github.com/apache/arrow/issues/19157#issuecomment-1528037394

   I would also like `pyarrow.array` to automatically convert Python values 
when a sparse union or dense union type is explicitly specified. I frequently 
use dense union types to represent data that originated in protocol buffers 
with `oneof` fields. It is inconvenient to have to implement special handling 
of this case when the target Arrow schema is known.
   
   Also, I would like to politely observe that example code snippets in 
previous comments are misleading, because they do not distinguish between child 
fields that happen to have the same data type.
   
   > [Antoine 
Pitrou](https://issues.apache.org/jira/browse/ARROW-2774?focusedCommentId=17392943)
 / @pitrou: I'm still not convinced this is a good idea. Consider `pa.array([1, 
2.3])`. Should it return a `union<int64, float64>`?
   > 
   > cc @amol- for advice.
   
   > [Joris Van den 
Bossche](https://issues.apache.org/jira/browse/ARROW-2774?focusedCommentId=17393143)
 / @jorisvandenbossche: Agreed that we shouldn't do that by default, but we can 
keep this issue about actually supporting it? Because now construction of a 
union array from a python sequence is not even supported when explicitly 
mentioning the type.
   > 
   > ```java
   > In [52]: typ = pa.union([pa.field("int", "int64"), pa.field("float", 
"float64")], mode="sparse")
   > 
   > In [53]: pa.array([1, 2.3], type=typ)
   > ...
   > ArrowNotImplementedError: sparse_union
   > ../src/arrow/util/converter.h:265  VisitTypeInline(*visitor.type, &visitor)
   > ../src/arrow/python/python_to_arrow.cc:1015  (MakeConverter<PyConverter, 
PyConverterTrait>( options.type, options, pool))
   > ```
   
   As an example, consider the following union type:
   ```
   >>> string_predicate_type = pa.dense_union([
   ...     pa.field("regexp", pa.string(), False),
   ...     pa.field("regexp", pa.string(), False),
   ...     pa.field("is_null", pa.null()),
   ... ])
   >>> string_predicate_type
   DenseUnionType(dense_union<regexp: string not null=0, regexp: string not 
null=1, is_null: null=2>)
   ```
   
   Both `equals` and `regexp` are string, but they are semantically distinct. 
For `pyarrow.array` to convert Python values to the correct child field type, 
the values ought to be tagged:
   ```
   pa.array([{"equals": "foo"}, {"regexp": "[0-9a-f]{16}"}, {"is_null": None}], 
type=string_predicate_type)
   ```
   
   
   
   
   
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to