jayceslesar opened a new issue, #1037:
URL: https://github.com/apache/iceberg-python/issues/1037
### Question
How would I go about using a field with mixed datatypes? Is that
recommended/possible? I am a fan of tall-tidy data and am wondering how to
properly go about the following?
```py
from pydantic import BaseModel
from datetime import datetime
import pyarrow as pa
from pyiceberg.catalog.sql import SqlCatalog
class Message(BaseModel):
system: str
node: str
message_name: str
signal: str
bus: str
timestamp: datetime
value: int | float | bool | str
@staticmethod
def to_pyarrow_schema():
return pa.schema([
pa.field('system', pa.string()),
pa.field('node', pa.string()),
pa.field('message_name', pa.string()),
pa.field('signal', pa.string()),
pa.field('bus', pa.string()),
pa.field('timestamp', pa.timestamp('s', tz='UTC')),
pa.field(pa.union([pa.field("value", pa.int32()),
pa.field("value", pa.float64()), pa.field("value", pa.bool_()),
pa.field("value", pa.string())], mode=pa.lib.UnionMode_SPARSE)),
])
catalog = SqlCatalog(
"default",
**{
"uri": "my_uri/catalog",
},
)
catalog.create_table(
identifier="default.messages",
schema=Message.to_pyarrow_schema(),
)
```
Right now it throws an error `TypeError: Expected primitive type, got:
<class 'pyarrow.lib.SparseUnionType'>` which makes sense as what I am
attempting isn't supported.
Should I be using a string type and casting in my queries?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]