valkum commented on issue #8404:
URL: https://github.com/apache/arrow-rs/issues/8404#issuecomment-3355736883
The schema looks correct from what I can tell. I tried to write some
debugging tools for this case the past couple of days if I got a minute, but I
wasn't able to find eht root cuase yet.
Here is the schema for reference:
<details><summary># parquet-tools schema --format raw
~/Downloads/arrow-bug-dremel-encoding.parquet |jq</summary>
<p>
```
"name": "root",
"num_children": 9,
"children": [
{
"type": "BYTE_ARRAY",
"repetition_type": "OPTIONAL",
"name": "registrable",
"converted_type": "UTF8",
"logicalType": {
"STRING": {}
}
},
{
"type": "BYTE_ARRAY",
"repetition_type": "OPTIONAL",
"name": "etld",
"converted_type": "UTF8",
"logicalType": {
"STRING": {}
}
},
{
"type": "BOOLEAN",
"repetition_type": "OPTIONAL",
"name": "is_market"
},
{
"type": "BOOLEAN",
"repetition_type": "OPTIONAL",
"name": "is_expiring"
},
{
"type": "BOOLEAN",
"repetition_type": "OPTIONAL",
"name": "zone"
},
{
"type": "BOOLEAN",
"repetition_type": "OPTIONAL",
"name": "security_trails"
},
{
"repetition_type": "OPTIONAL",
"name": "markets",
"num_children": 1,
"converted_type": "LIST",
"logicalType": {
"LIST": {}
},
"children": [
{
"repetition_type": "REPEATED",
"name": "list",
"num_children": 1,
"children": [
{
"repetition_type": "OPTIONAL",
"name": "element",
"num_children": 6,
"children": [
{
"type": "BYTE_ARRAY",
"repetition_type": "OPTIONAL",
"name": "market",
"converted_type": "UTF8",
"logicalType": {
"STRING": {}
}
},
{
"type": "INT64",
"repetition_type": "OPTIONAL",
"name": "expiring",
"logicalType": {
"TIMESTAMP": {
"isAdjustedToUTC": false,
"unit": {
"MILLIS": {}
}
}
}
},
{
"type": "INT64",
"repetition_type": "OPTIONAL",
"name": "price",
"converted_type": "UINT_64",
"logicalType": {
"INTEGER": {
"bitWidth": 64,
"isSigned": false
}
}
},
{
"type": "INT64",
"repetition_type": "OPTIONAL",
"name": "min_price",
"converted_type": "UINT_64",
"logicalType": {
"INTEGER": {
"bitWidth": 64,
"isSigned": false
}
}
},
{
"type": "INT64",
"repetition_type": "OPTIONAL",
"name": "valuation",
"converted_type": "UINT_64",
"logicalType": {
"INTEGER": {
"bitWidth": 64,
"isSigned": false
}
}
},
{
"type": "BYTE_ARRAY",
"repetition_type": "OPTIONAL",
"name": "type",
"converted_type": "UTF8",
"logicalType": {
"STRING": {}
}
}
]
}
]
}
]
},
{
"type": "BYTE_ARRAY",
"repetition_type": "OPTIONAL",
"name": "domain",
"converted_type": "UTF8",
"logicalType": {
"STRING": {}
}
},
{
"repetition_type": "OPTIONAL",
"name": "embedding",
"num_children": 1,
"converted_type": "LIST",
"logicalType": {
"LIST": {}
},
"children": [
{
"repetition_type": "REPEATED",
"name": "list",
"num_children": 1,
"children": [
{
"type": "FLOAT",
"repetition_type": "OPTIONAL",
"name": "element"
}
]
}
]
}
]
}
```
</p>
</details>
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]