Copilot commented on code in PR #50097: URL: https://github.com/apache/arrow/pull/50097#discussion_r3355060442
########## docs/source/format/Columnar.rst: ########## @@ -864,6 +864,12 @@ A union is defined by an ordered sequence of types; each slot in the union can have a value chosen from these types. The types are named like a struct's fields, and the names are part of the type metadata. +Each child type in a union has a type id (an 8-bit signed integer) +that identifies it. These type ids are not necessarily the same as the +index of the corresponding child array. For example, a union of two types +might assign type ids 5 and 7 rather than 0 and 1. The mapping from type +ids to child arrays is part of the union type definition. Review Comment: The docs call union type ids “8-bit signed integers”, but the allowed range is actually restricted to non-negative values 0..127 (which is why the text below mentions “more than 128 possible types”). Without stating the allowed range, “signed” may imply negative type ids are valid. ########## docs/source/format/Columnar.rst: ########## @@ -878,10 +884,10 @@ Dense union represents a mixed-type array with 5 bytes of overhead for each value. Its physical layout is as follows: * One child array for each type -* Types buffer: A buffer of 8-bit signed integers. Each type in the - union has a corresponding type id whose values are found in this - buffer. A union with more than 128 possible types can be modeled as - a union of unions. +* Types buffer: A buffer of 8-bit signed integers, indicating the type + id of each slot. Note that these type ids are not necessarily the + same as the child array index (see above). A union with more than 128 + possible types can be modeled as a union of unions. Review Comment: Similarly, the “Types buffer” description should make it explicit that type ids are constrained to 0..127; otherwise the “more than 128 possible types” sentence is confusing given that an int8 has 256 distinct bit patterns. ########## docs/source/format/Columnar.rst: ########## @@ -864,6 +864,12 @@ A union is defined by an ordered sequence of types; each slot in the union can have a value chosen from these types. The types are named like a struct's fields, and the names are part of the type metadata. +Each child type in a union has a type id (an 8-bit signed integer) Review Comment: The PR title/description mentions updating “Layout.rst”, but in this repo `Layout.rst` is just a stub redirecting to `Columnar.rst` (the actual content lives here). Consider updating the PR title/description to match the actual file being edited so it’s easier to find the change later. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
