Julien Le Dem created ARROW-255: ----------------------------------- Summary: Finalize Dictionary representation Key: ARROW-255 URL: https://issues.apache.org/jira/browse/ARROW-255 Project: Apache Arrow Issue Type: Improvement Components: Format Reporter: Julien Le Dem
format/Messages.fbs mentions DictionaryBatches with an id but does not specify where they are referenced. We should add a {{dictionary: long}} in Field that references the dictionary id: Field: https://github.com/apache/arrow/blob/34e7f48cb71428c4d78cf00d8fdf0045532d6607/format/Message.fbs#L86 Dictionary id: https://github.com/apache/arrow/blob/34e7f48cb71428c4d78cf00d8fdf0045532d6607/format/Message.fbs#L165 We need a spec in format/Layout.md that describes the dictionary layout. When dictionary encoded the value vector is an array of unsigned int32. The dictionary vector is a Vector of the type of the value. indexed by their id in the dictionary. -- This message was sent by Atlassian JIRA (v6.3.4#6332)