[ 
https://issues.apache.org/jira/browse/ARROW-255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Le Dem updated ARROW-255:
--------------------------------
    Description: 
format/Messages.fbs mentions DictionaryBatches with an id but does not specify 
where they are referenced.

We should add a {{dictionary: long}} in Field that references the dictionary id:

Field: 
https://github.com/apache/arrow/blob/34e7f48cb71428c4d78cf00d8fdf0045532d6607/format/Message.fbs#L86

Dictionary id: 
https://github.com/apache/arrow/blob/34e7f48cb71428c4d78cf00d8fdf0045532d6607/format/Message.fbs#L165

We need a spec in format/Layout.md that describes the dictionary layout.
When dictionary encoded the value vector is an array of signed int32 (for 
consistency with variable length collection offsets).
The dictionary vector is a Vector of the type of the value. indexed by their id 
in the dictionary.

  was:
format/Messages.fbs mentions DictionaryBatches with an id but does not specify 
where they are referenced.

We should add a {{dictionary: long}} in Field that references the dictionary id:

Field: 
https://github.com/apache/arrow/blob/34e7f48cb71428c4d78cf00d8fdf0045532d6607/format/Message.fbs#L86

Dictionary id: 
https://github.com/apache/arrow/blob/34e7f48cb71428c4d78cf00d8fdf0045532d6607/format/Message.fbs#L165

We need a spec in format/Layout.md that describes the dictionary layout.
When dictionary encoded the value vector is an array of signed int32 (for 
consistency with ).
The dictionary vector is a Vector of the type of the value. indexed by their id 
in the dictionary.


> Finalize Dictionary representation
> ----------------------------------
>
>                 Key: ARROW-255
>                 URL: https://issues.apache.org/jira/browse/ARROW-255
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: Format
>            Reporter: Julien Le Dem
>
> format/Messages.fbs mentions DictionaryBatches with an id but does not 
> specify where they are referenced.
> We should add a {{dictionary: long}} in Field that references the dictionary 
> id:
> Field: 
> https://github.com/apache/arrow/blob/34e7f48cb71428c4d78cf00d8fdf0045532d6607/format/Message.fbs#L86
> Dictionary id: 
> https://github.com/apache/arrow/blob/34e7f48cb71428c4d78cf00d8fdf0045532d6607/format/Message.fbs#L165
> We need a spec in format/Layout.md that describes the dictionary layout.
> When dictionary encoded the value vector is an array of signed int32 (for 
> consistency with variable length collection offsets).
> The dictionary vector is a Vector of the type of the value. indexed by their 
> id in the dictionary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to