[
https://issues.apache.org/jira/browse/ARROW-81?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17657182#comment-17657182
]
Rok Mihevc commented on ARROW-81:
---------------------------------
This issue has been migrated to [issue
#15500|https://github.com/apache/arrow/issues/15500] on GitHub. Please see the
[migration documentation|https://github.com/apache/arrow/issues/14542] for
further details.
> [Format] Add a Category logical type (distinct from dictionary-encoding)
> ------------------------------------------------------------------------
>
> Key: ARROW-81
> URL: https://issues.apache.org/jira/browse/ARROW-81
> Project: Apache Arrow
> Issue Type: New Feature
> Components: C++
> Reporter: Wes McKinney
> Assignee: Wes McKinney
> Priority: Major
> Fix For: 0.2.0
>
>
> A Category (or "factor") is a dictionary-encoded array whose dictionary has
> semantic meaning. The data consists of
> - An array of integer "codes"
> - A child array of some other type, known as the "categories" or "levels" of
> the array. Typically there is an "ordered" boolean flag indicating whether
> the order of the categories is meaningful.
> Category/factor types are used in a number of common statistical analyses.
> See, for example,
> http://www.voteview.com/R_Ordered_Logistic_or_Probit_Regression.htm. It is a
> basic requirement for Python and R, at least, as Arrow C++ consumers, to have
> this type. Separately, we should consider what is necessary to be able to
> transmit category data in IPCs -- possible an expansion of the Arrow format.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)