[
https://issues.apache.org/jira/browse/ARROW-4083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16725448#comment-16725448
]
Antoine Pitrou commented on ARROW-4083:
---------------------------------------
Doesn't it add constraints to every downstream consumer of chunked arrays?
Intuitively, that sounds like a rather bad idea. Perhaps we need some kind of
"logical chunked array" where individual chunks are allowed to have different
but compatible types (how compatible remains to be defined, e.g. do we allow
int8 then int32?).
> [C++] Allowing ChunkedArrays to contain a mix of DictionaryArray and dense
> Array (of the dictionary type)
> ---------------------------------------------------------------------------------------------------------
>
> Key: ARROW-4083
> URL: https://issues.apache.org/jira/browse/ARROW-4083
> Project: Apache Arrow
> Issue Type: Improvement
> Components: C++
> Reporter: Wes McKinney
> Priority: Major
> Fix For: 0.13.0
>
>
> In some applications we may receive a stream of some dictionary encoded data
> followed by some non-dictionary encoded data. For example this happens in
> Parquet files when the dictionary reaches a certain configurable size
> threshold.
> We should think about how we can model this in our in-memory data structures,
> and how it can flow through to relevant computational components (i.e.
> certain data flow observers -- like an Aggregation -- might need to be able
> to process either a dense or dictionary encoded version of a particular array
> in the same stream)
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)