This is what I want to extend for multiple tables:
https://issues.apache.org/jira/browse/ARROW-10045?focusedCommentId=17207790&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17207790
I would need to come up with custom binary wrapper for multiple serialized
pyarrow tables and since Arrow supports hierarchical data to some level, I
was looking for built-in support of nested tables.
I understand this might not be available on API level.

Best regards,
Adam Lippai

On Thu, Oct 29, 2020 at 10:14 PM Adam Lippai <a...@rigo.sk> wrote:

> If I have a DataFrame with columns Date, Category, Value and group by
> Category I'll have multiple DataFrames with Date, Value columns.
> The result of the groupby is DataFrameGroupBy, which can't be serialized.
> This is why I tried to assemble a nested DataFrame instead (like the one in
> the SO link previously), but that doesn't work either.
>
> As Apache Arrow JS doesn't support groupby (processing the original DF on
> the client-side), I was thinking of pushing the groupby operation to the
> server side (pyarrow), doing the groupby in pandas before serializing and
> sending it to the client.
> I was wondering whether this (nested arrow tables) is a supported feature
> or not (by calling chained table.toArray() or similar solution)
> Currently I process it in pure JS, it's not that ugly, but not really
> idiomatic either. The lack of Categorial data type and processing it row by
> row certainly has it's perf. price.
>
> Best regards,
> Adam Lippai
>
> On Thu, Oct 29, 2020 at 9:39 PM Joris Van den Bossche <
> jorisvandenboss...@gmail.com> wrote:
>
>> Can you give a more specific example of what kind of hierarchical data
>> you want to serialize? (eg the output of a groupby operation in pandas
>> typically is still a dataframe that can be converted to pyarrow and
>> serialized).
>>
>> In general, for hierarchical data we have the nested data types (eg
>> struct type when you nest "multiple columns in a single column").
>>
>> Joris
>>
>>
>> On Thu, 29 Oct 2020 at 15:29, Adam Lippai <a...@rigo.sk> wrote:
>> >
>> > Hi,
>> >
>> > is there a way to serialize (IPC) hierarchical tabular data (eg. output
>> of
>> > pandas groupby) in python?
>> > I've tried to call pa.ipc.serialize_pandas() on this example, but it
>> throws
>> > error:
>> > https://stackoverflow.com/questions/51505504/pandas-nesting-dataframes
>> >
>> > Best regards,
>> > Adam Lippai
>>
>

Reply via email to