If I have a DataFrame with columns Date, Category, Value and group by Category I'll have multiple DataFrames with Date, Value columns. The result of the groupby is DataFrameGroupBy, which can't be serialized. This is why I tried to assemble a nested DataFrame instead (like the one in the SO link previously), but that doesn't work either.
As Apache Arrow JS doesn't support groupby (processing the original DF on the client-side), I was thinking of pushing the groupby operation to the server side (pyarrow), doing the groupby in pandas before serializing and sending it to the client. I was wondering whether this (nested arrow tables) is a supported feature or not (by calling chained table.toArray() or similar solution) Currently I process it in pure JS, it's not that ugly, but not really idiomatic either. The lack of Categorial data type and processing it row by row certainly has it's perf. price. Best regards, Adam Lippai On Thu, Oct 29, 2020 at 9:39 PM Joris Van den Bossche < jorisvandenboss...@gmail.com> wrote: > Can you give a more specific example of what kind of hierarchical data > you want to serialize? (eg the output of a groupby operation in pandas > typically is still a dataframe that can be converted to pyarrow and > serialized). > > In general, for hierarchical data we have the nested data types (eg > struct type when you nest "multiple columns in a single column"). > > Joris > > > On Thu, 29 Oct 2020 at 15:29, Adam Lippai <a...@rigo.sk> wrote: > > > > Hi, > > > > is there a way to serialize (IPC) hierarchical tabular data (eg. output > of > > pandas groupby) in python? > > I've tried to call pa.ipc.serialize_pandas() on this example, but it > throws > > error: > > https://stackoverflow.com/questions/51505504/pandas-nesting-dataframes > > > > Best regards, > > Adam Lippai >