How Pandas/Perspective represent table pivots in arrow

Michael Lavina Wed, 21 Jul 2021 10:09:36 -0700

Hello Apache Arrow Team,

I am looking at ways my company can create an SDK that can share apache arrow 
data while preserving table pivots. I was looking at how Pandas and Perspective 
do it and it seems like


For row_pivots

Pandas just sorts the data into a flat arrow structure

Perspective actually generates a rowPath for each row

Does Pandas generate a row_path per row that I can reference?

For column_pivots

Pandas and perspective both seem to create new arrays whose names denote the 
column_path i.e.

Share price per monthj
Ticker | January       |   February  |
            | start | end | start | end |
Goog  | 200   | 244 | 246   | 260 |
F          | 35.   | 35    | 35.    | 50.  |


Would be represented like

| ticker | (January, start) | (January, end) | (February, start) | (February, 
end)

Why isn’t Pandas and perspective for that matter use Structs to denote that the 
start of the month ticker price is a child of the column January instead of 
hard coding that information in the name of the column?

Is that practice documented anywhere so that if we were to create an SDK for 
internal use it could be easily fed into Pandas?

How Pandas/Perspective represent table pivots in arrow

Reply via email to