Hello Apache Arrow Team,
I am looking at ways my company can create an SDK that can share apache arrow
data while preserving table pivots. I was looking at how Pandas and Perspective
do it and it seems like
For row_pivots
Pandas just sorts the data into a flat arrow structure
Perspective actually generates a rowPath for each row
Does Pandas generate a row_path per row that I can reference?
For column_pivots
Pandas and perspective both seem to create new arrays whose names denote the
column_path i.e.
Share price per monthj
Ticker | January | February |
| start | end | start | end |
Goog | 200 | 244 | 246 | 260 |
F | 35. | 35 | 35. | 50. |
Would be represented like
| ticker | (January, start) | (January, end) | (February, start) | (February,
end)
Why isn’t Pandas and perspective for that matter use Structs to denote that the
start of the month ticker price is a child of the column January instead of
hard coding that information in the name of the column?
Is that practice documented anywhere so that if we were to create an SDK for
internal use it could be easily fed into Pandas?