Wes and co., First off, great project ! I was able to read the docs and get going in under a day, the APIs are super easy to use. That being said, I'm a tad stuck, and having exhausted google-fu, am here to assistance. I want to use pyarrow to write a nested dataset in parquet. The schema is quite complex, and I'm having difficulty getting going with arrays for nested data structures. For e.g, a column in my schema look like this:
In [7]: schema Out[7]: cstruct: struct<field1: double, field2: struct<field1: string>, field3: list<item: int32>, field4: list<struct: struct<field1: int32>>> child 0, field1: double child 1, field2: struct<field1: string> child 0, field1: string child 2, field3: list<item: int32> child 0, item: int32 child 3, field4: list<struct: struct<field1: int32>> child 0, struct: struct<field1: int32> child 0, field1: int32 How would I go constructing a row with this type? I've been looking at StructArray and ListArray. I've found the following links during my research: * https://github.com/apache/arrow/issues/1217 * https://stackoverflow.com/questions/45341182/nested-data-in-parquet-with-python * https://github.com/apache/arrow/commit/5c704bce42e3fa71ea4586368962d41173b3e17b I've managed to wrangle everything but ListArrays, e.g: field1_data = pa.array([1.1], type=pa.float64()) field2_data = pa.StructArray.from_arrays(['field1'], [pa.array(['foo'], type=pa.string())]) field3_data = pa.array([[1], [2]], type=pa.list_(pa.int32())) I've having trouble with field4: field4_struct = pa.StructArray.from_arrays(['field1'], [pa.array([1], type=pa.int32())]) field4_data = pa.ListArray.from_arrays(??, field4_struct) In particular, what does the offset value mean, and how do I populate it? Thanks in advance for all the help. -- Ishaan