David Lee created ARROW-4032: -------------------------------- Summary: [Python] New pyarrow.Table.from_pydict() function Key: ARROW-4032 URL: https://issues.apache.org/jira/browse/ARROW-4032 Project: Apache Arrow Issue Type: Task Components: Python Reporter: David Lee
Here's a proposal to create a pyarrow.Table.from_pydict() function. Right now only pyarrow.Table.from_pandas() exist and there are inherit problems using Pandas with NULL support for Int(s) and Boolean(s) [http://pandas.pydata.org/pandas-docs/version/0.23.4/gotchas.html] {{NaN}}, Integer {{NA}} values and {{NA}} type promotions: Sample python code on how this would work. {code:java} import pyarrow as pa from datetime import datetime pylist = [ {"name": "Tom", "age": 10}, {"name": "Mark", "age": 5, "city": "San Francisco"}, {"name": "Pam", "age": 7, "birthday": datetime.now()} ] def from_pydict(pylist, columns): arrow_columns = list() for column in columns: arrow_columns.append(pa.array([v[column] if column in v else None for v in pylist])) arrow_table = pa.Table.from_arrays(arrow_columns, columns) return arrow_table test = from_pydict(pylist, ['name' , 'age', 'city', 'birthday', 'dummy']) {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)