amol- commented on a change in pull request #1:
URL: https://github.com/apache/arrow-cookbook/pull/1#discussion_r676499072



##########
File path: python/source/io.rst
##########
@@ -0,0 +1,423 @@
+========================
+Reading and Writing Data
+========================
+
+Recipes related to reading and writing data from disk using
+Apache Arrow.
+
+.. contents::
+
+Write a Parquet file
+====================
+
+.. testsetup::
+
+    import numpy as np
+    import pyarrow as pa
+
+    arr = pa.array(np.arange(100))
+
+Given an array with all numbers from 0 to 100
+
+.. testcode::
+
+    print(f"{arr[0]} .. {arr[-1]}")
+
+.. testoutput::
+
+    0 .. 99
+
+To write it to a Parquet file, as Parquet is a columnar format,
+we must create a :class:`pyarrow.Table` out of it,
+so that we get a table of a single column which can then be
+written to a Parquet file. 
+
+.. testcode::
+
+    table = pa.Table.from_arrays([arr], names=["col1"])
+
+Once we have a table, it can be written to a Parquet File 
+using the functions provided by the ``pyarrow.parquet`` module
+
+.. testcode::
+
+    import pyarrow.parquet as pq
+
+    pq.write_table(table, "example.parquet", compression=None)
+
+Reading a Parquet file
+======================
+
+Given a Parquet file, it can be read back to a :class:`pyarrow.Table`
+by using :func:`pyarrow.parquet.read_table` function
+
+.. testcode::
+
+    import pyarrow.parquet as pq
+
+    table = pq.read_table("example.parquet")
+
+The resulting table will contain the same columns that existed in
+the parquet file as :class:`ChunkedArray`
+
+.. testcode::
+
+    print(table)
+
+    col1 = table["col1"]
+    print(f"{type(col1).__name__} = {col1[0]} .. {col1[-1]}")
+
+.. testoutput::
+
+    pyarrow.Table
+    col1: int64
+    ChunkedArray = 0 .. 99

Review comment:
       It was actually one of the original design goals of the cookbook, while 
it's true that it might require more maintenance cost when the repr output 
changes, it's a minor effort (you just need to copy the new output into the old 
`.. testoutput`) but that will guarantee us that if there is _any_ change in 
arrow in terms of behaviours the recipes will always reflect exactly what arrow 
is going to do/answer.
   We plan to setup a nightly job to verify the cookbook against the latest 
arrow master so that we catch any change and can update it.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to