westonpace commented on a change in pull request #87: URL: https://github.com/apache/arrow-cookbook/pull/87#discussion_r737850103
########## File path: python/source/io.rst ########## @@ -577,4 +577,122 @@ The content of the file can be read back to a :class:`pyarrow.Table` using .. testoutput:: - {'a': [1, 3, 5, 7], 'b': [2.0, 3.0, 4.0, 5.0], 'c': [1, 2, 3, 4]} \ No newline at end of file + {'a': [1, 3, 5, 7], 'b': [2.0, 3.0, 4.0, 5.0], 'c': [1, 2, 3, 4]} + +Writing Compressed Data +======================= + +Arrow provides support for writing files in compressed formats, +both for formats that provide compression natively like Parquet or Feather, +and for formats that don't support compression out of the box like CSV. + +Given a table: + +.. testcode:: + + table = pa.table([ + pa.array([1, 2, 3, 4, 5]) + ], names=["numbers"]) + +Writing compressed Parquet or Feather data is driven by the +``compression`` argument to the :func:`pyarrow.feather.write_feather` and +:func:`pyarrow.parquet.write_table` functions: + +.. testcode:: + + pa.feather.write_feather(table, "compressed.feather", + compression="lz4") + pa.parquet.write_table(table, "compressed.parquet", + compression="lz4") + +You can refer to each of those functions' documentation for a complete +list of supported compression formats. + +.. note:: + + Arrow actually uses compression by default when writing + Parquet or Feather files. Feather is compressed using ``lz4`` + by default and Parquet uses ``snappy`` by default. + +For formats that don't support compression natively, like CSV, +it's possible to save compressed data using +:class:`pyarrow.CompressedOutputStream`: + +.. testcode:: + + with pa.CompressedOutputStream("compressed.csv.gz", "gzip") as out: + pa.csv.write_csv(table, out) + +This requires decompressing the file when reading it back, +which can be done using :class:`pyarrow.CompressedInputStream` +as explained in the next recipe. + +Reading Compressed Data +======================= + +Arrow provides support for reading compressed files, +both for formats that provide it natively like Parquet or Feather, +and for files in formats that don't support compression natively, +like CSV, but have been compressed by an application. + +Reading compressed formats that have native support for compression +doesn't require any special handling. We can for example read back +the Parquet and Feather files we wrote in the previous recipe +simply invoking :meth:`pyarrow.feather.read_table` and Review comment: ```suggestion by simply invoking :meth:`pyarrow.feather.read_table` and ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org