westonpace commented on a change in pull request #87: URL: https://github.com/apache/arrow-cookbook/pull/87#discussion_r737853466
########## File path: python/source/io.rst ########## @@ -577,4 +577,121 @@ The content of the file can be read back to a :class:`pyarrow.Table` using .. testoutput:: - {'a': [1, 3, 5, 7], 'b': [2.0, 3.0, 4.0, 5.0], 'c': [1, 2, 3, 4]} \ No newline at end of file + {'a': [1, 3, 5, 7], 'b': [2.0, 3.0, 4.0, 5.0], 'c': [1, 2, 3, 4]} + +Writing Compressed Data +======================= + +Arrow provides support for writing files in compressed format, +both for formats that provide it natively like Parquet or Feather, +and for formats that don't support it out of the box like CSV. + +Given a table: + +.. testcode:: + + table = pa.table([ + pa.array([1, 2, 3, 4, 5]) + ], names=["numbers"]) + +Writing it compressed to parquet or feather requires passing the +``compression`` argument to the :func:`pyarrow.feather.write_feather` and +:func:`pyarrow.parquet.write_table` functions: + +.. testcode:: + + pa.feather.write_feather(table, "compressed.feather", + compression="lz4") + pa.parquet.write_table(table, "compressed.parquet", + compression="lz4") + +You can refer to the two functions documentation for a complete +list of supported compression formats. + +.. note:: + + Arrow actually uses compression by default when writing + parquet or feather files. Feather is compressed using ``lz4`` Review comment: I was surprised to hear we compressed by default actually. But it seems it is because I always use `pyarrow.ipc.RecordBatchFileWriter` and `pyarrow.ipc.RecordBatchStreamWriter` which do not. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org