westonpace commented on a change in pull request #63: URL: https://github.com/apache/arrow-cookbook/pull/63#discussion_r701570334
########## File path: python/source/create.rst ########## @@ -7,6 +7,68 @@ Tensors and all other Arrow entities. .. contents:: +Creating Arrays +=============== + +Arrow keeps data in continuous arrays optimised for memory footprint +and SIMD analyses. In Python it's possible to build :class:`pyarrow.Array` +starting from Python ``lists`` (or sequence types in general), +``numpy`` arrays and ``pandas`` Series. + +.. testcode:: + + import pyarrow as pa + + array = pa.array([1, 2, 3, 4, 5]) + +.. testcode:: + + print(array) + +.. testoutput:: + + [ + 1, + 2, + 3, + 4, + 5 + ] + +Arrays can also provide a ``mask`` to specify which values should Review comment: I don't know if it's worth mentioning but the `mask` must be a numpy array (e.g. typical python list won't work) ########## File path: python/source/create.rst ########## @@ -7,6 +7,68 @@ Tensors and all other Arrow entities. .. contents:: +Creating Arrays +=============== + +Arrow keeps data in continuous arrays optimised for memory footprint +and SIMD analyses. In Python it's possible to build :class:`pyarrow.Array` +starting from Python ``lists`` (or sequence types in general), +``numpy`` arrays and ``pandas`` Series. + +.. testcode:: + + import pyarrow as pa + + array = pa.array([1, 2, 3, 4, 5]) + +.. testcode:: + + print(array) + +.. testoutput:: + + [ + 1, + 2, + 3, + 4, + 5 + ] + +Arrays can also provide a ``mask`` to specify which values should +be considered nulls + +.. testcode:: + + import numpy as np + + array = pa.array([1, 2, 3, 4, 5], + mask=np.array([True, False, True, False, True])) + + print(array) + +.. testoutput:: + + [ + null, + 2, + null, + 4, + null + ] + +When building arrays from ``numpy`` or ``pandas``, Arrow will leverage +optimized code paths that rely on the internal in-memory representation +of the data by ``numpy`` and ``pandas`` + +.. testcode:: + + import numpy as np + import pandas Review comment: If you are going to do `import numpy as np` and `import pyarrow as pa` you should probably do `import pandas as pd` for consistency. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org