[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #12153: ARROW-15338: [Python] Add `pyarrow.orc.read_table` API

GitBox Tue, 18 Jan 2022 23:56:44 -0800


jorisvandenbossche commented on a change in pull request #12153:
URL: https://github.com/apache/arrow/pull/12153#discussion_r787433850




##########
File path: python/pyarrow/orc.py
##########
@@ -175,3 +176,33 @@ def write_table(table, where):
     writer = ORCWriter(where)
     writer.write(table)
     writer.close()
+
+
+def read_table(source, columns=None, filesystem=None):
+    """
+    Read a table from ORC format
+
+    Parameters
+    ----------
+    source : str, pyarrow.NativeFile, or file-like object
+        If a string passed, can be a single file name or directory name. For
+        file-like objects, only read a single file. Use pyarrow.BufferReader to
+        read a file contained in a bytes or buffer-like object.
+    columns : list
+        If not None, only these columns will be read from the file. A column
+        name may be a prefix of a nested field, e.g. 'a' will select 'a.b',
+        'a.c', and 'a.d.e'. If empty, no columns will be read. Note
+        that the table will still have the correct num_rows set despite having
+        no columns.

Review comment:
       Hmm, it's maybe better to leave it for a separate JIRA, as reading in 
all columns to then select none of them in a second step might be a bit 
surprising for users? 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #12153: ARROW-15338: [Python] Add `pyarrow.orc.read_table` API

Reply via email to