[GitHub] [arrow] dongjoon-hyun commented on a change in pull request #12153: ARROW-15338: [Python] Add `pyarrow.orc.read_table` API

GitBox Wed, 19 Jan 2022 10:37:29 -0800


dongjoon-hyun commented on a change in pull request #12153:
URL: https://github.com/apache/arrow/pull/12153#discussion_r788035820




##########
File path: python/pyarrow/orc.py
##########
@@ -175,3 +176,33 @@ def write_table(table, where):
     writer = ORCWriter(where)
     writer.write(table)
     writer.close()
+
+
+def read_table(source, columns=None, filesystem=None):
+    """
+    Read a table from ORC format
+
+    Parameters
+    ----------
+    source : str, pyarrow.NativeFile, or file-like object
+        If a string passed, can be a single file name or directory name. For
+        file-like objects, only read a single file. Use pyarrow.BufferReader to
+        read a file contained in a bytes or buffer-like object.
+    columns : list
+        If not None, only these columns will be read from the file. A column
+        name may be a prefix of a nested field, e.g. 'a' will select 'a.b',
+        'a.c', and 'a.d.e'. If empty, no columns will be read. Note
+        that the table will still have the correct num_rows set despite having
+        no columns.

Review comment:
       Thanks, @jorisvandenbossche . Yes, I agree with you that we need to 
optimize it in both `ORCFile` API and this one together for that case.

##########
File path: python/pyarrow/orc.py
##########
@@ -175,3 +176,33 @@ def write_table(table, where):
     writer = ORCWriter(where)
     writer.write(table)
     writer.close()
+
+
+def read_table(source, columns=None, filesystem=None):
+    """
+    Read a table from ORC format
+
+    Parameters
+    ----------
+    source : str, pyarrow.NativeFile, or file-like object
+        If a string passed, can be a single file name or directory name. For
+        file-like objects, only read a single file. Use pyarrow.BufferReader to
+        read a file contained in a bytes or buffer-like object.
+    columns : list
+        If not None, only these columns will be read from the file. A column
+        name may be a prefix of a nested field, e.g. 'a' will select 'a.b',
+        'a.c', and 'a.d.e'. If empty, no columns will be read. Note
+        that the table will still have the correct num_rows set despite having
+        no columns.

Review comment:
       Thanks, @jorisvandenbossche . Yes, I agree with you that we need to 
optimize it in both `ORCFile` API and this one together for that case in a 
separate JIRA.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] dongjoon-hyun commented on a change in pull request #12153: ARROW-15338: [Python] Add `pyarrow.orc.read_table` API

Reply via email to