[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #7631: ARROW-8651: [Python][Dataset] Support pickling of Dataset objects

GitBox Mon, 06 Jul 2020 11:00:06 -0700


jorisvandenbossche commented on a change in pull request #7631:
URL: https://github.com/apache/arrow/pull/7631#discussion_r450390113




##########
File path: python/pyarrow/tests/test_dataset.py
##########
@@ -612,6 +613,83 @@ def test_make_fragment(multisourcefs):
         assert row_group_fragment.row_groups == [ds.RowGroupInfo(0)]
 
 
+def test_make_csv_fragment_from_buffer():
+    content = textwrap.dedent("""
+        alpha,num,animal
+        a,12,dog
+        b,11,cat
+        c,10,rabbit
+    """)
+    buffer = pa.py_buffer(content.encode('utf-8'))
+
+    csv_format = ds.CsvFileFormat()
+    fragment = csv_format.make_fragment(buffer)
+
+    expected = pa.table([['a', 'b', 'c'],
+                         [12, 11, 10],
+                         ['dog', 'cat', 'rabbit']],
+                        names=['alpha', 'num', 'animal'])
+    assert fragment.to_table().equals(expected)
+
+    pickled = pickle.loads(pickle.dumps(fragment))
+    assert pickled.to_table().equals(fragment.to_table())
+
+
[email protected]
+def test_make_parquet_fragment_from_buffer():
+    import pyarrow.parquet as pq
+
+    cases = [
+        (
+            pa.table(
+                [
+                    ['a', 'b', 'c'],
+                    [12, 11, 10],
+                    ['dog', 'cat', 'rabbit']

Review comment:
       you could reduce the used vertical space here a bit by only defining the 
list of arrays here, and do `table = pa.table(arrays, names=['alpha', 'num', 
'animal'])` in the loop




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #7631: ARROW-8651: [Python][Dataset] Support pickling of Dataset objects

Reply via email to