R-JunmingChen commented on code in PR #34586:
URL: https://github.com/apache/arrow/pull/34586#discussion_r1164217988


##########
python/pyarrow/tests/test_dataset.py:
##########
@@ -3172,6 +3202,58 @@ def test_csv_fragment_options(tempdir, dataset_reader):
     assert result.equals(
         pa.table({'col0': pa.array(['foo', 'spam', 'MYNULL'])}))
 
+@pytest.mark.pandas
+def test_json_format(tempdir, dataset_reader):
+    table = pa.table({'a': pa.array([1, 2, 3], type="int64"),
+                      'b': pa.array([.1, .2, .3], type="float64")})
+
+    path = str(tempdir / 'test.json')
+    out = table.to_pandas().to_json(orient='records')[1:-1].replace('},{', 
'}\n{')
+    with open(path, 'w') as f:
+        f.write(out)
+
+    dataset = ds.dataset(path, format=ds.JsonFileFormat())
+    result = dataset_reader.to_table(dataset)
+    assert result.equals(table)
+
+    assert_dataset_fragment_convenience_methods(dataset)
+
+    dataset = ds.dataset(path, format='json')
+    result = dataset_reader.to_table(dataset)
+    assert result.equals(table)
+
+def test_json_format_options(tempdir, dataset_reader):
+    table = pa.table({'a': pa.array([1, 2, 3], type="int64"),
+                      'b': pa.array([.1, .2, .3], type="float64")})
+
+    path = str(tempdir / 'test.json')
+    out = table.to_pandas().to_json(orient='records')[1:-1].replace('},{', 
'}\n{')
+    with open(path, 'w') as f:
+        f.write(out)
+    
+    dataset = ds.dataset(path, format=ds.JsonFileFormat(
+        read_options=pa.json.ReadOptions(block_size=64)))

Review Comment:
   > Is there any way to know if this option actually got applied? If the 
bindings were completely ignoring `block_size` would this test fail?
   > 
   > Is there another option that is easier to verify? Maybe 
`newlines_in_values` (and put some newlines in the test data) because then it 
will fail if this option is not getting set correctly?
   
   Since the `newlines_in_values` doesn't work for pyarrow.json interface too, 
I update my `blocksize` tests, referencing the code in test_json.py. In this 
way, we can check whether the `blocksize` is actually set.
   
   
https://github.com/apache/arrow/blob/5e8db3156c733a31e196683011db113e76ce6a32/python/pyarrow/tests/test_json.py#L138-L142
   
   We may submit a issue for `newlines_in_values` if it indeed fails to work. 
BTW, I check the code in test_json, there is no test for `newlines_in_values` 
too. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to