westonpace commented on code in PR #12701: URL: https://github.com/apache/arrow/pull/12701#discussion_r850042491
########## python/pyarrow/includes/libarrow_dataset.pxd: ########## @@ -18,6 +18,7 @@ # distutils: language = c++ from libcpp.unordered_map cimport unordered_map +from libcpp cimport bool Review Comment: So this works ([it creates an alias for `bint` named `bool`](https://github.com/cython/cython/blob/d2d2ea33a21e1915a52790ef64fdd1f28867854c/Cython/Includes/libcpp/__init__.pxd#L2)) but elsewhere we either use `bint` directly or call it `c_bool` with: `from libcpp cimport bool as c_bool`. Lets stick with one of the two existing approaches instead of adding a third. ########## python/pyarrow/tests/conftest.py: ########## @@ -311,3 +313,23 @@ def s3_server(s3_connection): finally: if proc is not None: proc.kill() + + [email protected](scope='session') +def limited_s3_user(request, s3_server): Review Comment: Why is this method here instead of `util.py`? ########## python/pyarrow/tests/test_dataset.py: ########## @@ -4286,6 +4287,55 @@ def test_write_dataset_s3(s3_example_simple): assert result.equals(table) +_minio_put_only_policy = """{ + "Version": "2012-10-17", + "Statement": [ + { + "Effect": "Allow", + "Action": [ + "s3:PutObject", + "s3:ListBucket", + "s3:GetObjectVersion" + ], + "Resource": [ + "arn:aws:s3:::*" + ] + } + ] +}""" + + [email protected] [email protected] +def test_write_dataset_s3_put_only(s3_server, limited_s3_user): + from pyarrow.fs import S3FileSystem + + # write dataset with s3 filesystem + host, port, _, _ = s3_server['connection'] + fs = S3FileSystem( + access_key='limited', + secret_key='limited123', + endpoint_override='{}:{}'.format(host, port), + scheme='http' + ) + limited_s3_user(_minio_put_only_policy) + table = pa.table([ + pa.array(range(20)), pa.array(np.random.randn(20)), + pa.array(np.repeat(['a', 'b'], 10))], + names=["f1", "f2", "part"] + ) + # writing with filesystem object with create_dir flag set to false Review Comment: Can you add a check to make sure that writing the dataset with `create_dir=True` fails? It doesn't really give us any coverage of our code but it allows us to make sure we've configured the limited server correctly so the test is more robust. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
