[GitHub] [arrow] jorisvandenbossche commented on a diff in pull request #13275: ARROW-16546: [Parquet][C++][Python] Make Thrift limits configurable

GitBox Thu, 02 Jun 2022 07:27:48 -0700


jorisvandenbossche commented on code in PR #13275:
URL: https://github.com/apache/arrow/pull/13275#discussion_r888009120



##########
python/pyarrow/_dataset_parquet.pyx:
##########
@@ -654,17 +668,49 @@ cdef class 
ParquetFragmentScanOptions(FragmentScanOptions):
     def pre_buffer(self, bint pre_buffer):
         self.arrow_reader_properties().set_pre_buffer(pre_buffer)
 
+    @property
+    def thrift_string_size_limit(self):
+        return self.reader_properties().thrift_string_size_limit()
+
+    @thrift_string_size_limit.setter
+    def thrift_string_size_limit(self, size):
+        if size <= 0:
+            raise ValueError("size must be larger than zero")
+        self.reader_properties().set_thrift_string_size_limit(size)
+
+    @property
+    def thrift_container_size_limit(self):
+        return self.reader_properties().thrift_container_size_limit()
+
+    @thrift_container_size_limit.setter
+    def thrift_container_size_limit(self, size):
+        if size <= 0:
+            raise ValueError("size must be larger than zero")

Review Comment:
   Should such a check be included in `_parquet.pyx` as well?



##########
python/pyarrow/parquet/__init__.py:
##########
@@ -2258,11 +2269,13 @@ class _ParquetDatasetV2:
     1       4  Horse  2022
     """
 
-    def __init__(self, path_or_paths, filesystem=None, filters=None,
+    def __init__(self, path_or_paths, filesystem=None, *, filters=None,
                  partitioning="hive", read_dictionary=None, buffer_size=None,
                  memory_map=False, ignore_prefixes=None, pre_buffer=True,
                  coerce_int96_timestamp_unit=None, schema=None,
-                 decryption_properties=None, **kwargs):
+                 decryption_properties=None, thrift_string_size_limit=None,
+                 thrift_container_size_limit=None,

Review Comment:
   Although if the tests pass ..



##########
python/pyarrow/parquet/__init__.py:
##########
@@ -2258,11 +2269,13 @@ class _ParquetDatasetV2:
     1       4  Horse  2022
     """
 
-    def __init__(self, path_or_paths, filesystem=None, filters=None,
+    def __init__(self, path_or_paths, filesystem=None, *, filters=None,
                  partitioning="hive", read_dictionary=None, buffer_size=None,
                  memory_map=False, ignore_prefixes=None, pre_buffer=True,
                  coerce_int96_timestamp_unit=None, schema=None,
-                 decryption_properties=None, **kwargs):
+                 decryption_properties=None, thrift_string_size_limit=None,
+                 thrift_container_size_limit=None,

Review Comment:
   I think you need to add it to the signature of `class 
ParquetDataset.__new__` as well, to pass it through



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] jorisvandenbossche commented on a diff in pull request #13275: ARROW-16546: [Parquet][C++][Python] Make Thrift limits configurable

Reply via email to