jorisvandenbossche commented on code in PR #36627:
URL: https://github.com/apache/arrow/pull/36627#discussion_r1412403609


##########
python/pyarrow/_dataset.pyx:
##########
@@ -1963,6 +1963,100 @@ cdef class FragmentScanOptions(_Weakrefable):
         except TypeError:
             return False
 
+cdef class CacheOptions(_Weakrefable):
+    """
+    Cache options for a pre-buffered fragment scan.
+    Parameters

Review Comment:
   ```suggestion
   
       Parameters
   ```



##########
python/pyarrow/_dataset.pxd:
##########
@@ -83,6 +83,16 @@ cdef class FragmentScanOptions(_Weakrefable):
     @staticmethod
     cdef wrap(const shared_ptr[CFragmentScanOptions]& sp)
 

Review Comment:
   ```suggestion
   
   
   ```
   
   Small styling nitpick: two blank lines between class definitions



##########
python/pyarrow/tests/test_dataset.py:
##########
@@ -16,31 +16,29 @@
 # under the License.
 
 import contextlib
-import os
-import posixpath

Review Comment:
   This probably went wrong in the rebase, you accidentally removed 
`posixpath`, which now causes some of the CI failures



##########
python/pyarrow/_dataset_parquet.pyx:
##########
@@ -769,6 +777,14 @@ cdef class ParquetFragmentScanOptions(FragmentScanOptions):
     def pre_buffer(self, bint pre_buffer):
         self.arrow_reader_properties().set_pre_buffer(pre_buffer)
 
+    @property
+    def cache_options(self):
+        return 
CacheOptions.wrap(self.arrow_reader_properties().cache_options())
+
+    @cache_options.setter
+    def cache_options(self, CacheOptions options):
+        self.arrow_reader_properties().set_cache_options(options.unwrap())

Review Comment:
   When adding this new parameter/property here, you will probably also have to 
update `equals` and `__reduce__` a bit below.



##########
python/pyarrow/_dataset.pxd:
##########
@@ -83,6 +83,16 @@ cdef class FragmentScanOptions(_Weakrefable):
     @staticmethod
     cdef wrap(const shared_ptr[CFragmentScanOptions]& sp)
 
+cdef class CacheOptions(_Weakrefable):
+    cdef:
+        CCacheOptions wrapped
+
+    cdef void init(self, CCacheOptions options)
+
+    cdef inline CCacheOptions unwrap(self)
+
+    @staticmethod
+    cdef wrap(const CCacheOptions options)
 

Review Comment:
   ```suggestion
   
   
   ```



##########
python/pyarrow/_dataset.pyx:
##########
@@ -1963,6 +1963,100 @@ cdef class FragmentScanOptions(_Weakrefable):
         except TypeError:
             return False
 
+cdef class CacheOptions(_Weakrefable):
+    """
+    Cache options for a pre-buffered fragment scan.
+    Parameters
+    ----------
+    hole_size_limit : int, default 8Ki
+        The maximum distance in bytes between two consecutive ranges; beyond 
+        this value, ranges are not combined.
+    range_size_limit : int, default 32Mi
+        The maximum size in bytes of a combined range; if combining two 
+        consecutive ranges would produce a range of a size greater than this, 
+        they are not combined
+    lazy : bool, default False
+        A lazy cache does not perform any I/O until requested.
+    prefetch_limit : int, default 0
+        The maximum number of ranges to be prefetched. This is only used for 
+        lazy cache to asynchronously read some ranges after reading the target 
+        range.
+    """
+
+    def __init__(self, *, hole_size_limit=None, range_size_limit=None, 
lazy=None, prefetch_limit=None):
+        self.wrapped = CCacheOptions.Defaults()
+        if hole_size_limit is not None:
+            self.hole_size_limit = hole_size_limit
+        if range_size_limit is not None:
+            self.range_size_limit = range_size_limit
+        if lazy is not None:
+            self.lazy = lazy
+        if prefetch_limit is not None:
+            self.prefetch_limit = prefetch_limit
+
+    cdef void init(self, CCacheOptions options):
+        self.wrapped = options
+
+    cdef inline CCacheOptions unwrap(self):
+        return self.wrapped
+
+    @staticmethod
+    cdef wrap(CCacheOptions options):
+        self = CacheOptions()
+        self.init(options)
+        return self
+
+    @property
+    def hole_size_limit(self):
+        return self.wrapped.hole_size_limit
+
+    @hole_size_limit.setter
+    def hole_size_limit(self, hole_size_limit):
+        self.wrapped.hole_size_limit = hole_size_limit
+
+    @property
+    def range_size_limit(self):
+        return self.wrapped.range_size_limit
+
+    @range_size_limit.setter
+    def range_size_limit(self, range_size_limit):
+        self.wrapped.range_size_limit = range_size_limit
+
+    @property
+    def lazy(self):
+        return self.wrapped.lazy
+
+    @lazy.setter
+    def lazy(self, lazy):
+        self.wrapped.lazy = lazy
+
+    @property
+    def prefetch_limit(self):
+        return self.wrapped.prefetch_limit
+
+    @prefetch_limit.setter
+    def prefetch_limit(self, prefetch_limit):
+        self.wrapped.prefetch_limit = prefetch_limit
+
+    def __eq__(self, CacheOptions other):
+        try:
+            return self.unwrap().Equals(other.unwrap())
+        except TypeError:
+            return False
+
+    @classmethod
+    def _reconstruct(cls, kwargs):
+        return cls(**kwargs)
+
+    def __reduce__(self):
+        kwargs = dict(
+            hole_size_limit=self.hole_size_limit,
+            range_size_limit=self.range_size_limit,
+            lazy=self.lazy,
+            prefetch_limit=self.prefetch_limit,
+        )
+        return type(self)._reconstruct, (kwargs,)

Review Comment:
   We changed a bit how this is typically done (staticmethod instead of 
classmethod and with @bindings(true)), see 
https://github.com/apache/arrow/commit/175b2a2fe2d6e35237c36666478800f62e3f6947
   
   (this should fix one of the CI failures, I think)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to