TomTJarosz commented on code in PR #39314:
URL: https://github.com/apache/arrow/pull/39314#discussion_r1445530513
##########
python/pyarrow/pandas-shim.pxi:
##########
@@ -96,13 +98,17 @@ cdef class _PandasAPIShim(object):
self.has_sparse = False
cdef inline _check_import(self, bint raise_=True):
- if self._tried_importing_pandas:
- if not self._have_pandas and raise_:
- self._import_pandas(raise_)
- return
-
- self._tried_importing_pandas = True
- self._import_pandas(raise_)
+ if not self._tried_importing_pandas:
+ with self._lock:
+ if not self._tried_importing_pandas:
+ try:
+ self._import_pandas(raise_)
+ finally:
+ self._tried_importing_pandas = True
+ return
+
+ if not self._have_pandas and raise_:
+ self._import_pandas(raise_)
Review Comment:
This implementation of `_check_import` differs from your suggested change as
(I believe) the suggested change would introduce another race condition,
resulting in us not raising when we do not have pandas and `raise_=True`.
Consider 2 threads concurrently call `_check_import` and thread 1 acquires
lock while thread 2 waits to acquire the lock. Also assume the current env does
not have pandas. If this method had the following implementation:
```py
cdef inline _check_import(self, bint raise_=True):
if self._tried_importing_pandas:
if not self._have_pandas and raise_:
self._import_pandas(raise_)
return
with self._lock:
if not self._tried_importing_pandas:
try:
self._import_pandas(raise_)
finally:
self._tried_importing_pandas = True
```
thread 2 would _not_ raise, but it should. I believe the committed
implementation of `_check_import` addresses this race condition, along with the
original race condition this PR addresses.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]