raulcd commented on code in PR #49453:
URL: https://github.com/apache/arrow/pull/49453#discussion_r2925234679


##########
python/scripts/update_stub_docstrings.py:
##########
@@ -198,31 +203,77 @@ def add_docstrings_to_stubs(stubs_dir):
         stub_file.write_text(modified.code)
 
 
-def add_docstrings_from_build(stubs_dir, build_lib):
-    """
-    Entry point for setup.py: update docstrings using pyarrow from build 
directory.
+def _link_or_copy(source, destination):
+    if sys.platform != "win32":
+        try:
+            os.symlink(source, destination)
+            return
+        except OSError:
+            pass
+
+    if source.is_dir():
+        shutil.copytree(source, destination, symlinks=(sys.platform != 
"win32"))
+    else:
+        shutil.copy2(source, destination)
+
 
-    During the build process, pyarrow is not installed in the system Python.
-    We need to temporarily add the build directory to sys.path so we can
-    import pyarrow and extract docstrings from it.
+def _create_importable_pyarrow(pyarrow_pkg, source_dir, install_pyarrow_dir):
     """
-    stubs_dir, build_lib = Path(stubs_dir), Path(build_lib)
+    Populate pyarrow_pkg with source Python modules and installed binary 
artifacts

Review Comment:
   What does `Populate pyarrow_pkg` mean? I am not sure I understand what is 
`pyarrow_pkg` here. Is it the source tree? Are we copying the Arrow shared 
objects around to populate the stubs? Otherwise I am unsure what are we 
copying/linking. Isn't this what `PYARROW_BUNDLE_ARROW_CPP` does? Maybe we have 
to build after CMake runs the copy for `PYARROW_BUNDLE_ARROW_CPP`?



##########
python/scripts/update_stub_docstrings.py:
##########
@@ -198,31 +203,77 @@ def add_docstrings_to_stubs(stubs_dir):
         stub_file.write_text(modified.code)
 
 
-def add_docstrings_from_build(stubs_dir, build_lib):
-    """
-    Entry point for setup.py: update docstrings using pyarrow from build 
directory.
+def _link_or_copy(source, destination):
+    if sys.platform != "win32":
+        try:
+            os.symlink(source, destination)
+            return
+        except OSError:
+            pass
+
+    if source.is_dir():
+        shutil.copytree(source, destination, symlinks=(sys.platform != 
"win32"))
+    else:
+        shutil.copy2(source, destination)
+
 
-    During the build process, pyarrow is not installed in the system Python.
-    We need to temporarily add the build directory to sys.path so we can
-    import pyarrow and extract docstrings from it.
+def _create_importable_pyarrow(pyarrow_pkg, source_dir, install_pyarrow_dir):
     """
-    stubs_dir, build_lib = Path(stubs_dir), Path(build_lib)
+    Populate pyarrow_pkg with source Python modules and installed binary 
artifacts
+    so that pyarrow can be imported from the parent directory of pyarrow_pkg.
+    """
+    ext_suffix = sysconfig.get_config_var("EXT_SUFFIX") or ".so"

Review Comment:
   Is this `EXT_SUFFIX` env var necessary? doesn't seem to be used anywhere 
apart from this line



##########
python/CMakeLists.txt:
##########
@@ -1025,3 +1025,54 @@ if(PYARROW_BUILD_PARQUET)
     target_link_libraries(_parquet_encryption PRIVATE 
arrow_python_parquet_encryption)
   endif()
 endif()
+
+#
+# Type stubs with docstring injection
+#
+# Stubs live in pyarrow-stubs/pyarrow/ during development but are installed
+# alongside the package so type checkers can find them (PEP 561).
+set(PYARROW_REQUIRE_STUB_DOCSTRINGS OFF)
+if(DEFINED SKBUILD_STATE
+   AND SKBUILD_STATE STREQUAL "wheel"
+   AND NOT CMAKE_SYSTEM_NAME STREQUAL "Emscripten"
+   AND DEFINED ENV{CI}

Review Comment:
   If I understand this correctly this will default to `ON` on wheels and only 
on CI, otherwise is off and we have to turn it on manually, i.e. testing 
locally?
   
   Is this logic necessary? Should we just remove all that, add the option and 
enable it as part of the wheel build scripts by setting an env var? We can add 
`PYARROW_REQUIRE_STUB_DOCSTRINGS` as a possible option on our pyproject.toml:
   `PYARROW_REQUIRE_STUB_DOCSTRINGS = {env = "PYARROW_REQUIRE_STUB_DOCSTRINGS", 
default = "OFF"}` similar to `PYARROW_BUNDLE_ARROW_CPP`



##########
python/CMakeLists.txt:
##########
@@ -1025,3 +1025,54 @@ if(PYARROW_BUILD_PARQUET)
     target_link_libraries(_parquet_encryption PRIVATE 
arrow_python_parquet_encryption)
   endif()
 endif()
+
+#
+# Type stubs with docstring injection
+#
+# Stubs live in pyarrow-stubs/pyarrow/ during development but are installed
+# alongside the package so type checkers can find them (PEP 561).
+set(PYARROW_REQUIRE_STUB_DOCSTRINGS OFF)
+if(DEFINED SKBUILD_STATE
+   AND SKBUILD_STATE STREQUAL "wheel"
+   AND NOT CMAKE_SYSTEM_NAME STREQUAL "Emscripten"
+   AND DEFINED ENV{CI}
+   AND NOT "$ENV{CI}" STREQUAL "")
+  set(PYARROW_REQUIRE_STUB_DOCSTRINGS ON)
+endif()
+
+set(PYARROW_STUBS_SOURCE_DIR 
"${CMAKE_CURRENT_SOURCE_DIR}/pyarrow-stubs/pyarrow")
+if(EXISTS "${PYARROW_STUBS_SOURCE_DIR}")
+  install(DIRECTORY "${PYARROW_STUBS_SOURCE_DIR}/"
+          DESTINATION "."
+          FILES_MATCHING
+          PATTERN "*.pyi")
+
+  if(DEFINED SKBUILD_STATE
+     AND SKBUILD_STATE STREQUAL "wheel"
+     AND NOT CMAKE_SYSTEM_NAME STREQUAL "Emscripten")

Review Comment:
   isn't this?
   ```suggestion
     if(PYARROW_REQUIRE_STUB_DOCSTRINGS)
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to