robomics commented on issue #44227:
URL: https://github.com/apache/arrow/issues/44227#issuecomment-2424065785

   Hi @amoeba, apologies for not coming back to you earlier.
   
   I understand your concerns regarding deleting things on behalf of users 
without their explicit consent.
   That being said, I still believe it would be useful to provide an automated 
way for users to fix their python environment when the environment is broken 
due to updating pyarrow.
   
   These are my reasons:
   - While it is true that the docs state that `create_library_symlinks()` 
should be run only once, they do not make it clear that installing pyarrow -> 
`create_library_symlinks()` -> upgrading pyarrow will lead to broken setups on 
*NIX systems.
   - Manually fixing broken environments cannot be done easily (with e.g. `rm 
/tmp/venv/lib/python3.*/site-packages/pyarrow/libarrow*.so`), because that 
would delete some shared libs that are not symlinks (e.g. 
`libarrow_python.so`). The simplest way I can think of to safely remove 
dangling symlinks with shell commands is something like `find "$(python -c 
'import os, pyarrow; print(os.path.dirname(pyarrow.__file__), end="")')" -type 
l -delete`. This works fine, but I am not sure it is a good recommendation to 
give to every user.
   - Manually uninstalling and re-installing pyarrow with pip will not fix the 
broken links (because pip refuses to delete files that it has not created).
   - When linking against e.g. `libarrow` shipped with the wheels, dangling 
links can lead to misleading errors. When I stumbled on this issue for the 
first time I was relying on CMake's `find_library()` to get the path to the 
wheels' `libarrow`  (see 
[here](https://github.com/paulsengroup/hictkpy/blob/af153ef7b67010779e7e98d6c02c4b9da4376be1/cmake/modules/FindPyarrow.cmake)
 for more context). `find_library()` was failing claiming that `libarrow.so` 
didn't exist, even though a file with that name indeed existed on my machine. 
My first reaction was deleting and re-creating my dev venv, which solved the 
problem for a few hours. Then something upgraded the pyarrow version installed 
in my venv and I was back at square zero. Figuring out that the problem were a 
few dangling symlinks took more time than I'd like to admit :D.
   
   How about we change `create_library_symlinks()` signature to something like:
   
   ```python3
   def create_library_symlinks(repair_broken_symlinks: bool = False):
       # impl ...
   ```
   
   and then we update the implementation to unlink dangling symlinks only when 
`repair_broken_symlinks=True`.
   
   It may also be worth updating the docs explaining that up/downgrading 
pyarrow after calling `create_library_symlinks()` leads to broken symlinks that 
can cause all sorts of problems, and that this can be fixed by calling 
`create_library_symlinks(repair_broken_symlinks=True)` (or by manually 
uninstalling pyarrow, deleting its package folder, reinstalling pyarrow, and 
finally calling `create_library_symlinks()`).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to