I apologize for the duplicate mail. Please respond on this thread. On Sat, 25 Mar 2017 at 21:16 Pradyun Gedam <pradyu...@gmail.com> wrote:
> Hello Everyone! > > I had previously sent a mail on this list, stating that I would like to > work on pip's dependency resolution for my GSoC 2017. I now have drafted a > proposal for the same; with help from my mentors - Donald Stufft and Justin > Cappos. I'd also take this opportunity to thank them for agreeing to be my > mentors for this GSoC. > > I would like to request for comments on my proposal for GSoC - it is > hosted at > https://gist.github.com/pradyunsg/5cf4a35b81f08b6432f280aba6f511eb. > > Please find trailing a MarkDown version of the proposal. > > Thanks, > Pradyun Gedam > > ----- > > # Adding Proper Dependency Resolution to pip > > - **Name:** Pradyun S. Gedam > - **Email:** [pradyu...@gmail.com][mailto-email] > - **Github:** [pradyunsg][github-profile] > - **University:** [VIT University, Vellore, India][vit-homepage] > - **Course:** Bachelor of Technology in Computer Science and Engineering > - **Course Term:** 2016/17 - 2019/20 (4 Year) > - **Timezone:** IST (GMT +5:30) > - **GSoC Blog RSS Feed URL:** < > https://pradyunsg.github.io/gsoc-2017/feed.xml> > > [github-profile]: http://github.com/pradyunsg/ > [vit-homepage]: http://vit.ac.in/ > [mailto-email]: mailto:pradyu...@gmail.com > > ## About Me > > I was introduced to Python about five years ago, through "Core Python > Programming" by Weasley J Chun. After the initial struggle with Python > 2/3, the > ball was set rolling and hasn't stopped since. I have fiddled around with > Game Programming (PyGame), Computer Vision (OpenCV), Data Analytics > (Pandas, > SciPy, NumPy), transcompilers (Py2C) and more. > > As a high school student, I did internship at Enthought in 2013 and 2014. > The two summers that I spent at Enthought were a great learning experience > and > I thoroughly enjoyed the environment there. > > Other than Python, I have also used C, C++, Web Technologies (JavaScript, > HTML, > CSS) and preprocessors (Pug, TypeScript, LESS/SASS/SCSS), Java and > Bash/Zsh for > some other projects. > > Curious to understand how pip works, I began looking into pip's source > code. > I started contributing to pip in May 2016. I am now fairly familiar with > the > design of pip and have a fair understanding of how it works, due to the > various > contributions I have made to pip in the past year. > > ## Mentors > > - Donald Stufft (@dstufft on GitHub) > - Justin Cappos (@JustinCappos on GitHub) > > Communication with the mentors will be done mostly on issues and pull > requests > on pip's GitHub repository. If at any point in time, a real time > discussion is > to be done with the mentors, it would be done over IRC or Skype. Email can > also > be used if needed. > > ## Proposal > > This project is regarding improving dependency resolution performed within > pip > by implementing a dependency resolver within it. > > ### Abstract > > Currently, pip does not resolve dependencies correctly when there are > conflicting requirements. The lack of dependency resolution has caused > hard-to-debug bugs/failures due to the installation of incompatible > packages. > The lack of a dependency resolver is also a blocker for various other > features - > adding an upgrade-all functionality to pip and properly determining > build-time > dependencies for packages are two such features. > > ### Deliverables > > At the end of this project, pip will have the ability to: > > - detect requirement conflicts > - resolve conflicting dependency requirements where possible > > ### Implementation > > The implementation language will be Python. The code will maintain the > compatibility requirements of pip - the same source code will support the > multiple Python implementations and version, including but not limited to, > CPython 2.7, CPython 3.3+, PyPy 2, PyPy3. > > New Tests for the functionality introduced will be added to pip's current > test > suite. > > User documentation would not be a major part of this project. The only > changes > would be to mention pip can now resolve dependencies properly. There are > certain > sections that might need updating. > > #### Overview > > The project will be composed of the following stages: > > 1. Refactor the dependency resolution logic of pip into a separate module. > 1. Implement dependency information caching in pip. > 1. Implement a backtracking dependency resolver to resolve the > dependencies. > > Every stage depends on the previous ones being completed. This step-wise > approach would make incremental improvements so that there is a constant > feedback on the work being done as well as making it easy to change course > without losing existing work; if needed for some unforeseen reason. > > #### Discussion > > There is a class in pip - `RequirementSet`, which is currently a God > class. It > is responsible for the following: > > 1. Holding a list of things to be installed > 1. Downloading Files > 1. Dependency Resolution > 1. Unpacking downloaded files (preparation for installation) > 1. Uninstalling packages > 1. Installing packages > > This is clearly a bad situation since this is most of the heavy lifting > involved in pip. These responsibilities need to be separated and moved out > into > their independent modules/classes, to allow for simplification of > `RequirementSet` while providing a clean platform for the remaining work. > This is the most tricky portion of this project, given the complexity of > `RequirementSet` as it stands today. > > There are two kinds of distributions that may be used to install a package > - > wheels (binary) and sdists (source). When installing a package, pip builds > a > wheel from an sdist and then proceeds to install the wheel. The difference > between the two formats of distribution relevant to this project is - > wheels > store the information about dependencies within them statically; sdists do > not. > > Determining the dependencies of a wheel distribution is merely the matter > of > fetching the information from a METADATA file within the `.whl` file. The > dependency information for an sdist, on the other hand, can only be > determined > after running its `setup.py` file on the target system. This means that > dependencies of an sdist depend on how its `setup.py` behaves which can > change > due to variations on target systems or could even contain through random > choices. > > Computation of an sdist's dependencies on the target system is a > time-consuming > task since it potentially involves fetching a package from PyPI and > executing > it's setup.py to get the dependency information. In order to improve > performance, once an sdist's dependencies are computed, they would be > stored in > a cache so that during dependency resolution, the dependencies of an sdist > are > not computed every time they are needed. Further, pip caches wheels it > downloads or builds meaning that any installed package or downloaded > wheel's > dependency information would available statically, without the need to go > through the sdist dependency cache. > > Like the wheel cache, sdist-dependency-cache will be a file system based > cache. > The sdist-dependency-cache would only be used if the corresponding sdist is > being used. > > Since sdist dependencies are non-deterministic, the cached dependency > information is potentially incorrect - in certain corner cases such as > using > random choices in setup.py files. Such uses are not seen as important > enough to > cater to, compared the benefits of having a cache. Further, this is > already the > case with the information in the wheel cache. > > SAT solver based resolution is not feasible for pip since a SAT solver > needs > the entire set of packages and their dependencies to compute a solution, > which > cannot be generated due to the aforementioned non-deterministic behaviour > of > setup.py file. Computing dependency information for all of PyPI on a target > system for installing a package is simply not feasible. > > The most reasonable approach is using a backtracking solver. Such a solver > can > be incrementally provided information about the dependencies of a package > and > would only need dependency information about packages in the dependency > graph > of the current system. > > There is a need to keep a cache of visited packages during dependency > resolution. A certain package-version combination may be reached via > multiple > paths and it is an inefficient use of computation time to re-compute that > whether it is indeed going to satisfy the requirements or not. By storing > information about which package-version combinations have been visited and > do > (or do not) satisfy the constraints, it is possible to speedup the > resolution. > > Consider the following example: > > ``` > A-1.0 (depends: B) > A-2.0 (depends: B and E-1.0) > B-1.0 (depends: C and D) > C-1.0 (depends: D) > D-1.0 > D-1.1 (depends: E-2.0) > E-1.0 > E-2.0 > ``` > > If an installation of A is required, either A-2.0 or D-1.1 should not be > installed because they have a conflicting requirement - E. While there are > multiple possible solutions to this situation, the "most obvious" one us > to use > the D-1.0, instead of not installing A-2.0. Further, as multiple packages > depend on D, the algorithm would "reach it" multiple times. By maintaining > a > cache for the visited packages, it is possible to achieve a speedup in > such a > scenario. > > #### Details > > Pull requests would be made on a regular basis during the project to ensure > that the feedback loop is quick. This also reduces the possibilities of > conflicts due to unrelated changes in pip. > > All the code will be tested within pip's existing testing infrastructure. > It > has everything needed to write tests related to all the changes to be made. > Every PR made to pip as a part of this project will contain new tests or > modifications to existing ones as needed. > > ##### Stage 1 > > Initially, some abstractions will be introduced to the pip codebase to > improve > the reuse of certain common patterns within the codebase. This includes > cleaner > temporary directory management through a `TempDirectory` class. > > `RequirementSet.prepare_files` and `RequirementSet._prepare_file` are > downloading, unpacking packages as well as doing what pip does as > dependency > resolution. Taking these functions apart neatly is going to be a tricky > task. > > The following is a listing of the final modules that will be responsible > for > the various tasks that are currently being performed by `RequirementSet`: > > - `pip.resolve` - Dependency Resolution > - `pip.download` - Downloading Files & Unpacking downloaded files > - `pip.req.req_set` - Holding a list of things to be installed / > uninstalled > - `pip.operations.install` - Installing Packages (preparation for > installation) > - `pip.operations.uninstall` - Uninstalling Packages > > To be able to proceed to the next step, only the dependency resolution > related > code needs to be refactored into a separate module. Other portions of > `RequirementSet` are not required to be refactored but the same will be > tackled > as optional deliverables. In other words, only `pip.resolve` needs to be > completed to be able to proceed to the next stage in this project. This is > needed since in Stage 3, the resolver would be written in `pip.resolve`, > independent of the rest of the codebase. > > ##### Stage 2 > > A new module `pip.cache` will be created. Within this module, all the > caching > will be handled. Thus, the code for the current wheel cache would be moved. > The new code for a dependency cache for sdists would also be written here. > > The new cache would hold all of an sdist's egg-info. The information will > be > stored on the file-system in a sub directory structure much like that of > the > wheel cache, in a directory structure based on hash of the sdist file > holding > the egg-info at the end. This will be done in a class named `EggInfoCache`. > > `EggInfoCache` cache will be used only if a corresponding wheel for an > sdist is > not available. Installing an sdist results in the creation of a wheel which > contains the dependency information, which would be used over the > information > available in the `EggInfoCache`. > > To be able to proceed to the next step, it is required that `EggInfoCache` > is > implemented. > > ##### Stage 3 > > The module `pip.resolve` will be modified and a class named > `BacktrackingResolver` will be added to it. The class does what you expect > it > to do - it would resolve dependencies with recursive backtracking. As > described > above, there will be some global state stored about the packages that have > been > explored. Other than the maintenance of global state, in the form of the > cache, > the rest of the algorithm will essentially follow the same structure as any > backtracking algorithm. > > The project would be complete when the aforementioned resolver is > implemented. > > #### Existing Work > > There is existing work directly related to dependency resolution in pip, > done by > multiple individuals. > > - Robert Collins (un-merged closed pull request on pypa/pip) > > This has an incomplete backtracking dependency resolver and dependency > caching. > > - Sebastien Awwad (branch on a separate fork) > > This was used for the depresolve project, to investigate the state of > Dependency Resolution in PyPI/pip ecosystem. > > - `pip-compile` (separate project) > > This has a backtracking dependency resolver implemented to overcome pip's > inablity to resolve dependencies. > > Their work would be used for reference, where appropriate, during the > course of > the project. Further, there are many package managers which implement > dependency > resolution, which would also be looked into. > > ### Tentative Timeline > > - Community Bonding Period: **5 May - 29 May** > > - Clean up and finalize my existing pull requests. > - Read existing literature regarding dependency resolution. > - Inspect existing implementations of dependency resolvers. > > GOAL: Be ready for the work coming up. > > - Week 1: **30 May - 5 June** > > - Introduce abstractions across pip's codebase to make refactoring > `RequirementSet` easier. > - Begin breaking down `RequirementSet.prepare_file`. > > - Week 2: **6 June - 12 June** > > - Continue working on the refactor of `RequirementSet`. > > - Week 3: **13 June - 19 June** > > - Continue working on the refactor of `RequirementSet`. > - Finish and polish `pip.resolve`. > > GOAL: `pip.resolve` module will be ready, using the current resolution > strategy. > > - Week 4: **20 June - 26 June** > > - Finish and polish all work on the refactor of `RequirementSet`. > > - Week 5: **27 June - 3 July** > > - Move code for `WheelCache` into a new `pip.cache` module. > - Write tests for `pip.cache.EggInfoCache`, based on `WheelCache`. > - Begin implementation of `pip.cache.EggInfoCache`. > > - Week 6: **4 July - 10 July** > > - Finish and polish `pip.cache.EggInfoCache`. > > GOAL: A cache for storing dependency information of sdists would be > ready to > add to pip. > > - Week 7: **10 July - 16 July** > > - Create a comprehensive set of tests for the dependency resolver. > (in `tests/unit/test_resolve.py`) > - Begin implementation on the backtracking algorithm. > > GOAL: A comprehensive test bed is ready for testing the dependency > resolver. > > - Week 8: **17 July - 24 July** > > - Complete a rough implementation of the backtracking algorithm > > GOAL: An implementation of a dependency resolver to begin running tests > on > and work on improving. > > - Week 9: **25 July - 31 July** > > - Fixing bugs in dependency resolver > > - Week 10: **1 August - 6 August** > > - Finish and polish work on dependency resolver > > GOAL: A ready-to-merge PR adding a backtracking dependency resolver for > pip. > > - Week 11: **6 August - 13 August** > > Buffer Week. > > - Week 12: **13 August - 21 August** > > Buffer Week. Finalization of project for evaluation. > > If the deliverable is achieved ahead of schedule, the remaining time will > be > utilized to resolve open issues on pip's repository in the order of > priority as > determined under the guidance of the mentors. > > #### Other Commitments > > I expect to not be able to put in 40 hour/week for at most 3 weeks > throughout > the working period, due to the schedule of my university. > > I will have semester-end examinations - from 10th May 2017 to 24th May > 2017 - > during the Community Bonding Period. My university will re-open for my > third > semester on 12 July 2017. I expect mid-semester examinations to be held in > my > University around 20th August 2017. During these times, I would not be > able to > put in full 40 hour weeks due to the academic workload. > > I might take a 3-4 day break during this period, regarding which I would be > informing my mentor around a week in advance. > > I will be completely free from 30th May 2017 to 11 July 2017. > > ### Future Work > > There seems to be some interest in being able to reuse the above dependency > resolution algorithm in other packaging related tools, specifically from > the > buildout project. I intend to eventually move the dependency resolution > code > that would come out of this project into a separate library to allow for > reuse > by installer projects - pip, buildout and other tools. > > ## Previous Contributions to pip > > (As on 12th March 2017) > > ### Issues > > Authored: > > - #3785 - Prefering wheel-based installation over source-based > installation (Open) > - #3786 - Make install command upgrade packages by default (Closed) > - #3787 - Check if pip broke the dependency graph and warn the user (Open) > - #3807 - Tests fail since change on PyPI (Closed) > - #3809 - Switch to TOML for configuration files (Open) > - #3871 - Provide a way to perform non-eager upgrades (Closed) > - #4198 - Travis CI - pypy broken dues to dependency change in pycrypto > (Closed) > - #4282 - What's the release schedule? (Closed) > > Participated: > > - #59 - Add "upgrade" and "upgrade-all" commands (Open) > - #988 - Pip needs a dependency resolver (Open) > - #1056 - pip install -U does not remember whether a package was installed > with --user (Open) > - #1511 - ssl certificate hostname mismatch errors presented badly (Open) > - #1668 - Default to --user (Open) > - #1736 - Create a command to make it easy to access the configuration > file (Open) > - #1737 - Don't tell the user what they meant, just do what they meant > (Open) > - #2313 - Automated the Creation and Upload of Release Artifacts (Open) > - #2732 - pip install hangs with interactive setup.py setups (Open) > - #3549 - pip -U pip fails (Open) > - #3580 - Update requests/urllib3 (Closed) > - #3610 - pip install from package from github, with github dependencies > (Open) > - #3788 - `pip` version suggested is older than the version which is > installed (Closed) > - #3789 - Error installing Mayavi in Mac OS X (Closed) > - #3798 - On using python -m pip install -upgrade pip.. its throwing an > error like the one below (Closed) > - #3811 - no matching distribution found for install (Closed) > - #3814 - pip could not find a version that satisfies the requirement > oslo.context (Closed) > - #3876 - support git refs in @ syntax (Open) > - #4021 - Will you accept PRs with pep484 type hints? (Open) > - #4087 - pip list produces error (Closed) > - #4149 - Exception thrown when binary is already linked to /usr/local/bin > (Open) > - #4160 - Pip does not seem to be handling deep requirements correctly > (Open) > - #4162 - Let --find-links be context aware to support github, gitlab, > etc. links (Open) > - #4170 - pip list |head raises BrokenPipeError (Open) > - #4182 - pip install should install packages in order to avoid ABI > incompatibilities in compiled (Open) > - #4186 - IOError: [Errno 13] Permission denied: '/usr/local/bin/pip' > (Open) > - #4206 - Where on Windows 10 is pip.conf or pip.ini located? (Closed) > - #4221 - Feature request: Check if user has permissions before > downloading files (Closed) > - #4229 - "pip uninstall" is too noisy (Open) > > #### Pull Requests > > Authored: > > - #3806 - Change install command's default behaviour to upgrade packages > by default (Closed, superseded by #3972) > - #3808 - Fix Tests (Merged) > - #3818 - Improve UX and tests of check command (Merged) > - #3972 - Add an upgrade-strategy option (Merged) > - #3974 - [minor] An aesthetic change to wheel command source (Merged) > - #4192 - Move out all the config code to a separate module (Merged) > - #4193 - Add the ability to autocorrect a user's command (Open) > - #4199 - Fix Tests for Travis CI (Merged) > - #4200 - Reduce the API exposed by the configuration class (Merged) > - #4232 - Update documentation to mention upgrade-strategy (Merged) > - #4233 - Nicer permissions error message (Open) > - #4240 - [WIP] Add a configuration command (Open) > > Participated: > > - #2716 - Issue #988: new resolver for pip (Closed) > - #2975 - Different mock dependencies based on Python version (Merged) > - #3744 - Add a "Upgrade all local installed packages" (Open) > - #3750 - Add a `pip check` command. (Merged) > - #3751 - tox.ini: Add "cover" target (Open) > - #3794 - Use the canonicalize_name function for finding .dist-info > (Merged) > - #4142 - Optionally load C dependencies based on platform (Open) > - #4144 - Install build dependencies as specified in PEP 518 (Open) > - #4150 - Clarify default for --upgrade-strategy is eager (Merged) > - #4194 - Allow passing a --no-progress-bar to the install script to > surpress progress bar (Merged) > - #4201 - Register req_to_install for cleanup sooner (Merged) > - #4202 - Switch to 3.6.0 final as our latest 3.x release (Merged) > - #4211 - improve message when installing requirements file (#4127) > (Merged) > - #4241 - Python 3.6 is supported (Merged) > > ## References > > 1. [pypa/pip#988](https://github.com/pypa/pip/issues/988) > > Tracking issue for adding a proper dependency resolver in pip. Contains > links to various useful resources. > > 1. [pypa/pip#2716](https://github.com/pypa/pip/issues/2716) > > Prior work by Robert Collins for adding a proper dependency resolver in > pip. > > 1. [Python Dependency Resolution]( > https://docs.google.com/document/d/1x_VrNtXCup75qA3glDd2fQOB2TakldwjKZ6pXaAjAfg/edit?usp=sharing > ) > > A writeup by Sebastian Awwad on the current state of dependency > resolution > in pip and PyPI in general. > > 1. [PSF Application Template]( > https://wiki.python.org/moin/SummerOfCode/ApplicationTemplate2017) > > For guidance on how to write the application and what information is > needed. > > 1. [Stork: Secure Package Management for VM Environments]( > http://isis.poly.edu/~jcappos/papers/cappos_stork_dissertation_08.pdf) > > A Paper by Justin Cappos about Stork, used for reference regarding > Backtracking Resolution > >
_______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig