Hi Spencer, Here is the documentation for the git commit-graph cache file. The authors also made their own blog posts about it as well with a bit more explanation.
=> https://git-scm.com/docs/commit-graph => https://devblogs.microsoft.com/devops/updates-to-the-git-commit-graph-feature/ Maybe it won't turn out to be needed... just thought it might help get you thinking. Please read all my suggestions from that perspective as a reasonable default. I will have to defer to others for gauging the size of projects. I have found as a rule there are always many more details to be considered than I could have anticipated at the start of a project. That said I liked your earlier stated plan of starting simple. Handling latest releases seems a reasonable minimal viable product. Cheers, Kyle On April 3, 2023 8:41:53 PM EDT, Spencer Skylar Chan <scha...@terpmail.umd.edu> wrote: >Hi Kyle, > >On 3/31/23 11:15, Kyle wrote: >> I would expect most software versions to not be in Guix. Simon had mentioned >> that this is mostly what the guix-past repository is for. However, some >> packages might be buried on some branch or some commit in some Guix related >> git repository. It may be helpful to facilitate their discovery and >> extraction for conda import. >> >> Git has a newish binary file format for caching searches across commits. >> Maybe it would be helpful to figure out how to parse this format (its >> documented) and index the data further using Xapian or a graph data >> structure (or tree sitter?) with the relevant metadata needed to find and >> efficiently extract scheme code and its dependencies? > >If the format is documented then this is possible, although I'm not super >familiar with these kinds of data structures. > >> You make an interesting point about compilation errors. It may more >> productive to help researchers test for working satisfiable configurations >> as a more relaxed approach to having to specify the exact software version. >> Maybe some "nearby" or newer version is packaged and that is enough to >> successfully run a test suite? I'm imagining something between git bisect >> and Guix's own package solver. > >Yes, we could have a variant of the solver that's more relaxed. It could >output multiple solutions so the user can inspect them and pick the best one. > >> It might also be productive to add infrastructure to help scientists more >> conveniently track and study their recent packaging experiments. Guix will >> only become more useful the more packages which are already available. Work >> which makes packaging more approachable by more people benefits everyone. >> Perhaps you can think of other ideas in this direction? > >I'm not sure how "packaging experiments" are different from packaging software >the usual way. I think making the importers easier to use and debug would >help, although that sounds outside the scope of the projects. > >Finally, would these projects be considered large or medium for the purposes >of GSOC? > >Thanks, >Skylar > >> On March 30, 2023 7:22:14 PM EDT, Spencer Skylar Chan >> <scha...@terpmail.umd.edu> wrote: >>> Hi Kyle, >>> >>> On 3/24/23 14:59, Kyle wrote: >>>> I am a bit worried about your proposed project is too focused on replacing >>>> python with guile. I think the project would benefit more from making >>>> python users more comfortable productively using Guix tools in concert >>>> with the tools they are already comfortable with. >>> >>> Yes, I agree with you. Replacing Python with Guile is a much more ambitious >>> task and is not the highest priority here. >>> >>>> I'm wondering if you might consider modifying your project goals toward >>>> exploring how GWL might be enhanced so that it could better complement >>>> more expressive language specific workflow tools like snakemake. I am also >>>> personally interested in exploring such a facilities from the targets >>>> workflow system in R as well. Alternatively, perhaps you could focus kn >>>> extending the GWL with more features? >>> >>> I would also be interested in extending GWL with more features, I will >>> follow up with this on the GWL mailing list. >>> >>>> I agree that establishing an achievable scope within a short timeline is >>>> crucial. The conda env importer idea would be quite an ambitious >>>> undertaking by itself and would lead you towards thinking about some >>>> pretty interesting and impactful problems. >>> >>> While it's a challenging project, it could be broken into smaller steps: >>> >>> 1. import packages by exact matching names only, without versioning. >>> 2. extend `guix import` to have `guix import conda` to help with package >>> names that do not match exactly, and to accelerate adoption of Conda >>> packages not in Guix >>> 3. match software version numbers when translating Conda packages to Guix >>> >>> What's currently undefined is the error handling: >>> - if a Conda package does not exist in Guix >>> - if the dependency graph is not solvable >>> - if compiling the environment fails (due to mismatching dependency >>> versions) >>> >>> I believe there are many satisfactory stopping points for successful >>> completion within the timeline of the summer, which I hope to present with >>> my proposal soon. >>> >>> Thanks, >>> Skylar >>> >>>> >>>> On March 22, 2023 5:44:52 PM EDT, Spencer Skylar Chan >>>> <scha...@terpmail.umd.edu> wrote: >>>> >>>> Hi Ricardo, >>>> >>>> On 3/22/23 14:19, Ricardo Wurmus wrote: >>>> >>>> >>>> - Translating Snakemake to Guix Workflow Language (GWL) >>>> >>>> >>>> Ricardo, maybe you would have some suggestions. :-) >>>> >>>> >>>> Oh, this looks interesting. Could you please elaborate on the >>>> idea? >>>> >>>> My idea is to take as input a Snakemake workflow file and eventually >>>> output an equivalent GWL workflow file. >>>> >>>> Currently, Snakemake workflows can be exported to CWL (Common >>>> Workflow Language): >>>> >>>> >>>> https://snakemake.readthedocs.io/en/stable/executing/interoperability.html >>>> >>>> <https://snakemake.readthedocs.io/en/stable/executing/interoperability.html> >>>> >>>> One approach could be to add CWL import/export capabilities to GWL. >>>> Then Snakemake/GWL conversion would be a 2 step process, using CWL as an >>>> intermediate step: >>>> >>>> 1. Snakemake -> CWL >>>> 2. CWL -> GWL >>>> >>>> However, CWL is not as expressive as Snakemake. There may be some >>>> details that are lost from Snakemake workflows. >>>> >>>> So a 1-step Snakemake/GWL transpiler could be interesting, as both >>>> Snakemake/GWL use a domain-specific language inside a general purpose >>>> language (Python/Guile respectively). There may be a possibility to >>>> achieve more "accurate" translations between workflows. >>>> >>>> Is this topic something that could fit into a summer project? >>>> >>> >> >