Re: Google Summer of Code 2023 Inquiry

Kyle Tue, 04 Apr 2023 03:05:23 -0700

Hi Spencer,

Here is the documentation for the git commit-graph cache file. The authors also 
made their own blog posts about it as well with a bit more explanation.


=> https://git-scm.com/docs/commit-graph
=> 
https://devblogs.microsoft.com/devops/updates-to-the-git-commit-graph-feature/

Maybe it won't turn out to be needed... just thought it might help get you 
thinking. Please read all my suggestions from that perspective as a reasonable 
default.

I will have to defer to others for gauging the size of projects. I have found 
as a rule there are always many more details to be considered than I could have 
anticipated at the start of a project. That said I liked your earlier stated 
plan of starting simple. Handling latest releases seems a reasonable minimal 
viable product.

Cheers,
Kyle





On April 3, 2023 8:41:53 PM EDT, Spencer Skylar Chan <scha...@terpmail.umd.edu> 
wrote:
>Hi Kyle,
>
>On 3/31/23 11:15, Kyle wrote:
>> I would expect most software versions to not be in Guix. Simon had mentioned 
>> that this is mostly what the guix-past repository is for. However, some 
>> packages might be buried on some branch or some commit in some Guix related 
>> git repository. It may be helpful to facilitate their discovery and 
>> extraction for conda import.
>> 
>> Git has a newish binary file format for caching searches across commits. 
>> Maybe it would be helpful to figure out how to parse this format (its 
>> documented) and index the data further using Xapian or a graph data 
>> structure (or tree sitter?) with the relevant metadata needed to find and 
>> efficiently extract scheme code and its dependencies?
>
>If the format is documented then this is possible, although I'm not super 
>familiar with these kinds of data structures.
>
>> You make an interesting point about compilation errors. It may more 
>> productive to help researchers test for working satisfiable configurations 
>> as a more relaxed approach to having to specify the exact software version. 
>> Maybe some "nearby" or newer version is packaged and that is enough to 
>> successfully run a test suite? I'm imagining something between git bisect 
>> and Guix's own package solver.
>
>Yes, we could have a variant of the solver that's more relaxed. It could 
>output multiple solutions so the user can inspect them and pick the best one.
>
>> It might also be productive to add infrastructure to help scientists more 
>> conveniently track and study their recent packaging experiments. Guix will 
>> only become more useful the more packages which are already available. Work 
>> which makes packaging more approachable by more people benefits everyone. 
>> Perhaps you can think of other ideas in this direction?
>
>I'm not sure how "packaging experiments" are different from packaging software 
>the usual way. I think making the importers easier to use and debug would 
>help, although that sounds outside the scope of the projects.
>
>Finally, would these projects be considered large or medium for the purposes 
>of GSOC?
>
>Thanks,
>Skylar
>
>> On March 30, 2023 7:22:14 PM EDT, Spencer Skylar Chan 
>> <scha...@terpmail.umd.edu> wrote:
>>> Hi Kyle,
>>> 
>>> On 3/24/23 14:59, Kyle wrote:
>>>> I am a bit worried about your proposed project is too focused on replacing 
>>>> python with guile. I think the project would benefit more from making 
>>>> python users more comfortable productively using Guix tools in concert 
>>>> with the tools they are already comfortable with.
>>> 
>>> Yes, I agree with you. Replacing Python with Guile is a much more ambitious 
>>> task and is not the highest priority here.
>>> 
>>>> I'm wondering if you might consider modifying your project goals toward 
>>>> exploring how GWL might be enhanced so that it could better complement 
>>>> more expressive language specific workflow tools like snakemake. I am also 
>>>> personally interested in exploring such a facilities from the targets 
>>>> workflow system in R as well. Alternatively, perhaps you could focus kn 
>>>> extending the GWL with more features?
>>> 
>>> I would also be interested in extending GWL with more features, I will 
>>> follow up with this on the GWL mailing list.
>>> 
>>>> I agree that establishing an achievable scope within a short timeline is 
>>>> crucial. The conda env importer idea would be quite an ambitious 
>>>> undertaking by itself and would lead you towards thinking about some 
>>>> pretty interesting and impactful problems.
>>> 
>>> While it's a challenging project, it could be broken into smaller steps:
>>> 
>>> 1. import packages by exact matching names only, without versioning.
>>> 2. extend `guix import` to have `guix import conda` to help with package 
>>> names that do not match exactly, and to accelerate adoption of Conda 
>>> packages not in Guix
>>> 3. match software version numbers when translating Conda packages to Guix
>>> 
>>> What's currently undefined is the error handling:
>>> - if a Conda package does not exist in Guix
>>> - if the dependency graph is not solvable
>>> - if compiling the environment fails (due to mismatching dependency 
>>> versions)
>>> 
>>> I believe there are many satisfactory stopping points for successful 
>>> completion within the timeline of the summer, which I hope to present with 
>>> my proposal soon.
>>> 
>>> Thanks,
>>> Skylar
>>> 
>>>> 
>>>> On March 22, 2023 5:44:52 PM EDT, Spencer Skylar Chan 
>>>> <scha...@terpmail.umd.edu> wrote:
>>>> 
>>>>      Hi Ricardo,
>>>> 
>>>>      On 3/22/23 14:19, Ricardo Wurmus wrote:
>>>> 
>>>> 
>>>>                  - Translating Snakemake to Guix Workflow Language (GWL)
>>>> 
>>>> 
>>>>              Ricardo, maybe you would have some suggestions. :-)
>>>> 
>>>> 
>>>>          Oh, this looks interesting. Could you please elaborate on the 
>>>> idea?
>>>> 
>>>>      My idea is to take as input a Snakemake workflow file and eventually 
>>>> output an equivalent GWL workflow file.
>>>> 
>>>>      Currently, Snakemake workflows can be exported to CWL (Common 
>>>> Workflow Language):
>>>> 
>>>>      
>>>> https://snakemake.readthedocs.io/en/stable/executing/interoperability.html 
>>>>  
>>>> <https://snakemake.readthedocs.io/en/stable/executing/interoperability.html>
>>>> 
>>>>      One approach could be to add CWL import/export capabilities to GWL. 
>>>> Then Snakemake/GWL conversion would be a 2 step process, using CWL as an 
>>>> intermediate step:
>>>> 
>>>>      1. Snakemake -> CWL
>>>>      2. CWL -> GWL
>>>> 
>>>>      However, CWL is not as expressive as Snakemake. There may be some 
>>>> details that are lost from Snakemake workflows.
>>>> 
>>>>      So a 1-step Snakemake/GWL transpiler could be interesting, as both 
>>>> Snakemake/GWL use a domain-specific language inside a general purpose 
>>>> language (Python/Guile respectively). There may be a possibility to 
>>>> achieve more "accurate" translations between workflows.
>>>> 
>>>>      Is this topic something that could fit into a summer project?
>>>> 
>>> 
>> 
>

Re: Google Summer of Code 2023 Inquiry

Reply via email to