Hi, I am asking if it should be possible to optionally stream the inputs/outputs when the workflow is processed without writing the intermediate files on disk.
Well, a workflow is basically: - some process units (or task or rule) that take inputs (file) and produce outputs (other file) - a graph that describes the relationship of theses units. The simplest workflow is: x --A--> y --B--> z - process A: input file x, output file y - process B: input file y, output file z Currently, the file y is written on disk by A then read by B. Which leads to IO inefficiency. Especially when the file is large. And/or when there is several same kind of unit done in parallel. Should be a good idea to have something like the shell pipe `|` to compose the process unit ? If yes how ? I have no clue where to look... I agree that the storage of intermediate files avoid to compute again and again unmodified part of the workflow. In this saves time when developing the workflow. However, the storage of temporary files appears unnecessary once the workflow is done and when it does not need to run on cluster. Thank you for all the work about the Guix ecosystem. All the best, simon