Hi,

I am asking if it should be possible to optionally stream the
inputs/outputs when the workflow is processed without writing the
intermediate files on disk.

Well, a workflow is basically:
 - some process units (or task or rule) that take inputs (file) and
produce outputs (other file)
 - a graph that describes the relationship of theses units.

The simplest workflow is:
    x --A--> y --B--> z
 - process A: input file x, output file y
 - process B: input file y, output file z

Currently, the file y is written on disk by A then read by B. Which
leads to IO inefficiency. Especially when the file is large. And/or
when there is several same kind of unit done in parallel.


Should be a good idea to have something like the shell pipe `|` to
compose the process unit ?
If yes how ? I have no clue where to look...


I agree that the storage of intermediate files avoid to compute again
and again unmodified part of the workflow. In this saves time when
developing the workflow.
However, the storage of temporary files appears unnecessary once the
workflow is done and when it does not need to run on cluster.


Thank you for all the work about the Guix ecosystem.

All the best,
simon

Reply via email to