Re: GWL pipelined process composition ?
Hi Roel, Thank you for all your comments. > Maybe we can come up with a convenient way to combine two processes > using a shell pipe. But this needs more thought! Yes, from my point of view, the classic shell pipe `|` has two strong limitations for workflows: 1. it does not compose at the 'process' level but at the procedure 'level' 2. it cannot deal with two inputs. As an illustration for the point 1., it appears to me more "functional spirit" to write one process/task/unit corresponding to "samtools view" and another one about compressing "gzip -c". Then, if you have a process that filters some fastq, you can easily reuse the compress process, and composes it. For more complicated workflows, such as RNAseq or other, the composition seems an advantage. As an illustration for the point 2., I do not do with shell pipe: dd if=/dev/urandom of=file1 bs=1024 count=1k dd if=/dev/urandom of=file2 bs=1024 count=2k tar -cvf file.tar file1 file2 or whatever process instead of `dd` which is perhaps not the right example here. To be clear, process that outputs fileA process that outputs fileB process that inputs fileA *and* fileB without write on disk fileA and fileB. > If you have an idea to improve on this, please do share. :-) I do not know where to look. :-) Any ideas ? All the best, simon
Re: linux 4.17.7 not suitable for i386
Leo Famulari writes: > On Tue, Jul 17, 2018 at 11:36:08AM -0400, Leo Famulari wrote: >> Quoting Greg Kroah-Hartman: >> >> "I did this release anyway with this known problem as there is a fix in >> here for x86-64 systems that was nasty to track down and was affecting >> people. Given that the huge majority of systems are NOT i386, I felt >> this was a safe release to do at this point in time." > > Apparently the missing patch is referred to here: > > https://lkml.org/lkml/2018/7/17/443 > https://lkml.org/lkml/2018/7/17/505 I just pushed the 4.17.7 update with the cherry-picked fix for i686. Thanks, Mark
Re: GWL pipelined process composition ?
Hello Simon, zimoun writes: > Hi, > > I am asking if it should be possible to optionally stream the > inputs/outputs when the workflow is processed without writing the > intermediate files on disk. > > Well, a workflow is basically: > - some process units (or task or rule) that take inputs (file) and > produce outputs (other file) > - a graph that describes the relationship of theses units. > > The simplest workflow is: > x --A--> y --B--> z > - process A: input file x, output file y > - process B: input file y, output file z > > Currently, the file y is written on disk by A then read by B. Which > leads to IO inefficiency. Especially when the file is large. And/or > when there is several same kind of unit done in parallel. > > > Should be a good idea to have something like the shell pipe `|` to > compose the process unit ? > If yes how ? I have no clue where to look... That's an interesting idea. Of course, you could literally use the shell pipe within a single process. And I think this makes sense, because if a shell pipe is beneficial in your situation, then it is likely to be beneficial to run the two programs connected by the pipe on a single computer / in a single job. Here's an example: (define-public A (process (name "A") (package-inputs (list samtools gzip)) (data-inputs "/tmp/sample.sam") (outputs "/tmp/sample.sam.gz") (procedure #~(system (string-append "samtools view " #$data-inputs " | gzip -c > " #$outputs) > I agree that the storage of intermediate files avoid to compute again > and again unmodified part of the workflow. In this saves time when > developing the workflow. > However, the storage of temporary files appears unnecessary once the > workflow is done and when it does not need to run on cluster. I think it's either an efficient data transfer (using a pipe), or writing to disk in between for better restore points. We cannot have both. The former can already be achieved with the shell pipe, and the latter can be achieved by writing two processes. Maybe we can come up with a convenient way to combine two processes using a shell pipe. But this needs more thought! If you have an idea to improve on this, please do share. :-) > Thank you for all the work about the Guix ecosystem. > > All the best, > simon Thanks! Kind regards, Roel Janssen
Re: [PATCH 0/1] Go importer
On Wed, Jul 18, 2018 at 03:11:36PM +0200, Pierre-Antoine Rouby wrote: > I thinks the modifications doesn't works on my computer. > > --- > Backtrace: > 13 (apply-smob/1 #) > In ice-9/boot-9.scm: > 705:2 12 (call-with-prompt _ _ #) > In ice-9/eval.scm: > 619:8 11 (_ #(#(#))) > In guix/ui.scm: > 1579:12 10 (run-guix-command _ . _) > In guix/scripts/import.scm: >115:11 9 (guix-import . _) > In guix/scripts/import/gopkg.scm: > 85:19 8 (guix-import-gopkg . _) > In guix/utils.scm: > 633:8 7 (call-with-temporary-directory #) > In unknown file: >6 (_ # # …) >5 (_ # # …) >4 (_ # # …) > In ice-9/eval.scm: >298:34 3 (_ #(#(#(#) # …) …)) > 619:8 2 (_ #(#(# #) …)) > In guix/serialization.scm: >270:25 1 (write-file # …) > In unknown file: >0 (lstat #) > > ERROR: In procedure lstat: > Wrong type (expecting string): # 15> > --- What package are you trying to import here? signature.asc Description: PGP signature
Re: [PATCH 0/1] Go importer
Hi Leo, - Original Message - > From: "Leo Famulari" >> * guix/import/gopkg.scm: New file. >> * guix/scripts/import/gopkg.scm: New file. >> * guix/scripts/import.scm: Add 'gopkg'. >> * Makefile.am: Add 'gopkg' importer in modules list. > > I wonder which of the new files needs to be added to Makefile.am? My > Autotools knowledge is not very strong... Oups, yes guix/scripts/import/gopkg.scm need to added to Makefile.am, my bad. > I noticed a couple issues with this code. First, the names of the > temporary directories are predictable (they use an incrementing > integer). Second, the temporary files are not deleted after the importer > runs. I've attached a modified patch that addresses this by using ((guix > utils) call-with-temporary-directory), which should address these > problems. [0] > What do you think of my patch? Does it still work for you? I thinks the modifications doesn't works on my computer. --- Backtrace: 13 (apply-smob/1 #) In ice-9/boot-9.scm: 705:2 12 (call-with-prompt _ _ #) In ice-9/eval.scm: 619:8 11 (_ #(#(#))) In guix/ui.scm: 1579:12 10 (run-guix-command _ . _) In guix/scripts/import.scm: 115:11 9 (guix-import . _) In guix/scripts/import/gopkg.scm: 85:19 8 (guix-import-gopkg . _) In guix/utils.scm: 633:8 7 (call-with-temporary-directory #) In unknown file: 6 (_ # # …) 5 (_ # # …) 4 (_ # # …) In ice-9/eval.scm: 298:34 3 (_ #(#(#(#) # …) …)) 619:8 2 (_ #(#(# #) …)) In guix/serialization.scm: 270:25 1 (write-file # …) In unknown file: 0 (lstat #) ERROR: In procedure lstat: Wrong type (expecting string): # --- -- Pierre-Antoine Rouby
GWL pipelined process composition ?
Hi, I am asking if it should be possible to optionally stream the inputs/outputs when the workflow is processed without writing the intermediate files on disk. Well, a workflow is basically: - some process units (or task or rule) that take inputs (file) and produce outputs (other file) - a graph that describes the relationship of theses units. The simplest workflow is: x --A--> y --B--> z - process A: input file x, output file y - process B: input file y, output file z Currently, the file y is written on disk by A then read by B. Which leads to IO inefficiency. Especially when the file is large. And/or when there is several same kind of unit done in parallel. Should be a good idea to have something like the shell pipe `|` to compose the process unit ? If yes how ? I have no clue where to look... I agree that the storage of intermediate files avoid to compute again and again unmodified part of the workflow. In this saves time when developing the workflow. However, the storage of temporary files appears unnecessary once the workflow is done and when it does not need to run on cluster. Thank you for all the work about the Guix ecosystem. All the best, simon
Re: GSoC: Adding a web interface similar to the Hydra web interface
Hi Tatiana, Tatiana Sholokhova writes: > Could you please review the last 3 commits and maybe find some more issues > besides that? I've integrated your work onto my Cuirass instance[1], and I really like it! I had to fix a few things and adapt it[2] so that it works with multiple inputs. I will do a review as soon as possible, and then we can merge it. I'm a bit late: going through the whole conversation history took more time than I expected. Clément [1]: https://cuirass.lassieur.org:8081/ [2]: https://git.lassieur.org/cgit/cuirass.git/
Re: GSoC: Adding a web interface similar to the Hydra web interface
Hi Tatiana, Tatiana Sholokhova writes: > Am I right that in terms of Cuirass database derivations correspond to > jobs? Yes, but to be more precise, a job is a structure containing: - derivation - job-name - system - nix-name - eval-id The database table called "Derivations" should be called "Jobs", so the name is confusing indeed. A derivation, as Ricardo explained, is a file (.drv) representing low-level build actions and the environment in which they are performed. At each evaluation, there is a new set of jobs returned by the evaluator, each job having its 'eval-id' incremented. That means that two different jobs for the same job-name (i.e. linux-libre-4.17.6-job) could embed the same derivation. In that case, it's useless to build that job in my opinion, see that bug[1]. I hope it's clearer, Clément [1]: https://debbugs.gnu.org/cgi/bugreport.cgi?bug=32190
Re: GSoC: Adding a web interface similar to the Hydra web interface
Dear all, Ludovic Courtès writes: > Hello Tatiana & all, > > Ricardo Wurmus skribis: > >>> I am a bit confused about the database structure. As far as I understand, >>> there are project_name (project) and branch_name (jobset) properties, but >>> project_name is a primary key, so a project can't have several branches? >> >> I share your confusion. Maybe Ludovic or Mathieu can shed some more >> light on this. > > It’s confusing indeed, I think it’s a mistake that has yet to be fixed. > Basically what we do now is that we use a different ‘repo_name’ when we > just want to add a branch… The notion of "project" has been removed[1]. It was previously the specification name, which is a primary key indeed, so it didn't make sense because one project couldn't have several branches. Now, Hydra's jobsets are the exact same thing as Cuirass' specifications. So if you want to build the "master" and "core-update" branches of Guix, you need two specifications. However it wasn't even possible, before, to build several branches, because specifications names were used by the evaluator: they had to be "guix" or "guix-modular". Since the name was a primary key, we could only have two specifications. It is now[2] possible, because the evaluator uses the input name instead of the specification name. If you think there is a need for the notion of "Project" in Cuirass, we could add it, but it needs to be a new SQL table. And each specification would be associated with one project. Clément [1]: https://git.savannah.gnu.org/cgit/guix/guix-cuirass.git/commit/?id=be713f8a30788861806a74865b07403aa6774117 [2]: https://git.savannah.gnu.org/cgit/guix/guix-cuirass.git/commit/?id=7b2f9e0de1ad2d320973b7aea132a8afcad8bece