Re: GWL pipelined process composition ?

2018-07-18 Thread zimoun
Hi Roel,

Thank you for all your comments.


> Maybe we can come up with a convenient way to combine two processes
> using a shell pipe.  But this needs more thought!

Yes, from my point of view, the classic shell pipe `|` has two strong
limitations for workflows:
 1. it does not compose at the 'process' level but at the procedure 'level'
 2. it cannot deal with two inputs.

As an illustration for the point 1., it appears to me more "functional
spirit" to write one process/task/unit corresponding to "samtools
view" and another one about compressing "gzip -c". Then, if you have a
process that filters some fastq, you can easily reuse the compress
process, and composes it. For more complicated workflows, such as
RNAseq or other, the composition seems an advantage.

As an illustration for the point 2., I do not do with shell pipe:

  dd if=/dev/urandom of=file1 bs=1024 count=1k
  dd if=/dev/urandom of=file2 bs=1024 count=2k
  tar -cvf file.tar file1 file2

or whatever process instead of `dd` which is perhaps not the right example here.
To be clear,
  process that outputs fileA
  process that outputs fileB
  process that inputs fileA *and* fileB
without write on disk fileA and fileB.



> If you have an idea to improve on this, please do share. :-)

I do not know where to look. :-)
Any ideas ?


All the best,
simon



Re: linux 4.17.7 not suitable for i386

2018-07-18 Thread Mark H Weaver
Leo Famulari  writes:

> On Tue, Jul 17, 2018 at 11:36:08AM -0400, Leo Famulari wrote:
>> Quoting Greg Kroah-Hartman:
>> 
>> "I did this release anyway with this known problem as there is a fix in
>> here for x86-64 systems that was nasty to track down and was affecting
>> people.  Given that the huge majority of systems are NOT i386, I felt
>> this was a safe release to do at this point in time."
>
> Apparently the missing patch is referred to here:
>
> https://lkml.org/lkml/2018/7/17/443
> https://lkml.org/lkml/2018/7/17/505

I just pushed the 4.17.7 update with the cherry-picked fix for i686.

 Thanks,
   Mark



Re: GWL pipelined process composition ?

2018-07-18 Thread Roel Janssen
Hello Simon,

zimoun  writes:

> Hi,
>
> I am asking if it should be possible to optionally stream the
> inputs/outputs when the workflow is processed without writing the
> intermediate files on disk.
>
> Well, a workflow is basically:
>  - some process units (or task or rule) that take inputs (file) and
> produce outputs (other file)
>  - a graph that describes the relationship of theses units.
>
> The simplest workflow is:
> x --A--> y --B--> z
>  - process A: input file x, output file y
>  - process B: input file y, output file z
>
> Currently, the file y is written on disk by A then read by B. Which
> leads to IO inefficiency. Especially when the file is large. And/or
> when there is several same kind of unit done in parallel.
>
>
> Should be a good idea to have something like the shell pipe `|` to
> compose the process unit ?
> If yes how ? I have no clue where to look...

That's an interesting idea.  Of course, you could literally use the
shell pipe within a single process.  And I think this makes sense, because
if a shell pipe is beneficial in your situation, then it is likely to be
beneficial to run the two programs connected by the pipe on a single
computer / in a single job.

Here's an example:
(define-public A
  (process
(name "A")
(package-inputs (list samtools gzip))
(data-inputs "/tmp/sample.sam")
(outputs "/tmp/sample.sam.gz")
(procedure
 #~(system (string-append "samtools view " #$data-inputs
  " | gzip -c > " #$outputs)

> I agree that the storage of intermediate files avoid to compute again
> and again unmodified part of the workflow. In this saves time when
> developing the workflow.
> However, the storage of temporary files appears unnecessary once the
> workflow is done and when it does not need to run on cluster.

I think it's either an efficient data transfer (using a pipe), or
writing to disk in between for better restore points.  We cannot have
both.  The former can already be achieved with the shell pipe, and the
latter can be achieved by writing two processes.

Maybe we can come up with a convenient way to combine two processes
using a shell pipe.  But this needs more thought!

If you have an idea to improve on this, please do share. :-)

> Thank you for all the work about the Guix ecosystem.
>
> All the best,
> simon

Thanks!

Kind regards,
Roel Janssen



Re: [PATCH 0/1] Go importer

2018-07-18 Thread Leo Famulari
On Wed, Jul 18, 2018 at 03:11:36PM +0200, Pierre-Antoine Rouby wrote:
> I thinks the modifications doesn't works on my computer.
> 
> ---
> Backtrace:
>   13 (apply-smob/1 #)
> In ice-9/boot-9.scm:
> 705:2 12 (call-with-prompt _ _ #)
> In ice-9/eval.scm:
> 619:8 11 (_ #(#(#)))
> In guix/ui.scm:
>   1579:12 10 (run-guix-command _ . _)
> In guix/scripts/import.scm:
>115:11  9 (guix-import . _)
> In guix/scripts/import/gopkg.scm:
> 85:19  8 (guix-import-gopkg . _)
> In guix/utils.scm:
> 633:8  7 (call-with-temporary-directory #)
> In unknown file:
>6 (_ # # …)
>5 (_ # # …)
>4 (_ # # …)
> In ice-9/eval.scm:
>298:34  3 (_ #(#(#(#) # …) …))
> 619:8  2 (_ #(#(# #) …))
> In guix/serialization.scm:
>270:25  1 (write-file # …)
> In unknown file:
>0 (lstat #)
> 
> ERROR: In procedure lstat:
> Wrong type (expecting string): # 15>
> ---

What package are you trying to import here?


signature.asc
Description: PGP signature


Re: [PATCH 0/1] Go importer

2018-07-18 Thread Pierre-Antoine Rouby
Hi Leo,

- Original Message -
> From: "Leo Famulari" 

>> * guix/import/gopkg.scm: New file.
>> * guix/scripts/import/gopkg.scm: New file.
>> * guix/scripts/import.scm: Add 'gopkg'.
>> * Makefile.am: Add 'gopkg' importer in modules list.
> 
> I wonder which of the new files needs to be added to Makefile.am? My
> Autotools knowledge is not very strong...

Oups, yes guix/scripts/import/gopkg.scm need to added to Makefile.am, my bad.

> I noticed a couple issues with this code. First, the names of the
> temporary directories are predictable (they use an incrementing
> integer). Second, the temporary files are not deleted after the importer
> runs. I've attached a modified patch that addresses this by using ((guix
> utils) call-with-temporary-directory), which should address these
> problems. [0]

> What do you think of my patch? Does it still work for you?

I thinks the modifications doesn't works on my computer.

---
Backtrace:
  13 (apply-smob/1 #)
In ice-9/boot-9.scm:
705:2 12 (call-with-prompt _ _ #)
In ice-9/eval.scm:
619:8 11 (_ #(#(#)))
In guix/ui.scm:
  1579:12 10 (run-guix-command _ . _)
In guix/scripts/import.scm:
   115:11  9 (guix-import . _)
In guix/scripts/import/gopkg.scm:
85:19  8 (guix-import-gopkg . _)
In guix/utils.scm:
633:8  7 (call-with-temporary-directory #)
In unknown file:
   6 (_ # # …)
   5 (_ # # …)
   4 (_ # # …)
In ice-9/eval.scm:
   298:34  3 (_ #(#(#(#) # …) …))
619:8  2 (_ #(#(# #) …))
In guix/serialization.scm:
   270:25  1 (write-file # …)
In unknown file:
   0 (lstat #)

ERROR: In procedure lstat:
Wrong type (expecting string): #
---

--
Pierre-Antoine Rouby



GWL pipelined process composition ?

2018-07-18 Thread zimoun
Hi,

I am asking if it should be possible to optionally stream the
inputs/outputs when the workflow is processed without writing the
intermediate files on disk.

Well, a workflow is basically:
 - some process units (or task or rule) that take inputs (file) and
produce outputs (other file)
 - a graph that describes the relationship of theses units.

The simplest workflow is:
x --A--> y --B--> z
 - process A: input file x, output file y
 - process B: input file y, output file z

Currently, the file y is written on disk by A then read by B. Which
leads to IO inefficiency. Especially when the file is large. And/or
when there is several same kind of unit done in parallel.


Should be a good idea to have something like the shell pipe `|` to
compose the process unit ?
If yes how ? I have no clue where to look...


I agree that the storage of intermediate files avoid to compute again
and again unmodified part of the workflow. In this saves time when
developing the workflow.
However, the storage of temporary files appears unnecessary once the
workflow is done and when it does not need to run on cluster.


Thank you for all the work about the Guix ecosystem.

All the best,
simon



Re: GSoC: Adding a web interface similar to the Hydra web interface

2018-07-18 Thread Clément Lassieur
Hi Tatiana,

Tatiana Sholokhova  writes:

> Could you please review the last 3 commits and maybe find some more issues
> besides that?

I've integrated your work onto my Cuirass instance[1], and I really like
it!  I had to fix a few things and adapt it[2] so that it works with
multiple inputs.

I will do a review as soon as possible, and then we can merge it.  I'm a
bit late: going through the whole conversation history took more time
than I expected.

Clément

[1]: https://cuirass.lassieur.org:8081/
[2]: https://git.lassieur.org/cgit/cuirass.git/



Re: GSoC: Adding a web interface similar to the Hydra web interface

2018-07-18 Thread Clément Lassieur
Hi Tatiana,

Tatiana Sholokhova  writes:

> Am I right that in terms of Cuirass database derivations correspond to
> jobs?

Yes, but to be more precise, a job is a structure containing:
  - derivation
  - job-name
  - system
  - nix-name
  - eval-id

The database table called "Derivations" should be called "Jobs", so the
name is confusing indeed.

A derivation, as Ricardo explained, is a file (.drv) representing
low-level build actions and the environment in which they are performed.

At each evaluation, there is a new set of jobs returned by the
evaluator, each job having its 'eval-id' incremented.  That means that
two different jobs for the same job-name (i.e. linux-libre-4.17.6-job)
could embed the same derivation.  In that case, it's useless to build
that job in my opinion, see that bug[1].

I hope it's clearer,
Clément

[1]: https://debbugs.gnu.org/cgi/bugreport.cgi?bug=32190



Re: GSoC: Adding a web interface similar to the Hydra web interface

2018-07-18 Thread Clément Lassieur
Dear all,

Ludovic Courtès  writes:

> Hello Tatiana & all,
>
> Ricardo Wurmus  skribis:
>
>>> I am a bit confused about the database structure. As far as I understand,
>>> there are project_name (project) and branch_name (jobset) properties, but
>>> project_name is a primary key, so a project can't have several branches?
>>
>> I share your confusion.  Maybe Ludovic or Mathieu can shed some more
>> light on this.
>
> It’s confusing indeed, I think it’s a mistake that has yet to be fixed.
> Basically what we do now is that we use a different ‘repo_name’ when we
> just want to add a branch…

The notion of "project" has been removed[1].  It was previously the
specification name, which is a primary key indeed, so it didn't make
sense because one project couldn't have several branches.

Now, Hydra's jobsets are the exact same thing as Cuirass'
specifications.  So if you want to build the "master" and "core-update"
branches of Guix, you need two specifications.

However it wasn't even possible, before, to build several branches,
because specifications names were used by the evaluator: they had to be
"guix" or "guix-modular".  Since the name was a primary key, we could
only have two specifications.  It is now[2] possible, because the
evaluator uses the input name instead of the specification name.

If you think there is a need for the notion of "Project" in Cuirass, we
could add it, but it needs to be a new SQL table.  And each
specification would be associated with one project.

Clément

[1]: 
https://git.savannah.gnu.org/cgit/guix/guix-cuirass.git/commit/?id=be713f8a30788861806a74865b07403aa6774117
[2]: 
https://git.savannah.gnu.org/cgit/guix/guix-cuirass.git/commit/?id=7b2f9e0de1ad2d320973b7aea132a8afcad8bece