Progress in preparing the Bazel Build System for Debian (COVID-19 Biohackathon follow-up)

2020-05-05 Thread Olek Wojnar
Fellow Developers, Maintainers, and Contributors,

This is a quick update on recent progress with packaging the Bazel Build
System [1] for Debian. My involvement grew out of an urgent need for
TensorFlow that was identified during the recent COVID-19 Biohackathon
[2]. Upstream has been very supportive of our efforts and we have had
many positive interactions with them.

However, we've now reached a point where we need more help in order to
get these important tools packaged in a timely manner. There are
currently 10 Java package dependencies that are not available in Debian.
These are:
google-api-client
google-auth
google-auto
checker-framework
diffutils
error-prone
google-flogger
grpc-java
opencensus
javax-annotation

We have more information available, including links to RFP bugs, on our
Workplan wiki [3]. If you have Java experience and are willing to assist
in this effort, even packaging one of these would be a great help. If
you also want to help with the main Bazel-packaging effort, please feel
free to join the team!

Stay safe out there!


-Olek

PS I am not subscribed to -science or -java

[1] https://bazel.build/
[2] https://salsa.debian.org/med-team/community/2020-covid19-hackathon
[3] https://salsa.debian.org/bazel-team/meta/-/wikis/Workplan-Part-1



signature.asc
Description: OpenPGP digital signature


Re: Help for asking upstreams about free licenses urgently needed (Was: Help: Seeking source code of guppy base caller)

2020-05-05 Thread Charles Plessy
Le Tue, May 05, 2020 at 11:18:51PM +0200, Andreas Tille a écrit :
> 
>   apt install environment-modules  

Yes, but we are under CentOS...

This said, I use Debian Med increasingly with Singularity.  I made an
image where I `apt install med-cloud` and use it to make environment
modules that export specifically the binaries of some packages, for
instance bedtools, etc.  The tedious part is to stitch the CentOS path
to the image: at the moment I need to generate one script per command.
I wonder if there would be a way to automate some steps...

Cheers,

Charles

-- 
Charles Plessy
Debian Med packaging team,
http://www.debian.org/devel/debian-med
Akano, Uruma, Okinawa, Japan



Re: Help for asking upstreams about free licenses urgently needed (Was: Help: Seeking source code of guppy base caller)

2020-05-05 Thread Andreas Tille
On Wed, May 06, 2020 at 04:20:49AM +0900, Charles Plessy wrote:
> > I wonder how users of that software are dealing with this.
> 
> In our case we use the environment modules system (modules.sourceforge.net).

Or

  apt install environment-modules  

Kind regards

  Andreas.

-- 
http://fam-tille.de



Re: Help for asking upstreams about free licenses urgently needed (Was: Help: Seeking source code of guppy base caller)

2020-05-05 Thread Andreas Tille
Hi Simon,

On Tue, May 05, 2020 at 03:24:12PM +0200, zimoun wrote:
> > I wonder how users of that software are dealing with this.
> 
> Personally, I am using on the top of Debian the package manager GNU
> Guix with custom channels for installing these non-free software.  It
> helps because it is easier to travel through the history tree of the
> packages and because ``profiles`` allow to install several versions
> side-to-side.

I admit installing several versions side by side.  That's pretty
orthogonal to the fact whether some software is free or non-free, right?
 
> The presentation "seeing Debian through a Functional lens" by Joey
> Hess at DebConf14 helped me to catch the point about ``functional
> package manager``.
> 
> BTW, thank you for all the hard packaging work you are doing.  I am
> still using Debian (med) packages for the ones I care less; my motto
> is: if it is not planned to be in Debian, then it is not really
> useful. ;-)

We will love hints about enhancements anyway. ;-) 
 
> > That's a strong point actually.  However, we will face more and more
> > problems of this nature.  Mo's attempt to write a deep learning policy
> > might help here a bit.
> 
> Note that considering the Guppy case -- because it is non-free and the
> structure of the neuronal network is thus not know -- there is no
> point at all. :-)
> 
> However, I think the "problem" of Deep Learning is not new.  Probably
> not the right place to discuss that.

Not really under this topic on this list - but I think it could be
discussed in Debian anyway.
 
> 1. Trying to state if the weights are part or not of an free licensed
> application does appear to me relevant.  It is part of the application
> as any icon image can be part of some application.  Because the
> application is free, the structure of the network is known and so any
> other weights can be provided (yes they will be probably irrelevant).
> The only question could be, IMHO, in which format the weights are
> stored
> 
> 2. The weights are simply data resulting of one (big) processing.
> This process can be well-describe or it cannot be.  The tools used can
> be free or cannot be.  It does not matter; the only point is the
> license of such data.  For example, an aligner needs a genome for
> reference.  No one argues that all the data used -- notebook,
> discussion for the consensus, etc. -- to build this reference has to
> be released under free licenses.  It is the same for annotations.
> Another example is all the default values, e.g., the ones in
> scikit-learn; they are based on training data set and it is not
> necessarily available.  It happens more than often that software use
> the data resulting of a process of other (training) data.  And the
> only concern about user freedom is the license of the resulting data.
> 
> 3. The access of the training data set is not about freedom but about
> (reproducible) science.  Is the weights considered "scientific" if
> they are not available?
> 
> From my point of view the Mo Zhou's policy melds free software and
> (real) Science, or say reproducibility.  There are bridges between
> both and part of the same big picture.

Since you mention Mo Zhou's policy:  That's the perfect place to
discuss issues like this.

> > Once I've started packaging deepbinner[1] which is stalled as long as we
> > do not have python3-tensorflow.  But may be that's at the horizon since
> > bazel packaging sounded quite promising.
> 
> That's sound awesome!

I guess Olek Wojnar who is busy packaging bazel and who is making
great progress would probably welcome any help. ;-) 
 
> > > Altogether, I think that we will best serve our users by making sure
> > > that Free basecallers are easy to install on Debian, providing the
> > > standard tools for downstream analysis (we are quite good at this), and
> > > adding value by supporting bioinformatics workflow systems.
> >
> > That's exactly my opinion here.
> 
> Really cool!  That's why Debian rocks!

... and why we on the one hand need opinions like yours as well as
active contributions from people like you. 
 
> Thank you for all the work that helps a lot to get thing done more easily.

You are welcome and thanks for your opinion

 Andreas. 

-- 
http://fam-tille.de



Re: Help for asking upstreams about free licenses urgently needed (Was: Help: Seeking source code of guppy base caller)

2020-05-05 Thread Charles Plessy
> On Mon, May 04, 2020 at 10:37:22AM +0900, Charles Plessy wrote:
> > 
> >  - Upgrades are not drop-in replacements for each other and a laboratory
> >typycally needs to install several versions side-to-side.

Le Tue, May 05, 2020 at 06:52:43AM +0200, Andreas Tille a écrit :
> I wonder how users of that software are dealing with this.

In our case we use the environment modules system (modules.sourceforge.net).

Have a nice day,

-- 
Charles



RFH: pigx-rnaseq - extra eyes requested to fix tests

2020-05-05 Thread Steffen Möller

Hello,

PIGX is a Python/R based workflow to analyse RNAseq data and admittedly
a driving force behind me wanting that is that has a scRNAseq sibling,
which would then the next target.

There are two remaining problems with the tests that I describe in
https://salsa.debian.org/med-team/pigx-rnaseq/-/blob/master/debian/TODO.

a) snakemake complains about what I reproduce on the command line
b) a missing path to "html_dependency" - this comes from
r-cran-rmarkdown or r-cran-dt, I guess.

I guess, but I don't actually see it. Better ideas or even patches are
welcome.

Best,

Steffen



Re: Welcoming GSoC students

2020-05-05 Thread Pranav Ballaney
Thank you so much, and congratulations to you too, Nilesh!
It's always been wonderful working with the Med team and I'm glad to
finally be a part of GSoC as well!

Regards,
Pranav

On Tue, 5 May, 2020, 10:34 AM Andreas Tille,  wrote:

> Hi,
>
> I'm hereby welcoming
>
>   Pranav Ballaney 
> for the topic
>   Quality Assurance and Continuous Integration for Applications in Life
> Sciences and Medicine.
>
> and
>
>   Nilesh Patra 
> for the topic
>   Packaging and Quality Assurance of COVID-19 Relevant Applications.
>
> as Google Summer of Code students.  Both should be now well known in our
> team due to their previous contributions.  So welcoming you two in our
> team is probably not the right word since you are considered team
> members even now.  That's why I just say:  I'm very happy that I've got
> official confirmation.  Please keep on the great work you have started!
>
> Kind regards
>
> Andreas.
>
> --
> http://fam-tille.de
>


Re: Help for asking upstreams about free licenses urgently needed (Was: Help: Seeking source code of guppy base caller)

2020-05-05 Thread zimoun
Dear,

On Tue, 5 May 2020 at 06:53, Andreas Tille  wrote:

> >  - Guppy is a moving target, and whichever version we would distribute
> >in Stable is unlikely to satisfy the users a year later.
> >
> >  - Upgrades are not drop-in replacements for each other and a laboratory
> >typycally needs to install several versions side-to-side.
>
> I wonder how users of that software are dealing with this.

Personally, I am using on the top of Debian the package manager GNU
Guix with custom channels for installing these non-free software.  It
helps because it is easier to travel through the history tree of the
packages and because ``profiles`` allow to install several versions
side-to-side.

The presentation "seeing Debian through a Functional lens" by Joey
Hess at DebConf14 helped me to catch the point about ``functional
package manager``.

BTW, thank you for all the hard packaging work you are doing.  I am
still using Debian (med) packages for the ones I care less; my motto
is: if it is not planned to be in Debian, then it is not really
useful. ;-)


> >  - The conversion from raw to FASTQ is done by neural network algorithms
> >for which we do not have access to the training data, and therefore
> >the freedom to modify Guppy would be limited to the sugar around the
> >core algorithms.
>
> That's a strong point actually.  However, we will face more and more
> problems of this nature.  Mo's attempt to write a deep learning policy
> might help here a bit.

Note that considering the Guppy case -- because it is non-free and the
structure of the neuronal network is thus not know -- there is no
point at all. :-)

However, I think the "problem" of Deep Learning is not new.  Probably
not the right place to discuss that.

1. Trying to state if the weights are part or not of an free licensed
application does appear to me relevant.  It is part of the application
as any icon image can be part of some application.  Because the
application is free, the structure of the network is known and so any
other weights can be provided (yes they will be probably irrelevant).
The only question could be, IMHO, in which format the weights are
stored

2. The weights are simply data resulting of one (big) processing.
This process can be well-describe or it cannot be.  The tools used can
be free or cannot be.  It does not matter; the only point is the
license of such data.  For example, an aligner needs a genome for
reference.  No one argues that all the data used -- notebook,
discussion for the consensus, etc. -- to build this reference has to
be released under free licenses.  It is the same for annotations.
Another example is all the default values, e.g., the ones in
scikit-learn; they are based on training data set and it is not
necessarily available.  It happens more than often that software use
the data resulting of a process of other (training) data.  And the
only concern about user freedom is the license of the resulting data.

3. The access of the training data set is not about freedom but about
(reproducible) science.  Is the weights considered "scientific" if
they are not available?

>From my point of view the Mo Zhou's policy melds free software and
(real) Science, or say reproducibility.  There are bridges between
both and part of the same big picture.


> > In that sense, I think that if we want to distribute a basecaller in
> > Debian, we should better pick an alternative that is already free.  Some
> > of them are reported to perform as well as Guppy.  But which one to
> > pick, and how about long-term mainteance ?
>
> Once I've started packaging deepbinner[1] which is stalled as long as we
> do not have python3-tensorflow.  But may be that's at the horizon since
> bazel packaging sounded quite promising.

That's sound awesome!


> > Altogether, I think that we will best serve our users by making sure
> > that Free basecallers are easy to install on Debian, providing the
> > standard tools for downstream analysis (we are quite good at this), and
> > adding value by supporting bioinformatics workflow systems.
>
> That's exactly my opinion here.

Really cool!  That's why Debian rocks!


Thank you for all the work that helps a lot to get thing done more easily.


Best regards,
simon



Re: Welcoming GSoC students

2020-05-05 Thread Steffen Möller

Hello,

On 05.05.20 07:04, Andreas Tille wrote:

Hi,

I'm hereby welcoming

   Pranav Ballaney 
 for the topic
   Quality Assurance and Continuous Integration for Applications in Life 
Sciences and Medicine.

and

   Nilesh Patra 
 for the topic
   Packaging and Quality Assurance of COVID-19 Relevant Applications.

as Google Summer of Code students.  Both should be now well known in our
team due to their previous contributions.  So welcoming you two in our
team is probably not the right word since you are considered team
members even now.  That's why I just say:  I'm very happy that I've got
official confirmation.  Please keep on the great work you have started!


Welcome also from my side. I have seen a lot from Nilesh already and am
deeply impressed.

Best,

Steffen



Re: Welcoming GSoC students

2020-05-05 Thread Nilesh Patra
Hi

On Tue, 5 May 2020, 10:34 Andreas Tille,  wrote:

> Hi,
>
> I'm hereby welcoming
>
>   Pranav Ballaney 
> for the topic
>   Quality Assurance and Continuous Integration for Applications in Life
> Sciences and Medicine.
>
> and
>
>   Nilesh Patra 
> for the topic
>   Packaging and Quality Assurance of COVID-19 Relevant Applications.
>
> as Google Summer of Code students.  Both should be now well known in our
> team due to their previous contributions.  So welcoming you two in our
> team is probably not the right word since you are considered team
> members even now.  That's why I just say:  I'm very happy that I've got
> official confirmation.  Please keep on the great work you have started!


Yes, that's the plan - doing good work together. It was only possible
because of your cooperation, really thanks a lot for your support :-)

And congrats to Pranav for making it!

Regards, Nilesh