Progress in preparing the Bazel Build System for Debian (COVID-19 Biohackathon follow-up)
Fellow Developers, Maintainers, and Contributors, This is a quick update on recent progress with packaging the Bazel Build System [1] for Debian. My involvement grew out of an urgent need for TensorFlow that was identified during the recent COVID-19 Biohackathon [2]. Upstream has been very supportive of our efforts and we have had many positive interactions with them. However, we've now reached a point where we need more help in order to get these important tools packaged in a timely manner. There are currently 10 Java package dependencies that are not available in Debian. These are: google-api-client google-auth google-auto checker-framework diffutils error-prone google-flogger grpc-java opencensus javax-annotation We have more information available, including links to RFP bugs, on our Workplan wiki [3]. If you have Java experience and are willing to assist in this effort, even packaging one of these would be a great help. If you also want to help with the main Bazel-packaging effort, please feel free to join the team! Stay safe out there! -Olek PS I am not subscribed to -science or -java [1] https://bazel.build/ [2] https://salsa.debian.org/med-team/community/2020-covid19-hackathon [3] https://salsa.debian.org/bazel-team/meta/-/wikis/Workplan-Part-1 signature.asc Description: OpenPGP digital signature
Re: Help for asking upstreams about free licenses urgently needed (Was: Help: Seeking source code of guppy base caller)
Le Tue, May 05, 2020 at 11:18:51PM +0200, Andreas Tille a écrit : > > apt install environment-modules Yes, but we are under CentOS... This said, I use Debian Med increasingly with Singularity. I made an image where I `apt install med-cloud` and use it to make environment modules that export specifically the binaries of some packages, for instance bedtools, etc. The tedious part is to stitch the CentOS path to the image: at the moment I need to generate one script per command. I wonder if there would be a way to automate some steps... Cheers, Charles -- Charles Plessy Debian Med packaging team, http://www.debian.org/devel/debian-med Akano, Uruma, Okinawa, Japan
Re: Help for asking upstreams about free licenses urgently needed (Was: Help: Seeking source code of guppy base caller)
On Wed, May 06, 2020 at 04:20:49AM +0900, Charles Plessy wrote: > > I wonder how users of that software are dealing with this. > > In our case we use the environment modules system (modules.sourceforge.net). Or apt install environment-modules Kind regards Andreas. -- http://fam-tille.de
Re: Help for asking upstreams about free licenses urgently needed (Was: Help: Seeking source code of guppy base caller)
Hi Simon, On Tue, May 05, 2020 at 03:24:12PM +0200, zimoun wrote: > > I wonder how users of that software are dealing with this. > > Personally, I am using on the top of Debian the package manager GNU > Guix with custom channels for installing these non-free software. It > helps because it is easier to travel through the history tree of the > packages and because ``profiles`` allow to install several versions > side-to-side. I admit installing several versions side by side. That's pretty orthogonal to the fact whether some software is free or non-free, right? > The presentation "seeing Debian through a Functional lens" by Joey > Hess at DebConf14 helped me to catch the point about ``functional > package manager``. > > BTW, thank you for all the hard packaging work you are doing. I am > still using Debian (med) packages for the ones I care less; my motto > is: if it is not planned to be in Debian, then it is not really > useful. ;-) We will love hints about enhancements anyway. ;-) > > That's a strong point actually. However, we will face more and more > > problems of this nature. Mo's attempt to write a deep learning policy > > might help here a bit. > > Note that considering the Guppy case -- because it is non-free and the > structure of the neuronal network is thus not know -- there is no > point at all. :-) > > However, I think the "problem" of Deep Learning is not new. Probably > not the right place to discuss that. Not really under this topic on this list - but I think it could be discussed in Debian anyway. > 1. Trying to state if the weights are part or not of an free licensed > application does appear to me relevant. It is part of the application > as any icon image can be part of some application. Because the > application is free, the structure of the network is known and so any > other weights can be provided (yes they will be probably irrelevant). > The only question could be, IMHO, in which format the weights are > stored > > 2. The weights are simply data resulting of one (big) processing. > This process can be well-describe or it cannot be. The tools used can > be free or cannot be. It does not matter; the only point is the > license of such data. For example, an aligner needs a genome for > reference. No one argues that all the data used -- notebook, > discussion for the consensus, etc. -- to build this reference has to > be released under free licenses. It is the same for annotations. > Another example is all the default values, e.g., the ones in > scikit-learn; they are based on training data set and it is not > necessarily available. It happens more than often that software use > the data resulting of a process of other (training) data. And the > only concern about user freedom is the license of the resulting data. > > 3. The access of the training data set is not about freedom but about > (reproducible) science. Is the weights considered "scientific" if > they are not available? > > From my point of view the Mo Zhou's policy melds free software and > (real) Science, or say reproducibility. There are bridges between > both and part of the same big picture. Since you mention Mo Zhou's policy: That's the perfect place to discuss issues like this. > > Once I've started packaging deepbinner[1] which is stalled as long as we > > do not have python3-tensorflow. But may be that's at the horizon since > > bazel packaging sounded quite promising. > > That's sound awesome! I guess Olek Wojnar who is busy packaging bazel and who is making great progress would probably welcome any help. ;-) > > > Altogether, I think that we will best serve our users by making sure > > > that Free basecallers are easy to install on Debian, providing the > > > standard tools for downstream analysis (we are quite good at this), and > > > adding value by supporting bioinformatics workflow systems. > > > > That's exactly my opinion here. > > Really cool! That's why Debian rocks! ... and why we on the one hand need opinions like yours as well as active contributions from people like you. > Thank you for all the work that helps a lot to get thing done more easily. You are welcome and thanks for your opinion Andreas. -- http://fam-tille.de
Re: Help for asking upstreams about free licenses urgently needed (Was: Help: Seeking source code of guppy base caller)
> On Mon, May 04, 2020 at 10:37:22AM +0900, Charles Plessy wrote: > > > > - Upgrades are not drop-in replacements for each other and a laboratory > >typycally needs to install several versions side-to-side. Le Tue, May 05, 2020 at 06:52:43AM +0200, Andreas Tille a écrit : > I wonder how users of that software are dealing with this. In our case we use the environment modules system (modules.sourceforge.net). Have a nice day, -- Charles
RFH: pigx-rnaseq - extra eyes requested to fix tests
Hello, PIGX is a Python/R based workflow to analyse RNAseq data and admittedly a driving force behind me wanting that is that has a scRNAseq sibling, which would then the next target. There are two remaining problems with the tests that I describe in https://salsa.debian.org/med-team/pigx-rnaseq/-/blob/master/debian/TODO. a) snakemake complains about what I reproduce on the command line b) a missing path to "html_dependency" - this comes from r-cran-rmarkdown or r-cran-dt, I guess. I guess, but I don't actually see it. Better ideas or even patches are welcome. Best, Steffen
Re: Welcoming GSoC students
Thank you so much, and congratulations to you too, Nilesh! It's always been wonderful working with the Med team and I'm glad to finally be a part of GSoC as well! Regards, Pranav On Tue, 5 May, 2020, 10:34 AM Andreas Tille, wrote: > Hi, > > I'm hereby welcoming > > Pranav Ballaney > for the topic > Quality Assurance and Continuous Integration for Applications in Life > Sciences and Medicine. > > and > > Nilesh Patra > for the topic > Packaging and Quality Assurance of COVID-19 Relevant Applications. > > as Google Summer of Code students. Both should be now well known in our > team due to their previous contributions. So welcoming you two in our > team is probably not the right word since you are considered team > members even now. That's why I just say: I'm very happy that I've got > official confirmation. Please keep on the great work you have started! > > Kind regards > > Andreas. > > -- > http://fam-tille.de >
Re: Help for asking upstreams about free licenses urgently needed (Was: Help: Seeking source code of guppy base caller)
Dear, On Tue, 5 May 2020 at 06:53, Andreas Tille wrote: > > - Guppy is a moving target, and whichever version we would distribute > >in Stable is unlikely to satisfy the users a year later. > > > > - Upgrades are not drop-in replacements for each other and a laboratory > >typycally needs to install several versions side-to-side. > > I wonder how users of that software are dealing with this. Personally, I am using on the top of Debian the package manager GNU Guix with custom channels for installing these non-free software. It helps because it is easier to travel through the history tree of the packages and because ``profiles`` allow to install several versions side-to-side. The presentation "seeing Debian through a Functional lens" by Joey Hess at DebConf14 helped me to catch the point about ``functional package manager``. BTW, thank you for all the hard packaging work you are doing. I am still using Debian (med) packages for the ones I care less; my motto is: if it is not planned to be in Debian, then it is not really useful. ;-) > > - The conversion from raw to FASTQ is done by neural network algorithms > >for which we do not have access to the training data, and therefore > >the freedom to modify Guppy would be limited to the sugar around the > >core algorithms. > > That's a strong point actually. However, we will face more and more > problems of this nature. Mo's attempt to write a deep learning policy > might help here a bit. Note that considering the Guppy case -- because it is non-free and the structure of the neuronal network is thus not know -- there is no point at all. :-) However, I think the "problem" of Deep Learning is not new. Probably not the right place to discuss that. 1. Trying to state if the weights are part or not of an free licensed application does appear to me relevant. It is part of the application as any icon image can be part of some application. Because the application is free, the structure of the network is known and so any other weights can be provided (yes they will be probably irrelevant). The only question could be, IMHO, in which format the weights are stored 2. The weights are simply data resulting of one (big) processing. This process can be well-describe or it cannot be. The tools used can be free or cannot be. It does not matter; the only point is the license of such data. For example, an aligner needs a genome for reference. No one argues that all the data used -- notebook, discussion for the consensus, etc. -- to build this reference has to be released under free licenses. It is the same for annotations. Another example is all the default values, e.g., the ones in scikit-learn; they are based on training data set and it is not necessarily available. It happens more than often that software use the data resulting of a process of other (training) data. And the only concern about user freedom is the license of the resulting data. 3. The access of the training data set is not about freedom but about (reproducible) science. Is the weights considered "scientific" if they are not available? >From my point of view the Mo Zhou's policy melds free software and (real) Science, or say reproducibility. There are bridges between both and part of the same big picture. > > In that sense, I think that if we want to distribute a basecaller in > > Debian, we should better pick an alternative that is already free. Some > > of them are reported to perform as well as Guppy. But which one to > > pick, and how about long-term mainteance ? > > Once I've started packaging deepbinner[1] which is stalled as long as we > do not have python3-tensorflow. But may be that's at the horizon since > bazel packaging sounded quite promising. That's sound awesome! > > Altogether, I think that we will best serve our users by making sure > > that Free basecallers are easy to install on Debian, providing the > > standard tools for downstream analysis (we are quite good at this), and > > adding value by supporting bioinformatics workflow systems. > > That's exactly my opinion here. Really cool! That's why Debian rocks! Thank you for all the work that helps a lot to get thing done more easily. Best regards, simon
Re: Welcoming GSoC students
Hello, On 05.05.20 07:04, Andreas Tille wrote: Hi, I'm hereby welcoming Pranav Ballaney for the topic Quality Assurance and Continuous Integration for Applications in Life Sciences and Medicine. and Nilesh Patra for the topic Packaging and Quality Assurance of COVID-19 Relevant Applications. as Google Summer of Code students. Both should be now well known in our team due to their previous contributions. So welcoming you two in our team is probably not the right word since you are considered team members even now. That's why I just say: I'm very happy that I've got official confirmation. Please keep on the great work you have started! Welcome also from my side. I have seen a lot from Nilesh already and am deeply impressed. Best, Steffen
Re: Welcoming GSoC students
Hi On Tue, 5 May 2020, 10:34 Andreas Tille, wrote: > Hi, > > I'm hereby welcoming > > Pranav Ballaney > for the topic > Quality Assurance and Continuous Integration for Applications in Life > Sciences and Medicine. > > and > > Nilesh Patra > for the topic > Packaging and Quality Assurance of COVID-19 Relevant Applications. > > as Google Summer of Code students. Both should be now well known in our > team due to their previous contributions. So welcoming you two in our > team is probably not the right word since you are considered team > members even now. That's why I just say: I'm very happy that I've got > official confirmation. Please keep on the great work you have started! Yes, that's the plan - doing good work together. It was only possible because of your cooperation, really thanks a lot for your support :-) And congrats to Pranav for making it! Regards, Nilesh