Re: Please keep enabling salsa CI for any packages you work on

2021-05-23 Thread Andreas Tille
Thanks a lot for this Nilesh!

On Mon, May 24, 2021 at 03:10:02AM +0530, Nilesh Patra wrote:
> Hi,
> 
> Few days earlier I ran a script which changed the default file to
> d/salsa-ci.yml
> 
> So basically every package in the med-team has the default config file set
> to d/salsa-ci.yml and pushing to any repository will trigger the CI
> 
> [ **NOTE: please don't mass push to hundreds of repos at once** ]
> 
> However, for any "NEW" packages you push, this file will not be set.
> Please consider doing it with migrating to Settings > CI/CD and changing
> the config file, or run the command(documented in `salsa` manpage)
> 
> $ salsa update_repo --ci-config-path debian/salsa-ci.yml
> med-team/
> 
> You might need elevated permissions for the above command to work, though
> 
> I'll try propagating it in the inject-to-salsa-git when I get a bit more
> time (it'll take a couple of days more at least)
> 
> PS: I have fixed the config for all new packages that Steffen pushed in
> past few days so everything should be nice for now.
> 
> Nilesh

-- 
http://fam-tille.de



Re: CuteSV (Was: PyEnsembl - how does that help us?)

2021-05-23 Thread Andreas Tille
On Mon, May 24, 2021 at 01:26:47AM +0200, Steffen Möller wrote:
> 
> Should there be a bug report about the missing test data in the pypi
> tarball?

May be.  But to my observation it is very common that the PyPI tarball
is lacking something we can profit for our packaging.  That's why I
prefer the Github tarball - provided it is tagged which is unfortunately
not always the case.  Seems like fighting against windmills to teach
upstreams sometimes. :-(

Kind regards

   Andreas.

-- 
http://fam-tille.de



Nanoplot - ORCA -> kaleido

2021-05-23 Thread Steffen Möller
Nanoplot was blocked by the tricky-to-package orca extension of plotly.
Orca seems to be substitute by kaleido, which is also tricky to package.

The idea is to use some magic of chromium that is made available via
docker to craft an executable that can transform SVGs in web outputs.  I
am tempted to think that this of general interest. They say that pypi
and conda can just install this a a one-liner, well, so can Debian once
it is packaged, but I have no clue how to get there.

Steffen




Re: CuteSV (Was: PyEnsembl - how does that help us?)

2021-05-23 Thread Steffen Möller


Am 24.05.21 um 00:36 schrieb tony mancill:
> On Sun, May 23, 2021 at 11:36:43PM +0530, Nilesh Patra wrote:
>> On 5/23/21 10:00 PM, tony mancill wrote:
>>> On Sun, May 23, 2021 at 06:16:42PM +0200, Steffen Möller wrote:
 https://salsa.debian.org/med-team/catfishq
 is ready for review+sponsoring.
 Many thanks!
>>> Hello Steffen, hi Debian Med@
>>>
>>> Since I have worked on the fastqc package in the past and am trying to
>>> increase my knowledge about FASTQ in general, I am planning to review
>>> catfishq and sponsor an upload.  
>>>
>>> I am posting to try to avoid duplicated efforts and because many on the
>>> team are much faster than I am at reviewing and uploading... :)  
>> The github (not pypi) repository for this contains test data, please 
>> consider adding that
>> and running autopkgtests on it, before uploading.
>> The data size is barely 8K, so it can directly to d/tests dir
> Hi Nilesh,
>
> I didn't see this until after I uploaded but I like the idea.  We appear
> to be on the same wavelength!  :)
>
> I used the file from github, but since the test data is tiny (21 bytes)
> and not a binary file, I didn't bother with a repack.  It is included in
> debian/tests/ [1] and the source documented in debian/copyright [2].
>
>
> [1] 
> https://salsa.debian.org/med-team/catfishq/-/commit/b74ea1c6a2cd5e7785de92389a354b5e01de6f33
> [2] 
> https://salsa.debian.org/med-team/catfishq/-/commit/fc82fefbad239d5e86c0374896a3af72d2e90e2f

I have seen the upload - thank you both!

Should there be a bug report about the missing test data in the pypi
tarball?

Best,
Steffen





Re: CuteSV (Was: PyEnsembl - how does that help us?)

2021-05-23 Thread tony mancill
On Sun, May 23, 2021 at 11:36:43PM +0530, Nilesh Patra wrote:
> On 5/23/21 10:00 PM, tony mancill wrote:
> > On Sun, May 23, 2021 at 06:16:42PM +0200, Steffen Möller wrote:
> >> https://salsa.debian.org/med-team/catfishq
> >> is ready for review+sponsoring.
> >> Many thanks!
> > 
> > Hello Steffen, hi Debian Med@
> > 
> > Since I have worked on the fastqc package in the past and am trying to
> > increase my knowledge about FASTQ in general, I am planning to review
> > catfishq and sponsor an upload.  
> > 
> > I am posting to try to avoid duplicated efforts and because many on the
> > team are much faster than I am at reviewing and uploading... :)  
> 
> The github (not pypi) repository for this contains test data, please consider 
> adding that
> and running autopkgtests on it, before uploading.
> The data size is barely 8K, so it can directly to d/tests dir

Hi Nilesh,

I didn't see this until after I uploaded but I like the idea.  We appear
to be on the same wavelength!  :)

I used the file from github, but since the test data is tiny (21 bytes)
and not a binary file, I didn't bother with a repack.  It is included in
debian/tests/ [1] and the source documented in debian/copyright [2].

Cheers,
tony

[1] 
https://salsa.debian.org/med-team/catfishq/-/commit/b74ea1c6a2cd5e7785de92389a354b5e01de6f33
[2] 
https://salsa.debian.org/med-team/catfishq/-/commit/fc82fefbad239d5e86c0374896a3af72d2e90e2f


signature.asc
Description: PGP signature


Please keep enabling salsa CI for any packages you work on

2021-05-23 Thread Nilesh Patra
Hi,

Few days earlier I ran a script which changed the default file to
d/salsa-ci.yml

So basically every package in the med-team has the default config file set
to d/salsa-ci.yml and pushing to any repository will trigger the CI

[ **NOTE: please don't mass push to hundreds of repos at once** ]

However, for any "NEW" packages you push, this file will not be set.
Please consider doing it with migrating to Settings > CI/CD and changing
the config file, or run the command(documented in `salsa` manpage)

$ salsa update_repo --ci-config-path debian/salsa-ci.yml
med-team/

You might need elevated permissions for the above command to work, though

I'll try propagating it in the inject-to-salsa-git when I get a bit more
time (it'll take a couple of days more at least)

PS: I have fixed the config for all new packages that Steffen pushed in
past few days so everything should be nice for now.

Nilesh


ont-fast5-api (Was: libvbz-hdf-plugin (Was: CuteSV))

2021-05-23 Thread Andreas Tille
Hi Nilesh,

On Sun, May 23, 2021 at 03:07:09PM +0530, Nilesh Patra wrote:
> >now and I've also uploaded to new.  I admit I forgot for what
> >final target I was working on this lib. :-(
> 
> This was the final target: https://salsa.debian.org/med-team/ont-fast5-api

I've commited some changes to ont-fast5-api.  When I try to run any
of the scripts installed into the package I get:

pkg_resources.DistributionNotFound: The 'progressbar33>=2.3.1' distribution 
was not found and is required by ont-fast5-api

I just leave this for somebody else since I'm tired now and
offline tomorrow.
 
> And it unlocks a bunch of other packages.

Cool.

Kind regards

  Andreas. 

-- 
http://fam-tille.de



Re: CuteSV (Was: PyEnsembl - how does that help us?)

2021-05-23 Thread Nilesh Patra


On 5/23/21 10:00 PM, tony mancill wrote:
> On Sun, May 23, 2021 at 06:16:42PM +0200, Steffen Möller wrote:
>> https://salsa.debian.org/med-team/catfishq
>> is ready for review+sponsoring.
>> Many thanks!
> 
> Hello Steffen, hi Debian Med@
> 
> Since I have worked on the fastqc package in the past and am trying to
> increase my knowledge about FASTQ in general, I am planning to review
> catfishq and sponsor an upload.  
> 
> I am posting to try to avoid duplicated efforts and because many on the
> team are much faster than I am at reviewing and uploading... :)  

The github (not pypi) repository for this contains test data, please consider 
adding that
and running autopkgtests on it, before uploading.
The data size is barely 8K, so it can directly to d/tests dir

Nilesh



signature.asc
Description: OpenPGP digital signature


Re: CuteSV (Was: PyEnsembl - how does that help us?)

2021-05-23 Thread Steffen Möller


Am 23.05.21 um 18:30 schrieb tony mancill:
> On Sun, May 23, 2021 at 06:16:42PM +0200, Steffen Möller wrote:
>> https://salsa.debian.org/med-team/catfishq
>> is ready for review+sponsoring.
>> Many thanks!
> Hello Steffen, hi Debian Med@
>
> Since I have worked on the fastqc package in the past and am trying to
> increase my knowledge about FASTQ in general, I am planning to review
> catfishq and sponsor an upload.
>
> I am posting to try to avoid duplicated efforts and because many on the
> team are much faster than I am at reviewing and uploading... :)

Much appreciated, many thanks!

Steffen



Re: CuteSV (Was: PyEnsembl - how does that help us?)

2021-05-23 Thread tony mancill
On Sun, May 23, 2021 at 06:16:42PM +0200, Steffen Möller wrote:
> https://salsa.debian.org/med-team/catfishq
> is ready for review+sponsoring.
> Many thanks!

Hello Steffen, hi Debian Med@

Since I have worked on the fastqc package in the past and am trying to
increase my knowledge about FASTQ in general, I am planning to review
catfishq and sponsor an upload.  

I am posting to try to avoid duplicated efforts and because many on the
team are much faster than I am at reviewing and uploading... :)  

Cheers,
tony


signature.asc
Description: PGP signature


Re: CuteSV (Was: PyEnsembl - how does that help us?)

2021-05-23 Thread Steffen Möller
https://salsa.debian.org/med-team/catfishq
is ready for review+sponsoring.
Many thanks!
Steffen

Am 23.05.21 um 16:18 schrieb Steffen Möller:
>
>
> Am 23.05.21 um 14:26 schrieb Steffen Möller:
>> Am 23.05.21 um 00:02 schrieb Nilesh Patra:
>>> On 5/23/21 2:54 AM, Andreas Tille wrote:
 On Sat, May 22, 2021 at 09:10:46AM +0200, Andreas Tille wrote:
> On Fri, May 21, 2021 at 09:26:48PM +0200, Steffen Möller wrote:
>> If someone needs a stimulus to package something - cuteSV
>> (https://github.com/tjiangHIT/cuteSV), please.
> I gave it a kickstart while sitting in the train (which will be
> offline soon).  Everybody can feel free to add own ID to Uploaders
> and finalise.  There is no build time test running now and no
> autopkgtest.  Data to test / benchmark are included - so this
> should be feasible.
 I just packaged the precondition python3-cigar and uploaded to new.
>>> I wrote a sample autopkgtest for cigar (basically used the same thingy in 
>>> the readme)
>>> and did a few minor changes.
>>>
>>> I have no idea about autopkgtests for cutesv - I lack the pre-requistites 
>>> here and probably only Steffen can help here.
>>>
>>> PS: Please check and upload vbz-compression whenever you have time (after 
>>> two days as you wrote would be fine anyway)
>>> I'll be inactive/be away for a couple of days (wish to take a break :-))
>> Thank you both, you are amazing!
>>
>> CuteSV is part of the
>> https://github.com/nanoporetech/pipeline-structural-variation that I
>> plan to run when first Nanopore reads surface in my inbox next week. You
>> compare against a reference genome to run this, which we do not have in
>> Debian, so, yes, we should think of some tests, but we should also find
>> a way to perform such tests for other packages.
>>
>> This kind of leads to a follow-up question - we could have a "test
>> package" that offers a fraction of the human genome, like the Y
>> chromosome and a second - chromosome 22 maybe. That would not be too big
>> and we can test with it. It would also be a bit meaningless, though. And
>> for testing we do not need anything to be human (or real) in the first
>> place. We could generate our own mini-genome or instead (which I would
>> prefer) go for something small that is real, like yeast (for
>> eukaryotes), E. coli (for bacteria), we ignore archea, and then .. there
>> is https://www.ncbi.nlm.nih.gov/nuccore/CP014940 , i.e. that data fr C.
>> Venter's
>> https://www.jcvi.org/research/first-minimal-synthetic-bacterial-cell,
>> which may be interesting to be distributed with an Open Source
>> distribution.
>>
>> While there is always something novel found also for these genomes for
>> which the genomic DNA is long known, we do not much harm by distributing
>> such genomes. Professional researchers will update them, anyway. The
>> same holds for the human genome, but it is a bit larger and we should
>> possibly make our experiences with the smaller genomes, first.
>>
>> I'll let this think in for another while and then likely extend getData
>> to deal with these genomes and auto-generate native Debian packages with it.
>>
>> Ok - back to some real work and I'll have a closer look at that pipeline.
>
> I just went through their snakemakefile. To get this running, we need
>
> * catfishq
> https://github.com/philres/catfishq
> 
> * lra (long read aligner)  https://github.com/ChaissonLab/LRA
> 
> * truvari https://github.com/spiralgenetics/truvari/
> 
> * add the scripts to libvcflib1/new package vcflib-scripts
>
> Catfishq looks straight-forward, I'll just go and adress that. LRA is
> a meson build with "subprojects" that wrap other bits. Truvari drags
> in a few python packages that in part we do not have, yet . Have added
> that info to the Nanopore tab on
> https://docs.google.com/spreadsheets/d/1tApLhVqxRZ2VOuMH_aPUgFENQJfbLlB_PFH_Ah_q7hM/edit#gid=1806578173
>
> Best,
> Steffen
>
>


Re: vcflib does not install scripts - missing bgziptabix

2021-05-23 Thread Steffen Möller
After a look at the CMakeLists.txt file, I very much get the impression
that the current vcflib package also misses the BINs, not ony the SCRIPTs.

Best,
Steffen

Am 23.05.21 um 17:02 schrieb Pjotr Prins:
> The vcflib installer drops the R scripts
>
>   https://github.com/vcflib/vcflib/blob/master/CMakeLists.txt#L189
>
> not sure you'd want those?
>
> Pj.
>
> On Sun, May 23, 2021 at 03:47:02PM +0200, Michael R. Crusoe wrote:
>>Thanks for catching this, Steffen!
>>A bug report would be appreciated, so we don't lose track of this.
>>--
>>Michael R. Crusoe
>>
>>On Sun, May 23, 2021, 14:51 Steffen Möller <[1]steffen_moel...@gmx.de>
>>wrote:
>>
>>  Hello,
>>  cudaSV needs bgziptabix to be happy, which is how I became aware fo
>>  all
>>  the scripts in
>>  Med/libvcflib$ ls scripts/
>>  bed2region  plotXPEHH.R   vcfindelproximity
>>  vcfnulldotslashdot  vcfregionreduce_and_cut
>>  bgziptabix  vcf2bed.py   vcfindels
>>  vcfplotaltdiscrepancy.r  vcfregionreduce_pipe
>>  plotBfst.R  vcf2sqlite.pyvcfjoincalls
>>  vcfplotaltdiscrepancy.sh   vcfregionreduce_uncompressed
>>  plotHaplotypes.R  vcfbiallelic   vcfmultiallelic
>>  vcfplotsitediscrepancy.r   vcfremovenonATGC
>>  plotHapLrt.R  vcfclearid   vcfmultiway
>>  vcfplottstv.sh  vcfsnps
>>  plotPfst.R  vcfclearinfo   vcfmultiwayscripts
>>  vcfprintaltdiscrepancy.r   vcfsort
>>  plot_roc.r  vcfcomplex   vcfnobiallelicsnps
>>  vcfprintaltdiscrepancy.sh  vcf_strip_extra_headers
>>  plotSmoothed.R  vcffirstheader   vcfnoindels
>>  vcfqualfilter  vcfvarstats
>>  plotWCfst.R  vcfgtcompare.sh  vcfnosnps
>>  vcfregionreduce
>>  Med/libvcflib$ du -sh scripts/
>>  399Kscripts/
>>  being plain missing in libvcflib1 . Should there be a libvcflib-bin
>>  package? These scripts are expected to install together with vcflib,
>>  so
>>  maybe having them recommended is a good idea? The vcftools package
>>  is
>>  something independent.
>>  debian/rules removes *.R files, so someone on this list knows more
>>  about
>>  it all than me, please give some directions.
>>  Best,
>>  Steffen
>>
>> References
>>
>>1. mailto:steffen_moel...@gmx.de



Re: vcflib does not install scripts - missing bgziptabix

2021-05-23 Thread Pjotr Prins
The vcflib installer drops the R scripts

  https://github.com/vcflib/vcflib/blob/master/CMakeLists.txt#L189

not sure you'd want those?

Pj.

On Sun, May 23, 2021 at 03:47:02PM +0200, Michael R. Crusoe wrote:
>Thanks for catching this, Steffen!
>A bug report would be appreciated, so we don't lose track of this.
>--
>Michael R. Crusoe
> 
>On Sun, May 23, 2021, 14:51 Steffen Möller <[1]steffen_moel...@gmx.de>
>wrote:
> 
>  Hello,
>  cudaSV needs bgziptabix to be happy, which is how I became aware fo
>  all
>  the scripts in
>  Med/libvcflib$ ls scripts/
>  bed2region  plotXPEHH.R   vcfindelproximity
>  vcfnulldotslashdot  vcfregionreduce_and_cut
>  bgziptabix  vcf2bed.py   vcfindels
>  vcfplotaltdiscrepancy.r  vcfregionreduce_pipe
>  plotBfst.R  vcf2sqlite.pyvcfjoincalls
>  vcfplotaltdiscrepancy.sh   vcfregionreduce_uncompressed
>  plotHaplotypes.R  vcfbiallelic   vcfmultiallelic
>  vcfplotsitediscrepancy.r   vcfremovenonATGC
>  plotHapLrt.R  vcfclearid   vcfmultiway
>  vcfplottstv.sh  vcfsnps
>  plotPfst.R  vcfclearinfo   vcfmultiwayscripts
>  vcfprintaltdiscrepancy.r   vcfsort
>  plot_roc.r  vcfcomplex   vcfnobiallelicsnps
>  vcfprintaltdiscrepancy.sh  vcf_strip_extra_headers
>  plotSmoothed.R  vcffirstheader   vcfnoindels
>  vcfqualfilter  vcfvarstats
>  plotWCfst.R  vcfgtcompare.sh  vcfnosnps
>  vcfregionreduce
>  Med/libvcflib$ du -sh scripts/
>  399Kscripts/
>  being plain missing in libvcflib1 . Should there be a libvcflib-bin
>  package? These scripts are expected to install together with vcflib,
>  so
>  maybe having them recommended is a good idea? The vcftools package
>  is
>  something independent.
>  debian/rules removes *.R files, so someone on this list knows more
>  about
>  it all than me, please give some directions.
>  Best,
>  Steffen
> 
> References
> 
>1. mailto:steffen_moel...@gmx.de



Re: vcflib does not install scripts - missing bgziptabix

2021-05-23 Thread Steffen Möller
Hi Michael,

I'd happily address this rightaway, but what do you suggest? A separate
arch-indep package sounds appropriate to me.

On a sidenote - the Nanopore Structural Variant pipeline also needs vcfsort.

Best,

Steffen

Am 23.05.21 um 15:47 schrieb Michael R. Crusoe:
> Thanks for catching this, Steffen!
>
> A bug report would be appreciated, so we don't lose track of this.
>
> -- 
> Michael R. Crusoe
>
> On Sun, May 23, 2021, 14:51 Steffen Möller  > wrote:
>
> Hello,
>
> cudaSV needs bgziptabix to be happy, which is how I became aware
> fo all
> the scripts in
>
> Med/libvcflib$ ls scripts/
> bed2region      plotXPEHH.R       vcfindelproximity  
> vcfnulldotslashdot      vcfregionreduce_and_cut
> bgziptabix      vcf2bed.py       vcfindels      
> vcfplotaltdiscrepancy.r      vcfregionreduce_pipe
> plotBfst.R      vcf2sqlite.py    vcfjoincalls   
> vcfplotaltdiscrepancy.sh   vcfregionreduce_uncompressed
> plotHaplotypes.R  vcfbiallelic       vcfmultiallelic
> vcfplotsitediscrepancy.r   vcfremovenonATGC
> plotHapLrt.R      vcfclearid       vcfmultiway      
> vcfplottstv.sh          vcfsnps
> plotPfst.R      vcfclearinfo       vcfmultiwayscripts 
> vcfprintaltdiscrepancy.r   vcfsort
> plot_roc.r      vcfcomplex       vcfnobiallelicsnps 
> vcfprintaltdiscrepancy.sh  vcf_strip_extra_headers
> plotSmoothed.R      vcffirstheader   vcfnoindels      
> vcfqualfilter          vcfvarstats
> plotWCfst.R      vcfgtcompare.sh  vcfnosnps       vcfregionreduce
> Med/libvcflib$ du -sh scripts/
> 399K    scripts/
>
> being plain missing in libvcflib1 . Should there be a libvcflib-bin
> package? These scripts are expected to install together with
> vcflib, so
> maybe having them recommended is a good idea? The vcftools package is
> something independent.
>
> debian/rules removes *.R files, so someone on this list knows more
> about
> it all than me, please give some directions.
>
> Best,
> Steffen
>


Re: CuteSV (Was: PyEnsembl - how does that help us?)

2021-05-23 Thread Steffen Möller

Am 23.05.21 um 14:26 schrieb Steffen Möller:
> Am 23.05.21 um 00:02 schrieb Nilesh Patra:
>> On 5/23/21 2:54 AM, Andreas Tille wrote:
>>> On Sat, May 22, 2021 at 09:10:46AM +0200, Andreas Tille wrote:
 On Fri, May 21, 2021 at 09:26:48PM +0200, Steffen Möller wrote:
> If someone needs a stimulus to package something - cuteSV
> (https://github.com/tjiangHIT/cuteSV), please.
 I gave it a kickstart while sitting in the train (which will be
 offline soon).  Everybody can feel free to add own ID to Uploaders
 and finalise.  There is no build time test running now and no
 autopkgtest.  Data to test / benchmark are included - so this
 should be feasible.
>>> I just packaged the precondition python3-cigar and uploaded to new.
>> I wrote a sample autopkgtest for cigar (basically used the same thingy in 
>> the readme)
>> and did a few minor changes.
>>
>> I have no idea about autopkgtests for cutesv - I lack the pre-requistites 
>> here and probably only Steffen can help here.
>>
>> PS: Please check and upload vbz-compression whenever you have time (after 
>> two days as you wrote would be fine anyway)
>> I'll be inactive/be away for a couple of days (wish to take a break :-))
> Thank you both, you are amazing!
>
> CuteSV is part of the
> https://github.com/nanoporetech/pipeline-structural-variation that I
> plan to run when first Nanopore reads surface in my inbox next week. You
> compare against a reference genome to run this, which we do not have in
> Debian, so, yes, we should think of some tests, but we should also find
> a way to perform such tests for other packages.
>
> This kind of leads to a follow-up question - we could have a "test
> package" that offers a fraction of the human genome, like the Y
> chromosome and a second - chromosome 22 maybe. That would not be too big
> and we can test with it. It would also be a bit meaningless, though. And
> for testing we do not need anything to be human (or real) in the first
> place. We could generate our own mini-genome or instead (which I would
> prefer) go for something small that is real, like yeast (for
> eukaryotes), E. coli (for bacteria), we ignore archea, and then .. there
> is https://www.ncbi.nlm.nih.gov/nuccore/CP014940 , i.e. that data fr C.
> Venter's
> https://www.jcvi.org/research/first-minimal-synthetic-bacterial-cell,
> which may be interesting to be distributed with an Open Source
> distribution.
>
> While there is always something novel found also for these genomes for
> which the genomic DNA is long known, we do not much harm by distributing
> such genomes. Professional researchers will update them, anyway. The
> same holds for the human genome, but it is a bit larger and we should
> possibly make our experiences with the smaller genomes, first.
>
> I'll let this think in for another while and then likely extend getData
> to deal with these genomes and auto-generate native Debian packages with it.
>
> Ok - back to some real work and I'll have a closer look at that pipeline.

I just went through their snakemakefile. To get this running, we need

* catfishq
https://github.com/philres/catfishq

* lra (long read aligner)  https://github.com/ChaissonLab/LRA

* truvari https://github.com/spiralgenetics/truvari/

* add the scripts to libvcflib1/new package vcflib-scripts

Catfishq looks straight-forward, I'll just go and adress that. LRA is a
meson build with "subprojects" that wrap other bits. Truvari drags in a
few python packages that in part we do not have, yet . Have added that
info to the Nanopore tab on
https://docs.google.com/spreadsheets/d/1tApLhVqxRZ2VOuMH_aPUgFENQJfbLlB_PFH_Ah_q7hM/edit#gid=1806578173

Best,
Steffen




Re: CuteSV (Was: PyEnsembl - how does that help us?)

2021-05-23 Thread Nilesh Patra


On 5/23/21 5:56 PM, Steffen Möller wrote:
> 
> Am 23.05.21 um 00:02 schrieb Nilesh Patra:
>>
>> On 5/23/21 2:54 AM, Andreas Tille wrote:
>>> On Sat, May 22, 2021 at 09:10:46AM +0200, Andreas Tille wrote:
 On Fri, May 21, 2021 at 09:26:48PM +0200, Steffen Möller wrote:
> If someone needs a stimulus to package something - cuteSV
> (https://github.com/tjiangHIT/cuteSV), please.
 I gave it a kickstart while sitting in the train (which will be
 offline soon).  Everybody can feel free to add own ID to Uploaders
 and finalise.  There is no build time test running now and no
 autopkgtest.  Data to test / benchmark are included - so this
 should be feasible.
>>> I just packaged the precondition python3-cigar and uploaded to new.
>> I wrote a sample autopkgtest for cigar (basically used the same thingy in 
>> the readme)
>> and did a few minor changes.
>>
>> I have no idea about autopkgtests for cutesv - I lack the pre-requistites 
>> here and probably only Steffen can help here.
>>
>> PS: Please check and upload vbz-compression whenever you have time (after 
>> two days as you wrote would be fine anyway)
>> I'll be inactive/be away for a couple of days (wish to take a break :-))
> 
> Thank you both, you are amazing!
> 
> CuteSV is part of the
> https://github.com/nanoporetech/pipeline-structural-variation that I
> plan to run when first Nanopore reads surface in my inbox next week. You
> compare against a reference genome to run this, which we do not have in
> Debian, so, yes, we should think of some tests, but we should also find
> a way to perform such tests for other packages.
> 
> This kind of leads to a follow-up question - we could have a "test
> package" that offers a fraction of the human genome, like the Y
> chromosome and a second - chromosome 22 maybe. That would not be too big
> and we can test with it. It would also be a bit meaningless, though. And
> for testing we do not need anything to be human (or real) in the first
> place. We could generate our own mini-genome or instead (which I would
> prefer) go for something small that is real, like yeast (for
> eukaryotes), E. coli (for bacteria), we ignore archea, and then .. there
> is https://www.ncbi.nlm.nih.gov/nuccore/CP014940 , i.e. that data fr C.
> Venter's
> https://www.jcvi.org/research/first-minimal-synthetic-bacterial-cell,
> which may be interesting to be distributed with an Open Source
> distribution.


Sounds good, but please take these factors in consideration:

* The debci machines have typically have space of ~40 GiB. If the data you 
refer here
is even a few GiB, all packages using it for tests will turn into RC bugs

* If the size of data is _not_ in line of an RC bug, but still *huge* - and 
used in large number of tests, it'll be a pain for us to maintain
it ourselves and also not the best for end users who might want to download 
test data

I had listed more reasons in a previous mail when a discussion regarding 
"centralised test data" was going on,
please take a look here too:

https://lists.debian.org/debian-med/2020/09/msg00365.html

> While there is always something novel found also for these genomes for
> which the genomic DNA is long known, we do not much harm by distributing
> such genomes. Professional researchers will update them, anyway. The
> same holds for the human genome, but it is a bit larger and we should
> possibly make our experiences with the smaller genomes, first.

If smaller genome sizes, analysis of which renders output sequences which 
aren't too large in size, it can be done.

> I'll let this think in for another while and then likely extend getData
> to deal with these genomes and auto-generate native Debian packages with it.
>
> Ok - back to some real work and I'll have a closer look at that pipeline.

* Thumbs up *

Nilesh
 



signature.asc
Description: OpenPGP digital signature


Re: vcflib does not install scripts - missing bgziptabix

2021-05-23 Thread Michael R. Crusoe
Thanks for catching this, Steffen!

A bug report would be appreciated, so we don't lose track of this.

--
Michael R. Crusoe

On Sun, May 23, 2021, 14:51 Steffen Möller  wrote:

> Hello,
>
> cudaSV needs bgziptabix to be happy, which is how I became aware fo all
> the scripts in
>
> Med/libvcflib$ ls scripts/
> bed2region  plotXPEHH.R   vcfindelproximity
> vcfnulldotslashdot  vcfregionreduce_and_cut
> bgziptabix  vcf2bed.py   vcfindels
> vcfplotaltdiscrepancy.r  vcfregionreduce_pipe
> plotBfst.R  vcf2sqlite.pyvcfjoincalls
> vcfplotaltdiscrepancy.sh   vcfregionreduce_uncompressed
> plotHaplotypes.R  vcfbiallelic   vcfmultiallelic
> vcfplotsitediscrepancy.r   vcfremovenonATGC
> plotHapLrt.R  vcfclearid   vcfmultiway
> vcfplottstv.sh  vcfsnps
> plotPfst.R  vcfclearinfo   vcfmultiwayscripts
> vcfprintaltdiscrepancy.r   vcfsort
> plot_roc.r  vcfcomplex   vcfnobiallelicsnps
> vcfprintaltdiscrepancy.sh  vcf_strip_extra_headers
> plotSmoothed.R  vcffirstheader   vcfnoindels
> vcfqualfilter  vcfvarstats
> plotWCfst.R  vcfgtcompare.sh  vcfnosnps   vcfregionreduce
> Med/libvcflib$ du -sh scripts/
> 399Kscripts/
>
> being plain missing in libvcflib1 . Should there be a libvcflib-bin
> package? These scripts are expected to install together with vcflib, so
> maybe having them recommended is a good idea? The vcftools package is
> something independent.
>
> debian/rules removes *.R files, so someone on this list knows more about
> it all than me, please give some directions.
>
> Best,
> Steffen
>
>


Re: libvbz-hdf-plugin (Was: CuteSV (Was: PyEnsembl - how does that help us?))

2021-05-23 Thread Steffen Möller


Am 23.05.21 um 11:37 schrieb Nilesh Patra:
>
> On 23 May 2021 2:06:00 pm IST, Andreas Tille  wrote:
>> Hi Nilesh,
>>
>> On Sun, May 23, 2021 at 03:32:55AM +0530, Nilesh Patra wrote:
>>> PS: Please check and upload vbz-compression whenever you have time
>> (after two days as you wrote would be fine anyway)
>>> I'll be inactive/be away for a couple of days (wish to take a break
>> :-))
>>
>> I've renamed the binary and source packages to match the soname.
>> You can find the package here
>>
>>https://salsa.debian.org/med-team/libvbz-hdf-plugin
>>
>> now and I've also uploaded to new.  I admit I forgot for what
>> final target I was working on this lib. :-(
> This was the final target: https://salsa.debian.org/med-team/ont-fast5-api
>
> And it unlocks a bunch of other packages.

Nice!

Steffen



vcflib does not install scripts - missing bgziptabix

2021-05-23 Thread Steffen Möller
Hello,

cudaSV needs bgziptabix to be happy, which is how I became aware fo all
the scripts in

Med/libvcflib$ ls scripts/
bed2region      plotXPEHH.R       vcfindelproximity  
vcfnulldotslashdot      vcfregionreduce_and_cut
bgziptabix      vcf2bed.py       vcfindels      
vcfplotaltdiscrepancy.r      vcfregionreduce_pipe
plotBfst.R      vcf2sqlite.py    vcfjoincalls   
vcfplotaltdiscrepancy.sh   vcfregionreduce_uncompressed
plotHaplotypes.R  vcfbiallelic       vcfmultiallelic
vcfplotsitediscrepancy.r   vcfremovenonATGC
plotHapLrt.R      vcfclearid       vcfmultiway      
vcfplottstv.sh          vcfsnps
plotPfst.R      vcfclearinfo       vcfmultiwayscripts 
vcfprintaltdiscrepancy.r   vcfsort
plot_roc.r      vcfcomplex       vcfnobiallelicsnps 
vcfprintaltdiscrepancy.sh  vcf_strip_extra_headers
plotSmoothed.R      vcffirstheader   vcfnoindels      
vcfqualfilter          vcfvarstats
plotWCfst.R      vcfgtcompare.sh  vcfnosnps       vcfregionreduce
Med/libvcflib$ du -sh scripts/
399K    scripts/

being plain missing in libvcflib1 . Should there be a libvcflib-bin
package? These scripts are expected to install together with vcflib, so
maybe having them recommended is a good idea? The vcftools package is
something independent.

debian/rules removes *.R files, so someone on this list knows more about
it all than me, please give some directions.

Best,
Steffen



Re: CuteSV (Was: PyEnsembl - how does that help us?)

2021-05-23 Thread Steffen Möller


Am 23.05.21 um 00:02 schrieb Nilesh Patra:
>
> On 5/23/21 2:54 AM, Andreas Tille wrote:
>> On Sat, May 22, 2021 at 09:10:46AM +0200, Andreas Tille wrote:
>>> On Fri, May 21, 2021 at 09:26:48PM +0200, Steffen Möller wrote:
 If someone needs a stimulus to package something - cuteSV
 (https://github.com/tjiangHIT/cuteSV), please.
>>> I gave it a kickstart while sitting in the train (which will be
>>> offline soon).  Everybody can feel free to add own ID to Uploaders
>>> and finalise.  There is no build time test running now and no
>>> autopkgtest.  Data to test / benchmark are included - so this
>>> should be feasible.
>> I just packaged the precondition python3-cigar and uploaded to new.
> I wrote a sample autopkgtest for cigar (basically used the same thingy in the 
> readme)
> and did a few minor changes.
>
> I have no idea about autopkgtests for cutesv - I lack the pre-requistites 
> here and probably only Steffen can help here.
>
> PS: Please check and upload vbz-compression whenever you have time (after two 
> days as you wrote would be fine anyway)
> I'll be inactive/be away for a couple of days (wish to take a break :-))

Thank you both, you are amazing!

CuteSV is part of the
https://github.com/nanoporetech/pipeline-structural-variation that I
plan to run when first Nanopore reads surface in my inbox next week. You
compare against a reference genome to run this, which we do not have in
Debian, so, yes, we should think of some tests, but we should also find
a way to perform such tests for other packages.

This kind of leads to a follow-up question - we could have a "test
package" that offers a fraction of the human genome, like the Y
chromosome and a second - chromosome 22 maybe. That would not be too big
and we can test with it. It would also be a bit meaningless, though. And
for testing we do not need anything to be human (or real) in the first
place. We could generate our own mini-genome or instead (which I would
prefer) go for something small that is real, like yeast (for
eukaryotes), E. coli (for bacteria), we ignore archea, and then .. there
is https://www.ncbi.nlm.nih.gov/nuccore/CP014940 , i.e. that data fr C.
Venter's
https://www.jcvi.org/research/first-minimal-synthetic-bacterial-cell,
which may be interesting to be distributed with an Open Source
distribution.

While there is always something novel found also for these genomes for
which the genomic DNA is long known, we do not much harm by distributing
such genomes. Professional researchers will update them, anyway. The
same holds for the human genome, but it is a bit larger and we should
possibly make our experiences with the smaller genomes, first.

I'll let this think in for another while and then likely extend getData
to deal with these genomes and auto-generate native Debian packages with it.

Ok - back to some real work and I'll have a closer look at that pipeline.

Best,
Steffen








Re: libvbz-hdf-plugin (Was: CuteSV (Was: PyEnsembl - how does that help us?))

2021-05-23 Thread Nilesh Patra



On 23 May 2021 2:06:00 pm IST, Andreas Tille  wrote:
>Hi Nilesh,
>
>On Sun, May 23, 2021 at 03:32:55AM +0530, Nilesh Patra wrote:
>> PS: Please check and upload vbz-compression whenever you have time
>(after two days as you wrote would be fine anyway)
>> I'll be inactive/be away for a couple of days (wish to take a break
>:-))
>
>I've renamed the binary and source packages to match the soname.
>You can find the package here
>
>https://salsa.debian.org/med-team/libvbz-hdf-plugin
>
>now and I've also uploaded to new.  I admit I forgot for what
>final target I was working on this lib. :-(

This was the final target: https://salsa.debian.org/med-team/ont-fast5-api

And it unlocks a bunch of other packages.

Nilesh
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.



libvbz-hdf-plugin (Was: CuteSV (Was: PyEnsembl - how does that help us?))

2021-05-23 Thread Andreas Tille
Hi Nilesh,

On Sun, May 23, 2021 at 03:32:55AM +0530, Nilesh Patra wrote:
> PS: Please check and upload vbz-compression whenever you have time (after two 
> days as you wrote would be fine anyway)
> I'll be inactive/be away for a couple of days (wish to take a break :-))

I've renamed the binary and source packages to match the soname.
You can find the package here

https://salsa.debian.org/med-team/libvbz-hdf-plugin

now and I've also uploaded to new.  I admit I forgot for what
final target I was working on this lib. :-(

Kind regards

   Andreas.

-- 
http://fam-tille.de