Re: [Bioc-devel] GenomeInfoDB SeqInfo function error

2018-09-27 Thread Hervé Pagès

Hi Dario,

On 09/13/2018 09:18 AM, Dario Righelli wrote:

Hello everyone,

I'm using in DEScan2 package the GenomeInfoDb::Seqinfo function with 
genome="mm10".

And sometimes it appens to retrieve this error message

"cannot open the connection to 
'https://urldefense.proofpoint.com/v2/url?u=ftp-3A__ftp.ncbi.nlm.nih.gov_genomes_all_GCF_000_001_635_GCF-5F01635.20-5FGRCm38_GCF-5F01635.20-5FGRCm38-5Fassembly-5Freport.txt=DwICAg=eRAMFD45gAfqt84VtBcfhQ=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA=-DK9eRRfECV1yPP6ZtcqI5wTDmAcAznmepaSA1e4lRE=lJUG8lKk4WIQQWiOaXjM3CYfr-ksMFhs5svvIk6kUyY='"

Even if the the file is reachable.


I cannot reproduce this, not too surprisingly...

This kind of intermittent internet access problem is not uncommon
and typically hard to reproduce. GenomeInfoDb::Seqinfo() was trying
to download a file from ftp.ncbi.nlm.nih.gov and failed for some
reason. It could be because NCBI's FTP site was temporarily unavailable
or because of any other network problem between NCBI and the machine
where GenomeInfoDb::Seqinfo() was called. Unfortunately there is not
much we can do about these transient connectivity issues in general.

However we can mitigate them:

- One way to mitigate them though would be to use a caching mechanism
  e.g. to use BiocFileCache to store the data downloaded by
  GenomeInfoDb::Seqinfo(genoe="some_genome") locally the 1st time
  it's downloaded for a particular genome.

- Another way would be to have this data already included in
  GenomeInfoDb (or GenomeInfoDbData) for the most frequently used
  genomes. In addition, the caching mechanism could still be used
  for the other genomes.

- Another way to mitigate this maybe would be to have
  GenomeInfoDb::Seqinfo(genoe="some_genome") re-try the download
  a couple of times (after waiting 1 or 2 sec before re-trying)
  before giving up. This could be done in combination with the
  above features. The re-try feature could even be integrated to
  BiocFileCache.

Although for now my feeling is that this issue is maybe not so much
of an annoyance to justify putting these new developments high on
the TODO list.

Just throwing some random thoughts here. Don't know what others
think about this.



I noticed it because I received an ERROR report from the bioconductor test bot.
I have a unit test for my package that doesn't pass on linux, but it works on 
other machines.

Looking on the Internet, this seems like an old (solved) problem.


Would you mind sharing a link to this information? Thanks!

Cheers,
H.



What do you suggest to do?

thanks,
dario


[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel=DwICAg=eRAMFD45gAfqt84VtBcfhQ=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA=-DK9eRRfECV1yPP6ZtcqI5wTDmAcAznmepaSA1e4lRE=sYIOe-2EKFxkXyKVQFowbNaXORn4F0QUhjWWkqlSUpY=



--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


[Bioc-devel] GenomeInfoDB SeqInfo function error

2018-09-13 Thread Dario Righelli
Hello everyone,

I'm using in DEScan2 package the GenomeInfoDb::Seqinfo function with 
genome="mm10".

And sometimes it appens to retrieve this error message

"cannot open the connection to 
'ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/001/635/GCF_01635.20_GRCm38/GCF_01635.20_GRCm38_assembly_report.txt'"

Even if the the file is reachable.

I noticed it because I received an ERROR report from the bioconductor test bot.
I have a unit test for my package that doesn't pass on linux, but it works on 
other machines.

Looking on the Internet, this seems like an old (solved) problem.

What do you suggest to do?

thanks,
dario


[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel