Re: [Bioc-devel] Wrong skipping of tests when builidng on Bioconductor and R CMD check timeout

2023-12-12 Thread Jacopo Ronchi
Dear Hervé,

Thank you very much for your answer. Regarding the issue that my package
encounters during the building on SPB i had the same doubt. Indeed when i
include that variable locally in my Renviron file everything works as
expected (tests that should be skipped on Bioconductor are indeed ignored).
So maybe the slight differences in variables between the two build systems
might be the answer.

On the other hand, i did not consider the caching of resources used in
examples. Since i already use BiocFileCache in my package, i will extend
this also for other features used in examples! Thank you very much for this
very useful suggestion.

Kind regards,
Jacopo

Il mer 13 dic 2023, 00:00 Hervé Pagès  ha
scritto:

> Hi Jacopo,
>
> testthat::skip_on_bioc() relies on the IS_BIOC_BUILD_MACHINE environment
> variable to know whether it's on a BioC build machine or not.
>
> This environment variable is defined during the daily build via the
> Renviron.bioc file. Note that a link to this file is provided on the
> individual build reports e.g. here
> https://bioconductor.org/checkResults/3.19/bioc-LATEST/Biobase/
> ("Renviron settings" link).
>
> Maybe this environment variable is not defined on the Single Package
> Builder (SPB)? The SPB is the build system used during the package
> submission process. It runs on the same machines as the daily builds but my
> understanding is that it uses a slightly different set of variables. Maybe
> Lori can shed some light?
>
> As for the timeout on merida1 (Intel Mac), have you considered using
> BiocFileCache to cache the data that you download in your examples? You
> might still get a timeout the next time 'R CMD check' will run on our build
> machines, but it should go significantly faster after that.
>
> Best,
>
> H.
> On 12/12/23 07:22, Jacopo Ronchi wrote:
>
> Dear Developers,
>
> I am currently in the process of submitting my package on Bioconductor and
> I am facing some issues during the R CMD check on the Bioconductor Build
> System. Since I was not able to find any answers to my doubts, I decided to
> ask for your help before doing anything wrong.
>
> The build report for my package is available 
> here:http://bioconductor.org/spb_reports/MIRit_buildreport_20231211095232.html
>
> In particular, my package includes some functions where it accesses remote
> resources. Therefore, I included some "skip_on_bioc()" chunks at the
> beginning of these tests since I don't want my package to fail during the
> build process because of occasional down times. However, when I look at the
> build report, I notice that the relevant tests are not skipped.
> Furthermore, other tests that should be run are instead skipped on CRAN. I
> am referring to these lines:
>
> Skipped tests (2)
>On CRAN (2): 'test-topological-integration.R:23:5', 'test-utils.R:20:5'
>
> Lastly, I have an error during R CMD check on macOS, and I really don't
> know how to reduce the running time on this operating system. Currently, I
> have reshaped the testing suite to reduce the time spent on unit tests.
> However, on macOS, i guess that most of the time consumed is due to
> examples. Nevertheless, the most time consuming functions retrieve
> gene-sets from external resources and I can't reduce the download size of
> KEGG pathways, for example. What should I do?
>
> Sorry again for bothering you,
> Best regards,
> Jacopo
>
>   [[alternative HTML version deleted]]
>
> ___bioc-de...@r-project.org 
> mailing listhttps://stat.ethz.ch/mailman/listinfo/bioc-devel
>
> --
> Hervé Pagès
>
> Bioconductor Core teamhpages.on.git...@gmail.com
>
>

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Wrong skipping of tests when builidng on Bioconductor and R CMD check timeout

2023-12-12 Thread Hervé Pagès
Hi Jacopo,

testthat::skip_on_bioc() relies on the IS_BIOC_BUILD_MACHINE environment 
variable to know whether it's on a BioC build machine or not.

This environment variable is defined during the daily build via the 
Renviron.bioc file. Note that a link to this file is provided on the 
individual build reports e.g. here 
https://bioconductor.org/checkResults/3.19/bioc-LATEST/Biobase/ 
("Renviron settings" link).

Maybe this environment variable is not defined on the Single Package 
Builder (SPB)? The SPB is the build system used during the package 
submission process. It runs on the same machines as the daily builds but 
my understanding is that it uses a slightly different set of variables. 
Maybe Lori can shed some light?

As for the timeout on merida1 (Intel Mac), have you considered using 
BiocFileCache to cache the data that you download in your examples? You 
might still get a timeout the next time 'R CMD check' will run on our 
build machines, but it should go significantly faster after that.

Best,

H.

On 12/12/23 07:22, Jacopo Ronchi wrote:
> Dear Developers,
>
> I am currently in the process of submitting my package on Bioconductor and
> I am facing some issues during the R CMD check on the Bioconductor Build
> System. Since I was not able to find any answers to my doubts, I decided to
> ask for your help before doing anything wrong.
>
> The build report for my package is available here:
> http://bioconductor.org/spb_reports/MIRit_buildreport_20231211095232.html
>
> In particular, my package includes some functions where it accesses remote
> resources. Therefore, I included some "skip_on_bioc()" chunks at the
> beginning of these tests since I don't want my package to fail during the
> build process because of occasional down times. However, when I look at the
> build report, I notice that the relevant tests are not skipped.
> Furthermore, other tests that should be run are instead skipped on CRAN. I
> am referring to these lines:
>
> Skipped tests (2)
> On CRAN (2): 'test-topological-integration.R:23:5', 'test-utils.R:20:5'
>
> Lastly, I have an error during R CMD check on macOS, and I really don't
> know how to reduce the running time on this operating system. Currently, I
> have reshaped the testing suite to reduce the time spent on unit tests.
> However, on macOS, i guess that most of the time consumed is due to
> examples. Nevertheless, the most time consuming functions retrieve
> gene-sets from external resources and I can't reduce the download size of
> KEGG pathways, for example. What should I do?
>
> Sorry again for bothering you,
> Best regards,
> Jacopo
>
>   [[alternative HTML version deleted]]
>
> ___
> Bioc-devel@r-project.org  mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel

-- 
Hervé Pagès

Bioconductor Core Team
hpages.on.git...@gmail.com

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [R-pkg-devel] Wrong mailing list: Could the 100 byte path length limit be lifted?

2023-12-12 Thread Ben Bolker

  Thanks. Pursuing this a bit further, from ?tar "Known problems":

 The handling of file paths of more than 100 bytes.  These
were unsupported in early versions of ‘tar’, and supported in
one way by POSIX ‘tar’ and in another by GNU ‘tar’ and yet
another by the POSIX ‘pax’ command which recent ‘tar’
programs often support.  The internal implementation warns on
paths of more than 100 bytes, uses the ‘ustar’ way from the
1998 POSIX standard which supports up to 256 bytes (depending
on the path: in particular the final component is limited to
100 bytes) if possible, otherwise the GNU way (which is
widely supported, including by ‘untar’).

  This issue is reminiscent of the "invalid uid value replaced ..." 
warning, which has happened to me a lot but which CRAN has never 
actually flagged when I've submitted a package:


https://stackoverflow.com/questions/30599326/warning-message-during-building-an-r-package-invalid-uid-value-replaced-by-that

 (By the way, this thread now seems firmly on-topic for r-pkg-devel, 
even if the ultimate answers must come from the CRAN maintainers ...)


  cheers
   Ben



On 2023-12-12 3:41 p.m., Duncan Murdoch wrote:
I don't know what the warning looks like, but the ?tar help page 
discusses the issues.


Duncan Murdoch

On 12/12/2023 3:10 p.m., Ben Bolker wrote:

   FWIW the R-windows FAQ says:

Yet another complication is a 260 character limit on the length of the
entire path name imposed by Windows. The limit applies only to some
system functions, and hence it is possible to create a long path using
one application yet inaccessible to another. It is sometimes possible to
reduce the path length by creating a drive mapping using subst and
accessing files via that drive. As of Windows 10 version 1607 and R 4.3,
one can remove this limit via Windows registry by setting
Computer\HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\FileSystem\LongPathsEnabled
to 1. Long paths still may not always work reliably: some applications
or packages may not be able to work with them and Windows cannot execute
an application with long path as the current directory.

    I'm having trouble finding the specific check for path lengths > 100
in the R source tree.  It would be helpful to have the exact wording of
the NOTE/WARNING (?) that is thrown ...  (I know I could make my own
mini-package with a long path length in it somewhere but ...)

    cheers
 Ben Bolker

On 2023-12-12 2:57 p.m., Simon Urbanek wrote:

Justin,

now that you clarified what you are actually talking about, this is a 
question about the CRAN policies, so you should really direct it to 
the CRAN team as it is their decision (R-devel would be appropriate 
if this was a limitation in R itself, and R-package-devel would be 
appropriate if you wanted help with refactoring to adhere to the 
policy). There are still path limits on various platforms (even if 
they are becoming more rare), so I'd personally question the source 
rather than the policy, but then your email was remarkably devoid of 
any details.


Cheers,
Simon


On Dec 13, 2023, at 6:03 AM, McGrath, Justin M 
 wrote:


When submitting a package to CRAN, it is required that path names be 
shorter than 100 bytes, with the reason that paths longer than that 
cannot be made into portable tar files. This error is reported by `R 
CMD check --as-cran`. Since this pertains only to developing 
packages, this seemed like the appropriate list, but if you don't 
think so, I can instead ask on R-devel.


Best wishes,
Justin


From: Martin Maechler 
Sent: Tuesday, December 12, 2023 10:13 AM
To: McGrath, Justin M
Cc: r-package-devel@r-project.org
Subject: Wrong mailing list: [R-pkg-devel] Could the 100 byte path 
length limit be lifted?



McGrath, Justin M
 on Tue, 12 Dec 2023 15:03:28 + writes:


We include other software in our source code. It has some long 
paths so a few of the files end up with paths longer than 100 
bytes, and we need to manually rename them whenever we pull in 
updates.

The 100 byte path limit is from tar v7, and since
POSIX1.1988, there has not been a path length limit. That
standard is 35 years old now, so given that there is
probably no one using an old version of tar that also
wants to use the latest version of R, could the 100 byte
limit be lifted? Incidentally, I am a big proponent of
wide, long-term support, but it's hard to see that this
change would negatively impact anyone.



Best wishes,
Justin


Wrong mailing list:

This is a topic for R-devel,  not at all R-package-devel,
but be more accurate in what you are talking about, only between
the line I could read that it is about some variants of using
'tar'.

Best regards,
Martin
---

Martin Maechler
ETH Zurich  and  R Core team

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel



__

Re: [R-pkg-devel] Wrong mailing list: Could the 100 byte path length limit be lifted?

2023-12-12 Thread Duncan Murdoch
I don't know what the warning looks like, but the ?tar help page 
discusses the issues.


Duncan Murdoch

On 12/12/2023 3:10 p.m., Ben Bolker wrote:

   FWIW the R-windows FAQ says:

Yet another complication is a 260 character limit on the length of the
entire path name imposed by Windows. The limit applies only to some
system functions, and hence it is possible to create a long path using
one application yet inaccessible to another. It is sometimes possible to
reduce the path length by creating a drive mapping using subst and
accessing files via that drive. As of Windows 10 version 1607 and R 4.3,
one can remove this limit via Windows registry by setting
Computer\HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\FileSystem\LongPathsEnabled
to 1. Long paths still may not always work reliably: some applications
or packages may not be able to work with them and Windows cannot execute
an application with long path as the current directory.

I'm having trouble finding the specific check for path lengths > 100
in the R source tree.  It would be helpful to have the exact wording of
the NOTE/WARNING (?) that is thrown ...  (I know I could make my own
mini-package with a long path length in it somewhere but ...)

cheers
 Ben Bolker

On 2023-12-12 2:57 p.m., Simon Urbanek wrote:

Justin,

now that you clarified what you are actually talking about, this is a question 
about the CRAN policies, so you should really direct it to the CRAN team as it 
is their decision (R-devel would be appropriate if this was a limitation in R 
itself, and R-package-devel would be appropriate if you wanted help with 
refactoring to adhere to the policy). There are still path limits on various 
platforms (even if they are becoming more rare), so I'd personally question the 
source rather than the policy, but then your email was remarkably devoid of any 
details.

Cheers,
Simon



On Dec 13, 2023, at 6:03 AM, McGrath, Justin M  wrote:

When submitting a package to CRAN, it is required that path names be shorter 
than 100 bytes, with the reason that paths longer than that cannot be made into 
portable tar files. This error is reported by `R CMD check --as-cran`. Since 
this pertains only to developing packages, this seemed like the appropriate 
list, but if you don't think so, I can instead ask on R-devel.

Best wishes,
Justin


From: Martin Maechler 
Sent: Tuesday, December 12, 2023 10:13 AM
To: McGrath, Justin M
Cc: r-package-devel@r-project.org
Subject: Wrong mailing list: [R-pkg-devel] Could the 100 byte path length limit 
be lifted?


McGrath, Justin M
 on Tue, 12 Dec 2023 15:03:28 + writes:



We include other software in our source code. It has some long paths so a few 
of the files end up with paths longer than 100 bytes, and we need to manually 
rename them whenever we pull in updates.
The 100 byte path limit is from tar v7, and since
POSIX1.1988, there has not been a path length limit. That
standard is 35 years old now, so given that there is
probably no one using an old version of tar that also
wants to use the latest version of R, could the 100 byte
limit be lifted? Incidentally, I am a big proponent of
wide, long-term support, but it's hard to see that this
change would negatively impact anyone.



Best wishes,
Justin


Wrong mailing list:

This is a topic for R-devel,  not at all R-package-devel,
but be more accurate in what you are talking about, only between
the line I could read that it is about some variants of using
'tar'.

Best regards,
Martin
---

Martin Maechler
ETH Zurich  and  R Core team

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel



__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] Wrong mailing list: Could the 100 byte path length limit be lifted?

2023-12-12 Thread Ben Bolker

 FWIW the R-windows FAQ says:

Yet another complication is a 260 character limit on the length of the 
entire path name imposed by Windows. The limit applies only to some 
system functions, and hence it is possible to create a long path using 
one application yet inaccessible to another. It is sometimes possible to 
reduce the path length by creating a drive mapping using subst and 
accessing files via that drive. As of Windows 10 version 1607 and R 4.3, 
one can remove this limit via Windows registry by setting 
Computer\HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\FileSystem\LongPathsEnabled 
to 1. Long paths still may not always work reliably: some applications 
or packages may not be able to work with them and Windows cannot execute 
an application with long path as the current directory.


  I'm having trouble finding the specific check for path lengths > 100 
in the R source tree.  It would be helpful to have the exact wording of 
the NOTE/WARNING (?) that is thrown ...  (I know I could make my own 
mini-package with a long path length in it somewhere but ...)


  cheers
   Ben Bolker

On 2023-12-12 2:57 p.m., Simon Urbanek wrote:

Justin,

now that you clarified what you are actually talking about, this is a question 
about the CRAN policies, so you should really direct it to the CRAN team as it 
is their decision (R-devel would be appropriate if this was a limitation in R 
itself, and R-package-devel would be appropriate if you wanted help with 
refactoring to adhere to the policy). There are still path limits on various 
platforms (even if they are becoming more rare), so I'd personally question the 
source rather than the policy, but then your email was remarkably devoid of any 
details.

Cheers,
Simon



On Dec 13, 2023, at 6:03 AM, McGrath, Justin M  wrote:

When submitting a package to CRAN, it is required that path names be shorter 
than 100 bytes, with the reason that paths longer than that cannot be made into 
portable tar files. This error is reported by `R CMD check --as-cran`. Since 
this pertains only to developing packages, this seemed like the appropriate 
list, but if you don't think so, I can instead ask on R-devel.

Best wishes,
Justin


From: Martin Maechler 
Sent: Tuesday, December 12, 2023 10:13 AM
To: McGrath, Justin M
Cc: r-package-devel@r-project.org
Subject: Wrong mailing list: [R-pkg-devel] Could the 100 byte path length limit 
be lifted?


McGrath, Justin M
on Tue, 12 Dec 2023 15:03:28 + writes:



We include other software in our source code. It has some long paths so a few 
of the files end up with paths longer than 100 bytes, and we need to manually 
rename them whenever we pull in updates.
The 100 byte path limit is from tar v7, and since
POSIX1.1988, there has not been a path length limit. That
standard is 35 years old now, so given that there is
probably no one using an old version of tar that also
wants to use the latest version of R, could the 100 byte
limit be lifted? Incidentally, I am a big proponent of
wide, long-term support, but it's hard to see that this
change would negatively impact anyone.



Best wishes,
Justin


Wrong mailing list:

This is a topic for R-devel,  not at all R-package-devel,
but be more accurate in what you are talking about, only between
the line I could read that it is about some variants of using
'tar'.

Best regards,
Martin
---

Martin Maechler
ETH Zurich  and  R Core team

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel



__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] Wrong mailing list: Could the 100 byte path length limit be lifted?

2023-12-12 Thread Simon Urbanek
Justin,

now that you clarified what you are actually talking about, this is a question 
about the CRAN policies, so you should really direct it to the CRAN team as it 
is their decision (R-devel would be appropriate if this was a limitation in R 
itself, and R-package-devel would be appropriate if you wanted help with 
refactoring to adhere to the policy). There are still path limits on various 
platforms (even if they are becoming more rare), so I'd personally question the 
source rather than the policy, but then your email was remarkably devoid of any 
details.

Cheers,
Simon


> On Dec 13, 2023, at 6:03 AM, McGrath, Justin M  wrote:
> 
> When submitting a package to CRAN, it is required that path names be shorter 
> than 100 bytes, with the reason that paths longer than that cannot be made 
> into portable tar files. This error is reported by `R CMD check --as-cran`. 
> Since this pertains only to developing packages, this seemed like the 
> appropriate list, but if you don't think so, I can instead ask on R-devel.
> 
> Best wishes,
> Justin
> 
> 
> From: Martin Maechler 
> Sent: Tuesday, December 12, 2023 10:13 AM
> To: McGrath, Justin M
> Cc: r-package-devel@r-project.org
> Subject: Wrong mailing list: [R-pkg-devel] Could the 100 byte path length 
> limit be lifted?
> 
>> McGrath, Justin M
>>on Tue, 12 Dec 2023 15:03:28 + writes:
> 
>> We include other software in our source code. It has some long paths so a 
>> few of the files end up with paths longer than 100 bytes, and we need to 
>> manually rename them whenever we pull in updates.
>> The 100 byte path limit is from tar v7, and since
>> POSIX1.1988, there has not been a path length limit. That
>> standard is 35 years old now, so given that there is
>> probably no one using an old version of tar that also
>> wants to use the latest version of R, could the 100 byte
>> limit be lifted? Incidentally, I am a big proponent of
>> wide, long-term support, but it's hard to see that this
>> change would negatively impact anyone.
> 
>> Best wishes,
>> Justin
> 
> Wrong mailing list:
> 
> This is a topic for R-devel,  not at all R-package-devel,
> but be more accurate in what you are talking about, only between
> the line I could read that it is about some variants of using
> 'tar'.
> 
> Best regards,
> Martin
> ---
> 
> Martin Maechler
> ETH Zurich  and  R Core team
> 
> __
> R-package-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-package-devel
> 

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [Bioc-devel] Missing CHM13v2.0 TxDB and OrgDb objects

2023-12-12 Thread Hervé Pagès
FWIW I've documented the process of making a TxDb object for 
T2T-CHM13v2.0 there:

https://github.com/Bioconductor/GenomicFeatures/issues/65

Please comment there for any follow-up.

Note that we're considering wrapping this is an TxDb package that we'll 
make available to the community. It's a work-in-progress.

Thanks!

H.

On 12/12/23 07:29, James W. MacDonald wrote:
> Hi Christian,
>
> This conversation is off-topic, both for this listserv (it’s meant to help 
> people developing Bioconductor packages) and for the support site (which is 
> meant to help people with (again), Bioconductor packages. I’ll answer your 
> questions one more time, but if you have other questions, please move to 
> biostars.org, or just ask the ArchR people directly, since it’s their package.
>
> I believe you are misinterpreting what an OrgDb is intended to provide. There 
> is no positional data in an OrgDb, and what the CHM13 project has done is 
> completely positional (what data are provided in the ‘Gene Annotation’ 
> section of the CHM13 Github are all GFF files, which are meant to provide 
> positional information of genes on a genome).
>
> The OrgDb package provides functional and within-annotation mappings. You can 
> map an NCBI Gene ID to Ensembl, or to the HGNC gene symbol, or a GO term, 
> etc. For example, I can map Gene symbol P53 to NCBI Gene ID 7157, or its 
> UniProt symbol K7PPA8. If the new genome build says P53 has moved to a new 
> genomic position, that has no affect on what UniProt thinks the ID for that 
> gene’s protein should be, or what ID NCBI uses, or what GO terms are appended 
> to that gene. Functionally it’s the same gene. We just might think it is 
> located in a different place in the genome.
>
> The difference between CHM13 and GRCh38 is not materially different from the 
> difference between GRCh37 and GRCh38 (they represent the current knowledge of 
> the genome at a point in time), and while we supply TxDb packages for GRCh38 
> and GRCh37 (and variants based on NCBI’s mappings as well as Ensembl’s 
> mappings), we have never supplied more than one human OrgDb package, because 
> the positional and functional information are orthogonal.
>
> It seems pretty simple to make what you need though.
>
>> library(GenomicAlignments)
>> tx <- 
>> makeTxDbFromGFF(https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/009/914/755/GCF_009914755.1_T2T-CHM13v2.0/GCF_009914755.1_T2T-CHM13v2.0_genomic.gff.gz)
> Import genomic features from the file as a GRanges object ... trying URL 
> 'https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/009/914/755/GCF_009914755.1_T2T-CHM13v2.0/GCF_009914755.1_T2T-CHM13v2.0_genomic.gff.gz'
> Content type 'application/x-gzip' length 79009538 bytes (75.3 MB)
> downloaded 75.3 MB
>
> OK
> Prepare the 'metadata' data frame ... OK
> Make the TxDb object ... OK
> Warning messages:
> 1: In .extract_transcripts_from_GRanges(tx_IDX, gr, mcols0$type, mcols0$ID,  :
>some transcripts have no
>"transcript_id" attribute ==>
>their name ("tx_name" column in
>the TxDb object) was set to NA
> 2: In .extract_transcripts_from_GRanges(tx_IDX, gr, mcols0$type, mcols0$ID,  :
>the transcript names ("tx_name"
>column in the TxDb object)
>imported from the
>"transcript_id" attribute are
>not unique
> 3: In .find_exon_cds(exons, cds) : The following transcripts have
>exons that contain more than one
>CDS (only the first CDS was kept
>for each exon):
>rna-NM_001134939.1,
>rna-NM_001172437.2,
>rna-NM_001184961.1,
>rna-NM_001301020.1,
>rna-NM_001301302.1,
>rna-NM_001301371.1,
>rna-NM_002537.3,
>rna-NM_004152.3,
>rna-NM_015068.3, rna-NM_016178.2
>> tx
> TxDb object:
> # Db type: TxDb
> # Supporting package: GenomicFeatures
> # Data 
> source:https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/009/914/755/GCF_009914755.1_T2T-CHM13v2.0/GCF_009914755.1_T2T-CHM13v2.0_genomic.gff.gz
> # Organism: NA
> # Taxonomy ID: NA
> # miRBase build ID: NA
> # Genome: NA
> # Nb of transcripts: 188205
> # Db created by: GenomicFeatures package from Bioconductor
> # Creation time: 2023-12-12 10:17:34 -0500 (Tue, 12 Dec 2023)
> # GenomicFeatures version at creation time: 1.54.1
> # RSQLite version at creation time: 2.3.1
> # DBSCHEMAVERSION: 1.2
>
> genomeAnnotation <- 
> createGenomeAnnotation(BSgenome.Hsapiens.NCBI.T2T.CHM13v2.0)
> geneAnnotation <- createGeneAnnotation(TxDb = tx, OrgDb = org.Hs.eg.db)
>
>
> Best,
>
> Jim
>
> From: Christian Arnold
> Sent: Tuesday, December 12, 2023 9:35 AM
> To: Vincent Carey; James W. 
> MacDonald
> Cc:bioc-devel@r-project.org
> Subject: Re: [Bioc-devel] Missing CHM13v2.0 TxDB and OrgDb objects
>
> Dear Vincent and others, thanks for the reply! Irrespective of whether a 
> different OrgDb is required, the name itself suggested that there "should be" 
> also corresponding OrgDb and TxDb packages. I can build one on my own, I see, 
> is there anyone
> ZjQcmQRYFpfptBannerStart
> This Message Is From an Untrusted Sender
> 

Re: [R-pkg-devel] Wrong mailing list: Could the 100 byte path length limit be lifted?

2023-12-12 Thread McGrath, Justin M
When submitting a package to CRAN, it is required that path names be shorter 
than 100 bytes, with the reason that paths longer than that cannot be made into 
portable tar files. This error is reported by `R CMD check --as-cran`. Since 
this pertains only to developing packages, this seemed like the appropriate 
list, but if you don't think so, I can instead ask on R-devel.

Best wishes,
Justin


From: Martin Maechler 
Sent: Tuesday, December 12, 2023 10:13 AM
To: McGrath, Justin M
Cc: r-package-devel@r-project.org
Subject: Wrong mailing list: [R-pkg-devel] Could the 100 byte path length limit 
be lifted?

> McGrath, Justin M
> on Tue, 12 Dec 2023 15:03:28 + writes:

> We include other software in our source code. It has some long paths so a 
few of the files end up with paths longer than 100 bytes, and we need to 
manually rename them whenever we pull in updates.
> The 100 byte path limit is from tar v7, and since
> POSIX1.1988, there has not been a path length limit. That
> standard is 35 years old now, so given that there is
> probably no one using an old version of tar that also
> wants to use the latest version of R, could the 100 byte
> limit be lifted? Incidentally, I am a big proponent of
> wide, long-term support, but it's hard to see that this
> change would negatively impact anyone.

> Best wishes,
> Justin

Wrong mailing list:

This is a topic for R-devel,  not at all R-package-devel,
but be more accurate in what you are talking about, only between
the line I could read that it is about some variants of using
'tar'.

Best regards,
Martin
---

Martin Maechler
ETH Zurich  and  R Core team

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


[R-pkg-devel] Wrong mailing list: Could the 100 byte path length limit be lifted?

2023-12-12 Thread Martin Maechler
> McGrath, Justin M 
> on Tue, 12 Dec 2023 15:03:28 + writes:

> We include other software in our source code. It has some long paths so a 
few of the files end up with paths longer than 100 bytes, and we need to 
manually rename them whenever we pull in updates.
> The 100 byte path limit is from tar v7, and since
> POSIX1.1988, there has not been a path length limit. That
> standard is 35 years old now, so given that there is
> probably no one using an old version of tar that also
> wants to use the latest version of R, could the 100 byte
> limit be lifted? Incidentally, I am a big proponent of
> wide, long-term support, but it's hard to see that this
> change would negatively impact anyone.

> Best wishes,
> Justin

Wrong mailing list:

This is a topic for R-devel,  not at all R-package-devel,
but be more accurate in what you are talking about, only between
the line I could read that it is about some variants of using
'tar'.

Best regards,
Martin
---

Martin Maechler
ETH Zurich  and  R Core team

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [Bioc-devel] Missing CHM13v2.0 TxDB and OrgDb objects

2023-12-12 Thread James W. MacDonald
Hi Christian,

This conversation is off-topic, both for this listserv (it’s meant to help 
people developing Bioconductor packages) and for the support site (which is 
meant to help people with (again), Bioconductor packages. I’ll answer your 
questions one more time, but if you have other questions, please move to 
biostars.org, or just ask the ArchR people directly, since it’s their package.

I believe you are misinterpreting what an OrgDb is intended to provide. There 
is no positional data in an OrgDb, and what the CHM13 project has done is 
completely positional (what data are provided in the ‘Gene Annotation’ section 
of the CHM13 Github are all GFF files, which are meant to provide positional 
information of genes on a genome).

The OrgDb package provides functional and within-annotation mappings. You can 
map an NCBI Gene ID to Ensembl, or to the HGNC gene symbol, or a GO term, etc. 
For example, I can map Gene symbol P53 to NCBI Gene ID 7157, or its UniProt 
symbol K7PPA8. If the new genome build says P53 has moved to a new genomic 
position, that has no affect on what UniProt thinks the ID for that gene’s 
protein should be, or what ID NCBI uses, or what GO terms are appended to that 
gene. Functionally it’s the same gene. We just might think it is located in a 
different place in the genome.

The difference between CHM13 and GRCh38 is not materially different from the 
difference between GRCh37 and GRCh38 (they represent the current knowledge of 
the genome at a point in time), and while we supply TxDb packages for GRCh38 
and GRCh37 (and variants based on NCBI’s mappings as well as Ensembl’s 
mappings), we have never supplied more than one human OrgDb package, because 
the positional and functional information are orthogonal.

It seems pretty simple to make what you need though.

> library(GenomicAlignments)
> tx <- 
> makeTxDbFromGFF(https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/009/914/755/GCF_009914755.1_T2T-CHM13v2.0/GCF_009914755.1_T2T-CHM13v2.0_genomic.gff.gz)
Import genomic features from the file as a GRanges object ... trying URL 
'https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/009/914/755/GCF_009914755.1_T2T-CHM13v2.0/GCF_009914755.1_T2T-CHM13v2.0_genomic.gff.gz'
Content type 'application/x-gzip' length 79009538 bytes (75.3 MB)
downloaded 75.3 MB

OK
Prepare the 'metadata' data frame ... OK
Make the TxDb object ... OK
Warning messages:
1: In .extract_transcripts_from_GRanges(tx_IDX, gr, mcols0$type, mcols0$ID,  :
  some transcripts have no
  "transcript_id" attribute ==>
  their name ("tx_name" column in
  the TxDb object) was set to NA
2: In .extract_transcripts_from_GRanges(tx_IDX, gr, mcols0$type, mcols0$ID,  :
  the transcript names ("tx_name"
  column in the TxDb object)
  imported from the
  "transcript_id" attribute are
  not unique
3: In .find_exon_cds(exons, cds) : The following transcripts have
  exons that contain more than one
  CDS (only the first CDS was kept
  for each exon):
  rna-NM_001134939.1,
  rna-NM_001172437.2,
  rna-NM_001184961.1,
  rna-NM_001301020.1,
  rna-NM_001301302.1,
  rna-NM_001301371.1,
  rna-NM_002537.3,
  rna-NM_004152.3,
  rna-NM_015068.3, rna-NM_016178.2
> tx
TxDb object:
# Db type: TxDb
# Supporting package: GenomicFeatures
# Data source: 
https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/009/914/755/GCF_009914755.1_T2T-CHM13v2.0/GCF_009914755.1_T2T-CHM13v2.0_genomic.gff.gz
# Organism: NA
# Taxonomy ID: NA
# miRBase build ID: NA
# Genome: NA
# Nb of transcripts: 188205
# Db created by: GenomicFeatures package from Bioconductor
# Creation time: 2023-12-12 10:17:34 -0500 (Tue, 12 Dec 2023)
# GenomicFeatures version at creation time: 1.54.1
# RSQLite version at creation time: 2.3.1
# DBSCHEMAVERSION: 1.2

genomeAnnotation <- createGenomeAnnotation(BSgenome.Hsapiens.NCBI.T2T.CHM13v2.0)
geneAnnotation <- createGeneAnnotation(TxDb = tx, OrgDb = org.Hs.eg.db)


Best,

Jim

From: Christian Arnold 
Sent: Tuesday, December 12, 2023 9:35 AM
To: Vincent Carey ; James W. MacDonald 

Cc: bioc-devel@r-project.org
Subject: Re: [Bioc-devel] Missing CHM13v2.0 TxDB and OrgDb objects

Dear Vincent and others, thanks for the reply! Irrespective of whether a 
different OrgDb is required, the name itself suggested that there "should be" 
also corresponding OrgDb and TxDb packages. I can build one on my own, I see, 
is there anyone
ZjQcmQRYFpfptBannerStart
This Message Is From an Untrusted Sender
You have not previously corresponded with this sender.
See https://itconnect.uw.edu/email-tags for additional information. Please 
contact the UW-IT Service Center, h...@uw.edu 206.221.5000, 
for assistance.
ZjQcmQRYFpfptBannerEnd

Dear Vincent and others,

thanks for the reply! Irrespective of whether a different OrgDb is required, 
the name itself suggested that there "should be" also corresponding OrgDb and 
TxDb packages. I can build one on my own, I see, is there anyone who works on 
providing the TxDB object for Bioc?

I am also asking this because the T2T 

[R-pkg-devel] Could the 100 byte path length limit be lifted?

2023-12-12 Thread McGrath, Justin M
We include other software in our source code. It has some long paths so a few 
of the files end up with paths longer than 100 bytes, and we need to manually 
rename them whenever we pull in updates.

The 100 byte path limit is from tar v7, and since POSIX1.1988, there has not 
been a path length limit. That standard is 35 years old now, so given that 
there is probably no one using an old version of tar that also wants to use the 
latest version of R, could the 100 byte limit be lifted? Incidentally, I am a 
big proponent of wide, long-term support, but it's hard to see that this change 
would negatively impact anyone.

Best wishes,
Justin
__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


[Bioc-devel] Wrong skipping of tests when builidng on Bioconductor and R CMD check timeout

2023-12-12 Thread Jacopo Ronchi
Dear Developers,

I am currently in the process of submitting my package on Bioconductor and
I am facing some issues during the R CMD check on the Bioconductor Build
System. Since I was not able to find any answers to my doubts, I decided to
ask for your help before doing anything wrong.

The build report for my package is available here:
http://bioconductor.org/spb_reports/MIRit_buildreport_20231211095232.html

In particular, my package includes some functions where it accesses remote
resources. Therefore, I included some "skip_on_bioc()" chunks at the
beginning of these tests since I don't want my package to fail during the
build process because of occasional down times. However, when I look at the
build report, I notice that the relevant tests are not skipped.
Furthermore, other tests that should be run are instead skipped on CRAN. I
am referring to these lines:

Skipped tests (2)
   On CRAN (2): 'test-topological-integration.R:23:5', 'test-utils.R:20:5'

Lastly, I have an error during R CMD check on macOS, and I really don't
know how to reduce the running time on this operating system. Currently, I
have reshaped the testing suite to reduce the time spent on unit tests.
However, on macOS, i guess that most of the time consumed is due to
examples. Nevertheless, the most time consuming functions retrieve
gene-sets from external resources and I can't reduce the download size of
KEGG pathways, for example. What should I do?

Sorry again for bothering you,
Best regards,
Jacopo

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Missing CHM13v2.0 TxDB and OrgDb objects

2023-12-12 Thread Christian Arnold via Bioc-devel
Dear Vincent and others,

thanks for the reply! Irrespective of whether a different OrgDb is
required, the name itself suggested that there "should be" also
corresponding OrgDb and TxDb packages. I can build one on my own, I see,
is there anyone who works on providing the TxDB object for Bioc?

I am also asking this because the T2T people specifically provide an
"updated" gene annotation dataset which may differ from what's inside
OrgDb and may be incompatible with? See here:
https://github.com/marbl/CHM13:

/JHU RefSeqv110 + Liftoff v5.1
:
This containscuratedannotations of the ampliconic genes on the Y
chromosome, correcting annotation errors in GENCODEv35 CAT/Liftoff and
RefSeqv110 annotation. Additional copies found in T2T-Y were annotated
to the closest available gene in RefSeq, allowing multiple genes to have
the same common name. This file has been modified to correct special
character issues from the original file.
/

/
/

For ArchR, I tried to understand how one can create a new genome by
checking here:
https://www.archrproject.com/bookdown/getting-set-up.html. There, they
explicitly mention the TxDb and OrgDb objects that are needed for
building a custom genome. There seems to be another option when both or
any of these 2 is not available ("Alternatively, if you dont have
a|TxDb|and|OrgDb|object, you can create a|geneAnnotation|object from the
following information" ), but I first tried to do it the easy way as I
want to properly embed it in a pipeline with as little "custom" code as
possible.


Thanks,
Christian




On 11/12/2023 15:30, Vincent Carey wrote:
> Thanks Jim, I tend to agree with you.  Christian, I had a look at
> ArchR but could not tell where the
> system contacts the Bioc annotation elements.  Can you give some
> hints?  I'd like to be able to
> verify compatibility.
>
> On Mon, Dec 11, 2023 at 9:19 AM James W. MacDonald  wrote:
>
> I don't believe a different OrgDb is required. The OrgDb package
> is meant to provide annotations for genes such as gene symbol or
> GO term, etc, which are orthogonal to the sequence of the genome,
> so the current version should suffice.
>
> -Original Message-
> From: Bioc-devel  On Behalf Of
> Vincent Carey
> Sent: Sunday, December 10, 2023 1:44 PM
> To: Christian Arnold 
> Cc: bioc-devel@r-project.org
> Subject: Re: [Bioc-devel] Missing CHM13v2.0 TxDB and OrgDb objects
>
> Good question.  I believe these will be forthcoming soon.  In the
> mean time you can create your own.  See, for example
>
> 
> https://urldefense.com/v3/__https://github.com/vjcitn/BiocT2T/blob/devel/inst/scripts/makeTxDb.R__;!!K-Hz7m0Vt54!ixhBX1kJeZc-9e3gcVgd5OOsvXj8vYfmUZphWadsaXZmdIMiLYcLZEGkJmZhkFTxT-wXY5c_hr0C9adMcpWaIEw$
>
>
> It's an active area so you can pull a gff file from
> 
> https://urldefense.com/v3/__https://s3-us-west-2.amazonaws.com/human-pangenomics/index.html?prefix=T2T*CHM13*assemblies*annotation*__;Ly8vLw!!K-Hz7m0Vt54!ixhBX1kJeZc-9e3gcVgd5OOsvXj8vYfmUZphWadsaXZmdIMiLYcLZEGkJmZhkFTxT-wXY5c_hr0C9adM7PNUeks$
> and adjust the code noted above for the TxDb.
>
> For the org.db I have to get back to you.
>
> On Sun, Dec 10, 2023 at 12:06 PM Christian Arnold via Bioc-devel <
> bioc-devel@r-project.org> wrote:
>
> > Hello, I am working with the new human T2T-CHM13v2.0 assembly and
> > while a BSgenome package already exists
> > (BSgenome.Hsapiens.NCBI.T2T.CHM13v2.0), I could not find the
> > corresponding TxDb and OrgDb packages. Is there any information
> when
> > they may also become available so it is easier to work with the new
> > genome for packages like ArchR, which support a custom genome
> but need
> > these standard annotation packages for their creation?
> >
> >
> > Thanks a lot for any information regarding this!
> >
> > Best, Christian
> >
> > ___
> > Bioc-devel@r-project.org mailing list
> >
> https://urldefense.com/v3/__https://stat.ethz.ch/mailman/listinfo/bioc
> >
> -devel__;!!K-Hz7m0Vt54!ixhBX1kJeZc-9e3gcVgd5OOsvXj8vYfmUZphWadsaXZmdIM
> > iLYcLZEGkJmZhkFTxT-wXY5c_hr0C9adMOtbUwTc$
> >
>
> --
> The information in this e-mail is intended only for the
> ...{{dropped:18}}
>
> ___
> Bioc-devel@r-project.org mailing list
> 
> https://urldefense.com/v3/__https://stat.ethz.ch/mailman/listinfo/bioc-devel__;!!K-Hz7m0Vt54!ixhBX1kJeZc-9e3gcVgd5OOsvXj8vYfmUZphWadsaXZmdIMiLYcLZEGkJmZhkFTxT-wXY5c_hr0C9adMOtbUwTc$
>
>
>
> The information in this e-mail is intended only for th...{{dropped:16}}

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [R-pkg-devel] Fortran compilation issues (errors/warnings) on Fedora clang/llvm

2023-12-12 Thread Koen Hufkens
Dear Ivan,

Thanks for these pointers!
I think with your script, reading through the docs and some other packages
structures I sort of made sense of things (the dynamic configure options
are new to me).
I hope this will now resolve the system specific dependencies.

Appreciate the help!
Cheers,
K

On Mon, 11 Dec 2023 at 12:58, Ivan Krylov  wrote:

> В Mon, 11 Dec 2023 10:02:14 +
> Koen Hufkens  пишет:
>
> > error:
> >
> loc("/data/gannet/ripley/R/packages/incoming/rsofun.Rcheck/00_pkg_src/rsofun/src/interface_biosphere_biomee.mod.f90":105:62):
> >
> /data/gannet/ripley/Sources2/LLVM/17.0/llvm-project-17.0.3.src/flang/lib/Lower/ConvertType.cpp:392:
> > not yet implemented: derived type components with non default lower
> > bounds
>
> Experience shows that reporting bugs in flang-new may get them
> acknowledged but not fixed [1]. Have you tried detecting the flang-new
> compiler from the ./configure script and only adding the workaround
> flag if the compiler matches? Something like the following:
>
> #!/bin/sh
> # taken from Writing R Extensions, 1.2. Configure and cleanup
> : ${R_HOME=`R RHOME`}
> if test -z "${R_HOME}"; then
>   echo "could not determine R_HOME"
>   exit 1
> fi
> # determine the Fortran 9x compiler
> FC="`"${R_HOME}/bin/R" CMD config FC`"
> # Use --version output to determine the compiler
> # A different compiler will either accept --version and print something
> # else or fail due to "unknown argument". In both cases the branch will
> # not be taken
> if "$FC" --version 2>/dev/null | grep -q 'flang-new version 17'; then
>  echo "PKG_FCFLAGS = `"${R_HOME}/bin/R" CMD config FCFLAGS`" \
>   " -fc-prototypes-external" >>src/Makevars
> fi
>
> You will still get an "unsupported flag" warning on machines with
> flang-new, but at least it won't warn with standards-compliant
> compilers or crash with flang-new. Unfortunately, I don't know whether
> this is acceptable for CRAN, sorry.
>
> Good luck!
>
> --
> Best regards,
> Ivan
>
> [1] https://stat.ethz.ch/pipermail/r-package-devel/2023q4/009987.html
>


-- 
Koen Hufkens, Ph.D.

Senior Scientist
Geocomputation and Earth Observation
Institute of Geography
University of Bern

founder BlueGreen Labs
bluegreenlabs.org
@koen_hufkens@mastodon.social

[[alternative HTML version deleted]]

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [Rd] Partial matching performance in data frame rownames using [

2023-12-12 Thread Ivan Krylov
В Mon, 11 Dec 2023 21:11:48 +0100
Hilmar Berger via R-devel  пишет:

> What was unexpected is that in this case was that [.data.frame was
> hanging for a long time (I waited about 10 minutes and then restarted
> R). Also, this cannot be interrupted in interactive mode.

That's unfortunate. If an operation takes a long time, it ought to be
interruptible. Here's a patch that passes make check-devel:

--- src/main/unique.c   (revision 85667)
+++ src/main/unique.c   (working copy)
@@ -1631,6 +1631,7 @@
}
 }
 
+unsigned int ic = ;
 if(nexact < n_input) {
/* Second pass, partial matching */
for (R_xlen_t i = 0; i < n_input; i++) {
@@ -1642,6 +1643,10 @@
mtch = 0;
mtch_count = 0;
for (int j = 0; j < n_target; j++) {
+   if (!--ic) {
+   R_CheckUserInterrupt();
+   ic = ;
+   }
if (no_dups && used[j]) continue;
if (strncmp(ss, tar[j], temp) == 0) {
mtch = j + 1;

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [R-pkg-devel] vignette with "Run Examples"

2023-12-12 Thread Duncan Murdoch

On 12/12/2023 2:24 a.m., Sigbert Klinke wrote:

Hi,

is it possible to get a button or link to run an example in a vignette
like we see for the examples in the R help?



Others have explained why this is hard.  An alternative might be to run 
the examples when you produce your vignette, but hide the results until 
a button is pressed to display them.


Duncan Murdoch

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] vignette with "Run Examples"

2023-12-12 Thread Ivan Krylov
On Tue, 12 Dec 2023 08:24:11 +0100
Sigbert Klinke  wrote:

> is it possible to get a button or link to run an example in a vignette

Technically, yes, but very hard to implement in practice.

Vignettes are a form of literate programming, expressed in terms of
files: there's a source file containing code mixed with prose, and
there are two programs, one of which extracts the code into a runnable
.R file and the other renders the code together with prose and any
resulting plots into a human-readable document. A link to run examples
implies that there's R running somewhere, which cannot be guaranteed by
the time the human-readable document is opened by the human.

One way around this problem would be to embed a copy of webR [*] in the
document so that R would run in the browser. This involves a
significant developer effort and would either bloat your vignette to
the size of an R installation or make it depend on external resources
to load webR from (that could go away or spy on the user). webR is
still experimental; last time I tried it, it crashed the browser tab
when I invoked functions from the quadprog package.

Another way would be to add a hack to the vignette engine to start a
server at vignette rendering time, insert the link to this server into
the vignette as it's being rendered and hope that the server is still
running by the time the vignette is opened. This would require the user
to re-render the vignette every time they restart the server.

Technically, one could also invent a completely new kind of vignette
engine that would output self-contained executable files with a
document rendering engine and R built in, so that a click on the "Run
examples" would use that built-in R. This is basically the webR
solution without the web and with a lot of extra pain.

You could also fake some of it by writing extra JavaScript (with the
help of third-party statistics libraries, e.g. [**]) to do the same
thing in the browser as is done in R, but that's still a lot of work
for little benefit.

Yet another way would be to make these links point to an external
service somewhere on the Internet that would run the R code. Since R is
not designed to work with untrusted input (not to mention untrusted
users entering code), that would be an informational security nightmare
both on your side (R would have to run in locked-down read-only
disposable virtual machines hardened against sandbox escape and
privilege escalation exploits) and on the GDPR side of things.

There are doubtlessly more approaches, but I think they would all be
this convoluted or worse.

-- 
Best regards,
Ivan

[*] https://docs.r-wasm.org/webr/latest/

[**] https://github.com/svkucheryavski/mdatools-js

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel