[Bioc-devel] Trying to fix package failing on 3.17

2022-12-16 Thread Jonathon Hill
Hi,

I received a notice that my package, sangerseqR, is failing during the nightly 
build process for the devel branch (3.17) of bioconductor.

I do not see an error when I use the package in the current build, so it is 
something specific to the new build.

I have been trying to recreate the error. I downloaded and installed R 4.3 and 
installed the Bioconductor devel packages. However, when I try to build my 
package locally, I get the following error:


==> R CMD INSTALL --preclean --no-multiarch --with-keep.source sangerseqR

* installing to library 
‘/Library/Frameworks/R.framework/Versions/4.3/Resources/library’
* installing *source* package ‘sangerseqR’ ...
** using staged installation
** R
** inst
** byte-compile and prepare package for lazy loading

 *** caught segfault ***
address 0x18, cause 'memory not mapped'

Traceback:
 1: dyn.load(file, DLLpath = DLLpath, ...)
 2: library.dynam(lib, package, package.lib)
 3: loadNamespace(package, lib.loc)
 4: doTryCatch(return(expr), name, parentenv, handler)
 5: tryCatchOne(expr, names, parentenv, handlers[[1L]])
 6: tryCatchList(expr, classes, parentenv, handlers)
 7: tryCatch({attr(package, "LibPath") <- which.lib.locns <- 
loadNamespace(package, lib.loc)env <- attachNamespace(ns, pos = pos, deps, 
exclude, include.only)}, error = function(e) {P <- if (!is.null(cc <- 
conditionCall(e))) paste(" in", deparse(cc)[1L])else ""msg <- 
gettextf("package or namespace load failed for %s%s:\n %s", 
sQuote(package), P, conditionMessage(e))if (logical.return && !quietly) 
message(paste("Error:", msg), domain = NA)else stop(msg, call. = FALSE, 
domain = NA)})
 8: library(pkg, character.only = TRUE, logical.return = TRUE, lib.loc = 
lib.loc, quietly = quietly)
 9: .getRequiredPackages2(pkgInfo, quietly = quietly, lib.loc = c(lib.loc, 
.libPaths()))
10: library(pkg, character.only = TRUE, logical.return = TRUE, lib.loc = 
lib.loc, quietly = quietly)
11: .getRequiredPackages2(pkgInfo, quietly, lib.loc, useImports)
12: .getRequiredPackages(quietly = TRUE)
13: withCallingHandlers(expr, packageStartupMessage = function(c) 
tryInvokeRestart("muffleMessage"))
14: suppressPackageStartupMessages(.getRequiredPackages(quietly = TRUE))
An irrecoverable exception occurred. R is aborting now ...
sh: line 1:  6601 Segmentation fault: 11  R_TESTS= 
'/Library/Frameworks/R.framework/Resources/bin/R' --no-save --no-restore 
--no-echo 2>&1 < 
'/var/folders/1w/yqm9vhw952s4dbb2g9vtr_6rgp/T//RtmpRo37lg/file19c36696d396'
ERROR: lazy loading failed for package ‘sangerseqR’
* removing 
‘/Library/Frameworks/R.framework/Versions/4.3/Resources/library/sangerseqR’

Exited with status 1.


Furthermore, when I try to install my package through bioconductor, I get:

Warning message:
package ‘sangerseqR’ is not available for Bioconductor version '3.17'

A version of this package for your version of R might be available elsewhere,
see the ideas at
https://cran.r-project.org/doc/manuals/r-devel/R-admin.html#Installing-packages

Any suggestions why I can’t load it?

Thanks,

Jonathon

_

Jonathon T. Hill, PhD
Associate Professor
Cell Biology and Physiology
Brigham Young University
801-422-8970
Lab: LSB 3035
Office: LSB 3018
Email: jh...@byu.edu

CONFIDENTIALITY:
This e-mail and any attachments are confidential and may be privileged. If you 
are not a named recipient, please notify the sender immediately and do not 
disclose the content to another person, use it for any purpose, or store or 
copy the information in any medium.


[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


[Bioc-devel] I don't have access to my package

2021-05-25 Thread Jonathon Hill
Hi,

I am the listed maintainer for the sangerseqR package. However, I am unable to 
access it to push a small bug fix. I am the owner of the GitHub repository and 
the maintainer listed on the bioconductor website/DESCRIPTION file. However, 
when I try to push the change to upstream, I get the following error:

FATAL: W any packages/sangerseqR kjohnsen DENIED by fallthru
(or you mis-spelled the reponame)
fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.


The username listed is a former student, but they never should have been listed 
as the maintainer for this package (they are on another package we made). How 
can I get access restored?

Thanks,

Jonathon Hill

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


[Bioc-devel] Problem with ensemble-VEP install on merida1

2019-10-07 Thread Jonathon Hill
Hi all,

My package passes all tests on linux, but fails on the OS X check (exact error 
is below). It appears to be a problem with the VEP perl modules on the server. 
Specifically, it is complaining that Bio::DB::HTS::Tabix is not installed. Is 
there something I can do or does it need to be fixed on the server?

Thanks,

Jonathon Hill


##
##
###
### Running command:
###
###   /Library/Frameworks/R.framework/Versions/Current/Resources/bin/R CMD 
build --keep-empty-dirs --no-resave-data MMAPPR2
###
##
##


* checking for file ‘MMAPPR2/DESCRIPTION’ ... OK
* preparing ‘MMAPPR2’:
* checking DESCRIPTION meta-information ... OK
* installing the package to build vignettes
* creating vignettes ... ERROR


 EXCEPTION 
MSG: ERROR: Cannot use format gff without Bio::DB::HTS::Tabix module installed

STACK Bio::EnsEMBL::VEP::AnnotationSource::File::new 
/usr/local/ensembl-vep/modules/Bio/EnsEMBL/VEP/AnnotationSource/File.pm:162
STACK Bio::EnsEMBL::VEP::AnnotationSourceAdaptor::get_all_custom 
/usr/local/ensembl-vep/modules/Bio/EnsEMBL/VEP/AnnotationSourceAdaptor.pm:228
STACK Bio::EnsEMBL::VEP::AnnotationSourceAdaptor::get_all 
/usr/local/ensembl-vep/modules/Bio/EnsEMBL/VEP/AnnotationSourceAdaptor.pm:93
STACK Bio::EnsEMBL::VEP::BaseRunner::get_all_AnnotationSources 
/usr/local/ensembl-vep/modules/Bio/EnsEMBL/VEP/BaseRunner.pm:175
STACK Bio::EnsEMBL::VEP::Runner::init 
/usr/local/ensembl-vep/modules/Bio/EnsEMBL/VEP/Runner.pm:123
STACK Bio::EnsEMBL::VEP::Runner::run 
/usr/local/ensembl-vep/modules/Bio/EnsEMBL/VEP/Runner.pm:194
STACK toplevel /usr/local/ensembl-vep/vep:224
Date (localtime)= Sat Oct  5 23:32:12 2019
Ensembl API version = 98
---

 EXCEPTION 
MSG: ERROR: Cannot use format gff without Bio::DB::HTS::Tabix module installed

STACK Bio::EnsEMBL::VEP::AnnotationSource::File::new 
/usr/local/ensembl-vep/modules/Bio/EnsEMBL/VEP/AnnotationSource/File.pm:162
STACK Bio::EnsEMBL::VEP::AnnotationSourceAdaptor::get_all_custom 
/usr/local/ensembl-vep/modules/Bio/EnsEMBL/VEP/AnnotationSourceAdaptor.pm:228
STACK Bio::EnsEMBL::VEP::AnnotationSourceAdaptor::get_all 
/usr/local/ensembl-vep/modules/Bio/EnsEMBL/VEP/AnnotationSourceAdaptor.pm:93
STACK Bio::EnsEMBL::VEP::BaseRunner::get_all_AnnotationSources 
/usr/local/ensembl-vep/modules/Bio/EnsEMBL/VEP/BaseRunner.pm:175
STACK Bio::EnsEMBL::VEP::Runner::init 
/usr/local/ensembl-vep/modules/Bio/EnsEMBL/VEP/Runner.pm:123
STACK Bio::EnsEMBL::VEP::Runner::run 
/usr/local/ensembl-vep/modules/Bio/EnsEMBL/VEP/Runner.pm:194
STACK toplevel /usr/local/ensembl-vep/vep:224
Date (localtime)= Sat Oct  5 23:32:27 2019
Ensembl API version = 98
---
Quitting from lines 128-135 (MMAPPR2.Rmd)
Error: processing vignette 'MMAPPR2.Rmd' failed with diagnostics:
file(s) do not exist:
  '/tmp/Rtmpv2ka4U/file11068648095bc'
--- failed re-building ‘MMAPPR2.Rmd’

SUMMARY: processing the following file failed:
  ‘MMAPPR2.Rmd’

Error: Vignette re-building failed.
Execution halted


[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Samtools dependency

2019-08-27 Thread Jonathon Hill
Hi,

I just wanted to check in, as I know we got interrupted by the weekend. Any 
thoughts on the best way forward? 

Thanks,

Jonathon

> On Aug 23, 2019, at 5:00 PM, Jonathon Hill  wrote:
> 
> Yes, gladly. Thank you for taking time to help me. Here is the exact line of 
> R code where we build the samtools command (the file to be tested is added 
> later):
> 
> args <- paste("mpileup -ERI",   #Redo Baq, ignore readgroups, and skip indels
> "-f", refFasta(param),
> "-C 50",
> "--min-MQ", minMapQuality(param),
> "--min-BQ", minBaseQuality(param),
> "--region", as.character(chrRange, ignore.strand=TRUE))
> 
> As you can see, we use the BAQ score to filter. We have tried to implement it 
> without BAQ (using Rsamtools) and found it negatively affected our results.
> 
>> On Aug 23, 2019, at 4:53 PM, Martin Morgan  wrote:
>> 
>> can you provide an example of the samtools command line that you evaluate?
>> 
>> On 8/23/19, 6:11 PM, "Bioc-devel on behalf of Jonathon Hill" 
>>  wrote:
>> 
>>   I had not until today. I spent the afternoon looking at the possibility, 
>> and it looks like it would be beyond my lab’s skills. We do not have anyone 
>> comfortable in C, as we do everything in R. The problem is that we need to 
>> get the results of the mpileup command with BAQ score. Although it has a 
>> pileup command, the Rsamtools implementation does not include the ability to 
>> retrieve the BAQ score as far as we can tell, so we had to fall back on 
>> making a system call to Rsamtools and reading in the results. Using Rhtslib 
>> is intriguing, but it looks like we would need several header files in 
>> Samtools as opposed to htslib and then implement our own C function. Again, 
>> we do not have anyone that could do this. We are scientists, not 
>> programmers. Am I correct on what it would require? Do you know of any other 
>> alternatives? 
>> 
>>   Jonathon
>> 
>>> On Aug 23, 2019, at 12:22 PM, Pages, Herve  wrote:
>>> 
>>> Hi Jonathon,
>>> 
>>> Have you considered depending on Rhtslib? See 
>>> https://bioconductor.org/packages/Rhtslib
>>> 
>>> Rsamtools itself is implemented on top of Rhtslib. Note that other 
>>> Bioconductor packages (e.g. DiffBind, deepSNV, BitSeq, qrqc, QuasR, 
>>> seqbias, TransView, etc...) use Rhtslib internally to implement features 
>>> not implemented in Rsamtools.
>>> 
>>> H.
>>> 
>>> On 8/23/19 11:05, Jonathon Hill wrote:
>>>> Hi,
>>>> 
>>>> I am working through the process of submitting a new package (MMAPPR2). We 
>>>> are having a problem with the build failing, because our package requires 
>>>> Samtools installed. We cannot use Rsamtools, as we depend on features not 
>>>> implemented in the package. How do we resolve the issue? What is the 
>>>> policy for system dependencies? We have samtools listed in the DESCRIPTION 
>>>> and installation instructions in our README, but I am sure that is not 
>>>> enough to get it installed on the Build and Check servers.
>>>> 
>>>> Thanks,
>>>> 
>>>> Jonathon Hill
>>>> 
>>>> ___
>>>> Bioc-devel@r-project.org mailing list
>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel=DwIFAg=eRAMFD45gAfqt84VtBcfhQ=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA=AEKZKMjjFTbu5U_zn0bacvzv69lx_S5s7Yb6dSOXbJs=s5EMLCdAbnqgXWs3_-Sxm52Zuc3pqFirWz7z3ymBruU=
>>>> 
>>> 
>>> -- 
>>> Hervé Pagès
>>> 
>>> Program in Computational Biology
>>> Division of Public Health Sciences
>>> Fred Hutchinson Cancer Research Center
>>> 1100 Fairview Ave. N, M1-B514
>>> P.O. Box 19024
>>> Seattle, WA 98109-1024
>>> 
>>> E-mail: hpa...@fredhutch.org
>>> Phone:  (206) 667-5791
>>> Fax:(206) 667-1319
>> 
>>   ___
>>   Bioc-devel@r-project.org mailing list
>>   https://stat.ethz.ch/mailman/listinfo/bioc-devel
>> 
> 

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Samtools dependency

2019-08-23 Thread Jonathon Hill
Yes, gladly. Thank you for taking time to help me. Here is the exact line of R 
code where we build the samtools command (the file to be tested is added later):

args <- paste("mpileup -ERI",   #Redo Baq, ignore readgroups, and skip indels
 "-f", refFasta(param),
 "-C 50",
 "--min-MQ", minMapQuality(param),
 "--min-BQ", minBaseQuality(param),
 "--region", as.character(chrRange, ignore.strand=TRUE))

As you can see, we use the BAQ score to filter. We have tried to implement it 
without BAQ (using Rsamtools) and found it negatively affected our results.

> On Aug 23, 2019, at 4:53 PM, Martin Morgan  wrote:
> 
> can you provide an example of the samtools command line that you evaluate?
> 
> On 8/23/19, 6:11 PM, "Bioc-devel on behalf of Jonathon Hill" 
>  wrote:
> 
>I had not until today. I spent the afternoon looking at the possibility, 
> and it looks like it would be beyond my lab’s skills. We do not have anyone 
> comfortable in C, as we do everything in R. The problem is that we need to 
> get the results of the mpileup command with BAQ score. Although it has a 
> pileup command, the Rsamtools implementation does not include the ability to 
> retrieve the BAQ score as far as we can tell, so we had to fall back on 
> making a system call to Rsamtools and reading in the results. Using Rhtslib 
> is intriguing, but it looks like we would need several header files in 
> Samtools as opposed to htslib and then implement our own C function. Again, 
> we do not have anyone that could do this. We are scientists, not programmers. 
> Am I correct on what it would require? Do you know of any other alternatives? 
> 
>Jonathon
> 
>> On Aug 23, 2019, at 12:22 PM, Pages, Herve  wrote:
>> 
>> Hi Jonathon,
>> 
>> Have you considered depending on Rhtslib? See 
>> https://bioconductor.org/packages/Rhtslib
>> 
>> Rsamtools itself is implemented on top of Rhtslib. Note that other 
>> Bioconductor packages (e.g. DiffBind, deepSNV, BitSeq, qrqc, QuasR, 
>> seqbias, TransView, etc...) use Rhtslib internally to implement features 
>> not implemented in Rsamtools.
>> 
>> H.
>> 
>> On 8/23/19 11:05, Jonathon Hill wrote:
>>> Hi,
>>> 
>>> I am working through the process of submitting a new package (MMAPPR2). We 
>>> are having a problem with the build failing, because our package requires 
>>> Samtools installed. We cannot use Rsamtools, as we depend on features not 
>>> implemented in the package. How do we resolve the issue? What is the policy 
>>> for system dependencies? We have samtools listed in the DESCRIPTION and 
>>> installation instructions in our README, but I am sure that is not enough 
>>> to get it installed on the Build and Check servers.
>>> 
>>> Thanks,
>>> 
>>> Jonathon Hill
>>> 
>>> ___
>>> Bioc-devel@r-project.org mailing list
>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel=DwIFAg=eRAMFD45gAfqt84VtBcfhQ=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA=AEKZKMjjFTbu5U_zn0bacvzv69lx_S5s7Yb6dSOXbJs=s5EMLCdAbnqgXWs3_-Sxm52Zuc3pqFirWz7z3ymBruU=
>>> 
>> 
>> -- 
>> Hervé Pagès
>> 
>> Program in Computational Biology
>> Division of Public Health Sciences
>> Fred Hutchinson Cancer Research Center
>> 1100 Fairview Ave. N, M1-B514
>> P.O. Box 19024
>> Seattle, WA 98109-1024
>> 
>> E-mail: hpa...@fredhutch.org
>> Phone:  (206) 667-5791
>> Fax:(206) 667-1319
> 
>___
>Bioc-devel@r-project.org mailing list
>https://stat.ethz.ch/mailman/listinfo/bioc-devel
> 

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Samtools dependency

2019-08-23 Thread Jonathon Hill
I had not until today. I spent the afternoon looking at the possibility, and it 
looks like it would be beyond my lab’s skills. We do not have anyone 
comfortable in C, as we do everything in R. The problem is that we need to get 
the results of the mpileup command with BAQ score. Although it has a pileup 
command, the Rsamtools implementation does not include the ability to retrieve 
the BAQ score as far as we can tell, so we had to fall back on making a system 
call to Rsamtools and reading in the results. Using Rhtslib is intriguing, but 
it looks like we would need several header files in Samtools as opposed to 
htslib and then implement our own C function. Again, we do not have anyone that 
could do this. We are scientists, not programmers. Am I correct on what it 
would require? Do you know of any other alternatives? 

Jonathon

> On Aug 23, 2019, at 12:22 PM, Pages, Herve  wrote:
> 
> Hi Jonathon,
> 
> Have you considered depending on Rhtslib? See 
> https://bioconductor.org/packages/Rhtslib
> 
> Rsamtools itself is implemented on top of Rhtslib. Note that other 
> Bioconductor packages (e.g. DiffBind, deepSNV, BitSeq, qrqc, QuasR, 
> seqbias, TransView, etc...) use Rhtslib internally to implement features 
> not implemented in Rsamtools.
> 
> H.
> 
> On 8/23/19 11:05, Jonathon Hill wrote:
>> Hi,
>> 
>> I am working through the process of submitting a new package (MMAPPR2). We 
>> are having a problem with the build failing, because our package requires 
>> Samtools installed. We cannot use Rsamtools, as we depend on features not 
>> implemented in the package. How do we resolve the issue? What is the policy 
>> for system dependencies? We have samtools listed in the DESCRIPTION and 
>> installation instructions in our README, but I am sure that is not enough to 
>> get it installed on the Build and Check servers.
>> 
>> Thanks,
>> 
>> Jonathon Hill
>> 
>> ___
>> Bioc-devel@r-project.org mailing list
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel=DwIFAg=eRAMFD45gAfqt84VtBcfhQ=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA=AEKZKMjjFTbu5U_zn0bacvzv69lx_S5s7Yb6dSOXbJs=s5EMLCdAbnqgXWs3_-Sxm52Zuc3pqFirWz7z3ymBruU=
>> 
> 
> -- 
> Hervé Pagès
> 
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M1-B514
> P.O. Box 19024
> Seattle, WA 98109-1024
> 
> E-mail: hpa...@fredhutch.org
> Phone:  (206) 667-5791
> Fax:(206) 667-1319

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


[Bioc-devel] Samtools dependency

2019-08-23 Thread Jonathon Hill
Hi,

I am working through the process of submitting a new package (MMAPPR2). We are 
having a problem with the build failing, because our package requires Samtools 
installed. We cannot use Rsamtools, as we depend on features not implemented in 
the package. How do we resolve the issue? What is the policy for system 
dependencies? We have samtools listed in the DESCRIPTION and installation 
instructions in our README, but I am sure that is not enough to get it 
installed on the Build and Check servers.

Thanks,

Jonathon Hill

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


[Bioc-devel] error when creating gitsvn bridge

2014-05-09 Thread Jonathon Hill
Hi am trying to recreate my gitsvn bridge. However, it simply says

An error occurred creating the git-svn bridge.

Any ideas why? I have checked my settings on github and everything looks okay.

Jonathon Hill

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Rsamtools applyPileups function not merging positions from multiple files if not identical

2014-04-29 Thread Jonathon Hill
Hi Martin,

Thanks for looking into this. This is a problem that we have run into before. 
The zebrafish genome has a number of small contigs ( 1000 of them). 
Unfortunately, the novoalign aligner only includes @SQ lines if reads actually 
map to the contig. Since they are so small, each file ends up having its own 
set of small contigs included due to sporadic low-coverage alignments to the 
contigs, even when aligned using the same parameters and the same exact 
reference genome. That is what happened here. I am working on a small function 
to gather all of the contigs from each file and merge them to create a larger 
list that can be used to create a common header. However, it is taking a while 
because it looks like I have to covert the files to sam, replace the header and 
then convert back to bam, sort and index to make sure the factor levels remain 
consistent. I would love any thoughts you have on how to do this. I am not a C 
programmer, so I am resorting to system calls to samtools in my R code. It 
seems like something that would have a fairly broad appeal, as the genomes of 
many non-model organisms have numerous small contigs.

Thanks again,

Jonathon

On Apr 26, 2014, at 5:28 AM, Martin Morgan 
mtmor...@fhcrc.orgmailto:mtmor...@fhcrc.org wrote:

On 04/23/2014 07:42 AM, Jonathon Hill wrote:
Thanks. I look forward to hearing from you.

Hi Jonathon --

It turns out that your BAM files have different seqlevels

 fls - PileupFiles(dir(pattern = _sorted.bam$, full=TRUE))
 lvls = lapply(fls, seqlevels)
 identical(lvls[[1]], lvls[[2]])
[1] FALSE

i.e., the BAM files have different reference sequences. Because of this, 
samtools thinks of 'chr20' in one file as different from 'chr20' in another, 
much as R might confuse values of factors with different level sets.

I updated Rsamtools to check that the seqlevels are identical

 applyPileups(fls, function(...) {})
Error in applyPileups(files, FUN, ..., param = plpParam(files)) :
 applyPileups 'seqlevels' must be identical(); failed when comparing
   '10696X1chr20testregion_sorted.bam' with
   '10696X4chr20testregion_sorted.bam'

so at least the problem is more apparent. I don't think you can correct your 
files using Rsamtools, maybe picard or worst-case (maybe it's appropriate 
anyway) re-aligning all samples to a common set of sequences?

Hope that sheds some light, and thanks for the report,

Martin


On Apr 23, 2014, at 6:10 AM, Martin Morgan 
mtmor...@fhcrc.orgmailto:mtmor...@fhcrc.org
mailto:mtmor...@fhcrc.org wrote:

Thanks for the file snippets; I'm able to reproduce this bug and will let you
know of its resolution. Martin

On 04/22/2014 07:44 AM, Jonathon Hill wrote:
Hi Martin,

Thank you for your response. I checked the header and it says that it is
coordinate sorted, so that shouldn’t be the problem. Here are the results of the
code you provided:

 r3 = applyPileups(PileupFiles(c(fl1, fl2)), function(x) x, param=testparam)
 any(duplicated(r3[[1]][[pos]]))
[1] TRUE
 pos = r3[[1]][[pos]]
 table(table(pos))

  12
4834 3115
 udpos = unique(pos[duplicated(pos)])
 head(pos[match(pos, udpos)], 20)
[1] 135003 135006 135007 135008 135009 135010 135011 135012 135013 135014
135015 135016 135017 135018 135019 135020 135021 135022 135023
[20] 135024
 head(match(pos, udpos), 20)
[1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20
 table(pos)[1:50]
pos
134762 134763 134764 134765 134766 134767 134768 134769 134770 134771 134772
134773 134774 134775 134776 134777 134778 134779 134780
1  1  1  1  1  1  1  1  1  1  1
  1  1  1  1  1  1  1  1
134781 134782 134783 134784 134785 134786 134787 134788 134991 134992 134993
134994 134995 134999 135001 135002 135003 135004 135005
1  1  1  1  1  1  1  1  1  1  1
  1  1  1  1  1  2  1  1
135006 135007 135008 135009 135010 135011 135012 135013 135014 135015 135016
135017
2  2  2  2  2  2  2  2  2  2  2
  2

I think the last one shows it well. There are duplicates throughout the file,
anywhere that there are reads in both files. I have attached the bam files you
requested showing the region used here.

Thanks,

Jonathon

On Apr 21, 2014, at 7:17 PM, Martin Morgan 
mtmor...@fhcrc.orgmailto:mtmor...@fhcrc.org
mailto:mtmor...@fhcrc.org
mailto:mtmor...@fhcrc.org wrote:

On 04/21/2014 02:33 PM, Jonathon Hill wrote:
Hi,

I have been trying to use Rsamtools’ applyPileups function to compare two
files position-by-position. In order to test it out, I simply ran:
minBaseQuality - 20
minMapQuality - 30
minDepth - 10
maxDepth - 1000
testparam - PileupParam(what=seq,
   which=GRanges(“chr20, IRanges(1, 100)),
   minBaseQuality=minBaseQuality,
   minMapQuality=minMapQuality,
   minDepth=minDepth,
   maxDepth=maxDepth,
)
fl1

Re: [Bioc-devel] Rsamtools applyPileups function not merging positions from multiple files if not identical

2014-04-22 Thread Jonathon Hill
Hi Martin,

Thank you for your response. I checked the header and it says that it is 
coordinate sorted, so that shouldn’t be the problem. Here are the results of 
the code you provided:

 r3 = applyPileups(PileupFiles(c(fl1, fl2)), function(x) x, param=testparam)
 any(duplicated(r3[[1]][[pos]]))
[1] TRUE
 pos = r3[[1]][[pos]]
 table(table(pos))

   12
4834 3115
 udpos = unique(pos[duplicated(pos)])
 head(pos[match(pos, udpos)], 20)
 [1] 135003 135006 135007 135008 135009 135010 135011 135012 135013 135014 
135015 135016 135017 135018 135019 135020 135021 135022 135023
[20] 135024
 head(match(pos, udpos), 20)
 [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20
 table(pos)[1:50]
pos
134762 134763 134764 134765 134766 134767 134768 134769 134770 134771 134772 
134773 134774 134775 134776 134777 134778 134779 134780
 1  1  1  1  1  1  1  1  1  1  1
  1  1  1  1  1  1  1  1
134781 134782 134783 134784 134785 134786 134787 134788 134991 134992 134993 
134994 134995 134999 135001 135002 135003 135004 135005
 1  1  1  1  1  1  1  1  1  1  1
  1  1  1  1  1  2  1  1
135006 135007 135008 135009 135010 135011 135012 135013 135014 135015 135016 
135017
 2  2  2  2  2  2  2  2  2  2  2
  2

I think the last one shows it well. There are duplicates throughout the file, 
anywhere that there are reads in both files. I have attached the bam files you 
requested showing the region used here.

Thanks,

Jonathon

On Apr 21, 2014, at 7:17 PM, Martin Morgan 
mtmor...@fhcrc.orgmailto:mtmor...@fhcrc.org wrote:

On 04/21/2014 02:33 PM, Jonathon Hill wrote:
Hi,

I have been trying to use Rsamtools’ applyPileups function to compare two files 
position-by-position. In order to test it out, I simply ran:
minBaseQuality - 20
minMapQuality - 30
minDepth - 10
maxDepth - 1000
testparam - PileupParam(what=seq,
 which=GRanges(“chr20, IRanges(1, 100)),
 minBaseQuality=minBaseQuality,
 minMapQuality=minMapQuality,
 minDepth=minDepth,
 maxDepth=maxDepth,
)
fl1 - 10696X1_140408_SN141_0782_BC4J3NACXX_2_STP1N90A.bam
fl2 - 10696X4_140408_SN141_0782_BC4J3NACXX_2_STP1N90A.bam
r3 = applyPileups(PileupFiles(c(fl1, fl2)), function(x) x, param=testparam)

My understanding is that this should result in a three-dimensional array with 
“ACTGN Counts” in the first dimension, files in the second and position in the 
third. Positions with overlapping reads in both files should thus be collapsed 
into a single line in the third dimension. However, selecting one of these 
positions shows that they are duplicated:

r3[[1]][[seq]][ , , r3[[1]][[pos]] == 135003]

I think your understanding is basically correct.

The function is assuming that the BAM files are sorted by position (with, e.g., 
sortBam, but the files don't have to be sorted by Rsamtools).

Executing a similar command gives me

 str(r3[[1]])
List of 3
$ seqnames: Named int 211195
 ..- attr(*, names)= chr chr20
$ pos : int [1:211195] 60026 60027 60028 60029 60030 60031 60032 60033 
60034 60035 ...
$ seq : int [1:5, 1:2, 1:211195] 0 0 0 0 0 0 0 0 0 0 ...
 ..- attr(*, dimnames)=List of 3
 .. ..$ : chr [1:5] A C G T ...
 .. ..$ : chr [1:2] normal_srx113635_sorted.bam tumor_srx036691_sorted.bam
 .. ..$ : NULL

Do you get something similar, especially the identical seqnames, pos dimension, 
and third dimension of seq? 'pos' should apparently be unique; so

 any(duplicated(r3[[1]][[pos]]))
[1] FALSE

If there are duplicates, I wonder how many there are and where they occur

 pos = r3[[1]][[pos]]
 table(table(pos))
 udpos = unique(pos[duplicated(pos)])
 head(pos[match(pos, udpos)], 20)
 head(match(pos, udpos), 20)

If nothing is suggested by the above, can you make a subset of the BAM files 
available to me, e.g., the result of

 param = ScanBamParam(which=GRanges(chr20, IRanges(1, 100)))
 filterBam(fls[1], tempfile(), param=param)

Thanks,

Martin

yields:

, , 1

  10696X1_140408_SN141_0782_BC4J3NACXX_2_STP1N90A.bam 
10696X4_140408_SN141_0782_BC4J3NACXX_2_STP1N90A.bam
A   0   
0
C   0   
0
G  10   
0
T   0   
0
N   0   
0

, , 2

  10696X1_140408_SN141_0782_BC4J3NACXX_2_STP1N90A.bam 
10696X4_140408_SN141_0782_BC4J3NACXX_2_STP1N90A.bam
A   0

[Bioc-devel] Rsamtools applyPileups function not merging positions from multiple files if not identical

2014-04-21 Thread Jonathon Hill
Hi,

I have been trying to use Rsamtools’ applyPileups function to compare two files 
position-by-position. In order to test it out, I simply ran:
minBaseQuality - 20
minMapQuality - 30
minDepth - 10
maxDepth - 1000
testparam - PileupParam(what=seq,
 which=GRanges(“chr20, IRanges(1, 100)),
 minBaseQuality=minBaseQuality,
 minMapQuality=minMapQuality,
 minDepth=minDepth,
 maxDepth=maxDepth,
)
fl1 - 10696X1_140408_SN141_0782_BC4J3NACXX_2_STP1N90A.bam
fl2 - 10696X4_140408_SN141_0782_BC4J3NACXX_2_STP1N90A.bam
r3 = applyPileups(PileupFiles(c(fl1, fl2)), function(x) x, param=testparam)

My understanding is that this should result in a three-dimensional array with 
“ACTGN Counts” in the first dimension, files in the second and position in the 
third. Positions with overlapping reads in both files should thus be collapsed 
into a single line in the third dimension. However, selecting one of these 
positions shows that they are duplicated:

r3[[1]][[seq]][ , , r3[[1]][[pos]] == 135003]
yields:

, , 1

  10696X1_140408_SN141_0782_BC4J3NACXX_2_STP1N90A.bam 
10696X4_140408_SN141_0782_BC4J3NACXX_2_STP1N90A.bam
A   0   
0
C   0   
0
G  10   
0
T   0   
0
N   0   
0

, , 2

  10696X1_140408_SN141_0782_BC4J3NACXX_2_STP1N90A.bam 
10696X4_140408_SN141_0782_BC4J3NACXX_2_STP1N90A.bam
A   0   
0
C   0   
0
G   0   
   13
T   0   
0
N   0   
0

Even though the position is the same, it is showing up twice. Each time, one of 
the files shows zeroes. This is not consistent with what happens if the files 
are identical (as in the example from the help docs).

For example,

r3 = applyPileups(PileupFiles(c(fl1, fl1)), function(x) x, param=testparam) 
#file 1 entered twice
 r3[[1]][[seq]][ , , r3[[1]][[pos]] == 135003]

yields:

  10696X1_140408_SN141_0782_BC4J3NACXX_2_STP1N90A.bam 
10696X1_140408_SN141_0782_BC4J3NACXX_2_STP1N90A.bam
A   0   
0
C   0   
0
G  10   
   10
T   0   
0
N   0   
0

Is this the expected behavior? It seems like each position should only show up 
once in the output. Is there something I am missing?

Thanks,

Jonathon Hill
Postdoc
Yost Lab, University of Utah



[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Revising biocViews

2014-02-21 Thread Jonathon Hill
Hi,

It may be a minor note, but all of the sequencing technologies included in the 
list are “High Throughput” technologies. Perhaps it would help to add a 
category for 

Sequencing-Sanger (i.e. chain terminator) sequencing. This technology is still 
widely used to confirm putative variants or confirm constructs. 

Jonathon
___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel