On Wed, Mar 9, 2011 at 6:41 AM, Ivan Gregoretti <ivang...@gmail.com> wrote:
> Just to expand a little bit Vincent's response. > > If you happen to be handling very large BED files, you probably keep > them compressed. The good news is that even in that case, you can load > them: > > lit = import("~/lit.bed.gz"."bed") > > There is still the long-standing issue of how slow the import() > function is but I am still hopeful. > > This is the first I've heard of this. What sort of files are slow? Do they have a track line? The parsing gets complicated when there are track lines and multiple tracks in a file. BED is a complex format with many variants. > Ivan > > Ivan Gregoretti, PhD > National Institute of Diabetes and Digestive and Kidney Diseases > National Institutes of Health > 5 Memorial Dr, Building 5, Room 205. > Bethesda, MD 20892. USA. > Phone: 1-301-496-1016 and 1-301-496-1592 > Fax: 1-301-496-9878 > > > > On Tue, Mar 8, 2011 at 9:26 PM, Vincent Carey > <st...@channing.harvard.edu> wrote: > > 2011/3/8 Thiago Yukio Kikuchi Oliveira <strat...@gmail.com>: > >> Hi, > >> > >> Is there a BED file parser for R? > > > > I suppose it depends on what you mean by "parser". import() from the > > rtracklayer package imports BED and constructs and populates a > > RangedData object with the contents. Here we look at a small bed file > > in text, > > start R, load rtracklayer, import the data, show the result, and show > > the resources used. > > > > bash-3.2$ head ~/junc716_20.bed > > chr20 55658 64827 JUNC00000001 14 + 55658 64827 > > 255,0,0 2 27,25 0,9144 > > chr20 55662 64821 JUNC00000002 2 - 55662 64821 > > 255,0,0 2 34,8 0,9151 > > chr20 135774 147029 JUNC00000003 1 - 135774 147029 > > 255,0,0 2 8,29 0,11226 > > chr20 167951 172361 JUNC00000004 1 + 167951 172361 > > 255,0,0 2 29,8 0,4402 > > chr20 189824 192113 JUNC00000005 3 + 189824 192113 > > 255,0,0 2 33,9 0,2280 > > chr20 189829 192113 JUNC00000006 3 + 189829 192113 > > 255,0,0 2 32,9 0,2275 > > chr20 193930 199576 JUNC00000007 4 - 193930 199576 > > 255,0,0 2 28,11 0,5635 > > chr20 207050 207846 JUNC00000008 2 - 207050 207846 > > 255,0,0 2 20,34 0,762 > > chr20 218306 218925 JUNC00000009 1 - 218306 218925 > > 255,0,0 2 11,26 0,593 > > chr20 221160 225070 JUNC00000010 25 - 221160 225070 > > 255,0,0 2 29,9 0,3901 > > bash-3.2$ head ~/junc716_20.bed > ~/lit.bed > > bash-3.2$ R213 --vanilla --quiet > >> library(rtracklayer) > > Loading required package: RCurl > > Loading required package: bitops > >> lit = import("~/lit.bed") > >> lit > > RangedData with 10 rows and 9 value columns across 1 space > > space ranges | name score strand > thickStart > > <character> <IRanges> | <character> <numeric> <character> > <integer> > > 1 chr20 [ 55659, 64827] | JUNC00000001 14 + > 55658 > > 2 chr20 [ 55663, 64821] | JUNC00000002 2 - > 55662 > > 3 chr20 [135775, 147029] | JUNC00000003 1 - > 135774 > > 4 chr20 [167952, 172361] | JUNC00000004 1 + > 167951 > > 5 chr20 [189825, 192113] | JUNC00000005 3 + > 189824 > > 6 chr20 [189830, 192113] | JUNC00000006 3 + > 189829 > > 7 chr20 [193931, 199576] | JUNC00000007 4 - > 193930 > > 8 chr20 [207051, 207846] | JUNC00000008 2 - > 207050 > > 9 chr20 [218307, 218925] | JUNC00000009 1 - > 218306 > > 10 chr20 [221161, 225070] | JUNC00000010 25 - > 221160 > > thickEnd itemRgb blockCount blockSizes blockStarts > > <integer> <character> <integer> <character> <character> > > 1 64827 #FF0000 2 27,25 0,9144 > > 2 64821 #FF0000 2 34,8 0,9151 > > 3 147029 #FF0000 2 8,29 0,11226 > > 4 172361 #FF0000 2 29,8 0,4402 > > 5 192113 #FF0000 2 33,9 0,2280 > > 6 192113 #FF0000 2 32,9 0,2275 > > 7 199576 #FF0000 2 28,11 0,5635 > > 8 207846 #FF0000 2 20,34 0,762 > > 9 218925 #FF0000 2 11,26 0,593 > > 10 225070 #FF0000 2 29,9 0,3901 > > > >> sessionInfo() > > R version 2.13.0 Under development (unstable) (2011-03-01 r54628) > > Platform: x86_64-apple-darwin10.4.0/x86_64 (64-bit) > > > > locale: > > [1] C > > > > attached base packages: > > [1] stats graphics grDevices utils datasets methods base > > > > other attached packages: > > [1] rtracklayer_1.11.11 RCurl_1.5-0 bitops_1.0-4.1 > > > > loaded via a namespace (and not attached): > > [1] BSgenome_1.19.4 Biobase_2.11.9 Biostrings_2.19.15 > > [4] GenomicRanges_1.3.23 IRanges_1.9.25 Matrix_0.999375-47 > > [7] XML_3.2-0 grid_2.13.0 lattice_0.19-17 > > > > > >> > >> > >> Thanks > >> > >> / Thiago Yukio Kikuchi Oliveira > >> (=\ > >> \=) Faculdade de Medicina de Ribeirão Preto > >> / Laboratório de Genética Molecular e Bioinformática > >> /=) ----------------------------------------------------------------- > >> (=/ Centro de Terapia Celular/CEPID/FAPESP - Hemocentro de Rib. Preto > >> / Rua Tenente Catão Roxo, 2501 CEP 14151-140 > >> (=\ Ribeirão Preto - São Paulo > >> \=) Fone: 55 16 2101-9300 Ramal: 9603 > >> / E-mail: stra...@lgmb.fmrp.usp.br > >> /=) strat...@gmail.com > >> (=/ > >> / Bioinformatic Team - BiT: http://lgmb.fmrp.usp.br > >> (=\ Hemocentro de Ribeirão Preto: http://pegasus.fmrp.usp.br > >> \=) > >> / ----------------------------------------------------------------- > >> > >> _______________________________________________ > >> Bioc-sig-sequencing mailing list > >> Bioc-sig-sequencing@r-project.org > >> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing > >> > > > > _______________________________________________ > > Bioc-sig-sequencing mailing list > > Bioc-sig-sequencing@r-project.org > > https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing > > > > _______________________________________________ > Bioc-sig-sequencing mailing list > Bioc-sig-sequencing@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing > [[alternative HTML version deleted]]
_______________________________________________ Bioc-sig-sequencing mailing list Bioc-sig-sequencing@r-project.org https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing