2011/6/15 Hervé Pagès <hpa...@fhcrc.org> > Hi Michael, Janet, > > I just added an "as.vector" method for XStringSet objects to > Biostrings 2.21.6: > > > library(Biostrings) > > x <- DNAStringSet(c("aaatg", "gt")) > > as.vector(x) > [1] "AAATG" "GT" > > But that doesn't solve Janet's problem: > > > df <- DataFrame(id=c("ID1", "ID2"), seqs=x) > > df > DataFrame with 2 rows and 2 columns > id seqs > <character> <DNAStringSet> > 1 ID1 AAATG > 2 ID2 GT > > as.data.frame(df) > > Error in as.data.frame.default(y, optional = TRUE, ...) : > cannot coerce class 'structure("DNAStringSet", package = "Biostrings")' > into a data.frame > > Michael? > > Well, sorry for that. I just added a coercion from Vector to data.frame through as.vector, so this works. But someone might add a coercion from List to data.frame that would treat the elements as columns. Would this make sense? AtomicList to data.frame does something even stranger: it creates a two column data frame with the unlisted values and names/indices rep'd out as a factor. Actually, that's kind of cool, since usually one does not have a list with equal element lengths, but it's somewhat unintuitive. But why does it apply only to AtomicList? Anyway, given the special correspondence between a XStringSet and a character vector, we could always add an as.data.frame method for XStringSet, just to make sure stuff behaves as expected.
> Thanks, > H. > > > > sessionInfo() > R version 2.14.0 Under development (unstable) (2011-05-30 r56024) > Platform: x86_64-unknown-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_CA.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_CA.UTF-8 LC_COLLATE=en_CA.UTF-8 > [5] LC_MONETARY=en_CA.UTF-8 LC_MESSAGES=en_CA.UTF-8 > [7] LC_PAPER=C LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_CA.UTF-8 LC_IDENTIFICATION=C > > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] Biostrings_2.21.6 IRanges_1.11.10 > > > > On 11-06-15 12:49 PM, Janet Young wrote: > >> yes - as.character seems a good choice, I think >> >> thanks, >> >> Janet >> >> On Jun 15, 2011, at 12:46 PM, Michael Lawrence wrote: >> >> So you would expect that the DNAStringSet is converted to a character >>> vector? DNAStringSet (technically XStringSet) then just needs an as.vector >>> method that delegates to as.character. >>> >>> Michael >>> >>> >>> On Wed, Jun 15, 2011 at 12:37 PM, Janet Young<jayo...@fhcrc.org> wrote: >>> Hi there, >>> >>> I'm trying to as as.data.frame on a GRanges object. On regular GRanges >>> objects it works fine but I have some objects that contain a DNAStringSet in >>> the values column, which isn't built in to the as.data.frame method. Is it >>> possible to add the ability to coerce the DNAStringSet too, please? >>> >>> Here's some code that demonstrates the issue: >>> >>> ################ >>> library(GenomicRanges) >>> library(Biostrings) >>> >>> gr1<- >>> GRanges(seqnames=rep("chr1",3),ranges=IRanges(start=c(1,101,201),width=50),strand=c("+","-","+"), >>> genenames=c("seq1","seq2","seq3") ) >>> >>> as.data.frame(gr1) >>> # works >>> >>> gr2<- gr1 >>> values(gr2)[,"myseqs"]<- DNAStringSet(c ("AACGTG", "ACGGTGGTGTT", >>> "GAGGCTG")) >>> >>> as.data.frame(gr2) >>> # Error in as.data.frame.default(y, optional = TRUE, ...) : >>> # cannot coerce class 'structure("DNAStringSet", package = >>> "Biostrings")' into a data.frame >>> ################ >>> >>> and here's sessionInfo() output: >>> >>> R version 2.13.0 (2011-04-13) >>> Platform: i386-apple-darwin9.8.0/i386 (32-bit) >>> >>> locale: >>> [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 >>> >>> attached base packages: >>> [1] stats graphics grDevices utils datasets methods base >>> >>> other attached packages: >>> [1] Biostrings_2.20.1 GenomicRanges_1.4.6 IRanges_1.10.4 >>> >>> ################ >>> >>> >>> You might wonder why I'm storing sequences in the GRanges values - in my >>> real data they're sequencing reads that have mapped back to that region, but >>> I'm still curious to maintain the sequence itself (for the moment) because >>> it's not always identical to the underlying genomic sequence of that region >>> (investigating mapping issues). >>> >>> (and my desire to use as.data.frame relates to a suggestion from Herve to >>> let me workaround some issues with the identical function) >>> >>> thanks, >>> >>> Janet >>> >>> _______________________________________________ >>> Bioc-sig-sequencing mailing list >>> Bioc-sig-sequencing@r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing >>> >>> >> _______________________________________________ >> Bioc-sig-sequencing mailing list >> Bioc-sig-sequencing@r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing >> > > > -- > Hervé Pagès > > Program in Computational Biology > Division of Public Health Sciences > Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N, M1-B514 > P.O. Box 19024 > Seattle, WA 98109-1024 > > E-mail: hpa...@fhcrc.org > Phone: (206) 667-5791 > Fax: (206) 667-1319 > [[alternative HTML version deleted]]
_______________________________________________ Bioc-sig-sequencing mailing list Bioc-sig-sequencing@r-project.org https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing