I saw that all coercions to atomic vectors from AtomicList are now deprecated. You had proposed deprecating as.vector(), because it should not unlist, and I agreed. Really as.vector() should return an ordinary R list. However, as.character(), as.numeric(), etc, in base R will unlist. I'd like to keep consistency with base R. Do we really need to deprecate those, as well?
Michael 2011/6/15 Michael Lawrence <micha...@gene.com> > > > 2011/6/15 Hervé Pagès <hpa...@fhcrc.org> > >> On 11-06-15 03:38 PM, Michael Lawrence wrote: >> >>> >>> >>> 2011/6/15 Hervé Pagès <hpa...@fhcrc.org <mailto:hpa...@fhcrc.org>> >>> >>> >>> Hi Michael, Janet, >>> >>> I just added an "as.vector" method for XStringSet objects to >>> Biostrings 2.21.6: >>> >>> > library(Biostrings) >>> > x <- DNAStringSet(c("aaatg", "gt")) >>> > as.vector(x) >>> [1] "AAATG" "GT" >>> >>> But that doesn't solve Janet's problem: >>> >>> > df <- DataFrame(id=c("ID1", "ID2"), seqs=x) >>> > df >>> DataFrame with 2 rows and 2 columns >>> id seqs >>> <character> <DNAStringSet> >>> 1 ID1 AAATG >>> 2 ID2 GT >>> > as.data.frame(df) >>> >>> Error in as.data.frame.default(y, optional = TRUE, ...) : >>> cannot coerce class 'structure("DNAStringSet", package = >>> "Biostrings")' into a data.frame >>> >>> Michael? >>> >>> >>> Well, sorry for that. I just added a coercion from Vector to data.frame >>> through as.vector, so this works. >>> >> >> Thanks! >> >> >> But someone might add a coercion from >>> List to data.frame that would treat the elements as columns. Would this >>> make sense? >>> >> >> Hard to tell. Maybe sometimes it would make sense, but sometimes it >> definitely does not (e.g. DNAStringSet). >> >> >> AtomicList to data.frame does something even stranger: it >>> creates a two column data frame with the unlisted values and >>> names/indices rep'd out as a factor. Actually, that's kind of cool, >>> since usually one does not have a list with equal element lengths, but >>> it's somewhat unintuitive. But why does it apply only to AtomicList? >>> >> >> Glad you bring this on the table. >> >> For the record, "as.vector" also unrolls an AtomicList: >> >> > as.vector(IntegerList(1:4, 0:-2)) >> [1] 1 2 3 4 0 -1 -2 >> >> IMO, we should not do things like that. Because: >> >> 1) The same can be achieved with unlist(): >> >> > unlist(IntegerList(1:4, 0:-2)) >> [1] 1 2 3 4 0 -1 -2 >> >> 2) It's totally unintuitive to use as.vector for unlisting >> a list (as.vector on a standard list does not do that). >> >> 3) There is a strong expectation that as.vector() will preserve >> the length of its input. >> >> So I propose to deprecate those "as.vector" and "as.data.frame" >> methods for AtomicList objects. >> >> > Sounds good to me. In fact, the stack method on List is almost identical to > as.data.frame on AtomicList (and the stack method actually makes sense). You > could make as.vector return an ordinary list, since list is a vector. > > >> H. >> >> >> Anyway, given the special correspondence between a XStringSet and a >>> character vector, we could always add an as.data.frame method for >>> XStringSet, just to make sure stuff behaves as expected. >>> >>> Thanks, >>> H. >>> >>> >>> > sessionInfo() >>> R version 2.14.0 Under development (unstable) (2011-05-30 r56024) >>> Platform: x86_64-unknown-linux-gnu (64-bit) >>> >>> locale: >>> [1] LC_CTYPE=en_CA.UTF-8 LC_NUMERIC=C >>> [3] LC_TIME=en_CA.UTF-8 LC_COLLATE=en_CA.UTF-8 >>> [5] LC_MONETARY=en_CA.UTF-8 LC_MESSAGES=en_CA.UTF-8 >>> [7] LC_PAPER=C LC_NAME=C >>> [9] LC_ADDRESS=C LC_TELEPHONE=C >>> [11] LC_MEASUREMENT=en_CA.UTF-8 LC_IDENTIFICATION=C >>> >>> >>> attached base packages: >>> [1] stats graphics grDevices utils datasets methods base >>> >>> other attached packages: >>> [1] Biostrings_2.21.6 IRanges_1.11.10 >>> >>> >>> >>> On 11-06-15 12:49 PM, Janet Young wrote: >>> >>> yes - as.character seems a good choice, I think >>> >>> thanks, >>> >>> Janet >>> >>> On Jun 15, 2011, at 12:46 PM, Michael Lawrence wrote: >>> >>> So you would expect that the DNAStringSet is converted to a >>> character vector? DNAStringSet (technically XStringSet) then >>> just needs an as.vector method that delegates to as.character. >>> >>> Michael >>> >>> >>> On Wed, Jun 15, 2011 at 12:37 PM, Janet >>> Young<jayo...@fhcrc.org <mailto:jayo...@fhcrc.org>> wrote: >>> >>> Hi there, >>> >>> I'm trying to as as.data.frame on a GRanges object. On >>> regular GRanges objects it works fine but I have some >>> objects that contain a DNAStringSet in the values column, >>> which isn't built in to the as.data.frame method. Is it >>> possible to add the ability to coerce the DNAStringSet too, >>> please? >>> >>> Here's some code that demonstrates the issue: >>> >>> ################ >>> library(GenomicRanges) >>> library(Biostrings) >>> >>> gr1<- >>> >>> >>> GRanges(seqnames=rep("chr1",3),ranges=IRanges(start=c(1,101,201),width=50),strand=c("+","-","+"), >>> genenames=c("seq1","seq2","seq3") ) >>> >>> as.data.frame(gr1) >>> # works >>> >>> gr2<- gr1 >>> values(gr2)[,"myseqs"]<- DNAStringSet(c ("AACGTG", >>> "ACGGTGGTGTT", "GAGGCTG")) >>> >>> as.data.frame(gr2) >>> # Error in as.data.frame.default(y, optional = TRUE, ...) : >>> # cannot coerce class 'structure("DNAStringSet", package = >>> "Biostrings")' into a data.frame >>> ################ >>> >>> and here's sessionInfo() output: >>> >>> R version 2.13.0 (2011-04-13) >>> Platform: i386-apple-darwin9.8.0/i386 (32-bit) >>> >>> locale: >>> [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 >>> >>> attached base packages: >>> [1] stats graphics grDevices utils datasets >>> methods base >>> >>> other attached packages: >>> [1] Biostrings_2.20.1 GenomicRanges_1.4.6 IRanges_1.10.4 >>> >>> ################ >>> >>> >>> You might wonder why I'm storing sequences in the GRanges >>> values - in my real data they're sequencing reads that have >>> mapped back to that region, but I'm still curious to >>> maintain the sequence itself (for the moment) because it's >>> not always identical to the underlying genomic sequence of >>> that region (investigating mapping issues). >>> >>> (and my desire to use as.data.frame relates to a suggestion >>> from Herve to let me workaround some issues with the >>> identical function) >>> >>> thanks, >>> >>> Janet >>> >>> _______________________________________________ >>> Bioc-sig-sequencing mailing list >>> Bioc-sig-sequencing@r-project.org >>> <mailto:Bioc-sig-sequencing@r-project.org> >>> >>> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing >>> >>> >>> _______________________________________________ >>> Bioc-sig-sequencing mailing list >>> Bioc-sig-sequencing@r-project.org >>> <mailto:Bioc-sig-sequencing@r-project.org> >>> >>> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing >>> >>> >>> >>> -- >>> Hervé Pagès >>> >>> Program in Computational Biology >>> Division of Public Health Sciences >>> Fred Hutchinson Cancer Research Center >>> 1100 Fairview Ave. N, M1-B514 >>> P.O. Box 19024 >>> Seattle, WA 98109-1024 >>> >>> E-mail: hpa...@fhcrc.org <mailto:hpa...@fhcrc.org> >>> >>> Phone: (206) 667-5791 >>> Fax: (206) 667-1319 >>> >>> >>> >> >> -- >> Hervé Pagès >> >> Program in Computational Biology >> Division of Public Health Sciences >> Fred Hutchinson Cancer Research Center >> 1100 Fairview Ave. N, M1-B514 >> P.O. Box 19024 >> Seattle, WA 98109-1024 >> >> E-mail: hpa...@fhcrc.org >> Phone: (206) 667-5791 >> Fax: (206) 667-1319 >> > > [[alternative HTML version deleted]]
_______________________________________________ Bioc-sig-sequencing mailing list Bioc-sig-sequencing@r-project.org https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing