2011/6/15 Hervé Pagès <hpa...@fhcrc.org>

> On 11-06-15 03:38 PM, Michael Lawrence wrote:
>
>>
>>
>> 2011/6/15 Hervé Pagès <hpa...@fhcrc.org <mailto:hpa...@fhcrc.org>>
>>
>>
>>    Hi Michael, Janet,
>>
>>    I just added an "as.vector" method for XStringSet objects to
>>    Biostrings 2.21.6:
>>
>>     > library(Biostrings)
>>     > x <- DNAStringSet(c("aaatg", "gt"))
>>     > as.vector(x)
>>      [1] "AAATG" "GT"
>>
>>    But that doesn't solve Janet's problem:
>>
>>     > df <- DataFrame(id=c("ID1", "ID2"), seqs=x)
>>     > df
>>      DataFrame with 2 rows and 2 columns
>>                 id           seqs
>>    <character> <DNAStringSet>
>>      1         ID1          AAATG
>>      2         ID2             GT
>>     > as.data.frame(df)
>>
>>      Error in as.data.frame.default(y, optional = TRUE, ...) :
>>        cannot coerce class 'structure("DNAStringSet", package =
>>    "Biostrings")' into a data.frame
>>
>>    Michael?
>>
>>
>> Well, sorry for that. I just added a coercion from Vector to data.frame
>> through as.vector, so this works.
>>
>
> Thanks!
>
>
>  But someone might add a coercion from
>> List to data.frame that would treat the elements as columns. Would this
>> make sense?
>>
>
> Hard to tell. Maybe sometimes it would make sense, but sometimes it
> definitely does not (e.g. DNAStringSet).
>
>
>  AtomicList to data.frame does something even stranger: it
>> creates a two column data frame with the unlisted values and
>> names/indices rep'd out as a factor. Actually, that's kind of cool,
>> since usually one does not have a list with equal element lengths, but
>> it's somewhat unintuitive. But why does it apply only to AtomicList?
>>
>
> Glad you bring this on the table.
>
> For the record, "as.vector" also unrolls an AtomicList:
>
>  > as.vector(IntegerList(1:4, 0:-2))
>  [1]  1  2  3  4  0 -1 -2
>
> IMO, we should not do things like that. Because:
>
>  1) The same can be achieved with unlist():
>
>    > unlist(IntegerList(1:4, 0:-2))
>    [1]  1  2  3  4  0 -1 -2
>
>  2) It's totally unintuitive to use as.vector for unlisting
>     a list (as.vector on a standard list does not do that).
>
>  3) There is a strong expectation that as.vector() will preserve
>     the length of its input.
>
> So I propose to deprecate those "as.vector" and "as.data.frame"
> methods for AtomicList objects.
>
>
Sounds good to me. In fact, the stack method on List is almost identical to
as.data.frame on AtomicList (and the stack method actually makes sense). You
could make as.vector return an ordinary list, since list is a vector.


> H.
>
>
>  Anyway, given the special correspondence between a XStringSet and a
>> character vector, we could always add an as.data.frame method for
>> XStringSet, just to make sure stuff behaves as expected.
>>
>>    Thanks,
>>    H.
>>
>>
>>     > sessionInfo()
>>    R version 2.14.0 Under development (unstable) (2011-05-30 r56024)
>>    Platform: x86_64-unknown-linux-gnu (64-bit)
>>
>>    locale:
>>      [1] LC_CTYPE=en_CA.UTF-8       LC_NUMERIC=C
>>      [3] LC_TIME=en_CA.UTF-8        LC_COLLATE=en_CA.UTF-8
>>      [5] LC_MONETARY=en_CA.UTF-8    LC_MESSAGES=en_CA.UTF-8
>>      [7] LC_PAPER=C                 LC_NAME=C
>>      [9] LC_ADDRESS=C               LC_TELEPHONE=C
>>    [11] LC_MEASUREMENT=en_CA.UTF-8 LC_IDENTIFICATION=C
>>
>>
>>    attached base packages:
>>    [1] stats     graphics  grDevices utils     datasets  methods   base
>>
>>    other attached packages:
>>    [1] Biostrings_2.21.6 IRanges_1.11.10
>>
>>
>>
>>    On 11-06-15 12:49 PM, Janet Young wrote:
>>
>>        yes - as.character seems a good choice, I think
>>
>>        thanks,
>>
>>        Janet
>>
>>        On Jun 15, 2011, at 12:46 PM, Michael Lawrence wrote:
>>
>>            So you would expect that the DNAStringSet is converted to a
>>            character vector? DNAStringSet (technically XStringSet) then
>>            just needs an as.vector method that delegates to as.character.
>>
>>            Michael
>>
>>
>>            On Wed, Jun 15, 2011 at 12:37 PM, Janet
>>            Young<jayo...@fhcrc.org <mailto:jayo...@fhcrc.org>>  wrote:
>>
>>            Hi there,
>>
>>            I'm trying to as as.data.frame on a GRanges object. On
>>            regular GRanges objects it works fine but I have some
>>            objects that contain a DNAStringSet in the values column,
>>            which isn't built in to the as.data.frame method.  Is it
>>            possible to add the ability to coerce the DNAStringSet too,
>>            please?
>>
>>            Here's some code that demonstrates the issue:
>>
>>            ################
>>            library(GenomicRanges)
>>            library(Biostrings)
>>
>>            gr1<-
>>
>>  
>> GRanges(seqnames=rep("chr1",3),ranges=IRanges(start=c(1,101,201),width=50),strand=c("+","-","+"),
>>            genenames=c("seq1","seq2","seq3") )
>>
>>            as.data.frame(gr1)
>>            # works
>>
>>            gr2<- gr1
>>            values(gr2)[,"myseqs"]<- DNAStringSet(c ("AACGTG",
>>            "ACGGTGGTGTT", "GAGGCTG"))
>>
>>            as.data.frame(gr2)
>>            # Error in as.data.frame.default(y, optional = TRUE, ...) :
>>            #   cannot coerce class 'structure("DNAStringSet", package =
>>            "Biostrings")' into a data.frame
>>            ################
>>
>>            and here's   sessionInfo() output:
>>
>>            R version 2.13.0 (2011-04-13)
>>            Platform: i386-apple-darwin9.8.0/i386 (32-bit)
>>
>>            locale:
>>            [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
>>
>>            attached base packages:
>>            [1] stats     graphics  grDevices utils     datasets
>>              methods   base
>>
>>            other attached packages:
>>            [1] Biostrings_2.20.1   GenomicRanges_1.4.6 IRanges_1.10.4
>>
>>            ################
>>
>>
>>            You might wonder why I'm storing sequences in the GRanges
>>            values - in my real data they're sequencing reads that have
>>            mapped back to that region, but I'm still curious to
>>            maintain the sequence itself (for the moment) because it's
>>            not always identical to the underlying genomic sequence of
>>            that region (investigating mapping issues).
>>
>>            (and my desire to use as.data.frame relates to a suggestion
>>            from Herve to let me workaround some issues with the
>>            identical function)
>>
>>            thanks,
>>
>>            Janet
>>
>>            _______________________________________________
>>            Bioc-sig-sequencing mailing list
>>            Bioc-sig-sequencing@r-project.org
>>            <mailto:Bioc-sig-sequencing@r-project.org>
>>
>>            https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>>
>>
>>        _______________________________________________
>>        Bioc-sig-sequencing mailing list
>>        Bioc-sig-sequencing@r-project.org
>>        <mailto:Bioc-sig-sequencing@r-project.org>
>>
>>        https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>>
>>
>>
>>    --
>>    Hervé Pagès
>>
>>    Program in Computational Biology
>>    Division of Public Health Sciences
>>    Fred Hutchinson Cancer Research Center
>>    1100 Fairview Ave. N, M1-B514
>>    P.O. Box 19024
>>    Seattle, WA 98109-1024
>>
>>    E-mail: hpa...@fhcrc.org <mailto:hpa...@fhcrc.org>
>>
>>    Phone:  (206) 667-5791
>>    Fax:    (206) 667-1319
>>
>>
>>
>
> --
> Hervé Pagès
>
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M1-B514
> P.O. Box 19024
> Seattle, WA 98109-1024
>
> E-mail: hpa...@fhcrc.org
> Phone:  (206) 667-5791
> Fax:    (206) 667-1319
>

        [[alternative HTML version deleted]]

_______________________________________________
Bioc-sig-sequencing mailing list
Bioc-sig-sequencing@r-project.org
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing

Reply via email to