Re: [Bioc-sig-seq] as.data.frame on GRanges object with DNAStringSet in values

Michael Lawrence Wed, 15 Jun 2011 15:40:10 -0700

2011/6/15 Hervé Pagès <hpa...@fhcrc.org>

> Hi Michael, Janet,
>
> I just added an "as.vector" method for XStringSet objects to
> Biostrings 2.21.6:
>
>  > library(Biostrings)
>  > x <- DNAStringSet(c("aaatg", "gt"))
>  > as.vector(x)
>  [1] "AAATG" "GT"
>
> But that doesn't solve Janet's problem:
>
>  > df <- DataFrame(id=c("ID1", "ID2"), seqs=x)
>  > df
>  DataFrame with 2 rows and 2 columns
>             id           seqs
>    <character> <DNAStringSet>
>  1         ID1          AAATG
>  2         ID2             GT
>  > as.data.frame(df)
>
>  Error in as.data.frame.default(y, optional = TRUE, ...) :
>    cannot coerce class 'structure("DNAStringSet", package = "Biostrings")'
> into a data.frame
>
> Michael?
>
>
Well, sorry for that. I just added a coercion from Vector to data.frame
through as.vector, so this works. But someone might add a coercion from List
to data.frame that would treat the elements as columns. Would this make
sense? AtomicList to data.frame does something even stranger: it creates a
two column data frame with the unlisted values and names/indices rep'd out
as a factor. Actually, that's kind of cool, since usually one does not have
a list with equal element lengths, but it's somewhat unintuitive. But why
does it apply only to AtomicList? Anyway, given the special correspondence
between a XStringSet and a character vector, we could always add an
as.data.frame method for XStringSet, just to make sure stuff behaves as
expected.



> Thanks,
> H.
>
>
> > sessionInfo()
> R version 2.14.0 Under development (unstable) (2011-05-30 r56024)
> Platform: x86_64-unknown-linux-gnu (64-bit)
>
> locale:
>  [1] LC_CTYPE=en_CA.UTF-8       LC_NUMERIC=C
>  [3] LC_TIME=en_CA.UTF-8        LC_COLLATE=en_CA.UTF-8
>  [5] LC_MONETARY=en_CA.UTF-8    LC_MESSAGES=en_CA.UTF-8
>  [7] LC_PAPER=C                 LC_NAME=C
>  [9] LC_ADDRESS=C               LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_CA.UTF-8 LC_IDENTIFICATION=C
>
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
> [1] Biostrings_2.21.6 IRanges_1.11.10
>
>
>
> On 11-06-15 12:49 PM, Janet Young wrote:
>
>> yes - as.character seems a good choice, I think
>>
>> thanks,
>>
>> Janet
>>
>> On Jun 15, 2011, at 12:46 PM, Michael Lawrence wrote:
>>
>>  So you would expect that the DNAStringSet is converted to a character
>>> vector? DNAStringSet (technically XStringSet) then just needs an as.vector
>>> method that delegates to as.character.
>>>
>>> Michael
>>>
>>>
>>> On Wed, Jun 15, 2011 at 12:37 PM, Janet Young<jayo...@fhcrc.org>  wrote:
>>> Hi there,
>>>
>>> I'm trying to as as.data.frame on a GRanges object. On regular GRanges
>>> objects it works fine but I have some objects that contain a DNAStringSet in
>>> the values column, which isn't built in to the as.data.frame method.  Is it
>>> possible to add the ability to coerce the DNAStringSet too, please?
>>>
>>> Here's some code that demonstrates the issue:
>>>
>>> ################
>>> library(GenomicRanges)
>>> library(Biostrings)
>>>
>>> gr1<-
>>> GRanges(seqnames=rep("chr1",3),ranges=IRanges(start=c(1,101,201),width=50),strand=c("+","-","+"),
>>> genenames=c("seq1","seq2","seq3") )
>>>
>>> as.data.frame(gr1)
>>> # works
>>>
>>> gr2<- gr1
>>> values(gr2)[,"myseqs"]<- DNAStringSet(c ("AACGTG", "ACGGTGGTGTT",
>>> "GAGGCTG"))
>>>
>>> as.data.frame(gr2)
>>> # Error in as.data.frame.default(y, optional = TRUE, ...) :
>>> #   cannot coerce class 'structure("DNAStringSet", package =
>>> "Biostrings")' into a data.frame
>>> ################
>>>
>>> and here's   sessionInfo() output:
>>>
>>> R version 2.13.0 (2011-04-13)
>>> Platform: i386-apple-darwin9.8.0/i386 (32-bit)
>>>
>>> locale:
>>> [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
>>>
>>> attached base packages:
>>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>>
>>> other attached packages:
>>> [1] Biostrings_2.20.1   GenomicRanges_1.4.6 IRanges_1.10.4
>>>
>>> ################
>>>
>>>
>>> You might wonder why I'm storing sequences in the GRanges values - in my
>>> real data they're sequencing reads that have mapped back to that region, but
>>> I'm still curious to maintain the sequence itself (for the moment) because
>>> it's not always identical to the underlying genomic sequence of that region
>>> (investigating mapping issues).
>>>
>>> (and my desire to use as.data.frame relates to a suggestion from Herve to
>>> let me workaround some issues with the identical function)
>>>
>>> thanks,
>>>
>>> Janet
>>>
>>> _______________________________________________
>>> Bioc-sig-sequencing mailing list
>>> Bioc-sig-sequencing@r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>>>
>>>
>> _______________________________________________
>> Bioc-sig-sequencing mailing list
>> Bioc-sig-sequencing@r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>>
>
>
> --
> Hervé Pagès
>
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M1-B514
> P.O. Box 19024
> Seattle, WA 98109-1024
>
> E-mail: hpa...@fhcrc.org
> Phone:  (206) 667-5791
> Fax:    (206) 667-1319
>

        [[alternative HTML version deleted]]

_______________________________________________
Bioc-sig-sequencing mailing list
Bioc-sig-sequencing@r-project.org
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing

Re: [Bioc-sig-seq] as.data.frame on GRanges object with DNAStringSet in values

Reply via email to