[aroma.affymetrix] Re: Using GenomeGraphs with FIRMA (Was: FIRMA score)

Mark Robinson Wed, 04 Mar 2009 20:39:04 -0800

Hi Frank.

Comments below.



On 05/03/2009, at 12:34 PM, Frank Ziegner wrote:

> Hi Mark,
>
> I have some questions regarding you plot and some more general  
> questions regarding probesets/exons.
>
> I assume your plot shows the WNK1 (ENSG00000060237) gene.
>
> When doing:
>   u<-which(getUnitNames(cdf)=="ENSG00000060237") #cdf is the Ensembl- 
> cdf
>   ugcM <- getUnitGroupCellMap(cdf, units=u, retNames=TRUE)
>   nProbes <- table(ugcM$group)
>   length(nProbes)
> I got 58 probesets. But you plot does only shows around 52  
> probesets. Did you leave any probesets out (can't see where...)?


We are using different CDFs.  I am using the "coreR3,A20071112,EP"  
one, not the Ensembl one, which is the source of the difference.


> Your plot shows four transcripts (ENST00000315939, ENST00000252477,  
> ENST00000386624, ENST00000408512). But Ensemble says, that there  
> only three transcripts (ENST00000315939, ENST00000252477,  
> ENST00000340908) belong to this gene. So you missed one (the last)  
> transcript of WNK1 and added to more transcripts:
> ENST00000408512 and ENST00000386624. I think it has something to do  
> with the way you selected the transcript:
>   "tr<-new("Transcript",id=unique(g...@ens[,"ensembl_gene_id"]),..."
> What was the idea behind?

Biomart seems to be down at the moment, so I can't track this down,  
but probably this has to do with the previous step, the "GeneRegion".   
The command was:

gr <- new("GeneRegion", chromosome=ch, start=min(psCsv[m,"start"]),
                        end = max(psCsv[m,"stop"]), strand =  
psCsv[f,"strand"], biomart = mart)

and this just searches Ensembl for "genes" in that region.  Perhaps we  
should be more restrictive in this search.  ENST00000408512 is an  
snRNA gene, ENST00000386624 is an snRNA_pseudogene.  According to:

http://www.ensembl.org/Homo_sapiens/Gene/Summary?g=ENSG00000060237

ENSG00000060237 has 3 transcripts.




> But my more important question is: Ensembl tells me, that WNK1 has  
> 29 Exons (like also shown in your plot, if one counts very carefully  
> in the first two transcripts). And the transcripts ENST00000315939,  
> ENST00000252477, ENST00000340908 each have 28, 26 and 25 Exons,  
> respectively. My memory on the basics (How the HuEx was build and  
> annotated) is a bit weak, but I remember as a rule of thumb that a  
> probeset roughly equals an exon. Obviously this rule does not apply  
> here!

A probeset is for a probe selection region (PSR).  You can have 1 PSR  
per exon or many.  It depends on the Affy rules and the mass of  
annotation projected onto the genome.


> Maybe we have to be a bit more precise. I think one can say: Every  
> probeset matches to only one Exon. But different probesets can match  
> to the same exon.  Is this statement right?

Yes, I agree.


> In your plot one could assume, that the first four probesets match  
> to the first exon. But that arises the question: Shouldn't then the  
> expression values for these probesets be identical? Wait, answer to  
> myself: different probesets have different probe affinities and so  
> they have different intensities. Is this the reason?

Yes, different PROBEs have different affinities, but indeed this  
carries over to PROBESETs.  Gene-level summaries are generally not  
comparable from gene-to-gene (although people sometimes do it, at  
least indirectly), only sample-to-sample, so why would probeset-level  
summaries be?


> Two more questions:
> In  GenomeGraphs one can plot the names of the probesets with  exon  
> = new("ExonArray", ...,DisplayPars=(...,displayProbesets=TRUE)). By  
> default this plots the affymetrix probeset_ids. But I would like to  
> plot the Ensembl-names of the Exons to which the probesets match  
> (e.g. ENSE00000437674). I guess I have to replace the "probeset_id"s  
> with the Ensembl-name of the corresponding exon. Somehow this must  
> be possible since the "plotMap=TRUE"-Options provides a mapping  
> between probeset and exon. But I don't know, where to find this  
> mapping. (Or is this mapping just done by coordinates?) Got anybody  
> ideas? After all I also want to map a possible probeset (obtained  
> the probeset_id with limma) to an exon, not only to a gene...

If you want to plot the Ensembl exon identifiers, I suspect it'll be  
tricky.  For example:

 > g...@ens[,c("ensembl_exon_id","exon_chrom_start","exon_chrom_end")] 
[1:2,]
   ensembl_exon_id exon_chrom_start exon_chrom_end
1 ENSE00001565147           760558         760626
2 ENSE00001501630           760559         760687

You have 2 exons that cover the same area of the genome (i.e.  
overlap).  You don't have 2 PSRs that cover the same region of the  
genome (i.e. no overlap).  Therefore, IMHO, its easier to deal with  
PSRs.

If you did want to do this, you have 2 options, as I see it.  First  
(easier) is take the PSR identifier and replace it with some Ensembl  
gene identifer.  Due to the above problem, you may have some choices  
to make about which Ensembl identifier you give to a certain PSR.   
Second (harder) is to create a CDF file where the "groups" are  
labelled with Ensembl identifiers.


> In your example plot your first row shows the data "ds <-  
> getDataSet(plm)" (I think there is a typo somewhere ds<->cs). What  
> exactly is this data?


Thanks for spotting the typo.  Good find.  Indeed, I should have  
written:

ds <- getDataSet(plm)
[...]
d <- log2(extractMatrix(ds,cells=ind,verbose=verbose))

So, in my plot, I was plotting unnormalized raw intensity data, not  
the BG-adjusted quantile normalized, since my 'cs' was defined as:

cs <- AffymetrixCelSet$byName("tissues", cdf=cdf)


> I though, the idea of a probe level model was to somehow merge all  
> probe values to a single probeset-value. So afterwards you have one  
> summarized intensity for each probeset and each array. Following  
> this thought, there should be only one intensity-value for each  
> probeset in the plot. But your plot shows (1st. row) more than one  
> value per probeset (mostly 4, one value  per probe). So what exactly  
> did you plot there?


I'm plotting all probes.  Top plot is the raw data, lower plot is the  
residuals ... then all the gene annotation at the bottom.

> In my plot above I used the values from "celsetN <-  
> process(QuantileNormalization(celsetBC, typesToUpdate="pm"))" and  
> expected to plot the normalized intensities... Did I get i wrong?
> I have my plot and the corresponding code attached.

Your 'plotdata1' is the normalized data.  Your 'plotdata2' and  
'plotdata3' are the chip effects from probeset-level summaries (PLM  
and FIRMA, respectively).  Therefore, the nProbes element of 'exon2'  
and 'exon3' doesn't actually match the data, does it?

What is the result of:

nrow(plotdata2$intensities) == sum(plotdata2$probesetdata[,4])


In my example:

 > nrow(d) == sum(as.numeric(nProbes))
[1] TRUE

Is that a possible source of the problem?


> Right now I'm pretty confused... I'm hoping for some enlightenment!
> Frank

Hopefully I haven't confused you more.  There are a lot of questions/ 
intricacies here!

Good luck.
Mark



> P.S.: Right now (2009-03-05, around 3 GMT+1) I can't connect to  
> Ensembl through Biomart:
> > library(GenomeGraphs)
> > mart = useMart("ensembl", dataset = "hsapiens_gene_ensembl")
> Opening and ending tag mismatch: meta line 3 and body
> Premature end of data in tag html line 1
> Fehler: 1: Opening and ending tag mismatch: meta line 3 and body
> 2: Premature end of data in tag html line 1
>
> Hopefully this is just a temporary error... Does anybody have the  
> same problem?
>
> >
> <GenomeGraphs_ENSG00000060237.pdf><GenomeGraphs_ENSG00000060237.R>
>

------------------------------
Mark Robinson
Epigenetics Laboratory, Garvan
Bioinformatics Division, WEHI
e: m.robin...@garvan.org.au
e: mrobin...@wehi.edu.au
p: +61 (0)3 9345 2628
f: +61 (0)3 9347 0852
------------------------------





--~--~---------~--~----~------------~-------~--~----~
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
"aroma.affymetrix" group.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe from this group, send email to 
aroma-affymetrix-unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/aroma-affymetrix?hl=en
-~----------~----~----~----~------~----~------~--~---

[aroma.affymetrix] Re: Using GenomeGraphs with FIRMA (Was: FIRMA score)

Reply via email to