Hi Frank. Comments below.
On 05/03/2009, at 12:34 PM, Frank Ziegner wrote: > Hi Mark, > > I have some questions regarding you plot and some more general > questions regarding probesets/exons. > > I assume your plot shows the WNK1 (ENSG00000060237) gene. > > When doing: > u<-which(getUnitNames(cdf)=="ENSG00000060237") #cdf is the Ensembl- > cdf > ugcM <- getUnitGroupCellMap(cdf, units=u, retNames=TRUE) > nProbes <- table(ugcM$group) > length(nProbes) > I got 58 probesets. But you plot does only shows around 52 > probesets. Did you leave any probesets out (can't see where...)? We are using different CDFs. I am using the "coreR3,A20071112,EP" one, not the Ensembl one, which is the source of the difference. > Your plot shows four transcripts (ENST00000315939, ENST00000252477, > ENST00000386624, ENST00000408512). But Ensemble says, that there > only three transcripts (ENST00000315939, ENST00000252477, > ENST00000340908) belong to this gene. So you missed one (the last) > transcript of WNK1 and added to more transcripts: > ENST00000408512 and ENST00000386624. I think it has something to do > with the way you selected the transcript: > "tr<-new("Transcript",id=unique(g...@ens[,"ensembl_gene_id"]),..." > What was the idea behind? Biomart seems to be down at the moment, so I can't track this down, but probably this has to do with the previous step, the "GeneRegion". The command was: gr <- new("GeneRegion", chromosome=ch, start=min(psCsv[m,"start"]), end = max(psCsv[m,"stop"]), strand = psCsv[f,"strand"], biomart = mart) and this just searches Ensembl for "genes" in that region. Perhaps we should be more restrictive in this search. ENST00000408512 is an snRNA gene, ENST00000386624 is an snRNA_pseudogene. According to: http://www.ensembl.org/Homo_sapiens/Gene/Summary?g=ENSG00000060237 ENSG00000060237 has 3 transcripts. > But my more important question is: Ensembl tells me, that WNK1 has > 29 Exons (like also shown in your plot, if one counts very carefully > in the first two transcripts). And the transcripts ENST00000315939, > ENST00000252477, ENST00000340908 each have 28, 26 and 25 Exons, > respectively. My memory on the basics (How the HuEx was build and > annotated) is a bit weak, but I remember as a rule of thumb that a > probeset roughly equals an exon. Obviously this rule does not apply > here! A probeset is for a probe selection region (PSR). You can have 1 PSR per exon or many. It depends on the Affy rules and the mass of annotation projected onto the genome. > Maybe we have to be a bit more precise. I think one can say: Every > probeset matches to only one Exon. But different probesets can match > to the same exon. Is this statement right? Yes, I agree. > In your plot one could assume, that the first four probesets match > to the first exon. But that arises the question: Shouldn't then the > expression values for these probesets be identical? Wait, answer to > myself: different probesets have different probe affinities and so > they have different intensities. Is this the reason? Yes, different PROBEs have different affinities, but indeed this carries over to PROBESETs. Gene-level summaries are generally not comparable from gene-to-gene (although people sometimes do it, at least indirectly), only sample-to-sample, so why would probeset-level summaries be? > Two more questions: > In GenomeGraphs one can plot the names of the probesets with exon > = new("ExonArray", ...,DisplayPars=(...,displayProbesets=TRUE)). By > default this plots the affymetrix probeset_ids. But I would like to > plot the Ensembl-names of the Exons to which the probesets match > (e.g. ENSE00000437674). I guess I have to replace the "probeset_id"s > with the Ensembl-name of the corresponding exon. Somehow this must > be possible since the "plotMap=TRUE"-Options provides a mapping > between probeset and exon. But I don't know, where to find this > mapping. (Or is this mapping just done by coordinates?) Got anybody > ideas? After all I also want to map a possible probeset (obtained > the probeset_id with limma) to an exon, not only to a gene... If you want to plot the Ensembl exon identifiers, I suspect it'll be tricky. For example: > g...@ens[,c("ensembl_exon_id","exon_chrom_start","exon_chrom_end")] [1:2,] ensembl_exon_id exon_chrom_start exon_chrom_end 1 ENSE00001565147 760558 760626 2 ENSE00001501630 760559 760687 You have 2 exons that cover the same area of the genome (i.e. overlap). You don't have 2 PSRs that cover the same region of the genome (i.e. no overlap). Therefore, IMHO, its easier to deal with PSRs. If you did want to do this, you have 2 options, as I see it. First (easier) is take the PSR identifier and replace it with some Ensembl gene identifer. Due to the above problem, you may have some choices to make about which Ensembl identifier you give to a certain PSR. Second (harder) is to create a CDF file where the "groups" are labelled with Ensembl identifiers. > In your example plot your first row shows the data "ds <- > getDataSet(plm)" (I think there is a typo somewhere ds<->cs). What > exactly is this data? Thanks for spotting the typo. Good find. Indeed, I should have written: ds <- getDataSet(plm) [...] d <- log2(extractMatrix(ds,cells=ind,verbose=verbose)) So, in my plot, I was plotting unnormalized raw intensity data, not the BG-adjusted quantile normalized, since my 'cs' was defined as: cs <- AffymetrixCelSet$byName("tissues", cdf=cdf) > I though, the idea of a probe level model was to somehow merge all > probe values to a single probeset-value. So afterwards you have one > summarized intensity for each probeset and each array. Following > this thought, there should be only one intensity-value for each > probeset in the plot. But your plot shows (1st. row) more than one > value per probeset (mostly 4, one value per probe). So what exactly > did you plot there? I'm plotting all probes. Top plot is the raw data, lower plot is the residuals ... then all the gene annotation at the bottom. > In my plot above I used the values from "celsetN <- > process(QuantileNormalization(celsetBC, typesToUpdate="pm"))" and > expected to plot the normalized intensities... Did I get i wrong? > I have my plot and the corresponding code attached. Your 'plotdata1' is the normalized data. Your 'plotdata2' and 'plotdata3' are the chip effects from probeset-level summaries (PLM and FIRMA, respectively). Therefore, the nProbes element of 'exon2' and 'exon3' doesn't actually match the data, does it? What is the result of: nrow(plotdata2$intensities) == sum(plotdata2$probesetdata[,4]) In my example: > nrow(d) == sum(as.numeric(nProbes)) [1] TRUE Is that a possible source of the problem? > Right now I'm pretty confused... I'm hoping for some enlightenment! > Frank Hopefully I haven't confused you more. There are a lot of questions/ intricacies here! Good luck. Mark > P.S.: Right now (2009-03-05, around 3 GMT+1) I can't connect to > Ensembl through Biomart: > > library(GenomeGraphs) > > mart = useMart("ensembl", dataset = "hsapiens_gene_ensembl") > Opening and ending tag mismatch: meta line 3 and body > Premature end of data in tag html line 1 > Fehler: 1: Opening and ending tag mismatch: meta line 3 and body > 2: Premature end of data in tag html line 1 > > Hopefully this is just a temporary error... Does anybody have the > same problem? > > > > <GenomeGraphs_ENSG00000060237.pdf><GenomeGraphs_ENSG00000060237.R> > ------------------------------ Mark Robinson Epigenetics Laboratory, Garvan Bioinformatics Division, WEHI e: m.robin...@garvan.org.au e: mrobin...@wehi.edu.au p: +61 (0)3 9345 2628 f: +61 (0)3 9347 0852 ------------------------------ --~--~---------~--~----~------------~-------~--~----~ When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups "aroma.affymetrix" group. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe from this group, send email to aroma-affymetrix-unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/aroma-affymetrix?hl=en -~----------~----~----~----~------~----~------~--~---