Re: [aroma.affymetrix] mufColumns on genes with more than 500 probes across the gene
Hi Iain, Can you send me a reproducible example, plus the output of sessionInfo() and traceback() after the error, please? It'll probably be too large for email, so maybe use Dropbox or dropsend .. My guess is that the units with 500 probes actually have NAs. Maybe you can check this also beforehand? Best, Mark On 04.06.2015, at 16:07, Iain iaingallag...@gmail.com wrote: Hi We're trying to calculate the difference in residuals between two groups and generate mufScores (FIRMAGene) on those residual differences. Calculating the differences goes smoothly: rsu_diff - lapply(unlist(rsu, use.names=FALSE), byrow=FALSE, ncol=length(unique(cls))) where cls is a grouping variable (e.g. cls - c('A', 'A, 'B', 'B') Next we apply the mufColumns function to the elements of the rsu_diff object. mufScores - lapply(rsu_diff[w], FUN = function(u), c(mufColumns(u))) w is an indicator variable that only keeps genes (this is gene level analysis) if the gene has more than 3 probes and less than some other number. If we set the upper limit of w to 500 we get the following error. Error in mufC(x): Nan/NA/Inf in foreign function call Having looked at the C code for the relevant function (mps.c) I can only see one line where that could cause this: x[count]=sum/sqrt(j-i+1.); Can anyone shed any light on why we can't run mufColumns if we select genes with more than 500 probes over the gene? Thanks, Iain -- -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group with website http://www.aroma-project.org/. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe and other options, go to http://www.aroma-project.org/forum/ --- You received this message because you are subscribed to the Google Groups aroma.affymetrix group. To unsubscribe from this group and stop receiving emails from it, send an email to aroma-affymetrix+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout. -- Prof. Dr. Mark Robinson Statistical Bioinformatics Group, UZH http://ow.ly/riRea -- -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group with website http://www.aroma-project.org/. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe and other options, go to http://www.aroma-project.org/forum/ --- You received this message because you are subscribed to the Google Groups aroma.affymetrix group. To unsubscribe from this group and stop receiving emails from it, send an email to aroma-affymetrix+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: [aroma.affymetrix] which version cdf should used when apply FIRMAGene
Hi Zaiwei, On Sun, Dec 16, 2012 at 2:22 AM, zhouzaiwei zhouzai...@163.com wrote: Hi , I want to apply FIRMAGene to analysis differential splicing events of hugene_1.0_st array and have red article (Robinson Speed, 2007) and script(http://bioinf.wehi.edu.au/folders/firmagene/sup3_04feb2010.R),I want to know which version of cdf file should be used?HuGene-1_0-st-v1,r3.cdf or HuGene-1_0-st-v1,Ensembl,exon.cdf or something else? You can use either CDF. The latter was used for the BMC Bfx paper, since we used Ensembl annotation. If you want to use the former, you would want to use the Affymetrix identifiers. Other CDFs are possible … as we say in the paper: To facilitate alternative splicing analysis, probe collections are organized in a gene-centric fashion, so that probes from all known isoforms for a gene can be analyzed by a single framework (i.e. fit with the RMA model). So, it just requires the CDF to be organized correctly. Hope that helps. Best regards, Mark -- View this message in context: http://aroma-affymetrix.967894.n3.nabble.com/which-version-cdf-should-used-when-apply-FIRMAGene-tp4024986.html Sent from the aroma.affymetrix mailing list archive at Nabble.com. -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group with website http://www.aroma-project.org/. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe and other options, go to http://www.aroma-project.org/forum/ -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group with website http://www.aroma-project.org/. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe and other options, go to http://www.aroma-project.org/forum/
[aroma.affymetrix] Re: How to extract raw probe intensity from .CEL file
Perhaps something like this is what you want (note: different chip to what you are using)? df - readDataFrame(getCdf(cs), verbose=-80) [...snip...] head(df) unit unitName unitType unitDirection unitNbrOfAtoms group groupName 11 7892501 expression sense 4 1 7892501 21 7892501 expression sense 4 1 7892501 31 7892501 expression sense 4 1 7892501 41 7892501 expression sense 4 1 7892501 52 7892502 expression sense 4 1 7892502 62 7892502 expression sense 4 1 7892502 groupDirection groupNbrOfAtomscell x y pbase tbase indexPos atom 1 sense 4 116371 870 110 C G 00 2 sense 4 943979 28 899 A T 11 3 sense 4 493089 638 469 T A 22 4 sense 4 907039 888 863 A T 33 5 sense 4 1033309 108 984 T A 00 6 sense 4 653512 411 622 T A 11 I'm not sure what object you have in mind when it comes to a probe- intensity pair, but this should give you all the info you might want (e.g. cell index, x/y physical location). HTH, Mark On Aug 9, 5:45 pm, Pierre Neuvial pie...@stat.berkeley.edu wrote: Hi, Have you tried using extractAffyBatch, which is documented here:http://aroma-project.org/howtos/extractAffyBatch? As far as I understand you will need the Bioconductor annotation package corresponding to your chip type to be installed, ie source(http://www.bioconductor.org/biocLite.R;) biocLite(hgu133plus2cdf) This is discussed in this thread:http://groups.google.com/group/aroma-affymetrix/browse_thread/thread/... Pierre On Tue, Aug 9, 2011 at 4:34 AM, hsingjun cheung hsingjun.ch...@gmail.com wrote: Hi Pierre: Thanks. These functions work now. Do you know how to extract the raw intensity for each probe ? On Aug 8, 5:48 pm, Pierre Neuvial pie...@stat.berkeley.edu wrote: Hi, The 'annotationData' directory should be directly in your working directory, as explained in the page Setup: Location of annotation data files:http://aroma-project.org/node/66 In your case, you need to change the current directory to ~/experiment/ by setwd(~/experiment/) (or by starting your R session from this directory). Then your command csR - AffymetrixCelSet$byName(KN01M013,chipType=HG-U133_Plus_2) should work. Best, Pierre On Mon, Aug 8, 2011 at 5:29 PM, hsingjun cheung hsingjun.ch...@gmail.com wrote: Hello: I searched the group but got no results ... So I want to know, how to extract the raw probe intensity from .CEL file? The file structure on my computer is like: ~/experiemnt/ annotationData/ chipTypes/ HG-U133_Plus_2/ HG-U133_Plus_2.cdf ~/experiment/ rawData/ KN01M013/ HG-U133_Plus_2/ KN01M013.CEL The .cdf file is downloaded fromhttp://www.aroma-project.org/chipTypes/HG-U133_Plus_2 When I run R under ~ directory: library(aroma.affymetrix) csR - AffymetrixCelSet$byName(KN01M013,chipType=HG-U133_Plus_2) I got error msg: Error in list(`AffymetrixCelSet$byName(KN01M013, chipType = HG- U133_Plus_2)` = environment, : [2011-08-08 11:24:05] Exception: Could not locate a file for this chip type: HG-U133_Plus_2 at throw(Exception(...)) at throw.default(Could not locate a file for this chip type: , paste(c(chipT at throw(Could not locate a file for this chip type: , paste(c(chipType, tag at method(static, ...) at AffymetrixCdfFile$byChipType(chipType) at method(static, ...) at AffymetrixCelSet$byName(KN01M013, chipType = HG-U133_Plus_2) Could anyone help me figure how this error happened ? And how to do it ( extract raw probe intensity ) in a right way ? Thanks -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group with websitehttp://www.aroma-project.org/. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe and other options, go tohttp://www.aroma-project.org/forum/ -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google
Re: [aroma.affymetrix] Re: How to extract raw probe intensity from .CEL file
How about grabbing the intensities according to their index: raw=extractMatrix(cs,cells=df$cell,verbose=verbose) Then you'll have them matched up to the 'df' data.frame. (Different numbers for your chip, of course) dim(df) [1] 844550 16 dim(raw) [1] 844550 33 Mark On Aug 10, 2011, at 12:31 AM, hsingjun cheung wrote: Hi Mark: My idea is how we could know the intensity for each probe ? Using these command: library(aroma.affymetrix) cs - AffymetrixCelSet$byName(KN01M013, chipType=HG-U133_Plus_2) raw=extractMatrix(cs,verbose=verbose) I can see 'raw' is a list of intensities, but I don't know which probe ids they correspond to. Hope this clarifies. Thanks On Aug 9, 6:03 am, Mark Robinson markrobinson@gmail.com wrote: Perhaps something like this is what you want (note: different chip to what you are using)? df - readDataFrame(getCdf(cs), verbose=-80) [...snip...] head(df) unit unitName unitType unitDirection unitNbrOfAtoms group groupName 11 7892501 expression sense 4 1 7892501 21 7892501 expression sense 4 1 7892501 31 7892501 expression sense 4 1 7892501 41 7892501 expression sense 4 1 7892501 52 7892502 expression sense 4 1 7892502 62 7892502 expression sense 4 1 7892502 groupDirection groupNbrOfAtomscell x y pbase tbase indexPos atom 1 sense 4 116371 870 110 C G 00 2 sense 4 943979 28 899 A T 11 3 sense 4 493089 638 469 T A 22 4 sense 4 907039 888 863 A T 33 5 sense 4 1033309 108 984 T A 00 6 sense 4 653512 411 622 T A 11 I'm not sure what object you have in mind when it comes to a probe- intensity pair, but this should give you all the info you might want (e.g. cell index, x/y physical location). HTH, Mark On Aug 9, 5:45 pm, Pierre Neuvial pie...@stat.berkeley.edu wrote: Hi, Have you tried using extractAffyBatch, which is documented here:http://aroma-project.org/howtos/extractAffyBatch? As far as I understand you will need the Bioconductor annotation package corresponding to your chip type to be installed, ie source(http://www.bioconductor.org/biocLite.R;) biocLite(hgu133plus2cdf) This is discussed in this thread:http://groups.google.com/group/aroma-affymetrix/browse_thread/thread/... Pierre On Tue, Aug 9, 2011 at 4:34 AM, hsingjun cheung hsingjun.ch...@gmail.com wrote: Hi Pierre: Thanks. These functions work now. Do you know how to extract the raw intensity for each probe ? On Aug 8, 5:48 pm, Pierre Neuvial pie...@stat.berkeley.edu wrote: Hi, The 'annotationData' directory should be directly in your working directory, as explained in the page Setup: Location of annotation data files:http://aroma-project.org/node/66 In your case, you need to change the current directory to ~/experiment/ by setwd(~/experiment/) (or by starting your R session from this directory). Then your command csR - AffymetrixCelSet$byName(KN01M013,chipType=HG-U133_Plus_2) should work. Best, Pierre On Mon, Aug 8, 2011 at 5:29 PM, hsingjun cheung hsingjun.ch...@gmail.com wrote: Hello: I searched the group but got no results ... So I want to know, how to extract the raw probe intensity from .CEL file? The file structure on my computer is like: ~/experiemnt/ annotationData/ chipTypes/ HG-U133_Plus_2/ HG-U133_Plus_2.cdf ~/experiment/ rawData/ KN01M013/ HG-U133_Plus_2/ KN01M013.CEL The .cdf file is downloaded fromhttp://www.aroma-project.org/chipTypes/HG-U133_Plus_2 When I run R under ~ directory: library(aroma.affymetrix) csR - AffymetrixCelSet$byName(KN01M013,chipType=HG-U133_Plus_2) I got error msg: Error in list(`AffymetrixCelSet$byName(KN01M013, chipType = HG- U133_Plus_2)` = environment, : [2011-08-08 11:24:05] Exception: Could not locate a file for this chip type: HG-U133_Plus_2 at throw(Exception(...)) at throw.default(Could not locate a file for this chip type: , paste(c(chipT at throw(Could not locate a file for this chip type: , paste(c(chipType, tag at method(static, ...) at AffymetrixCdfFile$byChipType(chipType) at method(static, ...) at AffymetrixCelSet$byName(KN01M013, chipType = HG-U133_Plus_2) Could anyone help me figure how this error happened ? And how to do it ( extract raw probe intensity ) in a right way ? Thanks -- When reporting problems on aroma.affymetrix, make sure 1
[aroma.affymetrix] Re: Question about Firma
Hi Florence, I've copied the aroma.affymetrix mailing list, just in case others have extra comments. On Jun 25, 2011, at 1:05 AM, Florence Jaffrezic wrote: Dear Professor Robinson, I am a French researcher working near Paris, and was trying to run an analysis with Firma to detect alternative splicing. I wanted to re-analyze the colon cancer data available on the affymetrix website for the Human exon chip, and used the R code provided in the Firma vignette. I ran the FirmaModel on the plmEx object as shown below: plmEx - ExonRmaPlm(csN,mergeGroups=FALSE) fit(plmEx, verbose=verbose) firma - FirmaModel(plmEx) fit(firma, verbose=verbose) fs - getFirmaScores(firma) scores=extractDataFrame(fs) I then obtain one score for each exon and each chip. I have a few questions: 1) First, there are 10 biological replicates in each condition. How should I combine the Firma scores obtained for each replicate ? You could take the average, or perhaps a more robust median. 2) For the detection of alternative splicing, I saw in the literature that we should take the log2 value of the scores fsScores - log2(extractDataFrame(fs)), and that large negative values will indicate exon skipping. Is this correct ? Is there a cut-off value that can be used for these scores to detect alternative splicing ? One thing: I see it more as detection of 'differential' splicing, i.e. different experimental conditions express transcripts differently. But, yes, this is correct, you are looking for 'extreme values' and negative ones may indicate exon skipping. As far as I know, we never set cutoffs because assigning estimated FDRs to them is non-trivial. Regards, Mark Thank you very much in advance for your help, Florence --- Dr Florence Jaffrezic INRA Bat 211 78352 Jouy-en-Josas Cedex France Tel: (+33) 1 34 65 21 94 Fax: (+33) 1 34 65 22 10 --- Firma_Exon_question.r -- Mark Robinson, PhD (Melb) Epigenetics Laboratory, Garvan Bioinformatics Division, WEHI e: mrobin...@wehi.edu.au e: m.robin...@garvan.org.au p: +61 (0)3 9345 2628 f: +61 (0)3 9347 0852 -- __ The information in this email is confidential and intended solely for the addressee. You must not disclose, forward, print or use it without the permission of the sender. __ -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group with website http://www.aroma-project.org/. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe and other options, go to http://www.aroma-project.org/forum/
Re: [aroma.affymetrix] Re: gcRMA for Gene ST Arrays
Hi Setsuko, I haven't looked at this in a long time, and I can't seem to find the CDF file that I used locally on my computer. The information regarding the antigenomic probesets should be readily available in the files you get from Affymetrix though, if you want to custom build a CDF file. To be honest, when I did this 2 years ago, there was no official GCRMA implementation for Gene 1.0 ST arrays. Maybe Bioconductor has this now for these arrays? I've cc'd the aroma.affymetrix mailing list in the hope that someone has a better solution for you. Cheers, Mark On Jun 21, 2011, at 8:28 AM, Setsuko Sahara wrote: Dear Mark I found your previous e-mail regarding an application of gcrma to data for Gene ST arrays. Do you happen to have a chance to provide us of antigenomic probesets somewhere? Or do you recommend to try your previous scripts? Sincerely, setsu Mark Robinson Sun, 29 Mar 2009 23:12:09 -0700 Hi Mario. I have made some modifications to the reading of probe_tab files and to the computing of affinities so that this procedure can run now, either as you have done below by choosing lowly expressed probes, or (perhaps preferably) by using the 'antigenomic' probes on the array: library(aroma.affymetrix) verbose - Arguments$getVerbose(-30); timestampOn(verbose) cdf-AffymetrixCdfFile$fromChipType(HuGene-1_0-st- v1,verbose=verbose,tags=PD) cs-AffymetrixCelSet$fromName(tissues,cdf=cdf,verbose=verbose) bcGc - GcRmaBackgroundCorrection(cs, type=affinities,indicesNegativeControl = negativeControlIndices) csGBC - process(bcGc,verbose=verbose) controlIndices - which(!isPm(cdf)) bc - GcRmaBackgroundCorrection (cs,type=affinities,indicesNegativeControl=controlIndices) csBC - process(bc,verbose=verbose,force=TRUE) I needed to make a CDF file that contained these antigenomic probesets as they are not present in the binary-converted CDF files I created previously. I will make available these CDFs once I can test everything. Unfortunately, I do not have a good way of testing that my modifications are doing exactly the right thing, as I am also not intimately familiar with the gcrma model/code. To be honest, I don't know of anyone that has successfully run gcrma on these chips or Exon ST chips, Bioconductor or otherwise. Do you? If so, please let me know. These changes can be made available in the next release or possibly earlier with a patch, but I just want to test the changes first. Cheers, Mark Setsuko Sahara -- Mark Robinson, PhD (Melb) Epigenetics Laboratory, Garvan Bioinformatics Division, WEHI e: mrobin...@wehi.edu.au e: m.robin...@garvan.org.au p: +61 (0)3 9345 2628 f: +61 (0)3 9347 0852 -- __ The information in this email is confidential and intended solely for the addressee. You must not disclose, forward, print or use it without the permission of the sender. __ -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group with website http://www.aroma-project.org/. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe and other options, go to http://www.aroma-project.org/forum/
Re: [aroma.affymetrix] Extract raw data for core transcripts from EXON arrays
Hi Anbarasu. To extract just the raw probe intensities (before BG adjustment or normalization), how about something like: cdf - AffymetrixCdfFile$byChipType(chipType, tags=coreR3,A20071112,EP) cs - AffymetrixCelSet$byName(tissues, cdf=cdf) u - 1:nbrOfUnits(cdf) u - 1:10 # could use line above, but use subset to make it quick ugcM - getUnitGroupCellMap(cdf, units=u, retNames=TRUE) d - extractMatrix(cs, cells=ugcM$cell) rownames(d) - paste(ugcM$unit, ugcM$group, sep=.) There are of course other possibilities for the rownames if you chose, but here is what it would give: rownames(d) - paste(ugcM$unit, ugcM$group, sep=.) head(d) huex_wta_spleen_A huex_wta_spleen_B huex_wta_spleen_C 2315251.2315252494630 2315251.2315252414837 2315251.2315252684640 2315251.2315252463932 2315251.2315253342631 2315251.2315253303028 huex_wta_testes_A huex_wta_testes_B huex_wta_testes_C 2315251.2315252 131 111 156 2315251.23152529681 153 2315251.2315252 138 102 146 2315251.2315252724956 2315251.2315253453348 2315251.2315253303931 huex_wta_thyroid_A huex_wta_thyroid_B huex_wta_thyroid_C 2315251.2315252 31 57 70 2315251.2315252 49 40 63 2315251.2315252 33 54 61 2315251.2315252 30 53 46 2315251.2315253 29 53 35 2315251.2315253 26 38 39 Hope that helps. Mark On 2011-02-09, at 3:38 AM, Anbarasu L A wrote: Hi All, I have been looking at extracting raw data for core transcripts from HuEx-1_0-st-v2 chip type. I have downloaded the custom CDF file provided in http://www.aroma-project.org/node/122. chipType - HuEx-1_0-st-v2 cdf - AffymetrixCdfFile$byChipType(chipType, tags=core) print(cdf) AffymetrixCdfFile: Path: annotationData/chipTypes/HuEx-1_0-st-v2 Filename: HuEx-1_0-st-v2,core.cdf Filesize: 32.00MB Chip type: HuEx-1_0-st-v2,core RAM: 0.00MB File format: v4 (binary; XDA) Dimension: 2560x2560 Number of cells: 6553600 Number of units: 22010 Cells per unit: 297.76 Number of QC units: 0 How can I access these 22010 units (transcripts) and extract un normalized intensity values? If I use: getCellIndices(cdf, unlist=TRUE, useNames=FALSE), I am getting intensity data for 893395 probes. Thanks in advance. Best regards, Anbarasu -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group with website http://www.aroma-project.org/. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe and other options, go to http://www.aroma-project.org/forum/ -- Mark Robinson, PhD (Melb) Epigenetics Laboratory, Garvan Bioinformatics Division, WEHI e: mrobin...@wehi.edu.au e: m.robin...@garvan.org.au p: +61 (0)3 9345 2628 f: +61 (0)3 9347 0852 -- __ The information in this email is confidential and intended solely for the addressee. You must not disclose, forward, print or use it without the permission of the sender. __ -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group with website http://www.aroma-project.org/. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe and other options, go to http://www.aroma-project.org/forum/
Re: [aroma.affymetrix] exon array analysis
Hi Kripa. Have a look at the Affymetrix website: http://www.affymetrix.com/estore/browse/products.jsp?productId=131452categoryId=35676productName=GeneChip-Human-Exon-1.0-ST-Array#1_3 Click on Technical Documentation and the file you are probably after is HuEx-1_0-st-v2 Transcript Cluster Annotations, CSV, Release 31 (25 MB 08/27/10): http://www.affymetrix.com/Auth/analysis/downloads/na31/wtexon/HuEx-1_0-st-v2.na31.hg19.transcript.csv.zip The 'unitName' below is a transcript cluster id. You may have to parse the other columns to extract the gene identifiers/symbols. There are of course other ways to do this. For example, you may be able to use the 'exonmap' R/Bioconductor package. Hope that helps. Mark On 2011-01-24, at 3:29 PM, kripa raman wrote: Hi, I'm very new to microarray analysis and I fear I'm in too deep by starting with the HuEx1_0-st-v2 chip, especially since no one in my building seems to have conducted this analysis! Experiment currently: 2 chips have been analyzed and have had the same treatment, I'm looking to confirm that the genes/ exons are the same for both chips (ideally by identifying that the top 100 genes are identical) The issue I'm having is converting the unitName and groupName, currently seen in the trFit table, into meaningful gene ID. I'm under the impression that I should be connecting this with the csv file but I'm not sure how to go about doing this. Any help would be greatly appreciated! -Kripa Code thus far: chipType-HuEx-1_0-st-v2 cdf-AffymetrixCdfFile$ byChipType(chipType, tags=coreR2,A20070914,EP) ##set cdf: Core probesets: 18,708 units/transcript clusters, 284,258 groups/probesets, and 1,082,385 probes cs-AffymetrixCelSet$byName(control, cdf=cdf) ##set cel group: 0035 and 0028 bc - RmaBackgroundCorrection(cs, tag=coreR2) ##background correction csBC - process(bc,verbose=verbose) qn - QuantileNormalization(csBC, typesToUpdate=pm) ##normalization csN - process(qn, verbose=verbose) MNorm-extractMatrix(csN) plmTr - ExonRmaPlm(csN, mergeGroups=TRUE) ##summarize fit(plmTr, verbose=verbose) qamTr - QualityAssessmentModel(plmTr) ##quality assessment plotNuse(qamTr) plotRle(qamTr) cesTr - getChipEffectSet(plmTr) trFit - extractDataFrame(cesTr, units=1:3, addNames=TRUE) MSumm-extractMatrix(cesTr) #result (how do i go about changing this unitName and groupName) unitName groupName unit group cellHuEx1_0028 HuEx1_0035 1 2315373 23153741 11 6.7226334.735021 2 2315554 23155862 15 9.4221649.943003 3 2315633 23156383 320 6.1466156.00318 -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group with website http://www.aroma-project.org/. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe and other options, go to http://www.aroma-project.org/forum/ -- Mark Robinson, PhD (Melb) Epigenetics Laboratory, Garvan Bioinformatics Division, WEHI e: mrobin...@wehi.edu.au e: m.robin...@garvan.org.au p: +61 (0)3 9345 2628 f: +61 (0)3 9347 0852 -- __ The information in this email is confidential and intended solely for the addressee. You must not disclose, forward, print or use it without the permission of the sender. __ -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group with website http://www.aroma-project.org/. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe and other options, go to http://www.aroma-project.org/forum/
Re: [aroma.affymetrix] Firma score help
Hi Sabrina. Sorry for the slow reply. Basically, there isn't a lot of precedent for this, but I would suggest using all 16 arrays to define the FIRMA scores. Then use limma on those to look for your differences of interest. My reasoning is that this will allow the data from more arrays to be used for the estimation of probe effects. Hope that helps. Mark On 2011-01-04, at 7:29 PM, sabrina wrote: Hello, all: I have 16 exon arrays from 4 groups, A1, A2 and B1, B2. As are genetically identical but with different treatment,Bs are genetically identical (but different from As) with different treatment 1 and 2. I am interested in finding alternative splicing events that was affected by treatment on A and B and also by genetics (under same treatement). Therefore, the comparison I am interested are A1 vs. A2, B1vsB2, A1 vs B1 (only treatment 1) . My question is, when I do RMA and calculate the FIRMA score, do I use all 16 array I have to get Firma scores, then use limma as suggested before to apply to design matrix and contrast matrix? Or should I for each comparison , do RMA and FIRMA score for exon arrays only involved in that comparison? Thanks for your input! Sabrina -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group with website http://www.aroma-project.org/. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe and other options, go to http://www.aroma-project.org/forum/ -- Mark Robinson, PhD (Melb) Epigenetics Laboratory, Garvan Bioinformatics Division, WEHI e: mrobin...@wehi.edu.au e: m.robin...@garvan.org.au p: +61 (0)3 9345 2628 f: +61 (0)3 9347 0852 -- __ The information in this email is confidential and intended solely for the addressee. You must not disclose, forward, print or use it without the permission of the sender. __ -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group with website http://www.aroma-project.org/. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe and other options, go to http://www.aroma-project.org/forum/
Re: [aroma.affymetrix] Installing FIRMAGene
that you have an appropriate version of Dev Tools: http://developer.apple.com/technologies/tools/ Hope that helps. Mark On 2010-10-27, at 1:52 PM, Jon Tang wrote: Hi, I'm having problems installing FIRMAGene on my computer either with the install.packages command in R or with the R Package Installer. I keep getting the warning message that FIRMAGene is not available. Is there another location to get this package? Thanks. R version 2.12.0 (2010-10-15) Platform: i386-apple-darwin9.8.0/i386 (32-bit) When I try to install by using the install.packages command, it says the package is unavailable: install.packages(FIRMAGene, repos=http://R-Forge.R-project.org;) Warning: unable to access index for repository http://R-Forge.R-project.org/bin/macosx/leopard/contrib/2.12 Warning message: In getDependencies(pkgs, dependencies, available, lib) : package ‘FIRMAGene’ is not available If I try to install the package using the R Package Installer, I get the following warnings: install.packages(FIRMAGene, repos=http://R-Forge.R-project.org;) Warning: unable to access index for repository http://R-Forge.R-project.org/bin/macosx/leopard/contrib/2.12 Warning message: In getDependencies(pkgs, dependencies, available, lib) : package ‘FIRMAGene’ is not available trying URL 'http://R-Forge.R-project.org/src/contrib/FIRMAGene_0.9.5.tar.gz' Content type 'application/x-gzip' length 9223 bytes opened URL == downloaded 9223 bytes * installing *source* package ‘FIRMAGene’ ... ** libs *** arch - i386 gcc -arch i386 -std=gnu99 -I/Library/Frameworks/R.framework/Resources/include -I/Library/Frameworks/R.framework/Resources/include/i386 -I/usr/local/include-fPIC -g -O2 -c init.c -o init.o gcc -arch i386 -std=gnu99 -I/Library/Frameworks/R.framework/Resources/include -I/Library/Frameworks/R.framework/Resources/include/i386 -I/usr/local/include-fPIC -g -O2 -c mps.c -o mps.o gcc -arch i386 -std=gnu99 -dynamiclib -Wl,-headerpad_max_install_names -undefined dynamic_lookup -single_module -multiply_defined suppress -L/usr/local/lib -o FIRMAGene.so init.o mps.o -F/Library/Frameworks/R.framework/.. -framework R -Wl,-framework -Wl,CoreFoundation /usr/bin/libtool: for architecture ppc7400 object: /usr/lib/gcc/i686-apple-darwin8/4.0.1/../../../libSystem.dylib malformed object (unknown load command 7) /usr/bin/libtool: for architecture: (null) file: -lSystem is not an object file (not allowed in a library) /usr/bin/libtool: for architecture ppc64 object: /usr/lib/gcc/i686-apple-darwin8/4.0.1/../../../libSystem.dylib malformed object (unknown load command 7) make: *** [FIRMAGene.so] Error 1 ERROR: compilation failed for package ‘FIRMAGene’ * removing ‘/Library/Frameworks/R.framework/Versions/2.12/Resources/library/FIRMAGene’ The downloaded packages are in ‘/private/var/folders/pQ/pQBxScbYEACZOn2BER6xiE+++TI/-Tmp-/RtmpzEsz6S/downloaded_packages’ -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group with website http://www.aroma-project.org/. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe and other options, go to http://www.aroma-project.org/forum/ -- Mark Robinson, PhD (Melb) Epigenetics Laboratory, Garvan Bioinformatics Division, WEHI e: mrobin...@wehi.edu.au e: m.robin...@garvan.org.au p: +61 (0)3 9345 2628 f: +61 (0)3 9347 0852 -- __ The information in this email is confidential and intended solely for the addressee. You must not disclose, forward, print or use it without the permission of the sender. __ -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group with website http://www.aroma-project.org/. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe and other options, go to http://www.aroma-project.org/forum/
Re: [aroma.affymetrix] Missing file bpmapCluster2Cdf.R in Create a CDF from a BpMap
Apologies folks. Inadvertent file permissions change. It should work now. Cheers, Mark -- Forwarded message -- From: vegard vegard.nyga...@medisin.uio.no Date: Thu, Oct 21, 2010 at 2:55 AM Subject: [aroma.affymetrix] Missing file bpmapCluster2Cdf.R in Create a CDF from a BpMap To: aroma.affymetrix aroma-affymetrix@googlegroups.com Hi, I am trying to make a CDF as described in the page How to: Create a CDF (and associated) files from a BpMap file (tiling arrays) http://aroma-project.org/node/42 I am supposed to use the script bpmapCluster2Cdf.R, but the link is dead (forbidden) http://129.94.136.7/file_dump/mark/bpmapCluster2Cdf.R . I was not able to find the script or methods elsewhere. Can you help me? Best Regards Vegard Nygaard. -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group with website http://www.aroma-project.org/. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe and other options, go to http://www.aroma-project.org/forum/ -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group with website http://www.aroma-project.org/. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe and other options, go to http://www.aroma-project.org/forum/
[aroma.affymetrix] Re: run FirmaGene on exon array ?
Hi Qicheng. I don't think that particular CDF will work with FIRMAGene, since it is laid out in a list of lists (probes -- cells, for probe selection regions or groups -- are laid out within transcript clusters -- units). The cell/group/unit are CDF speak. Basically, in order for FIRMAGene to work (and note that I haven't run FIRMAGene on the Exon platform myself), you need a CDF file where all the probes (cells) are within 1 group ... AND, you need to ensure that the order of the probes is the order in which they map to the genome/transcript. This is what FIRMAGene assumes. I believe the CDF files created by brainarray: http://brainarray.mbni.med.umich.edu/Brainarray/Database/CustomCDF/CDF_download.asp ... are organized this way, but you'd have to verify that for yourself. Hope that gets you started. Cheers, Mark On 2010-08-27, at 6:36 AM, Qicheng Ma wrote: Hi Mark, Could you please tell me whether we can run FirmaGene (http://bioinf.wehi.edu.au/folders/firmagene/sup3_04feb2010.R) on human exon array using CDF file HuEx-1_0-st-v2,coreR3,A20071112,EP.cdf , since FirmaGene score would be more useful than Firma score from individual exons ? Thanks, Qicheng -- Mark Robinson, PhD (Melb) Epigenetics Laboratory, Garvan Bioinformatics Division, WEHI e: m.robin...@garvan.org.au e: mrobin...@wehi.edu.au p: +61 (0)3 9345 2628 f: +61 (0)3 9347 0852 -- __ The information in this email is confidential and intended solely for the addressee. You must not disclose, forward, print or use it without the permission of the sender. __ -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group with website http://www.aroma-project.org/. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe and other options, go to http://www.aroma-project.org/forum/
[aroma.affymetrix] Re: sorry, I can not reproduce the table 2 in FIRMAGene paper
Hi Qichengm. Wrong dataset. That table was created using the Affy tissue panel, not the Affy tissue mixture data. Cheers, Mark On 2010-08-10, at 6:41 AM, qichengm wrote: Hi, I download the recent version of sup3_04feb2010.r, and made minor changes to let it run, here is the difference: $diff sup3_04feb2010.r sup3_04feb2010.r.orig 11,13c11 #cdf-AffymetrixCdfFile$byChipType(HuGene-1_0-st- v1,verbose=verbose) chipType - HuGene-1_0-st-v1 cdf - AffymetrixCdfFile$byChipType(chipType, tags=r3) --- cdf-AffymetrixCdfFile$byChipType(HuGene-1_0-st-v1,verbose=verbose) 16c14 cs-AffymetrixCelSet $byName(TisMix_WTGene1C,cdf=cdf,verbose=verbose) --- cs-AffymetrixCelSet$byName(tissues,cdf=cdf,verbose=verbose) 35,39c33,34 #hgnetaffx - read.csv(HuGene-1_0-st- v1.na25.hg18.transcript.csv,sep=,,skip=19,header=TRUE,comment.char=,stringsAsFactors=FALSE) hgnetaffx - read.csv(annotationData/chipTypes/HuGene-1_0-st-v1/ HuGene-1_0-st- v1.na30.hg19.transcript.csv,sep=,,skip=19,header=TRUE,comment.char=,stringsAsFactors=FALSE) #probetab - read.table(HuGene-1_0-st- v1.probe.tab,sep=\t,header=TRUE,comment.char=,stringsAsFactors=FALSE) probetab - read.table(annotationData/chipTypes/HuGene-1_0-st-v1/ HuGene-1_0-st- v1.hg19.probe.tab,sep=\t,header=TRUE,comment.char=,stringsAsFactors=FALSE) --- hgnetaffx - read.csv(HuGene-1_0-st-v1.na25.hg18.transcript.csv,sep=,,skip=19,header=TRUE,comment.char=,stringsAsFactors=FALSE) probetab - read.table(HuGene-1_0-st-v1.probe.tab,sep=\t,header=TRUE,comment.char=,stringsAsFactors=FALSE) Top 20 hits are different from those in the FirmaGene paper ID,Sample,Score,Symbol 7934979,TisMix_mix9,49.0412085693221,ANKRD1 7987315,TisMix_mix9,48.5685310739171,ACTC1 8023889,TisMix_mix1,37.4399023091527,MBP 8060963,TisMix_mix1,36.0242843230823,SNAP25 7947099,TisMix_mix9,35.1698682900308,CSRP3 7912520,TisMix_mix9,34.2770659572940,NPPB 7929653,TisMix_mix9,31.4139588184231,ANKRD2 8096959,TisMix_mix1,30.9351264815591,ANK2 7912692,TisMix_mix9,30.7020787308514,HSPB7 7957338,TisMix_mix1,30.6749548684879,SYT1 8169061,TisMix_mix1,29.9989728237812,PLP1 8087925,TisMix_mix9,29.4092542289276,TNNC1 7930208,TisMix_mix1,29.002191135071,INA 8046062,TisMix_mix9,28.5683689224915,XIRP2 8109663,TisMix_mix1,28.3206965639246,GABRA1 7982018,TisMix_mix1,28.3138146218297,SNORD115-6 7924910,TisMix_mix9,28.2833712932546,ACTA1 7982090,TisMix_mix1,27.4880562975114,SNORD115-42 8103789,TisMix_mix1,27.399249112706,GPM6A 7982008,TisMix_mix1,25.0039468447955,SNORD115-1 Could you please tell me where I am wrong ? Thanks, Qichengm -- Mark Robinson, PhD (Melb) Epigenetics Laboratory, Garvan Bioinformatics Division, WEHI e: m.robin...@garvan.org.au e: mrobin...@wehi.edu.au p: +61 (0)3 9345 2628 f: +61 (0)3 9347 0852 -- __ The information in this email is confidential and intended solely for the addressee. You must not disclose, forward, print or use it without the permission of the sender. __ -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group with website http://www.aroma-project.org/. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe and other options, go to http://www.aroma-project.org/forum/
Re: [aroma.affymetrix] Firma Scores in the range of 700-1500
Hi Gaurav. Note that the FIRMA scores (and expression values, chip effects, etc.) are stored in the exponentiated (base 2) scale. So, take log2() of them: log2(1568) [1] 10.6 That's much more feasible. Cheers, Mark On 2010-07-22, at 12:10 AM, gaurav bhatti wrote: I wanted to reproduce the results of FIRMA paper for the tissue sample data set (exon array:HuEx-1_0-st-v2) . I used the ensebl cdf, HuEx-1_0-st-v2,U-Ensembl47,G-Affy which I think is the one the authors used. Here is the exact code that I used: library(aroma.affymetrix) verbose - Arguments$getVerbose(-8, timestamp=TRUE) chipType - HuEx-1_0-st-v2 # Getting annotation data files cdf - AffymetrixCdfFile$byChipType(chipType,tags=U-Ensembl47,G- Affy) # Defining CEL set cs - AffymetrixCelSet$byName(coloncancer, cdf=cdf) #Background Adjustment and Normalization bc - RmaBackgroundCorrection(cs, tag=ensemblcancer) csBC - process(bc,verbose=verbose) qn - QuantileNormalization(csBC, typesToUpdate=pm) csN - process(qn, verbose=verbose) #Summarization plmTr - ExonRmaPlm(csN, mergeGroups=TRUE) fit(plmTr, verbose=verbose) CesTr - getChipEffectSet(plmTr) trFit - extractDataFrame(CesTr,units=NULL,addNames=TRUE) #Alternative Splicing Analysis (FIRMA) firma - FirmaModel(plmTr) fit(firma, verbose=verbose) fs - getFirmaScores(firma) firma - extractDataFrame(fs,units=NULL,addNames=TRUE) rownames(firma) = firma$groupName I am getting some FIRMA values as high as 1568 ( for UNR: 2429323). Is that even feasible ? Gaurav Bhatti -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group with website http://www.aroma-project.org/. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe and other options, go to http://www.aroma-project.org/forum/ -- Mark Robinson, PhD (Melb) Epigenetics Laboratory, Garvan Bioinformatics Division, WEHI e: m.robin...@garvan.org.au e: mrobin...@wehi.edu.au p: +61 (0)3 9345 2628 f: +61 (0)3 9347 0852 -- __ The information in this email is confidential and intended solely for the addressee. You must not disclose, forward, print or use it without the permission of the sender. __ -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group with website http://www.aroma-project.org/. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe and other options, go to http://www.aroma-project.org/forum/
Re: [aroma.affymetrix] FIRMA and GenomeGraphs
Hi Lara. Some comments below. On 2010-05-14, at 12:48 AM, Lara wrote: Dear all, first of all congratulations for aroma, I've been using it for a while and it saved me for the first analysis of exon arrays back in 2008. Since then I have used several times for serveral purposes. Unfortunately, sometimes there is a lack of help but a good forum... So, I have been struggling with alternative splicing, trying to understand, and done several tests. I have performed Firma scores and limma to see differences between two groups. I have also performed DE exons (probesets) with limma. Then, I used GenomeGraphs to visualize results and I have serveral questions which i hope to write clearly and not to be obvious: 1. what I expect is, in those exons that have differences in fs, to have a clear graphical difference in expression in the selected exon between conditions but is not what I get. So, for instance, if we take 7922737 -- ENSG0157060 -- C1orf14 -- blue=Testis, which is the first example of supplementary material (1) of Differential splicing using whole-transcript microarrays, BMC Bioinformatics, 2009, 10, 156. http://www.biomedcentral.com/content/supplementary/1471-2105-10-156-s1.pdf. I know that is FIRMAGene for gene arrays, but is something similar to what I get with FIRMA and exon arrays. I would say (from the residuals) that last two exons (10 probesets) are spliced. But if you check the expression of those probesets i wouldn't say that they have differences. On the contrary, first exons for instance seem to be spliced. Apparently, DE exons look better on graphs (no matter if they belong to a DE gene or not) Yes, this can happen. And, that is the real value of the GenomeGraphs output. Remember that FIRMA and FIRMAGene are really just outlier detection procedures. This is (sort of) eluded to in both the FIRMA and FIRMAGene papers: Purdom et al., Section 4.1: In particular, if the proportion of samples showing alternative splicing is high within an exon (say in the majority of samples), the high residuals will be found not in those samples classified by the simulation as spliced out, but rather the complementary set of samples Robinson and Speed, Conclusions and Discusion: Identifying departures through residuals from the RMA model will not always be perfect. Some departures from the RMA linear model may not be alternative splicing at all ... or are induced through, for example, an exon that is not expressed in any of the samples in combination with strong differential expression. 2. Do Firma scores correspond to residuals of rma model? because this is what you plot in genomeGraphs, isn't it? A FIRMA score for each sample and probeset is the median of the (usually 4) residuals from the robustly-fit linear model of RMA. See the paper for the formal definition. FIRMAGene is a little bit different, in that it takes partial sums of adjacent residuals to try and look for a persistence of departure from the model. 3. Everything is done on a probeset basis, but shouldn't it be done on a real exon (exon cluster) basis?. Sometimes just a probeset appears to be spliced and is complicated when you try to interpret this biologically... Debatable. Affymetrix picked probesets (or probe selection regions) based on what they thought could be independent units of expression, based on annotated transcripts, ESTs, etc. But, you could certainly build a CDF file that grouped the probes together in a different way. Hope that helps. Cheers, Mark I think there is no point on adding my code and sessionInfo() given that those are general questions, but if you need it I can add it. Thanks for your time and answer, Lara -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group with website http://www.aroma-project.org/. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe and other options, go to http://www.aroma-project.org/forum/ -- Mark Robinson, PhD (Melb) Epigenetics Laboratory, Garvan Bioinformatics Division, WEHI e: m.robin...@garvan.org.au e: mrobin...@wehi.edu.au p: +61 (0)3 9345 2628 f: +61 (0)3 9347 0852 -- __ The information in this email is confidential and intended solely for the addressee. You must not disclose, forward, print or use it without the permission of the sender. __ -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3
Re: [aroma.affymetrix] Re: what kind of file
Hi Daniela. That's strange. And no, these files do not need to be in a particular directory (In fact, this error is completely independent of the aroma.affmetrix package). The error you get suggests the file is empty. When you say it seems to be fine, what does that mean? Alternatively, what does this give: file.info(MoGene-1_0-st-v1.probe.tab) This should be a decent sized file. The ZIP file that you get from Affymetrix is 23MB, so unzipped it will be a lot larger. Cheers, Mark So far I have managed to nearly run everything of the script. I do though have issues in loading the probe sequences. Do the files need to be stored in a particular folder? For now I have it in ./annotationData/chipTypes/ as I have seen someone else in this forum doing it like this. What else could the reason be? probetab-read.table(MoGene-1_0-st-v1.probe.tab, sep=\t,header=TRUE, comment.char=,stringsAsFactors=FALSE) Error in read.table(MoGene-1_0-st-v1.probe.tab, sep = \t, header = TRUE, : no lines available in input I double checked the file I downloaded from Affy and it seems to be fine! Thx, Daniela On Mar 16, 4:56 pm, Mark Robinson mrobin...@wehi.edu.au wrote: Hi Daniela. Those files are available from Affymetrix. For example, see:http://www.affymetrix.com/estore/browse/products.jsp?productId=131453... HuGene-1_0-st-v1 Transcript Cluster Annotations, CSV, Release 30 (18 MB 11/09/09) HuGene-1_0-st-v1 Probe Sequences, tabular format (22 MB 07/13/07) (there are different versions of the annotation files available) Cheers, Mark On 13-Mar-10, at 8:12 AM, dkny169 wrote: Hello, I am working on the sup3.R script (FIRMAGene). I was wondering what kind of files these are: HuGene-1_0-st- v1.na25.hg18.transcript.csv and HuGene-1_0-st-v1.probe.tab? What is in these files? What am I supposed to load here into FIRMAGene? Thanks a lot! -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe from this group, send email to aroma-affymetrix-unsubscr...@googlegroups.com For more options, visit this group athttp://groups.google.com/group/aroma-affymetrix?hl=en -- Mark Robinson, PhD (Melb) Epigenetics Laboratory, Garvan Bioinformatics Division, WEHI e: m.robin...@garvan.org.au e: mrobin...@wehi.edu.au p: +61 (0)3 9345 2628 f: +61 (0)3 9347 0852 -- __ The information in this email is confidential and intended solely for the addressee. You must not disclose, forward, print or use it without the permission of the sender. __ -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group with website http://www.aroma-project.org/. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe from this group, send email to aroma-affymetrix-unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/aroma-affymetrix?hl=en To unsubscribe, reply using remove me as the subject. __ The information in this email is confidential and intended solely for the addressee. You must not disclose, forward, print or use it without the permission of the sender. __ -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group with website http://www.aroma-project.org/. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe from this group, send email to aroma-affymetrix-unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/aroma-affymetrix?hl=en
Re: [aroma.affymetrix] Re: FIRMAGene command
Hi Daniela. I definitely loaded the package and had a look at the help.start docs. Neverthelss, I wasn't able to work out my problems that I described in my previous post. OK, so you've read the documentation. But, you haven't told us what you didn't understand there. I can try and explain the docs to you ... - cls: variable giving the class (aligned with the columns or sample names of the input object). - So, this says that you need to specify a vector which gives the experimental group of your samples. So, in my example, the sample names were: getNames(cs) [1] TisMap_Brain_01_v1_WTGene1TisMap_Brain_02_v1_WTGene1 [3] TisMap_Brain_03_v1_WTGene1TisMap_Breast_01_v1_WTGene1 [5] TisMap_Breast_02_v1_WTGene1 TisMap_Breast_03_v1_WTGene1 [snip] which get converted to: cls-gsub(TisMap_,,gsub(_0[1-3]_v1_WTGene1,,getNames(cs))) cls [1] BrainBrainBrainBreast Breast Breast [snip] And, with the 'cls' variable, I tell FIRMAGene() what group each sample is from. You need to do the same for you 14 samples. --- idsToUse: indices of the units to calculate FIRMAGene scores for. --- In my example, all this does: u - which(getUnitNames(cdf) %in% hgnetaffx$probeset_id[hgnetaffx$category == main hgnetaffx$total_probes 7 hgnetaffx$total_probes 200]) ... (assuming you've read in an appropriate file to hgnetaffx) ... only uses the main category probeset (i.e. not the control probesets), only probesets with 7 and 200 probes within them. I am not sure what is supposed to be stored in cls and u. OK, so hopefully you are ok with whats spelled out above. Ask questions, mentioning what you don't understand, if not. I’m a bit confused however, with what the whole “unique cdf set” is for and how plm is working. Can I save the plm data into a txt file? You don't really need to understand the uniquifying. Its just a step that needs to be done. For general info on probe level models, you might look at the references mentioned in fitPLM() or rma(): library(affyPLM) ?fitPLM library(oligo) ?rma In terms of saving the plm data (I assume you mean chip effects?), you should breeze through the vignette for Gene 1.0 ST arrays. At the end, it extracts the summarized data into a data frame: http://aroma-project.org/node/38 ... and you could output this to a text file using write.table(). Hope that helps. Cheers, Mark On Fri, Mar 19, 2010 at 5:25 AM, dkny169 daniela...@yahoo.com wrote: I definitely loaded the package and had a look at the help.start docs. Neverthelss, I wasn't able to work out my problems that I described in my previous post. On Mar 18, 1:58 pm, Henrik Bengtsson henrik.bengts...@gmail.com wrote: Hi. On Thu, Mar 18, 2010 at 6:47 PM, dkny169 daniela...@yahoo.com wrote: Unfortunately I cannot get to the docs, unless the same docs are stored under help.start() Please explain what the problem/error is. Note that you have to load a package in order to use ?/help() on its methods, e.g. library(FIRMAGene); ?FIRMAGene If you don't load it, you get something like: ?FIRMAGene No documentation for 'FIRMAGene' in specified packages and libraries: you could try '??FIRMAGene' The help is the same regardless if you access it via ?/help() or help.start(). So, yes, you'll find the same information if you do help.start() - Packages - FIRMAGene - FIRMAGene /Henrik I used following parameters: plm - RmaPlm(csNU) plm [1] RmaPlm: 0x22388540 cls-gsub(TisMap_,,gsub(_0(1-3)_v1_WTGene1,,getNames(cs))) cls [1] P.L_10 P.L_11 P.L_12 P.L_14 P.L_15 P.L_16 P.L_2 P.L_3 [9] P.L_4 P.L_5 P.L_6 P.L_7 P.L_8 P.L_9 I am not sure what is supposed to be stored in cls and u. I’m a bit confused however, with what the whole “unique cdf set” is for and how plm is working. Can I save the plm data into a txt file? Many thanks for your help. I really appreciate it. Daniela On Mar 16, 4:57 pm, Mark Robinson mrobin...@wehi.edu.au wrote: Hi Daniela. You haven't told us what inputs you've used for 'plm' and 'cls' ... and what is stored in 'u'? Have you read the docs at: ?FIRMAGene Cheers, Mark On 14-Mar-10, at 10:21 AM, dkny169 wrote: Hello, I have a question regarding FIRMAGene. Executing the FIRMAGene command I get the following error: fg-FIRMAGene(plm, idsToUse=u, cls=cls) Gathering/calculating residuals. Reading units. Error in if (any(units 1)) stop(Argument 'units' contains non- positive indices.) : missing value where TRUE/FALSE needed The commands used right before are: monetaffx-read.csv(MoEx-1_0-st-v1.na29.mm9.transcript.csv, sep=,,skip=20, header=TRUE,comment.char=,stringsAsFactors=FALSE) probetab-read.table(MoEx-1_0-st-v1.na29.mm9.probeset.csv, sep=\t, header=TRUE, comment.char=, stringsAsFactors=FALSE) u-which(getUnitNames(cdf) %in% monetaffx$probeset_id [monetaffx $category ==main
Re: [aroma.affymetrix] getUniqueCdf inflates dimensions of original cdf
Hi Vince. Yes, getUniqueCdf() *should* inflate the dimensions of the original CDF. Basically, it is rearranging the probesets so that individual probes do not match to multiple probesets. To do this, it creates a CDF with a higher dimension, copying the original physical location to multiple locations. convertToUnique() takes an AffymetrixCelSet and copies the data to match the new CDF. I'm not sure what is going wrong in your analysis. But, as Henrik says, please tell us how you created the CDF as a starter. You have a near doubling in the number of probes in your unique CDF to the original CDF, which obviously is curious in itself. So, a full explanation of what you've done upstream of this would be useful. Cheers, Mark On 16-Mar-10, at 1:09 AM, Henrik Bengtsson wrote: Hi, I leave this one to Mark Robinson who is designed createUniqueCdf() for AffymetrixCdfFile and is on top of this. Though, in the meanwhile could you please: 1. Clarify the origin of Mm_PromPR_v02.CDF, because Affymetrix does not provide an CDF. 2. Make the Mm_PromPR_v02.CDF available to us? If you're happy to share it (and got the rights), I'm happy to have aroma-project.org to either link to it or host it. /Henrik On Fri, Mar 12, 2010 at 8:04 PM, stvjc carey...@gmail.com wrote: cdfU AffymetrixCdfFile: Path: annotationData/chipTypes/Mm_PromPR_v02 Filename: Mm_PromPR_v02,unique.CDF Filesize: 126.33MB Chip type: Mm_PromPR_v02,unique RAM: 0.00MB File format: v4 (binary; XDA) Dimension: 3026x3026 Number of cells: 9156676 Number of units: 25373 Cells per unit: 360.88 Number of QC units: 0 cdf AffymetrixCdfFile: Path: annotationData/chipTypes/Mm_PromPR_v02 Filename: Mm_PromPR_v02.cdf Filesize: 126.33MB Chip type: Mm_PromPR_v02 RAM: 0.00MB File format: v4 (binary; XDA) Dimension: 2166x2166 Number of cells: 4691556 Number of units: 25373 Cells per unit: 184.90 Number of QC units: 0 this leads to (i think) csU = convertToUnique(csN, verbose=verbose) 20100312 14:02:59|Converting to unique CDF... 20100312 14:02:59| Getting unique CDF... 20100312 14:02:59| Getting unique CDF...done 20100312 14:02:59| Input tags:MN,lm 20100312 14:02:59| Input Path: probeData/Dawn,MN,lm/Mm_PromPR_v02 20100312 14:02:59| Output Path:probeData/Dawn,MN,lm,UNQ/Mm_PromPR_v02 20100312 14:02:59| allTags:MN,lm,UNQ 20100312 14:02:59| Test whether dataset exists 20100312 14:02:59| Reading cell indices from standard CDF... 20100312 14:03:08| Reading cell indices from standard CDF...done 20100312 14:03:08| Reading cell indices list from unique CDF... 20100312 14:03:17| Reading cell indices list from unique CDF...done 20100312 14:03:17| Converting CEL data from standard to unique CDF for sample 1 ( 10_BL6_IP_Mmp ) of 8... 20100312 14:03:17| Reading intensity values according to standard CDF... Error in readCel(filename, indices = indices, readHeader = FALSE, readOutliers = FALSE, : Argument 'indices' contains an element out of range. 20100312 14:03:23| Reading intensity values according to standard CDF...done 20100312 14:03:23| Converting CEL data from standard to unique CDF for sample 1 ( 10_BL6_IP_Mmp ) of 8...done 20100312 14:03:23|Converting to unique CDF...done sessionInfo() R version 2.11.0 Under development (unstable) (2010-03-02 r51194) x86_64-apple-darwin9.8.0 locale: [1] C attached base packages: [1] stats graphics grDevices datasets tools utils methods [8] base other attached packages: [1] gsmoothr_0.1.4 limma_3.3.4 aroma.affymetrix_1.5.0 [4] aroma.apd_0.1.7affxparser_1.19.6 R.huge_0.2.0 [7] aroma.core_1.5.0 aroma.light_1.15.1 matrixStats_0.1.9 [10] R.rsp_0.3.6R.cache_0.2.0 R.filesets_0.8.0 [13] R.utils_1.3.3 R.oo_1.6.7 R.methodsS3_1.1.0 [16] weaver_1.13.0 codetools_0.2-2 digest_0.4.2 -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group. To post to this group, send email to aroma- affymet...@googlegroups.com To unsubscribe from this group, send email to aroma-affymetrix-unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/aroma-affymetrix?hl=en -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe from this group, send email to aroma-affymetrix-unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/aroma-affymetrix?hl=en -- Mark
Re: [aroma.affymetrix] FIRMAGene command
Hi Daniela. You haven't told us what inputs you've used for 'plm' and 'cls' ... and what is stored in 'u'? Have you read the docs at: ?FIRMAGene Cheers, Mark On 14-Mar-10, at 10:21 AM, dkny169 wrote: Hello, I have a question regarding FIRMAGene. Executing the FIRMAGene command I get the following error: fg-FIRMAGene(plm, idsToUse=u, cls=cls) Gathering/calculating residuals. Reading units. Error in if (any(units 1)) stop(Argument 'units' contains non- positive indices.) : missing value where TRUE/FALSE needed The commands used right before are: monetaffx-read.csv(MoEx-1_0-st-v1.na29.mm9.transcript.csv, sep=,,skip=20, header=TRUE,comment.char=,stringsAsFactors=FALSE) probetab-read.table(MoEx-1_0-st-v1.na29.mm9.probeset.csv, sep=\t, header=TRUE, comment.char=, stringsAsFactors=FALSE) u-which(getUnitNames(cdf) %in% monetaffx$probeset_id [monetaffx $category ==main monetaffx$total_probes 7 monetaffx $total_probes 200]) I'm not sure what these commands do and how they need to be changed to accommodate my own data: cls - gsub(TisMap_,,gsub(_0[1-3]_v1_WTGene1,,getNames(cs))) Many thanks, Daniela -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe from this group, send email to aroma-affymetrix-unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/aroma-affymetrix?hl=en -- Mark Robinson, PhD (Melb) Epigenetics Laboratory, Garvan Bioinformatics Division, WEHI e: m.robin...@garvan.org.au e: mrobin...@wehi.edu.au p: +61 (0)3 9345 2628 f: +61 (0)3 9347 0852 -- __ The information in this email is confidential and intended solely for the addressee. You must not disclose, forward, print or use it without the permission of the sender. __ -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe from this group, send email to aroma-affymetrix-unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/aroma-affymetrix?hl=en
Re: [aroma.affymetrix] Is the PdInfo2Cdf.R script working
Hi Peter. I haven't looked at this since early 2009 and our motive in making this available as a script (instead of within the aroma.affymetrix package) was simply as an FYI and the code is readily available for others to modify for their needs. You will of course understand that underlying packages change all the time. On a cursory glance from what you have mentioned below, it seems like the script actually works now problem. Each unit appears to be a probeset, not a transcript cluster. I know there was a point (looks like early 2009) where Affymetrix made annotation available at the probeset as well as transcript cluster level, whereas when I built this script there was only transcript cluster. Also, you'll notice my post on BioC: https://stat.ethz.ch/pipermail/bioconductor/2009-July/028893.html ... so, 250k probesets seems about right. You'll notice that BioC now makes available annotation at both levels: http://www.bioconductor.org/packages/release/data/annotation/ (hugene10sttranscriptcluster.db, hugene10stprobeset.db) I'm pretty sure that the 'pd.hugene.1.0.st.v1' package includes information about both probesets and transcript clusters, since oligo::rma() can summarize these chips at both levels. So, it may be an easy modification to make to the script to extract this. Also, you haven't told us what you mean by aroma.affymetrix will not work with it, so I can't offer much there. Cheers, Mark I attempted to use the script posted on the site to convert the pd.hugene.1.0.st.v1 package to a CDF file but it appears not to be working. The resulting cdf file has too many units and aroma.affymetrix will not work with it beyond naming the CDF. Are you aware of any issues? source(http://bioinf.wehi.edu.au/folders/mrobinson/aroma/PdInfo2Cdf.R;) PdInfo2Cdf(pd.hugene.1.0.st.v1, A1_Affy.CEL, overwrite=TRUE) I renamed the resulting binary cdf file and moved it to the appropriate aroma.affymetrix directory. setwd(P:\\ANNOTATION\\aromaAffymetrix) library(aroma.affymetrix) chipType - HuGene-1_0-st-v1 cdf - AffymetrixCdfFile$byChipType(chipType, tags=r4) print(cdf) print(cdf) AffymetrixCdfFile: Path: annotationData/chipTypes/HuGene-1_0-st-v1 Filename: HuGene-1_0-st-v1,r4.cdf Filesize: 51.94MB Chip type: HuGene-1_0-st-v1,r4 RAM: 0.00MB File format: v4 (binary; XDA) Dimension: 1050x1050 Number of cells: 1102500 Number of units: 253002 Cells per unit: 4.36 Number of QC units: 0 -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe from this group, send email to aroma-affymetrix-unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/aroma-affymetrix?hl=en __ The information in this email is confidential and intended solely for the addressee. You must not disclose, forward, print or use it without the permission of the sender. __ -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe from this group, send email to aroma-affymetrix-unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/aroma-affymetrix?hl=en
Re: [aroma.affymetrix] can't load CDF file
Hi Daniela. Is your CDF in the: annotationData/chipTypes/MoEx-1_0-st-v1/ directory? (http://aroma-project.org/node/66) Cheers, Mark Hi, I stored my CDF file in annotationData/chipTypes; nevertheless I cannot upload the file. Can anyone please tel me what I am doing wrong: cdf-AffymetrixCdfFile$byChipType(MoEx-1_0-st-v1.cdf) ror in list(`AffymetrixCdfFile$byChipType(MoEx-1_0-st-v1.cdf)` = environment, : [2010-03-09 14:50:58] Exception: Could not locate a file for this chip type: MoEx-1_0-st-v1.cdf at throw(Exception(...)) at throw.default(Could not locate a file for this chip type: , paste(c(chipT at throw(Could not locate a file for this chip type: , paste(c(chipType, tag at method(static, ...) at AffymetrixCdfFile$byChipType(MoEx-1_0-st-v1.cdf) Many thanks for your help! -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe from this group, send email to aroma-affymetrix-unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/aroma-affymetrix?hl=en __ The information in this email is confidential and intended solely for the addressee. You must not disclose, forward, print or use it without the permission of the sender. __ -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe from this group, send email to aroma-affymetrix-unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/aroma-affymetrix?hl=en
Re: [aroma.affymetrix] custom CDF and flat file problems
Hi Zaid. Bit hard to tell from the information you've given us. So, you built a custom CDF file. Did you check its contents? For example, you could do: cdf - AffymetrixCdfFile() readCdf( getPathname(cdf), units=1:2 ) ... and you could check that the indices/X-Y values match what your inputs are. Assuming that is all ok, what commands did you use to fit the probe level model using that CDF to a dataset? In my experience, when you get all zeros, that generally means you haven't actually fit the model, via something like: plm - ExonRmaPlm(...) fit(plm, verbose=verbose) ... # pull out chip effects using extractMatrix() So, need a few more details. Cheers, Mark On 13-Jan-10, at 12:17 PM, zaid wrote: I was tryign to use a custom CDF in aroma package in R. Basically I have a custom Flat file that i have fileterd to contain only the columns as the example provided in this gruup then ran it using the flat2Cdf function in R. Then I used the binary CDF file to run the analysis on the CEL files in aroma R package. The results I obtained were all zeros. Here's a snippet of the original FLAT file and the fileterd flat file: Original Flat file: pr_text pr_set_text chip_x chip_y interog_pos probe_sequence temp chr dna_fromdna_to strand junction type SNP entrezgene_id gene mrna in spliced est in unspliced est inmrna outspliced est out 5827196 2315304 635 22760 ggtatgctgttcgaattcataagaa 52.76 1 554527 554551 + exon0 100131754 LOC1001317540 0 0 0 0 5942976 2315304 121523210 tgtatgagttggtcgtagcggaatc 57.68 1 554555 554579 + exon0 100131754 LOC1001317540 0 0 0 0 502148 2315304 387 196 0 catataagtaatgctagggtgagtg 54.4 1 554603 554627 + exon0 100131754 LOC1001317540 0 0 0 0 5836909 2315304 108 22800 tgtaatgggtatggagacatatcat 52.76 1 554625 554649 + exon0 100131754 LOC1001317540 0 0 0 0 4237863 2315305 106216550 aaactcctattatttactctatcaa 47.84 1 554703 554727 + exon0 100131754 LOC1001317540 0 0 0 0 2980983 2315305 114211640 ttaaactcctattatttactctatc 47.84 1 554705 554729 + exon0 100131754 LOC1001317540 0 0 0 0 3217941 2315307 20 12570 agcgctgtgatgagtgtgcctgcaa 60.96 1 554923 554947 + exon0 100131754 LOC1001317540 0 0 0 0 143826 2315307 465 56 0 taatcagtgcgagcttagcgc 56.04 1 554943 554967 + exon0 100131754 LOC1001317540 0 0 0 0 end of snippet Fileterd flat file: Probe_IDX Y Probe_Sequence Group_IDUnit_ID 5827196 635 2276ggtatgctgttcgaattcataagaa 100131754 LOC100131754 5942976 12152321tgtatgagttggtcgtagcggaatc 100131754 LOC100131754 502148 387 196 catataagtaatgctagggtgagtg 100131754 LOC100131754 5836909 108 2280tgtaatgggtatggagacatatcat 100131754 LOC100131754 4237863 10621655aaactcctattatttactctatcaa 100131754 LOC100131754 2980983 11421164ttaaactcctattatttactctatc 100131754 LOC100131754 3217941 20 1257agcgctgtgatgagtgtgcctgcaa 100131754 LOC100131754 143826 465 56 taatcagtgcgagcttagcgc 100131754 LOC100131754 3751638 12371465gcagcttctgtggaacgagggttta 100131754 LOC100131754 5640155 474 2203cttgcgtgaggaaatacttgatggc 100131754 LOC100131754 3909899 778 1527aatggcccatttgggcagccg 100131754 LOC100131754 4012422 901 1567gtgaattcttcgataatggcc 100131754 LOC100131754 end of snippet any ideas? thanks -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe from this group, send email to aroma-affymetrix-unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/aroma-affymetrix?hl=en -- Mark Robinson, PhD (Melb) Epigenetics Laboratory, Garvan Bioinformatics Division, WEHI e: m.robin...@garvan.org.au e: mrobin...@wehi.edu.au p: +61 (0)3 9345 2628 f: +61 (0)3 9347 0852
[aroma.affymetrix] Re: Gene-Level Summarization of Expression Data
Hi Randy. From that error message, it looks like there was a mix of CDF files being used (my guess is 54675 corresponds to the number of Affymetrix probesets, whereas 30625 corresponds to the Refseq reorganization of probesets). Can you post the code you ran? Cheers, Mark On 16-Jan-10, at 11:41 AM, Randy Gobbel wrote: I'm also trying to get gene-level expression values, using HG- U133_Plus_2 data. I downloaded the custom CDF that combines probes into probesets that correspond to RefSeq genes, linked from the aroma.affymetrix group page for this chip type (Hs133P_Hs_REFSEQ.cdf), and ran the same set of commands. It works up to the point of trying to extract expression values, then dies with: Exception: Range of argument 'indices' is out of range [1,30625]: [1,54675] At this point, I'm not sure what to do next. Suggestions? It looks like you were the creator of the CDF--is it the right one for this? -Randy On Jun 19 2009, 10:08 pm, Mark Robinson mrobin...@wehi.edu.au wrote: Hi Steve. I don't know how common this is. Basically, a colleague found agene that was very differentially expressed when analyzing using the Affymetrix probesets definition and found virtually nothing when using the custom CDF that bundles all the probes for agenetogether. The reason was simple. There were several probesets designed for this geneand presumably they measure different isoforms. The probes for the DE probeset showed the difference, but all the other probesets didn't. When you use a robust linear model like RMA, outliers get downweighted. Because the DE probes accounted for a small proportion of the probes (I think there was 3 or 4 other probesets at this locus), their effect got washed out. So, its a tradeoff. Sometimes (perhaps most of the time) you gain by lumping them all together ... more information, more power to detect changes. But, sometimes (perhaps rarely) it can mislead. I'm sure I'm not the only one to observe such things. The probe-level data (usually?) doesn't lie. But, since you are comparing across platforms, you will undoubtedly find this as you go along. Different microarray designs often measure slightly different things. One other thing. Be sure to convert your CDF to binary if it is not already using affxparser's convertCdf(). Having this info stored in binary format will make the processing much faster. I think the MBNI custom CDFs are text. Cheers, Mark On 20/06/2009, at 6:55 AM, Steve P wrote: Mark, Thanks for the information. That is very helpful. I want to do the latter, which is to combine probesets such that all probes for a givengene(by some definition -- RefSeq, Ensembl, etc) are used to arise at the summarize value. I was able to obtain a custom CDF for the U133-A array. So I will try that approach. But part of the reason I want to do this is to be able to compare values across platforms, so I may need to find/build a custom CDF for the other platform. I would appreciate any cautionary advice you have about summarizing at thegenelevel. Regards, -Steve On Jun 17, 9:56 am, Steve Piccolo steve.picc...@gmail.com wrote: Yesterday I posted this question to the list, but the spam blocker didn't let it through. Below my question is a response from Mark Robinson. --- --- Following the example provided athttp://groups.google.com/group/aroma-affymetrix/web/gene-1-0-st-array ... , I am running the following code: chipType - HT_HG-U133A dataSet = myData library(aroma.affymetrix) verbose - Arguments$getVerbose(-8, timestamp=TRUE) cdf - AffymetrixCdfFile$byChipType(chipType) cs - AffymetrixCelSet$byName(dataSet, cdf=cdf) bc - RmaBackgroundCorrection(cs) csBC - process(bc,verbose=verbose) qn - QuantileNormalization(csBC) csN - process(qn, verbose=verbose) plm - RmaPlm(csN) fit(plm, verbose=verbose) ces - getChipEffectSet(plm) gExprs - extractDataFrame(ces, units=NULL, addNames=TRUE) This seems to be working beautifully. However, I'm doing an analysis that requires my expression values to be summarized at thegenelevel rather than the probeset level. In the gExprs object that results from the above analysis, I get a data.frame object in which each row contains expression values for a given probeset across all samples. What I would love to see in each row is an expression value for a givengene. I believe RMA has the ability to do this, but I'm not sure how to do it via aroma.affymetrix. Any suggestions? I'm happy to provide any more details that would be helpful. Regards, -Steve --- --- Hi Steve. As to your question, it depends on what you need. When you say you want every row to be agene, do you just want to know thegenename that goes with the probeset identifier
Re: [aroma.affymetrix] Re: a question about FIRMA score
Hi Jiang. You need to take the log2 of the residuals. CEL files store only positive numbers. This question has been answered many times. For starters, have a look at: http://www.mail-archive.com/aroma-affymetrix@googlegroups.com/msg01015.html Cheers, Mark On 8-Jan-10, at 2:53 AM, camelbbs wrote: Hi, can anyone help? Jiang On Jan 5, 9:47 am, camelbbs camel...@gmail.com wrote: HI, I found the FIRMA scores are all 0, so what do you mean the positive or negative? and i am not very understand this sentence it seems logical to find FIRMA scores that are different b/w your 2+ groups. That if there are several samples in each group, how to deal with it? Thanks, Jiang -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe from this group, send email to aroma-affymetrix-unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/aroma-affymetrix?hl=en -- Mark Robinson, PhD (Melb) Epigenetics Laboratory, Garvan Bioinformatics Division, WEHI e: m.robin...@garvan.org.au e: mrobin...@wehi.edu.au p: +61 (0)3 9345 2628 f: +61 (0)3 9347 0852 -- __ The information in this email is confidential and intended solely for the addressee. You must not disclose, forward, print or use it without the permission of the sender. __ -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe from this group, send email to aroma-affymetrix-unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/aroma-affymetrix?hl=en
Re: [aroma.affymetrix] FIRMA SCORE from different test
Hi Sabrina. Some comments below. On 24-Dec-09, at 2:39 AM, sabrina wrote: Hi, everyone: I am trying to use aroma to detect splicing events. My dataset consists 3 groups (genetically different) with one as control, the other two as mutants. each group also had control and treatment subgroups. My interest is to compare mutant with control, under normal condition,and compare control under two condition (normal and treatment), and interactions among mutant and control with condition. I did two different runs for the second comparison (control group under two conditions): 1. using all groups , all conditions to do normalization, plmFit, firmaScore and use limFit with specific contrast matrix to find the splicing events. 2. only use control group, under two conditions, do normalization, plmFit, firmaScore, and limFit to find the splicing events. I'm not sure whether such a comparison will lead you to anything meaningful. It seems like the best approach should be to use all data together, since that should allow the best estimates of probe effects to be estimated. Your #1 and #2 don't seem comparable to me. I compared the results from these two runs, they were quite different and B values from the topTable were very different. I wonder what is the right choice to fit my objective. Thanks If you did this with gene expression data in limma, you'd probably find the same thing -- process the data in different ways and you'll get different B values. Cheers, Mark Sabrina -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe from this group, send email to aroma-affymetrix-unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/aroma-affymetrix?hl=en -- Mark Robinson, PhD (Melb) Epigenetics Laboratory, Garvan Bioinformatics Division, WEHI e: m.robin...@garvan.org.au e: mrobin...@wehi.edu.au p: +61 (0)3 9345 2628 f: +61 (0)3 9347 0852 -- __ The information in this email is confidential and intended solely for the addressee. You must not disclose, forward, print or use it without the permission of the sender. __ -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe from this group, send email to aroma-affymetrix-unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/aroma-affymetrix?hl=en
Re: [aroma.affymetrix] Question for custom CDF of ST-Array
Hi Ranger. See comments below. On 8-Jan-10, at 8:03 AM, rangerq wrote: Hi, I use aroma.affymetrix to process Human Gene 1.0 ST Array. In the step of using custom cdf, I want to know that could I use the regular HuGene ST cdf from Affymetrix instead of the one provided at here? What are the different between unsupported cdf and the regular cdf? Could you explain what does 'unsupported' mean? Which one that I can trust to annotate my data? The only difference between the CDF that Affy provides and the one that you can download from the aroma.affymetrix site is that ours has been converted to binary, allowing it to be read a LOT faster. The content is identical. As far as the unsupported business goes, Affymetrix doesn't support it. They make their annotation available in a different format nowadays. However, that annotation is the same (or at least it was when I last checked it): http://www.mail-archive.com/aroma-affymetrix@googlegroups.com/msg00281.html If I want to make my own cdf, are there some instructions? Does this help? http://groups.google.com/group/aroma-affymetrix/web/creating-cdf-files-from-scratch Cheers, Mark Thanks, ranger QI -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe from this group, send email to aroma-affymetrix-unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/aroma-affymetrix?hl=en -- Mark Robinson, PhD (Melb) Epigenetics Laboratory, Garvan Bioinformatics Division, WEHI e: m.robin...@garvan.org.au e: mrobin...@wehi.edu.au p: +61 (0)3 9345 2628 f: +61 (0)3 9347 0852 -- __ The information in this email is confidential and intended solely for the addressee. You must not disclose, forward, print or use it without the permission of the sender. __ -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe from this group, send email to aroma-affymetrix-unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/aroma-affymetrix?hl=en
Re: [aroma.affymetrix] questiones about annotations for exon arrays: no gene symbol or refseq id for majority of core probe sets?
Hi Yupu. I haven't explored this in any detail, but on a cursory inspection (below), it appears that biomaRt has 244000 probesets represented in its database (which seems about right). bm - getBM (attributes = c (affy_huex_1_0_st_v2 ,hgnc_symbol,chromosome_name,band),mart=mart) dim(bm) [1] 324334 4 head(bm) affy_huex_1_0_st_v2 hgnc_symbol chromosome_name band 1 3581777 IGHA2 14 q32.33 2 3581646 IGHA2 14 q32.33 3 3581642 IGHA2 14 q32.33 4 3581781 IGHA2 14 q32.33 5 3581783 IGHA2 14 q32.33 6 3581788 IGHA2 14 q32.33 length(unique(bm$affy_huex_1_0_st_v2)) [1] 244801 Strictly speaking, this isn't an aroma.affymetrix question. What I suggest you try is exploring what identifiers are not represented in the database and whether something is missing from biomaRt (or the Ensembl web page). Of course, you can also get annotation from other sources (e.g. from Affymetrix). Hope that helps, Mark On 7-Jan-10, at 7:30 AM, yupu wrote: Hi, I am new to exon array analysis. I managed to follow the instructions from http://groups.google.com/group/aroma-affymetrix/web/human-exon-array-analysis to get the estimation of transcript: trFit - extractDataFrame(cesTr, units=1:3, addNames=TRUE) Then I followed the following thread's idea of using biomaRt to get the annotation information through the group id of trFit object: http://groups.google.com/group/aroma-affymetrix/browse_thread/thread/1f4af7fca4352022/a3fe6980ffa7b925?lnk=gstq=questions+about+annotations#a3fe6980ffa7b925 groupnames = trFit[,2] ann-getBM(attributes = c(affy_huex_1_0_st_v2, hgnc_symbol), filters = affy_huex_1_0_st_v2, values = groupnames, mart = ensembl) What surprised me is a majority of these group ids don't have any gene symbol or refseq id associated with them (even I was using the core probeset upstream) length(groupnames) [1] 18708 dim(ann) [1] 78352 I am not sure if this is expected or I am doing something wrong here. Thanks, Yupu -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe from this group, send email to aroma-affymetrix-unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/aroma-affymetrix?hl=en -- Mark Robinson, PhD (Melb) Epigenetics Laboratory, Garvan Bioinformatics Division, WEHI e: m.robin...@garvan.org.au e: mrobin...@wehi.edu.au p: +61 (0)3 9345 2628 f: +61 (0)3 9347 0852 -- __ The information in this email is confidential and intended solely for the addressee. You must not disclose, forward, print or use it without the permission of the sender. __ -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe from this group, send email to aroma-affymetrix-unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/aroma-affymetrix?hl=en
Re: [aroma.affymetrix] Re: FIRMA score for each transcript
Hi Libing. I'm afraid that aroma.affymetrix will not work on TXT files. I suggest you check out the following functions in the 'affxparser' package: ?readCel (just so you know what is stored) ?createCel (to create the file) ?updateCel (to store intensities in the file) Once you figure out the inputs for those functions It should be pretty straightforward to take your simulated data and push it into a CEL file. Hope that helps. Cheers, Mark On 16-Dec-09, at 5:14 AM, Libing Wang wrote: Hi Mark, Thanks a lot for your help! Now I want to work with some simulated data with aroma to calculate summarized intensities of probesets. The problem is that I only have a txt file with original probe signal intensities but aroma could be only fed by cel files. Is it possible let aroma work with txt files? If not, are there any ways to construct cel files from txt files? Thanks! /Libing On Fri, Nov 6, 2009 at 6:35 AM, Mark Robinson mrobin...@wehi.edu.au wrote: Hi Libing. Are you after the probe IDs from the probe.tab file? For example: Probe IDProbe Set IDprobe x probe y assembly seqname start stop strand probe sequence target strandedness category 4485910 2315252 789 1752build-34/hg16 chr1407616 407640 + GTAATGCTTGCCACATAGAGCACAG Sense main 2412400 2315252 879 942 build-34/hg16 chr1408027 408051 + AAGCTGTCCAACACATTAGGGCCAC Sense main 4260180 2315252 339 1664build-34/hg16 chr1408088 408112 + GAACTGCAATCTGTAGGTGTCGGTA Sense main 5750312 2315252 551 2246build-34/hg16 chr1408300 408324 + TCCATCTGTGAATTAGGGTGTGGCC Sense main 2959753 2315253 392 1156build-34/hg16 chr1408431 408455 + AGATCCTCTTGTAAATCACTAGCTG Sense main 294823 2315253 422 115 build-34/hg16 chr1408433 408457 + TGAGATCCTCTTGTAAATCACTAGC Sense main 5504333 2315253 332 2150build-34/hg16 chr1408434 408458 + ATGAGATCCTCTTGTAAATCACTAG Sense main 1224013 2315253 332 478 build-34/hg16 chr1408436 408460 + TTATGAGATCCTCTTGTAAATCACT Sense main If so, you could make a lookup table from that and match them to the info in your CDF file. For example: cdf - AffymetrixCdfFile$byChipType(HuEx-1_0-st-v2, tag=coreR3,A20071112,EP) u - readUnits(cdf, units=1, readBases=FALSE, readExpos=FALSE, readType=FALSE, readDirection=FALSE) u $`2315251` $`2315251`$groups $`2315251`$groups$`2315252` $`2315251`$groups$`2315252`$x [1] 789 339 879 551 $`2315251`$groups$`2315252`$y [1] 1752 1664 942 2246 $`2315251`$groups$`2315253` $`2315251`$groups$`2315253`$x [1] 332 422 392 332 $`2315251`$groups$`2315253`$y [1] 2150 115 1156 478 ... so if you read your BG adjusted intensities into a matrix, you could annotate each row with the probe ID. Is that what you had in mind? If so, hope that gets you started. Cheers, Mark On 3-Nov-09, at 6:50 AM, Libing Wang wrote: Hi Mark, Thanks for your help so far! Now I have a quick question for you. Is there any ways to get the probe ID for background corrected probe intensities? If I have finish the following steps: bc - RmaBackgroundCorrection(cs, tag=core,A20071112,EP) csBC - process(bc, verbose=verbose) Thanks! Libing On Wed, Jun 17, 2009 at 6:18 PM, Mark Robinson mrobin...@wehi.edu.au wrote: Hi Libing. Doesn't 'addNames=TRUE' already do this for you? fs1 - extractDataFrame(fs, units=1:2, addNames=TRUE) head(fs1[,1:6]) unitName groupName unit group cell huex_wta_breast_A 1 2315251 23152521 11 1.1150999 2 2315251 23152531 22 0.9551846 3 2315373 23153742 13 1.5354252 4 2315373 23153752 24 0.6288152 5 2315373 23153762 35 1.5658265 6 2315373 23153772 46 1.2131032 fs2 - extractDataFrame(fs, units=1:2, addNames=FALSE) head(fs2[,1:6]) unit group cell huex_wta_breast_A huex_wta_breast_B huex_wta_breast_C 11 11 1.1150999 0.8552212 0.9177643 21 22 0.9551846 1.1747438 0.8580346 32 13 1.5354252 1.0427089 1.6461661 42 24 0.6288152 0.7053325 0.6999596 52 35 1.5658265 1.0576524 1.1404822 62 46 1.2131032 1.0494679 0.7729633 If not, please send your entire script and the output of sessionInfo(). Cheers, Mark On 18/06/2009, at 1:02 AM, Libing Wang wrote: Hi Mark, I am wondering if it is possible to get the actual unit id(transcript cluster id) and group id(probeset id) for each firma score instead of artificial number from 1 to whatever in the firma score data frame. Thanks
Re: [aroma.affymetrix] probe_id in cdf file
Hi Renyi. No, the 'Probe_ID' column from that example is not used anywhere. Really, its just the 'X', 'Y' and then the 2 columns for group and unit that are used from the input TXT file. And yes, X/Y are analogous to pmx/pmy from the bpmap file. Cheers, Mark On 16-Dec-09, at 5:27 AM, Renyi Liu wrote: Hi, Mark and Aroma.affymetrix fans, I have a quick question: when we create a custom CDF file according to the following page, does it matter what number we use for the probe_id field? I have seen somebody used (y*DIM+x+1). If it does not matter too much, I would like to just use a sequential number. Also I assume that x and y refer to pmx and pmy in the bpmap file. Is this assumption correct? http://groups.google.com/group/aroma-affymetrix/web/creating-cdf-files-from-scratch Many thanks for your help. Renyi -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe from this group, send email to aroma-affymetrix-unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/aroma-affymetrix?hl=en -- Mark Robinson, PhD (Melb) Epigenetics Laboratory, Garvan Bioinformatics Division, WEHI e: m.robin...@garvan.org.au e: mrobin...@wehi.edu.au p: +61 (0)3 9345 2628 f: +61 (0)3 9347 0852 -- __ The information in this email is confidential and intended solely for the addressee. You must not disclose, forward, print or use it without the permission of the sender. __ -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe from this group, send email to aroma-affymetrix-unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/aroma-affymetrix?hl=en
[aroma.affymetrix] Re: annotation of ST gene Arrays
Hi Wade. I think the problem lies with the 'ragene10st*probeset*.db' library. How about trying the symbols from the 'ragene10sttranscriptcluster.db' package: http://www.bioconductor.org/packages/release/data/annotation/html/ragene10sttranscriptcluster.db.html I can't remember when this change was made, but my 'hugene10st.db' example is now outdated. You should use hugene10stprobeset.db for probesets or hugene10sttranscriptcluster.db for transcript clusters. Hope that helps. Cheers, Mark On 17-Dec-09, at 10:09 AM, Wade D wrote: Hi Mark and others, I am in a somewhat similar as the original person who started this discussion, so I am tacking on my question to your response from February. This is my first ST analysis, and I am using the Rat gene 1.0 ST. I followed the example given at http://groups.google.com/group/aroma-affymetrix/web/gene-1-0-st-array-analysis and everything has worked fine so far. Now, I would like to annotate my gene-level summaries. I tried using methods I typically do (from the annotate package) with ragene10stprobeset.db, but things didn't seem right. So I figured it was me, and I came back the group help pages and found your post. Mimicking it below, it seems that I've either done something wrong, or there is a problem with ragene10stprobeset.db. library(ragene10stprobeset.db) symbols - unlist(as.list(ragene10stprobesetSYMBOL)) myids-gExprs[,1] head(myids) [1] 1071 1073 1074 1075 10700013 10700014 temp-data.frame(affyid = myids,symbol = symbols[myids]) #temp[!is.na(temp$symbol),] sum(!is.na(temp$symbol)) [1] 237 This is a disturbingly low number, so I figure something is amiss. Following your lead, I compare the CDF with what is on Affy's website in the transcript and probeset files... tr - read.csv(RaGene-1_0-st- v1.na30.1.rn4.transcript.csv,header=TRUE,comment.char=#) ps - read.csv(RaGene-1_0-st- v1.na30.rn4.probeset.csv,header=TRUE,comment.char=#) #chipType - RaGene-1_0-st-v1 #cdf - AffymetrixCdfFile$byChipType(chipType, tags=r3) un - getUnitNames(cdf) sum( un %in% ps$transcript_cluster_id ) [1] 27342 sum( un %in% tr$transcript_cluster_id ) [1] 29169 Everything looks reasonable here. sum(names(symbols) %in% ps$transcript_cluster_id ) [1] 0 sum(names(symbols) %in% tr$transcript_cluster_id ) [1] 1872 This is the problem it seems. I wanted to ask others before I build my own annotation.db for ragene10st. I've done it for Illumina arrays before, but it has been awhile, and it is a little bit of a pain for Windows users to do. Just wanted to get a second opinion before I go down that road, especially since this is my first time dealing with ST arrays. Thanks, Wade On Feb 10, 3:13 am, Mark Robinson mrobin...@wehi.edu.au wrote: Hi Simon. See comments below. I am using the mouse gene ST arrays and am having problems with annotation. When i write a csv file, theannotationis only the probeset_id, no gene names or accession numbers etc. That's what it should be. Actually, its the 'transcript_cluster_id'. Previously, Affy did not provideannotationat the probeset level. The CDF file just contains the identifiers. Linking results (e.g. expression summaries) to theannotationcan be done with other R packages. For example, here is some code I gave Sebastien a few weeks ago that will get you started (just replace hugene10st.db with mogene10st.db): - Say you have some Affy identifiers: myids [1] 7950136 7955845 7955852 7955855 7955858 7955865 7955869 [8] 7955873 7955887 8016433 Load package and read off the gene symbols: library(hugene10st.db) symbols - unlist(as.list(hugene10stSYMBOL)) data.frame(affyid = myids,symbol = symbols[myids]) affyid symbol 7950136 7950136 PHOX2A 7955845 7955845 HOXC13 7955852 7955852 HOXC12 7955855 7955855 HOXC11 7955858 7955858 HOXC10 7955865 7955865 HOXC9 7955869 7955869 HOXC8 7955873 7955873 HOXC6 7955887 7955887 HOXC5 8016433 8016433 HOXB1 Here are some other fields in hugene10st.db: hugene10st hugene10st hugene10stCHRLENGTHS hugene10stENTREZID hugene10stGO2ALLPROBES hugene10stORGANISM hugene10stPMID2PROBE hugene10stUNIPROT hugene10st.db:: hugene10stCHRLOC hugene10stENZYME hugene10stGO2PROBE hugene10stPATH hugene10stPROSITEhugene10st_dbInfo hugene10stACCNUM hugene10stCHRLOCEND hugene10stENZYME2PROBE hugene10stMAP hugene10stPATH2PROBE hugene10stREFSEQ hugene10st_dbconn hugene10stALIAS2PROBEhugene10stENSEMBL hugene10stGENENAME hugene10stMAPCOUNTS hugene10stPFAM hugene10stSYMBOL hugene10st_dbfile hugene10stCHRhugene10stENSEMBL2PROBE hugene10stGO hugene10stOMIM hugene10stPMID hugene10stUNIGENEhugene10st_dbschema ... - These probesets also do not match the probeset_ids from MoGene-1_0-st-v1.na27.mm9
Re: [aroma.affymetrix] Re: custom CDF
Hi Zaid. There is an example flat file (well, 10 lines of it) at that page that Pierre suggested. You'll want to filter the missing lines and make sure all the data in a column is of the same type. Cheers, Mark No. My flat file has some lines with missing values and non integer values. Is there an example flat file that I could try out? Or some database that I could download them from? Thanks for your help On Dec 9, 10:08 pm, Pierre Neuvial pie...@stat.berkeley.edu wrote: Hi Zaid, Does your file satisfy the requirements detailed on the corresponding help page ? http://groups.google.com/group/aroma-affymetrix/web/creating-cdf-file... Pierre On Wed, Dec 9, 2009 at 4:18 PM, zaid z...@genomedx.com wrote: Hello, I'm trying to create a custom CDF file from a flat file uisng the R script provided in this group (flat2Cdf()). I'm running into errors such as incorrect number of columns, integer not found etc. Is there a standard flat file structure required? Or are there any flat files available for download? I just want to try the script and have a standard structure to work with. Thanks for the help. Z -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe from this group, send email to aroma-affymetrix-unsubscr...@googlegroups.com For more options, visit this group athttp://groups.google.com/group/aroma-affymetrix?hl=en- Hide quoted text - - Show quoted text - -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe from this group, send email to aroma-affymetrix-unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/aroma-affymetrix?hl=en __ The information in this email is confidential and intended solely for the addressee. You must not disclose, forward, print or use it without the permission of the sender. __ -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe from this group, send email to aroma-affymetrix-unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/aroma-affymetrix?hl=en
Re: [aroma.affymetrix] Problem generating CDF file (Arabidopsis)
Hi Renyi. The reason we require startpos 0 is that many of the BPMAP files we've looked at have control antigenomic probes in them and these all have startpos=0, so this was an easy way to filter them out. I think you are right though that startpos could start at 0, so your workaround of setting it to 1 should be fine. Of course, the bpmapCluster2Cdf() script is not meant to be a cure-all for everyone's needs. The source code is of course available to you should you wish to do something different. Cheers, Mark On 6-Dec-09, at 10:32 AM, Renyi Liu wrote: Hi, Mark, Thanks for your quick reply and your suggestion. You guessed right: the first startpos is 0 (the probe is obviously mapped right at the beginning of the chromosome). I will change that number to 1 for now, but in my understanding, startpos in bpmap file does start at 0 (not 1), why your script does not allow it? Thanks, Renyi On Sat, Dec 5, 2009 at 2:43 PM, Mark Robinson mrobin...@wehi.edu.au wrote: Hi Renyi. One thing to do is check that the genome positions in your BPMAP file for chr5 are all 0. To do this, try: library(affxparser) bp - readBpmap(At35b_MR_v04-2_TAIR9_unique.bpmap) z - lapply(bp, FUN=function(u) { print(u$seqInfo[c(groupname,fullname)]); cat(sum(u$startpos=0),\n---\n) }) If you see a non-zero number next to chr5, then that is the problem and you'll have to remove those when you create the custom BPMAP. Otherwise, post the output of that command and we'll have to investigate further. Cheers, Mark On 6-Dec-09, at 8:55 AM, Renyi Liu wrote: Hi, All, I am trying to generating a CDF file for Arabidopsis Tiling array 1.0R from a custom bpmap file that I created (it contains only probes that map to a single location to the TAIR9 genome, no control probesets). I used bpmapCluster2Cdf script with the following command: bpmapCluster2Cdf(At35b_MR_v04-2_TAIR9_unique.bpmap, At35b_MR_v04,rows=2560,cols=2560,groupName=At, verbose=-20) It works well for all chromosomes except chr5 because there is a message saying Skipping all 657459 probes for At:TAIR9;chr5. I certainly do not want to skip a whole chromosome. Can you please tell me what is going on and how I can correct it? Thanks, Renyi -- Renyi Liu, PhD Assistant Professor Department of Botany and Plant Sciences 3109 Batchelor Hall University of California, Riverside Riverside, CA 92521 Email: renyi@ucr.edu Phone: (951)827-3987 Fax: (951)827-4437 -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe from this group, send email to aroma-affymetrix-unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/aroma-affymetrix?hl=en -- Mark Robinson, PhD (Melb) Epigenetics Laboratory, Garvan Bioinformatics Division, WEHI e: m.robin...@garvan.org.au e: mrobin...@wehi.edu.au p: +61 (0)3 9345 2628 f: +61 (0)3 9347 0852 -- -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group. To post to this group, send email to aroma- affymet...@googlegroups.com To unsubscribe from this group, send email to aroma-affymetrix-unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/aroma-affymetrix?hl=en -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe from this group, send email to aroma-affymetrix-unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/aroma-affymetrix?hl=en -- Mark Robinson, PhD (Melb) Epigenetics Laboratory, Garvan Bioinformatics Division, WEHI e: m.robin...@garvan.org.au e: mrobin...@wehi.edu.au p: +61 (0)3 9345 2628 f: +61 (0)3 9347 0852 -- -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You
[aroma.affymetrix] Re: more question on tiling array
Hi Renyi. Even though your questions aren't aroma.affymetrix-specific, I'll send my responses to the aroma.affymetrix list in case others have the same questions. Comments below. On 6-Dec-09, at 6:46 PM, Renyi Liu wrote: Hi, Mark, Thanks again for helping me with my previous question on CDF file. I have read your article on Promoter 1.0R Tiling array analysis and would like to apply MAT normalization to my dataset. Again, my dataset is from Arabidopsis and I need to create my own CDF file. For MAT analysis, we need to put copy number of probes into the bpmap file (and the CDF file). If you got a chance, I'd appreciate your help on a couple of more questions: You may be able to use the BPMAP file for Arabidopsis from the MAT website: http://liulab.dfci.harvard.edu/MAT/Download.htm ... that would already have copy number in there. (1) should I filter out probes with copy number 10? (their xMAN paper states that they remove probes with copy number 10) Yes, that's what they do for the BPMAP files at the MAT website. (2) for probes that map to multiple locations, how many entries need to included? (i.e. should I just randomly choose one of the mapping locations and put in startpos?). If multiple locations are included, the total number of probes will be much larger than the original affymetrix bpmap file. You would want to keep all of them I guess, except for the 10 hits that you've filtered. So, yes, you would end up with more probes than what you started with, but I'd expect that it wouldn't be *much* larger, since (at least for human) only a small percentages of probes map to multiple locations. (3) for MeDIP-chip data, which normalization method is the best choice in your opinion? We typically use MAT for normalization, but you might be interested is this comparison: http://www.biomedcentral.com/1471-2105/10/204 Hope that helps. Cheers, Mark Many thanks, Renyi -- Renyi Liu, PhD Assistant Professor Department of Botany and Plant Sciences 3109 Batchelor Hall University of California, Riverside Riverside, CA 92521 Email: renyi@ucr.edu Phone: (951)827-3987 Fax: (951)827-4437 -- Mark Robinson, PhD (Melb) Epigenetics Laboratory, Garvan Bioinformatics Division, WEHI e: m.robin...@garvan.org.au e: mrobin...@wehi.edu.au p: +61 (0)3 9345 2628 f: +61 (0)3 9347 0852 -- -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe from this group, send email to aroma-affymetrix-unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/aroma-affymetrix?hl=en
Re: [aroma.affymetrix] Debate sobre gene-1-0-st-array-analysis
Hi Germán. Try this link: http://www.biomedcentral.com/1471-2105/10/156 Cheers, Mark On 5-Dec-09, at 12:33 AM, Germán González wrote: There has been some recent work that suggests you can use the Gene arrays to do splicing analysis. Can you give me sites or any references about that? thanks, german -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe from this group, send email to aroma-affymetrix-unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/aroma-affymetrix?hl=en -- Mark Robinson, PhD (Melb) Epigenetics Laboratory, Garvan Bioinformatics Division, WEHI e: m.robin...@garvan.org.au e: mrobin...@wehi.edu.au p: +61 (0)3 9345 2628 f: +61 (0)3 9347 0852 -- -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe from this group, send email to aroma-affymetrix-unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/aroma-affymetrix?hl=en
Re: [aroma.affymetrix] Re: exon array analysis errors
Hi Yu Chuan. As I mentioned before, I was unable to reproduce your error from a test dataset on my system. Its hard to know the history of your environment, so what I suggest you do is start from a brand new session, and run the commands start to finish. There are a couple ways to do this. By brut strength, you could delete the relevant plmData/ and probeData/ directories and when you run the commands, aroma.affymetrix will just recreate them. Alternatively (and more elegantly?), you can add a 'force=TRUE' argument to all the process() and fit() commands that you use. Hope that helps. Cheers, Mark On 2-Dec-09, at 11:59 AM, Yu Chuan wrote: Mark, I pulled out some chip effectsthe NUSE are still all 0s. cesTr - getChipEffectSet(plmTr) trFit - extractDataFrame(cesTr, units=1:3, addNames=TRUE) dim(trFit) [1] 3 13 trFit unitName groupName unit group cell 20091119_Colon4_Exon2 1 2315251 23152521 11 21.13534 2 2315373 23153742 13 21.74671 3 2315554 23155863 17 22.94160 20091119_Colon4_Exon3 20091119_UBR_Exon1 20091119_UBR_Exon2 1 22.23353 21.05928 21.57542 2 21.51784 21.39991 22.58863 3 22.78790 22.42114 23.21064 20091119_UBR_Exon3 20091119_UHR_Exon1 20091119_UHR_Exon2 20091119_UHR_Exon3 1 21.29856 22.37293 22.17496 21.97057 2 21.72278 22.74162 22.76515 22.23168 3 22.39577 23.52349 23.74061 22.97983 qamTr - QualityAssessmentModel(plmTr) z - plotNuse(qamTr) z $`20091119_Colon4_Exon2` $`20091119_Colon4_Exon2`$stats [1] 0 0 0 0 0 $`20091119_Colon4_Exon2`$n [1] 18705 $`20091119_Colon4_Exon2`$conf [1] 0 0 $`20091119_Colon4_Exon2`$out numeric(0) $`20091119_Colon4_Exon3` $`20091119_Colon4_Exon3`$stats [1] 0 0 0 0 0 $`20091119_Colon4_Exon3`$n [1] 18705 $`20091119_Colon4_Exon3`$conf [1] 0 0 $`20091119_Colon4_Exon3`$out numeric(0) $`20091119_UBR_Exon1` $`20091119_UBR_Exon1`$stats [1] 0 0 0 0 0 $`20091119_UBR_Exon1`$n [1] 18705 $`20091119_UBR_Exon1`$conf [1] 0 0 $`20091119_UBR_Exon1`$out numeric(0) $`20091119_UBR_Exon2` $`20091119_UBR_Exon2`$stats [1] 0 0 0 0 0 $`20091119_UBR_Exon2`$n [1] 18705 $`20091119_UBR_Exon2`$conf [1] 0 0 $`20091119_UBR_Exon2`$out numeric(0) $`20091119_UBR_Exon3` $`20091119_UBR_Exon3`$stats [1] 0 0 0 0 0 $`20091119_UBR_Exon3`$n [1] 18705 $`20091119_UBR_Exon3`$conf [1] 0 0 $`20091119_UBR_Exon3`$out numeric(0) $`20091119_UHR_Exon1` $`20091119_UHR_Exon1`$stats [1] 0 0 0 0 0 $`20091119_UHR_Exon1`$n [1] 18705 $`20091119_UHR_Exon1`$conf [1] 0 0 $`20091119_UHR_Exon1`$out numeric(0) $`20091119_UHR_Exon2` $`20091119_UHR_Exon2`$stats [1] 0 0 0 0 0 $`20091119_UHR_Exon2`$n [1] 18705 $`20091119_UHR_Exon2`$conf [1] 0 0 $`20091119_UHR_Exon2`$out numeric(0) $`20091119_UHR_Exon3` $`20091119_UHR_Exon3`$stats [1] 0 0 0 0 0 $`20091119_UHR_Exon3`$n [1] 18705 $`20091119_UHR_Exon3`$conf [1] 0 0 $`20091119_UHR_Exon3`$out numeric(0) attr(,type) [1] NUSE On Nov 30, 1:28 am, Mark Robinson mrobin...@wehi.edu.au wrote: Hi Yu Chuan. I'm still mystified by this. Can you check that the PLM was successfully fit? Pull out some chip effects, maybe. Cheers, Mark On 25-Nov-09, at 4:51 AM, Yu Chuan wrote: Mark, I think the below info. may help too. Looks like all the gene-level NUSE are 0. How could this happen? z - plotNuse(qamTr) z $`20091119_Colon4_Exon2` $`20091119_Colon4_Exon2`$stats [1] 0 0 0 0 0 $`20091119_Colon4_Exon2`$n [1] 18705 $`20091119_Colon4_Exon2`$conf [1] 0 0 $`20091119_Colon4_Exon2`$out numeric(0) $`20091119_Colon4_Exon3` $`20091119_Colon4_Exon3`$stats [1] 0 0 0 0 0 $`20091119_Colon4_Exon3`$n [1] 18705 $`20091119_Colon4_Exon3`$conf [1] 0 0 $`20091119_Colon4_Exon3`$out numeric(0) $`20091119_UBR_Exon1` $`20091119_UBR_Exon1`$stats [1] 0 0 0 0 0 $`20091119_UBR_Exon1`$n [1] 18705 $`20091119_UBR_Exon1`$conf [1] 0 0 $`20091119_UBR_Exon1`$out numeric(0) $`20091119_UBR_Exon2` $`20091119_UBR_Exon2`$stats [1] 0 0 0 0 0 $`20091119_UBR_Exon2`$n [1] 18705 $`20091119_UBR_Exon2`$conf [1] 0 0 $`20091119_UBR_Exon2`$out numeric(0) $`20091119_UBR_Exon3` $`20091119_UBR_Exon3`$stats [1] 0 0 0 0 0 $`20091119_UBR_Exon3`$n [1] 18705 $`20091119_UBR_Exon3`$conf [1] 0 0 $`20091119_UBR_Exon3`$out numeric(0) $`20091119_UHR_Exon1` $`20091119_UHR_Exon1`$stats [1] 0 0 0 0 0 $`20091119_UHR_Exon1`$n [1] 18705 $`20091119_UHR_Exon1`$conf [1] 0 0 $`20091119_UHR_Exon1`$out numeric(0) $`20091119_UHR_Exon2` $`20091119_UHR_Exon2`$stats [1] 0 0 0 0 0 $`20091119_UHR_Exon2`$n [1] 18705 $`20091119_UHR_Exon2`$conf [1] 0 0 $`20091119_UHR_Exon2`$out numeric(0) $`20091119_UHR_Exon3` $`20091119_UHR_Exon3`$stats
Re: [aroma.affymetrix] Re: apt- affymetrix power tool
Hi Elai/Zaid. You'll want to be careful with all this (i.e. linear models on unnormalized data ... or maybe you are standardizing some other way), but yes you can run the probe level model fits on any AffymetrixCelSet object. The standard RMA procedure would BG adjust, then quantile normalize, then fit the PLMs: cs - AffymetrixCelSet$byName(BCGC_2006, cdf=cdf) bc - RmaBackgroundCorrection(cs, tag=coreR2) csBC - process(bc,verbose=verbose) qn - QuantileNormalization(csBC, typesToUpdate=pm) csN - process(qn, verbose=verbose) plmTr - ExonRmaPlm(csN, mergeGroups=TRUE) fit(plmTr, verbose=verbose) You could replace this simply with: csX - AffymetrixCelSet$byName(BCGC_2006, cdf=cdf) [...maybe do something else to csX...] plmTr - ExonRmaPlm(csX, mergeGroups=TRUE) # put in the AffymetrixCelSet here you want to fit PLM on fit(plmTr, verbose=verbose) Cheers, Mark On 30-Nov-09, at 3:32 PM, davic...@gmail.com wrote: Henrik Is it possible to use aroma to run an RMA implementation without quantile normalization? Zaid- have you tried this? Best, Elai CSO GenomeDx Biosciences On Nov 28, 5:51 am, Henrik Bengtsson h...@stat.berkeley.edu wrote: Hi Zaid, I think you have mistaken the aroma.affymetrix mailing list as being a mailing list for Affymetrix software - this forum is only for aroma.affymetrix related topics. Please use the appropriate official Affymetrix forum for their APT software: https://www.affymetrix.com/community/forums/index.jspa That way you also know you will get the correct answer from the correct source. Your question might even have been answered there before (I don't know). /HenrikOn Thu, Nov 26, 2009 at 9:22 PM, zaid z...@genomedx.com wrote: I tried running the 64 bit version of the command tool apt. I was not able to find any information to run the command with no normalization. I tried many different commands such as: apt-probeset-summarize -a rma- bg,pm-only,sea Any ideas on how I can run that tool with no normlization. Thanks -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe from this group, send email to aroma-affymetrix-unsubscr...@googlegroups.com For more options, visit this group athttp://groups.google.com/group/aroma-affymetrix?hl=en -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe from this group, send email to aroma-affymetrix-unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/aroma-affymetrix?hl=en -- Mark Robinson, PhD (Melb) Epigenetics Laboratory, Garvan Bioinformatics Division, WEHI e: m.robin...@garvan.org.au e: mrobin...@wehi.edu.au p: +61 (0)3 9345 2628 f: +61 (0)3 9347 0852 -- -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe from this group, send email to aroma-affymetrix-unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/aroma-affymetrix?hl=en
Re: [aroma.affymetrix] Re: exon array analysis errors
- QualityAssessmentModel(plmTr) plotNuse(qamTr) plotRle(qamTr) Error in plot.window(xlim = xlim, ylim = ylim, log = log, yaxs = pars $yaxs) : need finite 'ylim' values In addition: There were 16 warnings (use warnings() to see them) qamEx - QualityAssessmentModel(plmEx) plotNuse(qamEx) plotRle(qamEx) z - plotNuse(qamTr) plotBoxplotStats(z, ylim=c(-0.01,0.01)) sessionInfo() R version 2.9.2 (2009-08-24) i386-pc-mingw32 locale: LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States. 1252;LC_MONETARY=English_United States. 1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] aroma.affymetrix_1.2.0 aroma.apd_0.1.6 affxparser_1.16.0 [4] R.huge_0.1.9 aroma.core_1.2.0 aroma.light_1.12.2 [7] matrixStats_0.1.6 R.rsp_0.3.6 R.filesets_0.5.3 [10] digest_0.4.1 R.cache_0.1.9 R.utils_1.2.0 [13] R.oo_1.5.0 R.methodsS3_1.0.3 traceback() 8: plot.window(xlim = xlim, ylim = ylim, log = log, yaxs = pars$yaxs) 7: bxp(bxpStats, ylim = ylim, outline = outline, las = las, ...) 6: plotBoxplotStats.list(stats, main = main, ylab = ylab, ...) 5: plotBoxplotStats(stats, main = main, ylab = ylab, ...) 4: plotBoxplot.ChipEffectSet(ces, type = RLE, ...) 3: plotBoxplot(ces, type = RLE, ...) 2: plotRle.QualityAssessmentModel(qamTr) 1: plotRle(qamTr) On Nov 24, 1:50 am, Mark Robinson mrobin...@wehi.edu.au wrote: Hi Yu Chuan. Comments below. On 24-Nov-09, at 1:00 PM, Yu Chuan wrote: Hi, I am pre-processing 8 exon arrays (Hu-Ex-1_0-st-v2) and doing quality assessment. When I plotted the NUSE using plotNUSE, I found that the y- axis limit is too wide, such that the boxplots were all squeezed tightly around 0 and it's hard to see what's going on there. Is there any way I can change the y-axis limit? I tried I assume you mean tightly around 1? That's where they should be. plotNuse(qamTr,ylim=c(-0.2,0.2)) Error in boxplot.stats(stdvs/medianSE, ...) : unused argument(s) (ylim = c(-0.2, 0.2)) An easy work-around for this is: z - plotNuse(qamTr) plotBoxplotStats(z, ylim=c(.5,2)) I'm unable to recreate these errors below on a local dataset. They all work fine for me. Here is my complete set of commands from a fresh R session, as described in the exon array vignette page: http://groups.google.com/group/aroma-affymetrix/web/human-exon-array- ... library(aroma.affymetrix) cdf - AffymetrixCdfFile$byChipType(chipType, tags=coreR3,A20071112,EP) cs - AffymetrixCelSet$byName(tissues, cdf=cdf) setCdf(cs,cdf) bc - RmaBackgroundCorrection(cs, tag=coreR2) csBC - process(bc,verbose=verbose) qn - QuantileNormalization(csBC, typesToUpdate=pm) csN - process(qn, verbose=verbose) plmTr - ExonRmaPlm(csN, mergeGroups=TRUE) fit(plmTr, verbose=verbose) rs - calculateResiduals(plmTr, verbose=verbose) qamTr - QualityAssessmentModel(plmTr) plotNuse(qamTr) z - plotNuse(qamTr) plotBoxplotStats(z, ylim=c(.5,2)) plotRle(z) Have you fit the probe level model in advance of these commands? Given that your NUSE values are tightly around 0, I suspect maybe not. Otherwise, can you give a complete code example, and maybe run it from a fresh R session and check whether that solves your problem. And, as usual, if you get an error, the output of traceback() is much appreciated ... and of course, the output of your sessionInfo(). Hope that helps. Cheers, Mark ps. my sessionInfo(): sessionInfo() R version 2.10.0 (2009-10-26) i386-apple-darwin9.8.0 locale: [1] en_CA.UTF-8/en_CA.UTF-8/C/C/en_CA.UTF-8/en_CA.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] preprocessCore_1.7.9 Biobase_2.5.8 aroma.affymetrix_1.2.0 [4] aroma.apd_0.1.7affxparser_1.17.5 R.huge_0.2.0 [7] aroma.core_1.2.0 aroma.light_1.13.5 matrixStats_0.1.6 [10] R.rsp_0.3.6R.filesets_0.5.3 digest_0.4.1 [13] R.cache_0.2.0 R.utils_1.2.2 R.oo_1.6.2 [16] affy_1.23.12 R.methodsS3_1.0.3 loaded via a namespace (and not attached): [1] affyio_1.13.5 But it didn't work. In addition, I got the following error when I used plotRLE plotRle(qamTr) Error in plot.window(xlim = xlim, ylim = ylim, log = log, yaxs = pars $yaxs) : need finite 'ylim' values In addition: There were 16 warnings (use warnings() to see them) warnings() Warning messages: 1: In min(x) : no non-missing arguments to min; returning Inf 2: In max(x) : no non-missing arguments to max; returning -Inf 3: In min(x) : no non-missing arguments to min; returning Inf 4: In max(x) : no non-missing arguments to max; returning -Inf 5: In min(x) : no non-missing arguments to min; returning Inf 6: In max(x) : no non-missing arguments to max; returning -Inf
Re: [aroma.affymetrix] exon array analysis errors
Hi Yu Chuan. Comments below. On 24-Nov-09, at 1:00 PM, Yu Chuan wrote: Hi, I am pre-processing 8 exon arrays (Hu-Ex-1_0-st-v2) and doing quality assessment. When I plotted the NUSE using plotNUSE, I found that the y- axis limit is too wide, such that the boxplots were all squeezed tightly around 0 and it's hard to see what's going on there. Is there any way I can change the y-axis limit? I tried I assume you mean tightly around 1? That's where they should be. plotNuse(qamTr,ylim=c(-0.2,0.2)) Error in boxplot.stats(stdvs/medianSE, ...) : unused argument(s) (ylim = c(-0.2, 0.2)) An easy work-around for this is: z - plotNuse(qamTr) plotBoxplotStats(z, ylim=c(.5,2)) I'm unable to recreate these errors below on a local dataset. They all work fine for me. Here is my complete set of commands from a fresh R session, as described in the exon array vignette page: http://groups.google.com/group/aroma-affymetrix/web/human-exon-array-analysis library(aroma.affymetrix) cdf - AffymetrixCdfFile$byChipType(chipType, tags=coreR3,A20071112,EP) cs - AffymetrixCelSet$byName(tissues, cdf=cdf) setCdf(cs,cdf) bc - RmaBackgroundCorrection(cs, tag=coreR2) csBC - process(bc,verbose=verbose) qn - QuantileNormalization(csBC, typesToUpdate=pm) csN - process(qn, verbose=verbose) plmTr - ExonRmaPlm(csN, mergeGroups=TRUE) fit(plmTr, verbose=verbose) rs - calculateResiduals(plmTr, verbose=verbose) qamTr - QualityAssessmentModel(plmTr) plotNuse(qamTr) z - plotNuse(qamTr) plotBoxplotStats(z, ylim=c(.5,2)) plotRle(z) Have you fit the probe level model in advance of these commands? Given that your NUSE values are tightly around 0, I suspect maybe not. Otherwise, can you give a complete code example, and maybe run it from a fresh R session and check whether that solves your problem. And, as usual, if you get an error, the output of traceback() is much appreciated ... and of course, the output of your sessionInfo(). Hope that helps. Cheers, Mark ps. my sessionInfo(): sessionInfo() R version 2.10.0 (2009-10-26) i386-apple-darwin9.8.0 locale: [1] en_CA.UTF-8/en_CA.UTF-8/C/C/en_CA.UTF-8/en_CA.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] preprocessCore_1.7.9 Biobase_2.5.8 aroma.affymetrix_1.2.0 [4] aroma.apd_0.1.7affxparser_1.17.5 R.huge_0.2.0 [7] aroma.core_1.2.0 aroma.light_1.13.5 matrixStats_0.1.6 [10] R.rsp_0.3.6R.filesets_0.5.3 digest_0.4.1 [13] R.cache_0.2.0 R.utils_1.2.2 R.oo_1.6.2 [16] affy_1.23.12 R.methodsS3_1.0.3 loaded via a namespace (and not attached): [1] affyio_1.13.5 But it didn't work. In addition, I got the following error when I used plotRLE plotRle(qamTr) Error in plot.window(xlim = xlim, ylim = ylim, log = log, yaxs = pars $yaxs) : need finite 'ylim' values In addition: There were 16 warnings (use warnings() to see them) warnings() Warning messages: 1: In min(x) : no non-missing arguments to min; returning Inf 2: In max(x) : no non-missing arguments to max; returning -Inf 3: In min(x) : no non-missing arguments to min; returning Inf 4: In max(x) : no non-missing arguments to max; returning -Inf 5: In min(x) : no non-missing arguments to min; returning Inf 6: In max(x) : no non-missing arguments to max; returning -Inf 7: In min(x) : no non-missing arguments to min; returning Inf 8: In max(x) : no non-missing arguments to max; returning -Inf 9: In min(x) : no non-missing arguments to min; returning Inf 10: In max(x) : no non-missing arguments to max; returning -Inf 11: In min(x) : no non-missing arguments to min; returning Inf 12: In max(x) : no non-missing arguments to max; returning -Inf 13: In min(x) : no non-missing arguments to min; returning Inf 14: In max(x) : no non-missing arguments to max; returning -Inf 15: In min(x) : no non-missing arguments to min; returning Inf 16: In max(x) : no non-missing arguments to max; returning -Inf Any idea about how to fix this? Thanks! Yu Chuan -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe from this group, send email to aroma-affymetrix-unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/aroma-affymetrix?hl=en -- Mark Robinson, PhD (Melb) Epigenetics Laboratory, Garvan Bioinformatics Division, WEHI e: m.robin...@garvan.org.au e: mrobin...@wehi.edu.au p: +61 (0)3 9345 2628 f: +61 (0)3 9347 0852 -- -- When reporting problems
Re: [aroma.affymetrix] how to analysis the FIRMA score
Hi Jiang. A couple quick comments below. On 20-Nov-09, at 10:07 AM, camelbbs wrote: Hi, After I got the firma scores, how can i analyze it. I see boxplot of firma scores in the paper. So how i can get the same result. You can use the boxplot() command on your matrix of FIRMA scores? That's how the boxplot in the paper was made. I want to check the alternative splicing between our several samples. Now I have got the score of each one but How to compare them. You've read the paper. So, you'll know that extreme FIRMA scores (i.e. large positive/negative) represent putative differential splicing events. So, in general, you are looking for large (in magnitude) values. If you have replicates, maybe you want to look for significant changes in FIRMA scores between groups. Hope that helps. Cheers, Mark Thanks very much. Jiang -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe from this group, send email to aroma-affymetrix-unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/aroma-affymetrix?hl=en -- Mark Robinson, PhD (Melb) Epigenetics Laboratory, Garvan Bioinformatics Division, WEHI e: m.robin...@garvan.org.au e: mrobin...@wehi.edu.au p: +61 (0)3 9345 2628 f: +61 (0)3 9347 0852 -- -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe from this group, send email to aroma-affymetrix-unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/aroma-affymetrix?hl=en
extracting FIRMA scores (Was: [aroma.affymetrix] a question)
Hi Jiang. Its helpful to use meaningful subject headers, so that others can search the mailing lists. So, I've changed your messages to a new thread. Comments below. Hi, I have a question can you help me. That's about using FIRMA. I cannot get the result after I run FIRMA. I only found the score.CEL files but they cannot be opened. So may you give me some suggestion? Thanks very much. Best, Jiang You shouldn't need to deal directly with the CEL files. You can access the FIRMA scores by using extractDataFrame(). For example: [...] firma - FirmaModel(plm) fit(firma, verbose=verbose) fs - getFirmaScores(firma) fsDf - extractDataFrame(fs) The 'fsDf' should be a data.frame object containing the FIRMA scores, assuming you've run all the previous steps mentioned in the vignette: http://groups.google.com/group/aroma-affymetrix/web/human-exon-array-analysis Hi, When I run FIRMA, I got the FIRMAscores.CEL files, but I cannot open it using TXT. They are binary files, but as mentioned above, you shouldn't need to deal with them directly. So how I can check my result of the FIRMA. The result folder is empty. Another, if I want to add the samples groups info to the exon array, what is the format of it? I wirte like this: sample1 group1 sample2 group1 sample3 group2 sample4 group3 and put it in the ..\annotationData\samples\, that's right? Thanks, Jiang FIRMA is a single sample method, in the sense that it doesn't need replicates. Therefore, it does not need this information. Cheers, Mark -- Mark Robinson, PhD (Melb) Epigenetics Laboratory, Garvan Bioinformatics Division, WEHI e: m.robin...@garvan.org.au e: mrobin...@wehi.edu.au p: +61 (0)3 9345 2628 f: +61 (0)3 9347 0852 -- -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe from this group, send email to aroma-affymetrix-unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/aroma-affymetrix?hl=en
Re: [aroma.affymetrix] FIRMAGene with masked CEL files
To unsubscribe from this group, send email to aroma-affymetrix-unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/aroma-affymetrix?hl=en -- Mark Robinson, PhD (Melb) Epigenetics Laboratory, Garvan Bioinformatics Division, WEHI e: m.robin...@garvan.org.au e: mrobin...@wehi.edu.au p: +61 (0)3 9345 2628 f: +61 (0)3 9347 0852 -- -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe from this group, send email to aroma-affymetrix-unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/aroma-affymetrix?hl=en
[aroma.affymetrix] Re: Discussion on using-the-genomegraphs-package-with-firma
Hi Lasse. Yes, the 'plm' mentioned on that page should be the 'plmTr' object (with mergeGroups=TRUE as you have) mentioned in the HuEx vignette: http://groups.google.com/group/aroma-affymetrix/web/human-exon-array-analysis Hope that helps. Cheers, Mark On 26-Oct-09, at 9:20 PM, Lasse wrote: Thanks for sharing your code. I don't really understand - when getting the PLM for this instruction are you supposed to set mergeGroups to true like this: plm - ExonRmaPlm(celSet, mergeGroups=TRUE) -L -- Mark Robinson, PhD (Melb) Epigenetics Laboratory, Garvan Bioinformatics Division, WEHI e: m.robin...@garvan.org.au e: mrobin...@wehi.edu.au p: +61 (0)3 9345 2628 f: +61 (0)3 9347 0852 -- --~--~-~--~~~---~--~~ When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe from this group, send email to aroma-affymetrix-unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/aroma-affymetrix?hl=en -~--~~~~--~~--~--~---
[aroma.affymetrix] Re: error in exon array analysis: fit(plmTr, verbose=verbose)
Hi Elizabeth. Indeed, this is certainly a R 2.10 problem, related to changes in 'preprocessCore' ... processCore v 1.7.9 (BioC 2.5, R 2.10.x): SEXP R_rlm_rma_default_model(SEXP Y, SEXP PsiCode, SEXP PsiK, SEXP Scales){ processCore v 1.6.0 (BioC 2.4, R 2.9.x): SEXP R_rlm_rma_default_model(SEXP Y, SEXP PsiCode, SEXP PsiK){ The fix is probably quite easy, but we'll need to update this (I haven't begun my migration to 2.10 yet ...). And, surely other routines will be affected. Henrik: are you planning a release in the near future that works with R 2.10? Cheers, Mark On 27-Oct-09, at 11:35 AM, Elizabeth Purdom wrote: Hi Mark, I am running into the same problem with 2.10.0 that I just installed (see session info below). It appears to be a problem with preprocessCore function that does the rma fit having changed its format. In which case, I think this a more general problem (i.e. not just for the exon array). Best, Elizabeth 20091026 17:32:55| Identifying non-fitted units in chip-effect file...done 20091026 17:32:55| Identifying non-estimated units...done 20091026 17:32:55| Getting model fit for 23885 units. Loading required package: preprocessCore simpleError in .Call(R_rlm_rma_default_model, y, psiCode, psiK, PACKAGE = rlmPkg): Incorrect number of arguments (3), expecting 4 for R_rlm_rma_default_model Error in list(`fit(plmColonCCL2run, verbose = verbose)` = environment, : [2009-10-26 17:32:55] Exception: The fit function for requested exon RMA PLM failed at throw(Exception(...)) at throw.default(The fit function for requested exon RMA PLM failed) at throw(The fit function for requested exon RMA PLM failed) at getFitUnitGroupFunction.ExonRmaPlm(this, ...) at getFitUnitGroupFunction(this, ...) at getFitUnitFunction.MultiArrayUnitModel(this) at getFitUnitFunction(this) at fit.ProbeLevelModel(plmColonCCL2run, verbose = verbose) at fit(plmColonCCL2run, verbose = verbose) 20091026 17:32:55|Fitting model of class ExonRmaPlm:...done sessionInfo() R version 2.10.0 (2009-10-26) x86_64-apple-darwin9.8.0 locale: [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] preprocessCore_1.7.9 aroma.affymetrix_1.2.0 aroma.apd_0.1.7 affxparser_1.17.5 R.huge_0.2.0 aroma.core_1.2.0 aroma.light_1.13.6 [8] matrixStats_0.1.6 R.rsp_0.3.6R.filesets_0.5.3 digest_0.4.1 R.cache_0.2.0 R.utils_1.2.2 R.oo_1.6.2 [15] R.methodsS3_1.0.3 projectManager_1.0 XML_2.6-0 Mark Robinson wrote: Hi Hailei. For starters, can you give the *full* output of your sessionInfo()? The error you are getting has something to do with the 'preprocessCore' package and I first want to check whether it is a package version mismatch error. I haven't used aroma.affymetrix on R 2.10 and I don't know if anyone else has either. You could try all this on R 2.9.2 ... Cheers, Mark Dear All, I am analyzing human affy exon arrays for first time. I followed the steps listed in website: http://groups.google.com/group/aroma-affymetrix/web/human-exon-array-analysis In summarization step, I met an error when I began to fit the PLM to all of the data. Thanks Hailei my R session: sessionInfo() R version 2.10.0 Under development (unstable) (2009-08-10 r49148) x86_64-unknown-linux-gnu My script: library(aroma.affymetrix) verbose - Arguments$getVerbose(-8,timestamp=TRUE) chipType - HuEx-1_0-st-v2 cdf - AffymetrixCdfFile$byChipType(chipType, tags=coreR3,A20071112,EP) print(cdf) cs - AffymetrixCelSet$byName(tissues,cdf=cdf) print(cs) bc - RmaBackgroundCorrection(cs, tag=coreR3) csBC - process(bc,verbose=verbose) qn - QuantileNormalization(csBC, typesToUpdate=pm) print(qn) csN - process(qn, verbose=verbose) plmTr - ExonRmaPlm(csN, mergeGroups=TRUE) print(plmTr) fit(plmTr,verbose=verbose) The error is: fit(plmTr, verbose=verbose) 20091015 15:49:54|Fitting model of class ExonRmaPlm:... ExonRmaPlm: Data set: tissues Chip type: HuEx-1_0-st-v2,coreR3,A20071112,EP Input tags: coreR3,QN Output tags: coreR3,QN,RMA,merged Parameters: (probeModel: chr pm; shift: num 0; flavor: chr affyPLM; treatNAsAs: chr weights; mergeGroups: logi TRUE). Path: plmData/tissues,coreR3,QN,RMA,merged/HuEx-1_0-st-v2 RAM: 0.01MB 20091015 15:49:54| Identifying non-estimated units... 20091015 15:49:54| Identifying non-fitted units in chip-effect file... 20091015 15:49:54| Pathname: plmData/tissues,coreR3,QN,RMA,merged/ HuEx-1_0-st-v2/RD2009092837,chipEffects.CEL 20091015 15:49:54| Found indices cached on file 20091015 15:49:54| Reading data for these 18708 cells... 20091015 15:49:54| Reading data for these 18708 cells...done 20091015 15:49:54| Looking for stdvs = 0 indicating non-estimated units: int [1:18708] 1 2 3 4 5 6 7
[aroma.affymetrix] Re: selection of CDF and PA call for exon array
Hi Yong. Comments below. On 21-Oct-09, at 9:11 AM, Yong wrote: Dear All, I am a newbie of aroma.affymetrix package. After reading user guild and Human exon array analysis case study, I was able to process my mouse exon array data and generate gene-level expression intensity. But I still have two questions: 1. There is one Ensembl gene based mouse CDF file available, http://groups.google.com/group/aroma-affymetrix/web/moex-1-0-st . Also, http://brainarray.mbni.med.umich.edu/Brainarray/Database/CustomCDF/11.0.1/ensg.asp provides another one based on Ensembl genes. I am wondering which one is better for generation of gene-level summarization. That depends on what you mean by and how you define better. These different CDFs are just rearrangements of what probes go in what probesets, based on the annotation sources. From what I recall, you won't be able to use the brainarray CDFs for FIRMA analysis, since that requires the CDF to be stored in a hierarchical way (exon probesets within gene probesets). I would also guess that if those 2 CDFs are based on the same Ensembl gene build, then the gene-level summarization would essentially be the same. 2. Is it possible to generate gene-level PA (presence/absence) call for exon array? As far as I know, there is no implementation in aroma.affymetrix for this. Cheers, Mark Many thanks ahead. Yong Zhang Ph.D, Research Scholar Manyuan Long's Lab University of Chicago -- Mark Robinson, PhD (Melb) Epigenetics Laboratory, Garvan Bioinformatics Division, WEHI e: m.robin...@garvan.org.au e: mrobin...@wehi.edu.au p: +61 (0)3 9345 2628 f: +61 (0)3 9347 0852 -- --~--~-~--~~~---~--~~ When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe from this group, send email to aroma-affymetrix-unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/aroma-affymetrix?hl=en -~--~~~~--~~--~--~---
[aroma.affymetrix] Re: FIRMA score and limma analysis
Hi Hailei. This has been discussed before as well. You might start with: http://groups.google.com/group/aroma-affymetrix/browse_thread/thread/36d8c59d742fc503/ http://groups.google.com/group/aroma-affymetrix/browse_thread/thread/f4d015cae1848f51/ You should definitely take log2 before proceeding with FIRMA scores. Since they are a summary of residuals for a probeset (see the paper for details), you will have positive and negative numbers. Cheers, Mark On 17-Oct-09, at 3:46 AM, hailei@gmail.com wrote: Dear All, I have 2 tumor samples and 2 control samples and want to find alternative splicing. After I got the FIRMA score, Could I use limma to find the alternative splicing? Before using limma, I did log2 scale to FIRMA score. But I found there are a lot of negative value in data set. It is common? Original Firma score: head(exFirma) unitName groupName unit group cell RD2009092839 RD2009092840 RD2009092841 1 6838637 43049271 110.45864440.2794902 1.8639828 2 6838637 43305951 220.61026090.9152724 1.3151890 3 6838637 43567711 331.65704730.1941053 1.3088931 4 6838637 43663261 440.74172301.8255262 0.7095107 5 6838637 43679511 551.00014531.3591065 0.6875097 6 6838637 43963761 661.40897040.4405679 1.1440083 RD2009092842 11.3710542 21.3612753 30.8766123 40.6292946 51.2785655 60.6761603 log2 score: head(exFirma) unitName groupName unit group cell RD2009092839 RD2009092840 RD2009092841 1 6838637 43049271 11 -1.1245521013 -1.8391304 0.8983885 2 6838637 43305951 22 -0.7125019273 -0.1277269 0.3952701 3 6838637 43567711 33 0.7286147599 -2.3650883 0.3883473 4 6838637 43663261 44 -0.43104758690.8683124 -0.4951036 5 6838637 43679511 55 0.00020963160.4426586 -0.5405480 6 6838637 43963761 66 0.4946412584 -1.1825638 0.1940975 RD2009092842 10.4552856 20.4449589 3 -0.1899892 4 -0.6681926 50.3545261 6 -0.5645628 Thanks Hailei -- Mark Robinson, PhD (Melb) Epigenetics Laboratory, Garvan Bioinformatics Division, WEHI e: m.robin...@garvan.org.au e: mrobin...@wehi.edu.au p: +61 (0)3 9345 2628 f: +61 (0)3 9347 0852 -- --~--~-~--~~~---~--~~ When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe from this group, send email to aroma-affymetrix-unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/aroma-affymetrix?hl=en -~--~~~~--~~--~--~---
[aroma.affymetrix] Re: error in exon array analysis: fit(plmTr, verbose=verbose)
Hi Hailei. For starters, can you give the *full* output of your sessionInfo()? The error you are getting has something to do with the 'preprocessCore' package and I first want to check whether it is a package version mismatch error. I haven't used aroma.affymetrix on R 2.10 and I don't know if anyone else has either. You could try all this on R 2.9.2 ... Cheers, Mark Dear All, I am analyzing human affy exon arrays for first time. I followed the steps listed in website: http://groups.google.com/group/aroma-affymetrix/web/human-exon-array-analysis In summarization step, I met an error when I began to fit the PLM to all of the data. Thanks Hailei my R session: sessionInfo() R version 2.10.0 Under development (unstable) (2009-08-10 r49148) x86_64-unknown-linux-gnu My script: library(aroma.affymetrix) verbose - Arguments$getVerbose(-8,timestamp=TRUE) chipType - HuEx-1_0-st-v2 cdf - AffymetrixCdfFile$byChipType(chipType, tags=coreR3,A20071112,EP) print(cdf) cs - AffymetrixCelSet$byName(tissues,cdf=cdf) print(cs) bc - RmaBackgroundCorrection(cs, tag=coreR3) csBC - process(bc,verbose=verbose) qn - QuantileNormalization(csBC, typesToUpdate=pm) print(qn) csN - process(qn, verbose=verbose) plmTr - ExonRmaPlm(csN, mergeGroups=TRUE) print(plmTr) fit(plmTr,verbose=verbose) The error is: fit(plmTr, verbose=verbose) 20091015 15:49:54|Fitting model of class ExonRmaPlm:... ExonRmaPlm: Data set: tissues Chip type: HuEx-1_0-st-v2,coreR3,A20071112,EP Input tags: coreR3,QN Output tags: coreR3,QN,RMA,merged Parameters: (probeModel: chr pm; shift: num 0; flavor: chr affyPLM; treatNAsAs: chr weights; mergeGroups: logi TRUE). Path: plmData/tissues,coreR3,QN,RMA,merged/HuEx-1_0-st-v2 RAM: 0.01MB 20091015 15:49:54| Identifying non-estimated units... 20091015 15:49:54| Identifying non-fitted units in chip-effect file... 20091015 15:49:54| Pathname: plmData/tissues,coreR3,QN,RMA,merged/ HuEx-1_0-st-v2/RD2009092837,chipEffects.CEL 20091015 15:49:54| Found indices cached on file 20091015 15:49:54| Reading data for these 18708 cells... 20091015 15:49:54| Reading data for these 18708 cells...done 20091015 15:49:54| Looking for stdvs = 0 indicating non-estimated units: int [1:18708] 1 2 3 4 5 6 7 8 9 10 ... 20091015 15:49:54| Identifying non-fitted units in chip-effect file...done 20091015 15:49:54| Identifying non-estimated units...done 20091015 15:49:54| Getting model fit for 18708 units. simpleError in .Call(R_rlm_rma_default_model, y, psiCode, psiK, PACKAGE = rlmPkg): Incorrect number of arguments (3), expecting 4 for R_rlm_rma_default_model Error in list(`fit(plmTr, verbose = verbose)` = environment, `fit.ProbeLevelModel(plmTr, verbose = verbose)` = environment, : [2009-10-15 15:49:54] Exception: The fit function for requested exon RMA PLM failed at throw(Exception(...)) at throw.default(The fit function for requested exon RMA PLM failed) at throw(The fit function for requested exon RMA PLM failed) at getFitUnitGroupFunction.ExonRmaPlm(this, ...) at getFitUnitGroupFunction(this, ...) at getFitUnitFunction.MultiArrayUnitModel(this) at getFitUnitFunction(this) at fit.ProbeLevelModel(plmTr, verbose = verbose) at fit(plmTr, verbose = verbose) 20091015 15:49:54|Fitting model of class ExonRmaPlm:...done --~--~-~--~~~---~--~~ When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe from this group, send email to aroma-affymetrix-unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/aroma-affymetrix?hl=en -~--~~~~--~~--~--~---
[aroma.affymetrix] Re: error in exon analysis
Hi Hailei. This has come up recently: http://groups.google.com/group/aroma-affymetrix/browse_thread/thread/94a7343c92e54946 ... and what happened there is the: fit(plmTr, verbose=verbose) didn't get run. Can you check this, perhaps from a fresh R session? Cheers, Mark ps. I think we are pushing the limits for RMA/FIRMA by running this on just 3 samples. On 16-Oct-09, at 8:55 AM, hailei@gmail.com wrote: my fs shows: head(fs$files) [[1]] FirmaFile: Name: RD2009092835 Tags: FIRMAscores Full name: RD2009092835,FIRMAscores Pathname: firmaData/tissues,coreR3,QN,RMA,merged,FIRMA,medres/ HuEx-1_0- st-v2/RD2009092835,FIRMAscores.CEL File size: 2.72 MB (2846934 bytes) RAM: 104.76 MB File format: v4 (binary; XDA) Platform: Affymetrix Chip type: HuEx-1_0-st-v2,coreR3,A20071112,EP,monocell Timestamp: 2009-10-15 14:55:04 [[2]] FirmaFile: Name: RD2009092836 Tags: FIRMAscores Full name: RD2009092836,FIRMAscores Pathname: firmaData/tissues,coreR3,QN,RMA,merged,FIRMA,medres/ HuEx-1_0- st-v2/RD2009092836,FIRMAscores.CEL File size: 2.72 MB (2846934 bytes) RAM: 0.00 MB File format: v4 (binary; XDA) Platform: Affymetrix Chip type: HuEx-1_0-st-v2,coreR3,A20071112,EP,monocell Timestamp: 2009-10-15 14:55:04 [[3]] FirmaFile: Name: RD2009092837 Tags: FIRMAscores Full name: RD2009092837,FIRMAscores Pathname: firmaData/tissues,coreR3,QN,RMA,merged,FIRMA,medres/ HuEx-1_0- st-v2/RD2009092837,FIRMAscores.CEL File size: 2.72 MB (2846934 bytes) RAM: 0.00 MB File format: v4 (binary; XDA) Platform: Affymetrix Chip type: HuEx-1_0-st-v2,coreR3,A20071112,EP,monocell Timestamp: 2009-10-15 14:55:04 On Oct 15, 5:44 pm, hailei zhang hailei@gmail.com wrote: Dear All, I want to get FIRma score for my exon array. But after I type this commod: exFirma - extractDataFrame(fs,addNames=TRUE,units=NULL) My FIRMA scores are all Nan: head(exFirma) unitName groupName unit group cell RD2009092835 RD2009092836 RD2009092837 1 2315251 23152521 11 NaN NaN NaN 2 2315251 23152531 22 NaN NaN NaN 3 2315373 23153742 13 NaN NaN NaN 4 2315373 23153752 24 NaN NaN NaN 5 2315373 23153762 35 NaN NaN NaN 6 2315373 23153772 46 NaN NaN NaN The following is my script: library(aroma.affymetrix) verbose - Arguments$getVerbose(-8,timestamp=TRUE) chipType - HuEx-1_0-st-v2 cdf - AffymetrixCdfFile$byChipType(chipType, tags=coreR3,A20071112,EP) print(cdf) cs - AffymetrixCelSet$byName(tissues,cdf=cdf) print(cs) bc - RmaBackgroundCorrection(cs, tag=coreR3) csBC - process(bc,verbose=verbose) qn - QuantileNormalization(csBC, typesToUpdate=pm) print(qn) csN - process(qn, verbose=verbose) plmTr - ExonRmaPlm(csN, mergeGroups=TRUE) fit(plmTr,verbose=verbose) rs - calculateResiduals(plmTr, verbose=verbose) firma - FirmaModel(plmTr) fit(firma, verbose=verbose) fs - getFirmaScores(firma) exFirma - extractDataFrame(fs,addNames=TRUE,units=NULL) savehistory(file=Firma_score.commond) My session information: sessionInfo() R version 2.9.2 (2009-08-24) x86_64-unknown-linux-gnu locale: LC_CTYPE =en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.UTF- 8 ;LC_MONETARY =C;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=C;LC_A DDRESS =C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] preprocessCore_1.6.0 aroma.affymetrix_1.2.0 aroma.apd_0.1.6 [4] affxparser_1.16.0 R.huge_0.1.9 aroma.core_1.2.0 [7] aroma.light_1.12.2 matrixStats_0.1.6 R.rsp_0.3.6 [10] R.filesets_0.5.3 digest_0.4.1 R.cache_0.1.9 [13] R.utils_1.2.0 R.oo_1.5.0 R.methodsS3_1.0.3 loaded via a namespace (and not attached): [1] tools_2.9.2 class(fs) [1] FirmaSet ParameterCelSet AffymetrixCelSet [4] AffymetrixFileSet AromaPlatformInterface AromaMicroarrayDataSet [7] GenericDataFileSet Object Thanks. Hailei -- Mark Robinson, PhD (Melb) Epigenetics Laboratory, Garvan Bioinformatics Division, WEHI e: m.robin...@garvan.org.au e: mrobin...@wehi.edu.au p: +61 (0)3 9345 2628 f: +61 (0)3 9347 0852 -- --~--~-~--~~~---~--~~ When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe
[aroma.affymetrix] Re: Exception: Unknown arguments: cdf, checkChipType
arguments: , argsStr) at throw(Unknown arguments: , argsStr) at GenericDataFileSet(files = files, ...) at extend(GenericDataFileSet(files = files, ...), AromaMicroarrayDataSet) at AromaMicroarrayDataSet(files = files, ...) at extend(AromaMicroarrayDataSet(files = files, ...), c (AffymetrixFileSet, uses(AromaPlatformI at AffymetrixFileSet(files = files, ...) at extend(AffymetrixFileSet(files = files, ...), AffymetrixCelSet, `cached:.intensities` = NULL at this(...) at newInstance.Class(clazz, ...) at newInstance(clazz, ...) at newInstance.Object(static, files, ...) at newInstance(static, files, ...) at method(static, ...) at staticMethod(path = probeData/MGH09,RBC/MoGene-1_0-st-v1, pattern = ^[^.].*[.](CEL|cel)$, at do.call(staticMethod, args = args) at getOutputDataSet0.AromaTransform(this, ..., verbose Background correcting data set...done 3. QuantileNormalization(cs) Error: qn -QuantileNormalization(cs) qn Error in list(`print(NA)` = environment, `print.Object(NA)` = environment, : [2009-10-12 11:33:42] Exception: Unknown arguments: cdf, checkChipType at throw(Exception(...)) at throw.default(Unknown arguments: , argsStr) at throw(Unknown arguments: , argsStr) at GenericDataFileSet(files = files, ...) at extend(GenericDataFileSet(files = files, ...), AromaMicroarrayDataSet) at AromaMicroarrayDataSet(files = files, ...) at extend(AromaMicroarrayDataSet(files = files, ...), c (AffymetrixFileSet, u at AffymetrixFileSet(files = files, ...) at extend(AffymetrixFileSet(files = files, ...), AffymetrixCelSet, `cached:. at this(...) at newInstance.Class(clazz, ...) at newInstance(clazz, ...) at newInstance.Object(static, files, ...) at newInstance(static, files, ...) at method(static, ...) at staticMethod(path = probeData/MGH09,QN/MoGene-1_0-st-v1, pattern = ^[^.] at do.call(staticMethod, args = args) -- Mark Robinson, PhD (Melb) Epigenetics Laboratory, Garvan Bioinformatics Division, WEHI e: m.robin...@garvan.org.au e: mrobin...@wehi.edu.au p: +61 (0)3 9345 2628 f: +61 (0)3 9347 0852 -- --~--~-~--~~~---~--~~ When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe from this group, send email to aroma-affymetrix-unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/aroma-affymetrix?hl=en -~--~~~~--~~--~--~---
[aroma.affymetrix] Re: AffymetrixCelSet, Could not locate a file for this chip type
Hi Carol. This throws an error because your CDF has a tag (r3) and there is no way to send a tag for the CDF file through the AffymetrixCelSet$byName function. I recommend that you just do this in 2 commands, as you've done successfully. Alternatively, you can remove the ,r3 from your CDF file name to remove the tag and that single command should work. Cheers, Mark Hi all, I'm just starting to analyze expression data form the MoGene-1_0-st-v1 chip using aroma.affymetrix but I'm running into an error for creating a cs object that I don't really understand. DATA SET UP: I started by reading the following to learn about setting up the data files: http://groups.google.com/group/aroma-affymetrix/web/users-guide Here is my basic set up: \annotationData\chipTypes\MoGene-1_0-st-v1\MoGene-1_0-st-v1,r3.cdf \rawData\MGH09\MoGene-1_0-st-v1\lots o CEL files WHAT WORKS: cdf - AffymetrixCdfFile$byChipType(chipType, tags=r3) print (cdf) AffymetrixCdfFile: Path: annotationData/chipTypes/MoGene-1_0-st-v1 Filename: MoGene-1_0-st-v1,r3.cdf Filesize: 67.42MB Chip type: MoGene-1_0-st-v1,r3 RAM: 0.00MB File format: v3 (text; ASCII) Dimension: 1050x1050 Number of cells: 1102500 Number of units: 35512 Cells per unit: 31.05 Number of QC units: 1 cs - AffymetrixCelSet$byName(MGH09, cdf=cdf) print (cs) AffymetrixCelSet: Name: MGH09 Tags: Path: rawData/MGH09/MoGene-1_0-st-v1 Platform: Affymetrix Chip type: MoGene-1_0-st-v1,r3 Number of arrays: 26 Names: J001_3_11.5, J001_4_11.5, ..., J010_4_16.5 Time period: 2009-09-23 17:51:49 -- 2009-09-23 21:45:31 Total file size: 275.06MB RAM: 0.02MB WHAT DOESN'T WORK: What I tried first, and what I've found in other tutorials, is the folllowing: cs -AffymetrixCelSet$byName(data name, tags, chipType=chipType) Which I translated to: cs -AffymetrixCelSet$byName(MGH09, chipType=MoGene-1_0-st-v1) But when I run this I get the error listed below. I'm wondering one approach approach works just fine but this one doesn't. Error in list(`AffymetrixCelSet$byName(MGH09, chipType = MoGene-1_0- st-v1)` = environment, : [2009-10-10 06:55:28] Exception: Could not locate a file for this chip type: MoGene-1_0-st-v1 at throw(Exception(...)) at throw.default(Could not locate a file for this chip type: , paste(c(chipType, tags), collapse = ,)) at throw(Could not locate a file for this chip type: , paste(c (chipType, tags), collapse = ,)) at method(static, ...) at AffymetrixCdfFile$byChipType(chipType, nbrOfCells = nbrOfCells) at fromFiles.AffymetrixCelSet(static, path = path, cdf = cdf, ...) at fromFiles(static, path = path, cdf = cdf, ...) at withCallingHandlers(expr, warning = function(w) invokeRestart (muffleWarning)) at suppressWarnings({ at method(static, ...) at AffymetrixCelSet$byName(MGH09, chipType = MoGene-1_0-st-v1) --~--~-~--~~~---~--~~ When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe from this group, send email to aroma-affymetrix-unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/aroma-affymetrix?hl=en -~--~~~~--~~--~--~---
[aroma.affymetrix] Re: exon array analysis
Hi Enid. I was unsuccessful in repeating your problem. I ran the script below from a fresh session on the Affy tissues dataset using aroma.affymetrix 1.2.0 ... and I get results. I don't think I've run FIRMA on as few as 4 samples as your example suggests, but in theory that shouldn't be the problem. Can you post your full script and give your sessionInfo()? Cheers, Mark --- library(aroma.affymetrix) # setup verbose - Arguments$getVerbose(-20, timestamp=TRUE) chipType - HuEx-1_0-st-v2 cdf - AffymetrixCdfFile$byChipType(chipType, tags=coreR3,A20071112,EP) cs - AffymetrixCelSet$byName(tissues, cdf=cdf) # BG adjust + QN bc - RmaBackgroundCorrection(cs, tag=coreR2) csBC - process(bc,verbose=verbose) qn - QuantileNormalization(csBC, typesToUpdate=pm) csN - process(qn, verbose=verbose) # fit PLM, FIRMA plm - ExonRmaPlm(csN, mergeGroups=TRUE) rs - calculateResiduals(plm, verbose=verbose) firma - FirmaModel(plm) fit(firma, verbose=verbose) fs - getFirmaScores(firma) firmascore - extractDataFrame(fs) --- On 5-Oct-09, at 7:50 AM, Enid wrote: Dear all, I am analysing a set of exon array data, and have been following the human exon array analysis vignette, but am having trouble getting the firma scores After the firma analysis, I get a table full of NaN's. firma - FirmaModel(plmTr) fit(firma, verbose=verbose) fs - getFirmaScores(firma) firmascore-extractDataFrame(fs) firmascore[1:10,] unit group cell P2008 P2009 P2010 P2011 1 1 11 NaN NaN NaN NaN 2 1 22 NaN NaN NaN NaN 3 2 13 NaN NaN NaN NaN 4 2 24 NaN NaN NaN NaN 5 2 35 NaN NaN NaN NaN 6 2 46 NaN NaN NaN NaN 7 3 17 NaN NaN NaN NaN 8 3 28 NaN NaN NaN NaN 9 3 39 NaN NaN NaN NaN 103 4 10 NaN NaN NaN NaN but the transcript summarisation seems to be ok. readUnits(plmTr,unit=1) $`2315251` $`2315251`[[1]] $`2315251`[[1]]$intensities [,1] [,2] [,3] [,4] [1,] 6.418280 51.159870 5.914537 8.396150 [2,] 11.297100 7.589603 11.975643 8.396150 [3,] 9.184888 12.899874 83.409866 44.409882 [4,] 8.778770 5.476091 82.909866 5.136143 [5,] 4.446828 6.080955 11.454535 4.062980 [6,] 5.914537 4.550271 7.589603 3.849267 [7,] 8.038540 3.542727 3.808023 7.923751 [8,] 13.260725 6.238900 4.660217 4.200828 I have not sure what I'm doing wrong and would really appreciate any help. Thank you very much in advance, Enid -- Mark Robinson, PhD (Melb) Epigenetics Laboratory, Garvan Bioinformatics Division, WEHI e: m.robin...@garvan.org.au e: mrobin...@wehi.edu.au p: +61 (0)3 9345 2628 f: +61 (0)3 9347 0852 -- --~--~-~--~~~---~--~~ When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe from this group, send email to aroma-affymetrix-unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/aroma-affymetrix?hl=en -~--~~~~--~~--~--~---
[aroma.affymetrix] Re: QC of probe level model.
Hi Cathy. Comments below. On 29-Sep-09, at 10:11 PM, CathyMitchell wrote: Hi all, I am using the gene st array and would like to know a couple of things about the probe level model. After doing the RmaPlm one can do two types of QC, the NUSE and RLE plots. These however compare results for each array. I would like to be able to have a look at the individual genes/probes (be able to flag up problem genes/probes). Is there a way to plot these? For plotting individual genes, you may be interested in this page (and discussion): http://groups.google.com/group/aroma-affymetrix/web/using-the-genomegraphs-package-with-firma As another approach, you could simply plot the data as line plots (1 line for each sample, 1 point for every probe). For example, you could use extractMatrix() to read your normalization data into a matrix and then use matplot(). Also is there an error reading for the probe level models? like a splice index or something? I'm not sure exactly what you are after here. Are you looking for standard errors of the probe/chip effects? That information is not stored, although it is calculated from the underlying 'preprocessCore' package. If you are looking at doing something along the lines of differential splicing with Gene 1.0ST arrays, you might be interested in this (apologies for the shameless self-promotion): http://www.biomedcentral.com/1471-2105/10/156 Cheers, Mark Thanks, Cathy -- Mark Robinson, PhD (Melb) Epigenetics Laboratory, Garvan Bioinformatics Division, WEHI e: m.robin...@garvan.org.au e: mrobin...@wehi.edu.au p: +61 (0)3 9345 2628 f: +61 (0)3 9347 0852 -- --~--~-~--~~~---~--~~ When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe from this group, send email to aroma-affymetrix-unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/aroma-affymetrix?hl=en -~--~~~~--~~--~--~---
[aroma.affymetrix] Re: Mat implementation - comparing MAT (pure) vs MAT aroma.affymetrix
Hi Lavinia. I'm hoping that most of this is explained in the MAT Smoothing section at: http://groups.google.com/group/aroma-affymetrix/web/promoter-tiling-array In your case, the IPs would be Treatment (you would make + numbers in the design matrix) and Inputs would be Control (- numbers in the design matrix). As an example, if your MAT tag file had samples/type: ABCDEF 000111 you would specify your design matrix something like: design - matrix( rep(c(-1,1),each=3), nc=1, + dimnames=list( toupper(letters[1:6]), A+B+C-D-E-F) ) design A+B+C-D-E-F A -1 B -1 C -1 D 1 E 1 F 1 Of course, the rows of your design matrix must match the order of the files from your AffymetrixCelSet. Hope that helps. Cheers, Mark On 1-Sep-09, at 2:47 PM, Lavinia wrote: Thanks Mark, very helpful. Sorry, one other question. With MAT (pure), you group your controls + inputs, e.g. Treatment (1) or Control (0) groups = 111000 for 3ChIP and 3Input How is this best done with MAT aroma.affymetrix (in the contrast matrix)?, it isn't immediately clear to me from the example how input and IP are treated? many thanks Lavinia. On Sep 1, 1:24 pm, Mark Robinson mrobin...@wehi.edu.au wrote: Hi Lavinia. Yes. Bandwidth(MAT)=probeWindow(aroma.affymetrix MAT) Also, MinProbe(MAT)=nProbes(aroma.affymetrix MAT) Cheers, Mark On 1-Sep-09, at 11:31 AM, Lavinia Gordon wrote: Hi, I have some older MAT pure results that I'd like to compare to newer MAT aroma.affymetrix results. Can I just check, from the MAT .tag (parameters file), does probeWindow correspond directly to Bandwidth? many thanks Lavinia. -- Mark Robinson, PhD (Melb) Epigenetics Laboratory, Garvan Bioinformatics Division, WEHI e: m.robin...@garvan.org.au e: mrobin...@wehi.edu.au p: +61 (0)3 9345 2628 f: +61 (0)3 9347 0852 -- -- Mark Robinson, PhD (Melb) Epigenetics Laboratory, Garvan Bioinformatics Division, WEHI e: m.robin...@garvan.org.au e: mrobin...@wehi.edu.au p: +61 (0)3 9345 2628 f: +61 (0)3 9347 0852 -- --~--~-~--~~~---~--~~ When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe from this group, send email to aroma-affymetrix-unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/aroma-affymetrix?hl=en -~--~~~~--~~--~--~---
[aroma.affymetrix] Re: Mat implementation - comparing MAT (pure) vs MAT aroma.affymetrix
Hi Lavinia. Yes. Bandwidth(MAT)=probeWindow(aroma.affymetrix MAT) Also, MinProbe(MAT)=nProbes(aroma.affymetrix MAT) Cheers, Mark On 1-Sep-09, at 11:31 AM, Lavinia Gordon wrote: Hi, I have some older MAT pure results that I'd like to compare to newer MAT aroma.affymetrix results. Can I just check, from the MAT .tag (parameters file), does probeWindow correspond directly to Bandwidth? many thanks Lavinia. -- Mark Robinson, PhD (Melb) Epigenetics Laboratory, Garvan Bioinformatics Division, WEHI e: m.robin...@garvan.org.au e: mrobin...@wehi.edu.au p: +61 (0)3 9345 2628 f: +61 (0)3 9347 0852 -- --~--~-~--~~~---~--~~ When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe from this group, send email to aroma-affymetrix-unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/aroma-affymetrix?hl=en -~--~~~~--~~--~--~---
[aroma.affymetrix] Re: Discussion on gene-1-0-st-array-analysis
Hi Diya. Just to add my 2 cents to this, although Mathieu has pretty much covered it. There are no online tutorials (that I know of) for exactly what you want to do, but your proposed analysis is very standard. lmFit() of limma can work directly on your matrix of logged data. Consult the limma documentation (e.g. the limma user's guide) on how to do this. Cheers, Mark On 29-Aug-09, at 12:13 AM, Diya v wrote: Hi I have 2 control and 2 treatment groups of MoGene-1_0-st.I have the data normalized and and a data matrix after fit(plm) is performed. I want to do statistical analysis for differentially expressed ganes. Can I take the datamatrix generated from aroma.affymetrix and do the analysis with limma Is there any online tutorial for this? Thanks, Diya --- On Fri, 28/8/09, Mathieu Parent parent.math...@gmail.com wrote: From: Mathieu Parent parent.math...@gmail.com Subject: [aroma.affymetrix] Re: Discussion on gene-1-0-st-array- analysis To: aroma-affymetrix@googlegroups.com Date: Friday, 28 August, 2009, 5:58 PM Hi, They way it has been proposed to me, is to extract the matrix from the normalised and summarised data, log it and pass into the LIMMA package for differential expression analysis. What is your experimental design ? Math McGill University On Thu, Aug 27, 2009 at 3:46 PM, Diya Vaka biotechd...@gmail.com wrote: Hello All, I want to know about the up and down regulated genes.So how am i supposed to proceed after this step Diya Love Cricket? Check out live scores, photos, video highlights and more. Click here -- Mark Robinson, PhD (Melb) Epigenetics Laboratory, Garvan Bioinformatics Division, WEHI e: m.robin...@garvan.org.au e: mrobin...@wehi.edu.au p: +61 (0)3 9345 2628 f: +61 (0)3 9347 0852 -- --~--~-~--~~~---~--~~ When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe from this group, send email to aroma-affymetrix-unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/aroma-affymetrix?hl=en -~--~~~~--~~--~--~---
[aroma.affymetrix] Re: installation fails
Hey Henrik. If I run those commands, they work for me, but only because I already have digest, etc. installed. In order for it to give an error on my R 2.9.1, I've removed a few packages (e.g. digest) from my default install directory (/Users/mrobinson/Library/R/2.9/library/), so that it requires an installation of those dependencies. The error for me appears to be related to not finding the digest package. See below. Does that help? Mark --- source(http://www.braju.com/R/hbLite.R;); hbLite(R.filesets); Using repository: http://www.braju.com/R/repos Identified packages to be processed: utils, R.methodsS3, methods, R.oo, R.utils, digest, R.filesets Installing external packages... Packages: utils, methods, digest --- Please select a CRAN mirror for use in this session --- Loading Tcl/Tk interface ... done Updating packages: utils, methods, digest 01/03. utils: not available. 02/03. methods: not available. 03/03. digest: missing. Installing: simpleError in .find_bundles(available): subscript out of bounds Installing external packages...done Installing braju.com packages... Packages: R.methodsS3, R.oo, R.utils, R.filesets Detected R option pkgType=mac.binary, which is not available. Enforcing installation from source instead for packages: R.methodsS3, R.oo, R.utils, R.filesets Updating packages: R.methodsS3, R.oo, R.utils, R.filesets 01/04. R.methodsS3: v1.0.3, i.e. up to date. 02/04. R.oo: v1.4.8, i.e. up to date. 03/04. R.utils: v1.1.7, i.e. up to date. 04/04. R.filesets: missing. Installing: Warning: dependency ‘digest’ is not available trying URL 'http://www.braju.com/R/repos/R.filesets_0.5.3.tar.gz' Content type 'application/x-tar' length 35408 bytes (34 Kb) opened URL == downloaded 34 Kb * Installing *source* package ‘R.filesets’ ... ** R ** inst ** preparing package for lazy loading R.methodsS3 v1.0.3 (2008-07-02) successfully loaded. See ?R.methodsS3 for help. R.oo v1.4.8 (2009-05-18) successfully loaded. See ?R.oo for help. R.utils v1.1.7 (2009-05-30) successfully loaded. See ?R.utils for help. Error : package 'digest' required by 'R.filesets' could not be found ERROR: lazy loading failed for package ‘R.filesets’ * Removing ‘/Users/mrobinson/Library/R/2.9/library/R.filesets’ The downloaded packages are in ‘/private/var/folders/8T/8TpdlpGXGzyAaNQm0Wp-zk++mHo/-Tmp-/Rtmp7MMjod/downloaded_packages’ Installing braju.com packages...done Warning messages: 1: In hbLite(R.filesets) : Detected R option pkgType=mac.binary, which is not available. Enforcing installation from source instead for packages: R.methodsS3, R.oo, R.utils, R.filesets 2: In install.packages(pkg, lib = lib, ..., available = available) : installation of package 'R.filesets' had non-zero exit status sessionInfo() R version 2.9.1 (2009-06-26) i386-apple-darwin8.11.1 locale: en_CA.UTF-8/en_CA.UTF-8/C/C/en_CA.UTF-8/en_CA.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] tcltk_2.9.1 tools_2.9.1 --- Hi, this seems to be an OSX issue and I cannot reproduce it myself. From the error message: simpleError: invalid version specification 1.1.3NA it looks like it there are some parsing errors when parsing version numbers in the PACKAGES file on the repository server, but not sure. See if this is only a problem with the aroma.affymetrix package or with all packages. Can you install individual packages from the braju.com server by: source(http://www.braju.com/R/hbLite.R;); hbLite(R.filesets); library(R.filesets); Q1) Does this work? hbLite(aroma.core); library(aroma.core); Q2) Does this work? hbLite(aroma.affymetrix); library(aroma.affymetrix); Q3) Does this work? Mark (Robinson), you mentioned this problem a few weeks ago. Can you help me troubleshoot this one? /Henrik 2009/7/31 mbaudis mbau...@gmail.com: Dear Henrik, currently, I have an installation problem after upgrading to R 2.9.1 (Mac OS X 10.5.7): ... everything fine up to here ... Installing/updating: CRAN:aroma.core (= 1.1.0) Repositories: CRAN Package: aroma.core (= 1.1.0) Tags: Updating packages: aroma.core from repository 'DEFAULT' 01/01. aroma.core: not available. Installing/updating: CRAN:matrixStats (= 0.1.4) Repositories: CRAN Package: matrixStats (= 0.1.4) Tags: Updating packages: matrixStats from repository 'DEFAULT' 01/01. matrixStats: v0.1.6, i.e. up to date. Installing/updating: CRAN:RColorBrewer Repositories: CRAN Package: RColorBrewer Tags: Updating packages: RColorBrewer from repository 'DEFAULT' 01/01. RColorBrewer: v1.0-2, i.e. up to date. Installing/updating: BIOC:aroma.light (= 1.12.2) Repositories: BIOC Package: aroma.light (= 1.12.2) Tags: Package up to date: aroma.light (= 1.12.2) Installing/updating: BIOC:affxparser (= 1.13.8) Repositories: BIOC Package
[aroma.affymetrix] Re: Re-run aroma.affymetrix
Hi Anbarasu. Comments below. Hi Mark, Thanks for your suggestions. What I have tried so far is: I removed all outliers CEL files from rawData and re-run the analysis. I was expecting a slightly different intensity distributions of chips (due to quantile normalization) but it seems I have the same distributions that I got with all chips, including outliers. Amongst many chips, I would guess that removing a handful would have very little effect on the overall distribution that each sample is quantile-normalized to. So, this doesn't surprise me. Also, be sure that you run the fit() and process() with force=TRUE, otherwise the code *may* be going directly to cached results, regardless of your removal of files. I will try with what you have suggested. Do I need to use extract() for sub setting before or after normalization? Are we ignoring the effect of these outlier chips in normalization step (if I have to use extract() after normalization)? I would do it after. And yes, this ignores the effects of outlier chips, which I suspect is minimal over a big dataset. Cheers, Mark Thanks again. Kind regards, Anbarasu On Thu, Aug 6, 2009 at 10:46 PM, Mark Robinson mrobin...@wehi.edu.auwrote: Hi Anbarasu. No, you don't have to remove all the files. What you can do is use extract() to extract the files that you are interested in, and create a new AffymetrixCelSet and fit the probe level modesl only on those samples. You do need to be careful though and I suggest you use *tags* so that the output results are sent to a different location on disk. Here is an example: [...] # preprocessing as before csN1 - extract(csN,1:12) # take a subset plmTr - ExonRmaPlm(csN1, mergeGroups=TRUE, tag=*,subsetmerged) # add a tag fit(plmTr, verbose=verbose) # fit as normal Hope that helps. Mark On 04/08/2009, at 8:22 PM, anbarasu wrote: Dear All, I was able to run the human exon array analysis with 120 chips. I have identified few outlier chips and would like to re-run the analysis again without these outliers. Do I need to remove all files (in plmData, probeData, and reports) that are created by aroma.affymetrix? Thanks in advance. Kind regards, Anbarasu -- Mark Robinson, PhD (Melb) Epigenetics Laboratory, Garvan Bioinformatics Division, WEHI e: m.robin...@garvan.org.au e: mrobin...@wehi.edu.au p: +61 (0)3 9345 2628 f: +61 (0)3 9347 0852 -- --~--~-~--~~~---~--~~ When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe from this group, send email to aroma-affymetrix-unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/aroma-affymetrix?hl=en -~--~~~~--~~--~--~---
[aroma.affymetrix] Re: RMA of aroma.affymetrix, affyPLM and affy
Hi Yiwen. I think this thread will answer your question in detail: http://groups.google.com/group/aroma-affymetrix/browse_thread/thread/1b0ab11fad9b4df3 In brief, the main difference is how the probe-level linear model is fit -- median polish OR iteratively reweighted least squares with a specified influence function. Note that in aroma.affymetrix, you have the 'flavor' argument in the RmaPlm and ExonRmaPlm objects. Cheers, Mark Hi, Following the Reproducibility of other implementations Replication test: RMA (background, normalization summarization) section in aroma.affymetrix online document, I tried to compare the difference in the RMA summary of gene expression index generated by aroma.affymetrix, affyPLM and affy for a public dataset I am studying. I found that the RMA summary generated by aroma.affymetrix and affyPLM (fitPLM) are highly consistent, while the values between aroma.affymetrix/affyPLM and affy(rma function) are quite different (Pearson correlation is only about ~0.97) and there is a significant deviation from straight line in the scatter-plot. I was wonder what is the main cause of the discrepency between RMA calculated by aroma.affymetrix/affyPLM and affly(rma function). Thanks a lot. Yiwen Chen --~--~-~--~~~---~--~~ When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe from this group, send email to aroma-affymetrix-unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/aroma-affymetrix?hl=en -~--~~~~--~~--~--~---
[aroma.affymetrix] Re: Re-run aroma.affymetrix
Hi Anbarasu. No, you don't have to remove all the files. What you can do is use extract() to extract the files that you are interested in, and create a new AffymetrixCelSet and fit the probe level modesl only on those samples. You do need to be careful though and I suggest you use *tags* so that the output results are sent to a different location on disk. Here is an example: [...] # preprocessing as before csN1 - extract(csN,1:12) # take a subset plmTr - ExonRmaPlm(csN1, mergeGroups=TRUE, tag=*,subsetmerged) # add a tag fit(plmTr, verbose=verbose) # fit as normal Hope that helps. Mark On 04/08/2009, at 8:22 PM, anbarasu wrote: Dear All, I was able to run the human exon array analysis with 120 chips. I have identified few outlier chips and would like to re-run the analysis again without these outliers. Do I need to remove all files (in plmData, probeData, and reports) that are created by aroma.affymetrix? Thanks in advance. Kind regards, Anbarasu -- Mark Robinson, PhD (Melb) Epigenetics Laboratory, Garvan Bioinformatics Division, WEHI e: m.robin...@garvan.org.au e: mrobin...@wehi.edu.au p: +61 (0)3 9345 2628 f: +61 (0)3 9347 0852 -- --~--~-~--~~~---~--~~ When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe from this group, send email to aroma-affymetrix-unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/aroma-affymetrix?hl=en -~--~~~~--~~--~--~---
[aroma.affymetrix] Re: MedianNormalization
Hi Cathrine. A few comments below. On 30/07/2009, at 2:33 AM, Cathy Mitchell wrote: To whom it may concern, Is there a way of median normalising across your arrays in aroma.affymetrix or can you only quantile normalise? Is there a way of finding out all the other methods that are available in aroma.affymetrix as the only information I've been able to find is through the google groups. There is a ScaleNormalization, but it appears to be specific to SNP chips. It should be quite easy to implement, if it is not there. Maybe Henrik can comment on that one. As for getting help, you can call: help.start() ... and then find the aroma.affymetrix package and a bunch of documents. Is there a way to quantile normalise between replicate arrays only instead of all of the arrays? How does one pull out a single array from the cel set? Yes, you can do this with extract(). For example, say 'cs' is an AffymetrixCelSet object. You can extract the first 2 samples with: csSubset - extract(cs, 1:2) Hope that helps. Mark Thank you very much. -- Cathrine Mitchell -- Mark Robinson, PhD (Melb) Epigenetics Laboratory, Garvan Bioinformatics Division, WEHI e: m.robin...@garvan.org.au e: mrobin...@wehi.edu.au p: +61 (0)3 9345 2628 f: +61 (0)3 9347 0852 -- --~--~-~--~~~---~--~~ When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe from this group, send email to aroma-affymetrix-unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/aroma-affymetrix?hl=en -~--~~~~--~~--~--~---
[aroma.affymetrix] Re: Differential expression analysis
Bonjour Mathieu. A few comments below. On 30/07/2009, at 4:35 AM, Mathieu wrote: Salut !! I am analysing rat exon arrays from Affymetrix with Aroma.Affymetrix (sessionInfo() bellow). What I want, is a differential expression analysis between my two groups. Is extracting the matrix of plmTr and pass a log2() of that to LIMMA the right thing to do ? I am asking because it seems to be what people are doing to do such analysis here. Yep. That seems like a reasonably 'standard' thing to do. But by doing so, aren't we losing some analysis power by losing the different statistics involved by the fact that all those probes in a transcript may have different behaviors ? and all those of an exon ? I was proposed to run LIMMA on all the probes of the array, get the t statistics out of it and then using a Wilcox Ranked test to compare each units to the distribution of all the t statistics... That's feasible for me but will involve a lot of head scratching :) You *could* do any number of things. But, you would have to justify all these steps and demonstrate that it does better than the standard methods. That is generally difficult to do. Honestly, I am not a very good statistician (yet!) and a begginner programmer and I want the safe way to have my differential expression between my two groups, done using good statistics. Seems like a good starting point would be RMA (or GCRMA) at the gene- level and a limma analysis afterwards. Second question. I don't get anything out of my analysis if I use a FDR correction. I thought that would be ok if I used the core cdf only, but it seems to not be the case. Nothing is significant between my two groups.. :/ Is the following the right thing to do ? method - fdr pval - 0.05 lfc - 1 # log2(2) results - decideTests(fit.eb, adjust.method=method, p.val=pval, lfc=lfc) This all seems pretty standard. With respect to the fact that nothing is signfificant, that can be due a number of things: data quality, small sample size, true differences are very small. Hope that helps. Cheers, Mark Thanks a lot in advance ! If needed, I could forward the whole code but it's basically the user case example of the Human exon array. Merci, Mathieu McGill University * sessionInfo() R version 2.8.1 (2008-12-22) x86_64-unknown-linux-gnu attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] plotrix_2.5-2 limma_2.16.4 aroma.affymetrix_1.1.1 [4] aroma.apd_0.1.6affxparser_1.14.2 R.huge_0.1.8 [7] aroma.core_1.1.2 aroma.light_1.12.2 matrixStats_0.1.6 [10] R.rsp_0.3.4R.filesets_0.5.2 digest_0.3.1 [13] R.cache_0.1.7 R.utils_1.1.7 R.oo_1.4.8 [16] EBImage_2.6.0 R.methodsS3_1.0.3 -- Mark Robinson, PhD (Melb) Epigenetics Laboratory, Garvan Bioinformatics Division, WEHI e: m.robin...@garvan.org.au e: mrobin...@wehi.edu.au p: +61 (0)3 9345 2628 f: +61 (0)3 9347 0852 -- --~--~-~--~~~---~--~~ When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe from this group, send email to aroma-affymetrix-unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/aroma-affymetrix?hl=en -~--~~~~--~~--~--~---
[aroma.affymetrix] Re: CDF creation
Hi Naresh. That particular script (flat2Cdf.R) creates a PM-only CDF. But, if you understand the format of a CDF file -- you might read in the your original CDF with readCdf() from the 'affxparser' package to understand it -- then you should be able to modify the script to include MM probes. Keep in mind that we make this script available to explain what we have done in the past, not as a cure-all for CDF creation. Cheers, Mark Hello Group, I created custom cdf for HG-U133_Plus_2 array by redefining probesets (mapping original probes to exons and redefining probesets according to the probes mapping to the same exons considered as new probe set) . I downloaded Oiginal CDF fo this array from affy website.I tried to access a particular probe i'm able to get both PM and MM values but for my customized CDF i'm getting only PM values for a probe but not MM values.So could you please confirm whether this customized CDF does not contain MM values or i'm doing a mistake in creating CDF. If there is a mistake please suggest me some way to rectify it. I created CDF using Source : http://groups.google.com/group/aroma-affymetrix/web/creating-cdf-files-from-scratch Thanks and Regards Naresh P --~--~-~--~~~---~--~~ When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe from this group, send email to aroma-affymetrix-unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/aroma-affymetrix?hl=en -~--~~~~--~~--~--~---
[aroma.affymetrix] Re: Discussion on using-the-genomegraphs-package-with-firma
Hi John. See below. On 16/07/2009, at 9:30 AM, JFP wrote: Hi Mark, Thanks for making our code available, its a great help. I have an error reproducing part of it which is bothering me. cdf AffymetrixCdfFile: Path: annotationData/chipTypes/HuEx-1_0-st-v2 Filename: HuEx-1_0-st-v2,mainR3,A20071112,EP.cdf Filesize: 207.11MB Chip type: HuEx-1_0-st-v2,mainR3,A20071112,EP RAM: 10.76MB File format: v4 (binary; XDA) Dimension: 2560x2560 Number of cells: 6553600 Number of units: 312355 Cells per unit: 20.98 Number of QC units: 1 u - indexOf(cdf,3400034) ugcM - getUnitGroupCellMap(getCdf(ds), units=u, retNames=TRUE) ind - getCellIndices(cdf,units=ugcM $cell,verbose=verbose,useNames=FALSE,unlist=TRUE) Thanks for pointing this out. You'll want to change this line above to: ind - getCellIndices(cdf,units=u,verbose=verbose,useNames=FALSE,unlist=TRUE) Or, perhaps more elegantly: ind - ugcM$cell Not sure how that got in there. I've fixed the docs online. BTW (for fellow windows sufferers) The way I generated the small probeset file was with a perl script (one free implementation is with Activestate perl and UnxUtils). Here is the script: #!C:/Perl/bin/perl while(){ next if(/^#/); chomp; s///g; @sl = split(/,/,$_); for $i ( 0..7) {print $sl[$i],,;} print $sl[8],\n; } save it to foo.pl then (say using the unix.sh from UnxUtils) go perl foo.pl HuEx-1_0-st-v2.na28.hg18.probeset.csv For those that suffer by using windows, you could alternatively install the cygwin tools and then have access to the standard battery of command line tools, such as grep/awk from the example. Cheers, Mark regards, John -- Mark Robinson, PhD (Melb) Epigenetics Laboratory, Garvan Bioinformatics Division, WEHI e: m.robin...@garvan.org.au e: mrobin...@wehi.edu.au p: +61 (0)3 9345 2628 f: +61 (0)3 9347 0852 -- --~--~-~--~~~---~--~~ When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe from this group, send email to aroma-affymetrix-unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/aroma-affymetrix?hl=en -~--~~~~--~~--~--~---
[aroma.affymetrix] Re: Calling enriched regions in tiling array experiments
Hi Lars. Apologies for the slow response. I do have some scripts for calling enriched regions, but they are not really ready for public consumption. And, they are geared more towards ChIP-chip of modified histones / DNA methylation than either TF binding or transcript expression. We have implemented something similar to MAT's region calling procedure that is hopefully a bit more flexible. Secondly, do you plan to implement further functions for tiling arrays, in particular for transcript discovery (which is similar to identifying chip-chip regions but involves discrete steps between signal and no-signal regions) and for detection of differential splicing? My work with Affy tiling arrays is ongoing so we are planning to implement more things within aroma.affymetrix. However, the development is slow since at this stage, it is just me working on it. I encourage you to contribute a previously proposed method or even a new one. This makes me think of segmentation, of which aroma.affymetrix does have implementations for (e.g. CbsModel), more in the context of copy number data. Maybe you can use those routines. Cheers, Mark Dear Aroma team, I enjoyed using the MAT implementation in aroma, but now I wonder how to best proceed for calling of enriched regions? Are there any functions / scripts available? Secondly, do you plan to implement further functions for tiling arrays, in particular for transcript discovery (which is similar to identifying chip-chip regions but involves discrete steps between signal and no-signal regions) and for detection of differential splicing? Thank you very much for your help. Best wishes, Lars --~--~-~--~~~---~--~~ When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe from this group, send email to aroma-affymetrix-unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/aroma-affymetrix?hl=en -~--~~~~--~~--~--~---
[aroma.affymetrix] Re: Gene-Level Summarization of Expression Data
Hi Steve. I don't know how common this is. Basically, a colleague found a gene that was very differentially expressed when analyzing using the Affymetrix probesets definition and found virtually nothing when using the custom CDF that bundles all the probes for a gene together. The reason was simple. There were several probesets designed for this gene and presumably they measure different isoforms. The probes for the DE probeset showed the difference, but all the other probesets didn't. When you use a robust linear model like RMA, outliers get downweighted. Because the DE probes accounted for a small proportion of the probes (I think there was 3 or 4 other probesets at this locus), their effect got washed out. So, its a tradeoff. Sometimes (perhaps most of the time) you gain by lumping them all together ... more information, more power to detect changes. But, sometimes (perhaps rarely) it can mislead. I'm sure I'm not the only one to observe such things. The probe-level data (usually?) doesn't lie. But, since you are comparing across platforms, you will undoubtedly find this as you go along. Different microarray designs often measure slightly different things. One other thing. Be sure to convert your CDF to binary if it is not already using affxparser's convertCdf(). Having this info stored in binary format will make the processing much faster. I think the MBNI custom CDFs are text. Cheers, Mark On 20/06/2009, at 6:55 AM, Steve P wrote: Mark, Thanks for the information. That is very helpful. I want to do the latter, which is to combine probesets such that all probes for a given gene (by some definition -- RefSeq, Ensembl, etc) are used to arise at the summarize value. I was able to obtain a custom CDF for the U133-A array. So I will try that approach. But part of the reason I want to do this is to be able to compare values across platforms, so I may need to find/build a custom CDF for the other platform. I would appreciate any cautionary advice you have about summarizing at the gene level. Regards, -Steve On Jun 17, 9:56 am, Steve Piccolo steve.picc...@gmail.com wrote: Yesterday I posted this question to the list, but the spam blocker didn't let it through. Below my question is a response from Mark Robinson. --- --- Following the example provided athttp://groups.google.com/group/aroma-affymetrix/web/gene-1-0-st-array ... , I am running the following code: chipType - HT_HG-U133A dataSet = myData library(aroma.affymetrix) verbose - Arguments$getVerbose(-8, timestamp=TRUE) cdf - AffymetrixCdfFile$byChipType(chipType) cs - AffymetrixCelSet$byName(dataSet, cdf=cdf) bc - RmaBackgroundCorrection(cs) csBC - process(bc,verbose=verbose) qn - QuantileNormalization(csBC) csN - process(qn, verbose=verbose) plm - RmaPlm(csN) fit(plm, verbose=verbose) ces - getChipEffectSet(plm) gExprs - extractDataFrame(ces, units=NULL, addNames=TRUE) This seems to be working beautifully. However, I'm doing an analysis that requires my expression values to be summarized at the gene level rather than the probeset level. In the gExprs object that results from the above analysis, I get a data.frame object in which each row contains expression values for a given probeset across all samples. What I would love to see in each row is an expression value for a given gene. I believe RMA has the ability to do this, but I'm not sure how to do it via aroma.affymetrix. Any suggestions? I'm happy to provide any more details that would be helpful. Regards, -Steve --- --- Hi Steve. As to your question, it depends on what you need. When you say you want every row to be a gene, do you just want to know the gene name that goes with the probeset identifier, or do you want to combine probesets such that all probes for a given gene (by some definition -- RefSeq, Ensembl, etc) are used to arise at the summarize value (a la the MBNI CustomCDF)? If the former, then there are annotation packages within R. If the latter, I have a few cautionary tales of doing this, since the different probesets for a given locus can be measuring different variants. But if you still want to do this, we need to make a CDF file specific to the annotation you want. For the standard HG-U133 arrays, I know the MBNI guys made the CDFs and we could use those within aroma.affymetrix. I don't know if they build custom CDFs for the HT- arrays. Hope that gets you started. Cheers, Mark- Show quoted text - -- Mark Robinson, PhD (Melb) Epigenetics Laboratory, Garvan Bioinformatics Division, WEHI e: m.robin...@garvan.org.au e: mrobin...@wehi.edu.au p: +61 (0)3 9345 2628
[aroma.affymetrix] Re: FIRMA score for each transcript
Hi Libing. Doesn't 'addNames=TRUE' already do this for you? fs1 - extractDataFrame(fs, units=1:2, addNames=TRUE) head(fs1[,1:6]) unitName groupName unit group cell huex_wta_breast_A 1 2315251 23152521 11 1.1150999 2 2315251 23152531 22 0.9551846 3 2315373 23153742 13 1.5354252 4 2315373 23153752 24 0.6288152 5 2315373 23153762 35 1.5658265 6 2315373 23153772 46 1.2131032 fs2 - extractDataFrame(fs, units=1:2, addNames=FALSE) head(fs2[,1:6]) unit group cell huex_wta_breast_A huex_wta_breast_B huex_wta_breast_C 11 11 1.1150999 0.8552212 0.9177643 21 22 0.9551846 1.1747438 0.8580346 32 13 1.5354252 1.0427089 1.6461661 42 24 0.6288152 0.7053325 0.6999596 52 35 1.5658265 1.0576524 1.1404822 62 46 1.2131032 1.0494679 0.7729633 If not, please send your entire script and the output of sessionInfo(). Cheers, Mark On 18/06/2009, at 1:02 AM, Libing Wang wrote: Hi Mark, I am wondering if it is possible to get the actual unit id(transcript cluster id) and group id(probeset id) for each firma score instead of artificial number from 1 to whatever in the firma score data frame. Thanks, Libing On Sat, Apr 11, 2009 at 5:48 PM, Mark Robinson mrobin...@wehi.edu.au wrote: Hi Libing. As the error message suggests, there are no degrees of freedom for the fit, meaning you have no replicates. It appears you only have 2 total samples, one for each group. You wouldn't be able to use limma to do differential expression on any experiment with only 2 1-channel chips. If that is all the data you have, perhaps you are best off looking for large (positive or negative) values of the difference: fsdf - extractDataFrame(fs, addNames=TRUE) fsdf[,6:ncol(fsdf)] - log2(fsdf[,6:ncol(fsdf)]) fsdf[,7] - fsdf[,6] # B-A, assuming you've already taken logs Cheers, mark Hi Mark, I am trying to find differences of FIRMA scores between two chips and don't know what's wrong: cls - c(A,B) mm - model.matrix(~cls) Warning message: In model.matrix.default(~cls) : variable 'cls' converted to a factor fit - lmFit(fsdf[,6:7], mm) Warning message: In lmFit(fsdf[, 6:7], mm) : Some coefficients not estimable: coefficient interpretation may vary. fit - eBayes(fit) Error in ebayes(fit = fit, proportion = proportion, stdev.coef.lim = stdev.coef.lim) : No residual degrees of freedom in linear model fits Thanks, Libing On Tue, Apr 7, 2009 at 5:54 PM, Mark Robinson mrobin...@wehi.edu.au wrote: Hi Libing. limma has quite an extensive user manual. See link to it: http://www.bioconductor.org/packages/release/bioc/html/limma.html Your response still puzzles me. You say your wording should've been 'splicing' not 'expression', but then you go on to say that you want to do differential *expression* with limma. However, note that you can use limma on FIRMA scores as well, as discussed previously. If that is what you are interested in, you might check the following thread: http://groups.google.com/group/aroma-affymetrix/browse_thread/thread/36d8c59d742fc503/ If you give a more detailed description of what it is you are doing or want to do, I might be better able to help. Cheers, Mark On 08/04/2009, at 8:10 AM, Libing Wang wrote: Hi Mark, Thank you for your reply! Sorry for my wrong wording! It should be splicing not expression. ... then you can use log2 of the chip effects here for an analysis of differential expression with an appropriate design matrix with limma. Is that what you are after? Yes, this is what I want. I think I need process Affymetrix probeset file to correlate probesets and transcripts, then use limma to do the analysis. I am pretty new to limma, do you have any suggestions? Thanks, Libing On Tue, Apr 7, 2009 at 4:35 PM, Mark Robinson mrobin...@wehi.edu.au wrote: Hi Libing. On 08/04/2009, at 1:42 AM, Libing Wang wrote: Hi, I am wondering if there is a way to compute a FIRMA score for each transcript. Currently I only have FIRMA score for each probeset or group. I did as follows: 1. plmTr - ExonRmaPlm(csN, mergeGroups=TRUE) 2. fit(plmTr) 3. firma-FirmaModel(plmTr) 4.fit(firma) 5.fs-getFirmaScores(firma) The short answer is that FIRMA scores are really a probeset-level statistic, not a gene/transcript-level statistic. This is the recommended use of FIRMA. Or with the FIRMA score of each probeset, find out which transcripts are differentially expressed
[aroma.affymetrix] Re: FIRMAGene
Hi Nick. At present, FIRMAGene is not actually part of the aroma.affymetrix project, although it makes use of it. So, I will reply to this off the aroma.affymetrix mailing list, except to say that FIRMAGene is now hosted by R-forge. See the following link for details: http://bioinf.wehi.edu.au/folders/firmagene/ When (and if) time permits, I plan to add FIRMAGene to aroma.affymetrix, so that it can share the same memory efficiency and mailing list support. Cheers, Mark On 11/06/2009, at 10:31 PM, nmcgli...@googlemail.com wrote: Hello, I have two questions regarding FIRMAGene: 1. The same as the first in this thread: using the code from sup3.r when I try to load the FIRMAGene library or execute the FIRMAGene command I get the following errors: library(FIRMAGene) Error in base::library(...) : there is no package called 'FIRMAGene' fg - FIRMAGene(plm, idsToUse=u) Error: could not find function FIRMAGene I'm using aroma.affymetrix v1.1.0 with R2.9.0 on MacOSX 10.5.7 2. I'm unsure of what this command is doing and how it needs to be changed to accommodate my own data: cls - gsub(TisMap_,,gsub(_0[1-3]_v1_WTGene1,,getNames(cs))) Many thanks, Nick -- Mark Robinson, PhD (Melb) Epigenetics Laboratory, Garvan Bioinformatics Division, WEHI e: m.robin...@garvan.org.au e: mrobin...@wehi.edu.au p: +61 (0)3 9345 2628 f: +61 (0)3 9347 0852 -- --~--~-~--~~~---~--~~ When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe from this group, send email to aroma-affymetrix-unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/aroma-affymetrix?hl=en -~--~~~~--~~--~--~---
[aroma.affymetrix] Re: SNPs affecting EXon splicing detection
Hi Sabrina. The Unit_ID can be any transcript cluster identifier of your choice. The easiest may be to use the Affymetrix transcript cluster identifier itself ... available from: http://www.affymetrix.com/analysis/downloads/current_exon/MoEx-1_0-st-v1.mm9.probeset.csv.zip See the 'transcript_cluster_id' column. Perhaps only take the core probes, as defined in the the 'level' column? Note: we used Ensembl in that flat2Cdf() example since we were using a custom organization (i.e. non-Affy) of the probesets. Cheers, Mark On 11/06/2009, at 10:58 PM, sabrina wrote: Hi, Mark: for the Unit_id, does it have to be Ensembl gene ID like ENSMUSG? Lots of genes do not have ensembl assignment from Affy annotation file. There are lots of missing annotaions, and I still have not found any good way to deal with it. Do you have any suggestions? Thanks Sabrina On Jun 10, 12:32 am, Mark Robinson mrobin...@wehi.edu.au wrote: Hi Sabrina. How about you try and create a 'flat' file like the one described at:http://groups.google.com/group/aroma-affymetrix/web/creating-cdf-file ... Presumably, you will be comfortable with the Exon Array's 'probetab' file by now and possibly the Affymetrix annotation CSV file and so you should have access to all this information. For example, from the following table: mac1618:HuEx-1_0-st-v2.probe.tab mrobinson$ head HuEx-1_0-st- v2.probe.tab Probe IDProbe Set IDprobe x probe y assembly seqname start stop strand probe sequence target strandedness category 494998 2315101 917 193 build-34/hg16 chr11788 1812+ CACGGGAAGTCTGGGCTAAGAGACA Sense main 1734213 2315101 1092677 build-34/hg16 chr11973 1997+ ACACCAGAAGATGAACAATGG Sense main 4767517 2315101 796 1862build-34/hg16 chr11992 2016+ ATTAAGTTACATGCAGACAACAGGG Sense main 4286427 2315101 986 1674build-34/hg16 chr12006 2030+ TGCCTGGTTGTGGTATTAAGTTACA Sense main 5760145 2315102 144 2250build-34/hg16 chr12520 2544+ TCGGCCGTCGTCTTCTGCAGCTCTG Sense main 671410 2315102 689 262 build-34/hg16 chr12523 2547+ AAGTCGGCCGTCGTCTTCTGCAGCT Sense main 4275780 2315102 579 1670build-34/hg16 chr12526 2550+ TCCAAGTCGGCCGTCGTCTTCTGCA Sense main 4293462 2315102 341 1677build-34/hg16 chr12531 2555+ TGTGATCCAAGTCGGCCGTCGTCTT Sense main 53882315103 267 2 build-34/hg16 chr12927 2951+ CTGTCTGTCGACCCAGCTGGAGGCA Sense main [snip] ... you see the second column is the probeset_id, which would be used as the Group_ID column for your flat file. Depending on whether you are using the Ensembl CDF or the Affymetrix annotation, you would need to create a mapping to get the transcript cluster id column (here, the Unit_ID). Everything else you need (Probe_Sequence, X, Y, Probe_ID) is within the table above. Then, it would be just a matter of filtering OUT those probes that overlap a SNP, which based on your mapping exercise, you must have a list of. Then, make a call to the flat2Cdf() script and hopefully you'll be off and running. Let me know how you go. Cheers, Mark On 10/06/2009, at 1:00 PM, sabrina wrote: Thanks , Mark! Can you show me /walk me through how to get a new snp-free CDF ? I finally got the right version of snp and probe mapping so I am ready to try it out! Sabrina On Jun 6, 3:14 am, Mark Robinson mrobin...@wehi.edu.au wrote: Hi Sabrina. Comments below. On 06/06/2009, at 1:57 AM, sabrina wrote: Hi, Mark: I finally found the SNP data set that is suitable for my case. As I understand, aroma used RMA to estimate gene level and exon level intensities. After I estimate gene level (transcript level), I can use FIRMA to estimate residual for each exon and compose a score as described in the paper . My question is: if there is a SNP difference between two strains within one exon, should I exclude that exon from estimating transcript level value? My guess is probably no. If the SNP affects only 1 probe in an entire transcript, I would expect it to have very little impact on the gene-level summary. And, especially so if there are a large number of total probes for that gene. It may have a noticeable effect on the probe effect. So will it be a good idea if I exclude that exon after I calculate all FIRMA scores or should I exclude these exons after I estimate residuals , but only used these residuals not affected by SNPs for firma score estimation? Thanks Keep in mind the residuals are calculated at the probe-level, not the probeset-level. The FIRMA score is then a summary of the all the residuals for a probeset. I think you have (at least) 3 choices: 1. (preferred, i would think) you could
[aroma.affymetrix] Re: CDF files for Affymetrix whole transcript arrays (Gene 1.0, Exon 1.0)
Hi Dick. (I've copied the aroma.affymetrix list in case others have the same question). I remember doing an update *late* last year (don't remember exactly when) and recreating the CDFs --- the contents were identical to previous ones. That is, the Affymetrix annotation (at least how probesets are put into transcript clusters) doesn't change that much. I haven't checked recently -- do you know if Affy has made some major changes? If so, I'm happy to recreate them. But, just because they are dated July 2008, doesn't mean they have old information. However, if you use the Ensembl builds (for Exon 1.0ST), these change more regularly. The last one I built was Ensembl 50 and I see Ensembl is now at v54. In general, building CDFs for Exon arrays is described at: http://groups.google.com/group/aroma-affymetrix/web/creating-cdf-files-from-scratch Cheers, Mark On 12/06/2009, at 9:23 AM, dbe...@u.washington.edu wrote: Hi Mark, I was wondering if there was any planned updates to the Mouse Exon 1.0 ST cdf files? I noticed following your link: http://groups.google.com/group/aroma-affymetrix/web/moex-1-0-st that the last date was about a year ago. Or, if you could direct me to some document that would tell me how to build custom cdfs, that would be good too. Thanks very much, Dick On Aug 4 2008, 10:05 pm, Mark Robinson mrobin...@wehi.edu.au wrote: Hi folks. I've been updating/creating CDF files for some of the recent Affymetrix whole transcript expression arrays (i.e. Human/Mouse/Rat Gene/Exon1.0 ST). You should be able to just download the relevant CDF, put it in the correct location (annotationData/chipTypes/ as well as putting the data in the correct location) and run your standard sorts of analyses (e.g. BG correction, normalization, summarization, quality assessement) as described in the user's guide:http://groups.google.com/group/aroma-affymetrix/web/users-guide So far, I have created default versions, based on the Affymetrix (probeset CSV/unsupported CDF) annotation, for: Human Gene 1.0 SThttp://groups.google.com/group/aroma-affymetrix/web/hugene-1-0-st MouseGene 1.0 SThttp://groups.google.com/group/aroma-affymetrix/web/mogene-1-0-st-v1 Rat Gene 1.0 SThttp://groups.google.com/group/aroma-affymetrix/web/ragene-1-0-st-v1 MouseExon1.0 SThttp://groups.google.com/group/aroma-affymetrix/web/moex-1-0-st RatExon1.0 SThttp://groups.google.com/group/aroma-affymetrix/web/raex-1-0-st-v1 ... HumanExon1.0 ST has seen a little more development as Ken, Elizabeth and I all have worked directly with data from these, so we have some custom CDFs (and more on the way), based on different annotation sources. More information can be found at (and a big thank you to Elizabeth for the code that creates these):http://groups.google.com/group/aroma-affymetrix/web/huex-1-0-st-v2 If there are others on this list who want custom CDFs designed for any of these platforms, let me know and I can at least point you in the direction of how to create them. This is typically not a hard thing to do. Cheers, Mark -- Mark Robinson, PhD (Melb) Epigenetics Laboratory, Garvan Bioinformatics Division, WEHI e: m.robin...@garvan.org.au e: mrobin...@wehi.edu.au p: +61 (0)3 9345 2628 f: +61 (0)3 9347 0852 -- --~--~-~--~~~---~--~~ When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe from this group, send email to aroma-affymetrix-unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/aroma-affymetrix?hl=en -~--~~~~--~~--~--~---
[aroma.affymetrix] Re: SNPs affecting EXon splicing detection
Hi Sabrina. How about you try and create a 'flat' file like the one described at: http://groups.google.com/group/aroma-affymetrix/web/creating-cdf-files-from-scratch Presumably, you will be comfortable with the Exon Array's 'probetab' file by now and possibly the Affymetrix annotation CSV file and so you should have access to all this information. For example, from the following table: mac1618:HuEx-1_0-st-v2.probe.tab mrobinson$ head HuEx-1_0-st- v2.probe.tab Probe IDProbe Set IDprobe x probe y assemblyseqname start stop strand probe sequence target strandedness category 494998 2315101 917 193 build-34/hg16 chr117881812+ CACGGGAAGTCTGGGCTAAGAGACA Sense main 1734213 2315101 1092677 build-34/hg16 chr119731997+ ACACCAGAAGATGAACAATGG Sense main 4767517 2315101 796 1862build-34/hg16 chr119922016+ ATTAAGTTACATGCAGACAACAGGG Sense main 4286427 2315101 986 1674build-34/hg16 chr120062030+ TGCCTGGTTGTGGTATTAAGTTACA Sense main 5760145 2315102 144 2250build-34/hg16 chr125202544+ TCGGCCGTCGTCTTCTGCAGCTCTG Sense main 671410 2315102 689 262 build-34/hg16 chr125232547+ AAGTCGGCCGTCGTCTTCTGCAGCT Sense main 4275780 2315102 579 1670build-34/hg16 chr125262550+ TCCAAGTCGGCCGTCGTCTTCTGCA Sense main 4293462 2315102 341 1677build-34/hg16 chr125312555+ TGTGATCCAAGTCGGCCGTCGTCTT Sense main 53882315103 267 2 build-34/hg16 chr129272951+ CTGTCTGTCGACCCAGCTGGAGGCA Sense main [snip] ... you see the second column is the probeset_id, which would be used as the Group_ID column for your flat file. Depending on whether you are using the Ensembl CDF or the Affymetrix annotation, you would need to create a mapping to get the transcript cluster id column (here, the Unit_ID). Everything else you need (Probe_Sequence, X, Y, Probe_ID) is within the table above. Then, it would be just a matter of filtering OUT those probes that overlap a SNP, which based on your mapping exercise, you must have a list of. Then, make a call to the flat2Cdf() script and hopefully you'll be off and running. Let me know how you go. Cheers, Mark On 10/06/2009, at 1:00 PM, sabrina wrote: Thanks , Mark! Can you show me /walk me through how to get a new snp-free CDF ? I finally got the right version of snp and probe mapping so I am ready to try it out! Sabrina On Jun 6, 3:14 am, Mark Robinson mrobin...@wehi.edu.au wrote: Hi Sabrina. Comments below. On 06/06/2009, at 1:57 AM, sabrina wrote: Hi, Mark: I finally found the SNP data set that is suitable for my case. As I understand, aroma used RMA to estimate gene level and exon level intensities. After I estimate gene level (transcript level), I can use FIRMA to estimate residual for each exon and compose a score as described in the paper . My question is: if there is a SNP difference between two strains within one exon, should I exclude that exon from estimating transcript level value? My guess is probably no. If the SNP affects only 1 probe in an entire transcript, I would expect it to have very little impact on the gene-level summary. And, especially so if there are a large number of total probes for that gene. It may have a noticeable effect on the probe effect. So will it be a good idea if I exclude that exon after I calculate all FIRMA scores or should I exclude these exons after I estimate residuals , but only used these residuals not affected by SNPs for firma score estimation? Thanks Keep in mind the residuals are calculated at the probe-level, not the probeset-level. The FIRMA score is then a summary of the all the residuals for a probeset. I think you have (at least) 3 choices: 1. (preferred, i would think) you could remove all affected *probes* (via the creation of a SNP-affected-probe-free CDF) in advance, then run FIRMA as normal. I can help with this if you tell me which probes are affected. 2. remove the affected *probesets* afterwards, since you may not believe the FIRMA scores for which these are based on. 3. as you suggested, only calculate FIRMA scores from unaffected residuals. But, the information you require to do this is the same information required to do #1 and it would seems like #1 is preferred. The good thing about option #1 is you would still have some ability to detect differential splicing for the probeset (instead of tossing it away), albeit with the smaller number of remaining unaffected probes. Cheers, Mark Sabrina On Apr 30, 3:46 am, Mark Robinson mrobin...@wehi.edu.au wrote: Hi Sabrina. I have not had to deal with this myself, but I do know that it exists and I can at least
[aroma.affymetrix] Re: SNPs affecting EXon splicing detection
Hi Sabrina. Comments below. On 06/06/2009, at 1:57 AM, sabrina wrote: Hi, Mark: I finally found the SNP data set that is suitable for my case. As I understand, aroma used RMA to estimate gene level and exon level intensities. After I estimate gene level (transcript level), I can use FIRMA to estimate residual for each exon and compose a score as described in the paper . My question is: if there is a SNP difference between two strains within one exon, should I exclude that exon from estimating transcript level value? My guess is probably no. If the SNP affects only 1 probe in an entire transcript, I would expect it to have very little impact on the gene-level summary. And, especially so if there are a large number of total probes for that gene. It may have a noticeable effect on the probe effect. So will it be a good idea if I exclude that exon after I calculate all FIRMA scores or should I exclude these exons after I estimate residuals , but only used these residuals not affected by SNPs for firma score estimation? Thanks Keep in mind the residuals are calculated at the probe-level, not the probeset-level. The FIRMA score is then a summary of the all the residuals for a probeset. I think you have (at least) 3 choices: 1. (preferred, i would think) you could remove all affected *probes* (via the creation of a SNP-affected-probe-free CDF) in advance, then run FIRMA as normal. I can help with this if you tell me which probes are affected. 2. remove the affected *probesets* afterwards, since you may not believe the FIRMA scores for which these are based on. 3. as you suggested, only calculate FIRMA scores from unaffected residuals. But, the information you require to do this is the same information required to do #1 and it would seems like #1 is preferred. The good thing about option #1 is you would still have some ability to detect differential splicing for the probeset (instead of tossing it away), albeit with the smaller number of remaining unaffected probes. Cheers, Mark Sabrina On Apr 30, 3:46 am, Mark Robinson mrobin...@wehi.edu.au wrote: Hi Sabrina. I have not had to deal with this myself, but I do know that it exists and I can at least suggest a possible route to exclude affected exons. Presumably, there is a database (dbSNP?) that tells you the genome locations of each SNP for your strains. There is also a probe.tab file from Affymetrix that gives you the mapped genome locations of each probe (or you could take the sequences from the same file and map them yourself with a tool like BLAT). It is then just a matter of looking whether each probe maps to a location on the genome that overlaps a SNP. There is probably a Bioconductor tool for this or you could create a hash, etc. There are a couple levels at which you might introduce this to your analysis. You could remove individual probes that are affected. On the aroma.affymetrix side, this would require creating a new CDF with those affected probes not included (a bit tricky but doable). Or, you could simply post-process your existing results and remove probesets that have an affected probe (easier but not as elegant). You might've also seen: Duan S, Zhang W, Bleibel WK, Cox NJ, Dolan ME: SNPinProbe 1.0: A database for filtering out probes in the Affymetrix GeneChip(R) HumanExon1.0 ST array potentially affected bySNPs. Bioinformation 2008, 2(10):469{470. Hope that gets you started. Cheers, Mark On 30/04/2009, at 6:07 AM, sabrina wrote: Hi, all: I am using Aroma for detectingexonskipping events around two groups (two different strains). I found out that several of my top hits indeed includes at least one SNP between two strains. I wonder if anyone has some suggestion about how to deal with this situation. If I need to remove all affected exons from analysis, how can I do it? I never worked with SNP data before, can anyone give me a hint? Thanks a lot! Sabrina -- Mark Robinson Epigenetics Laboratory, Garvan Bioinformatics Division, WEHI e: m.robin...@garvan.org.au e: mrobin...@wehi.edu.au p: +61 (0)3 9345 2628 f: +61 (0)3 9347 0852 -- -- Mark Robinson, PhD (Melb) Epigenetics Laboratory, Garvan Bioinformatics Division, WEHI e: m.robin...@garvan.org.au e: mrobin...@wehi.edu.au p: +61 (0)3 9345 2628 f: +61 (0)3 9347 0852 -- --~--~-~--~~~---~--~~ When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe from this group, send email
[aroma.affymetrix] Re: FIRMAGene
Hi Ettore. Comments below. I have the following question. In the sup3.R file the probe level model fitting is realised using the instructions: plm - RmaPlm(csNU) fit(plm, verbose=verbose) where csNU is an object obtained after background correction, quantile normalisation and conversion of the cdf to a unique version. The conversion to 'unique' is actually done both on the CDF and the data. This is simply to dance around the fact that a handful of probes are used in multiple probesets. I suppose that this approach should enable the exon-level analysis of the Gene 1.0 data, as required by FIRMAGene. However I don't understand where is the difference since the methods are the same as in the gene-level analysis of such data. I'm actually not sure what it is you are asking here. Indeed, the methodology of FIRMAGene operates on the results (specifically, the residuals) of your standard RMA probe level model. This is, however, quite different to the standard DE analysis, if that is what you mean by gene-level analysis. Hope that helps. Cheers, Mark Thanks, Ettore M. On May 29, 3:16 pm, rhizomorph cognitiontechnic...@yahoo.com wrote: I have the same question as Ettore. I installed the aroma.affymetrix package (and all supporting packages), but nowhere can I find a source to download and install the FIRMAGene package that the SUP3.R script clearly calls for. Rhizomorph. On May 29, 3:15 am, ettore mosca ettore.mos...@gmail.com wrote: Dear aroma.affymetrix developers, I'm very interested in using Gene 1.0 ST platform for alternative splicing. I read in your paper Differential splicing using whole-transcript microarrays that FIRMAGene is freely available as R package but I can not load the library following the instruction in the third additional file sup3.r (I installed and loaded aroma.affymetrix successfully) How do I install and load FIRMAGene library? Thanks, Ettore M. -- Ettore M. http://www.ettoremosca.it --~--~-~--~~~---~--~~ When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe from this group, send email to aroma-affymetrix-unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/aroma-affymetrix?hl=en -~--~~~~--~~--~--~---
[aroma.affymetrix] Re: Discussion on affymetrix-defined-transcript-clusters
Hi. Check the page for the MoEx-1_0-st chip, it has CDF files you can download: http://groups.google.com/group/aroma-affymetrix/web/moex-1-0-st Cheers, Mark On 08/05/2009, at 11:35 PM, telos wrote: Hi, I'd like to run FIRMA on mouse Exon array data. For this, it seem that I need special mouse Exon array CDF files... any advice on how I can get hold of them? Thanks -- Mark Robinson Epigenetics Laboratory, Garvan Bioinformatics Division, WEHI e: m.robin...@garvan.org.au e: mrobin...@wehi.edu.au p: +61 (0)3 9345 2628 f: +61 (0)3 9347 0852 -- --~--~-~--~~~---~--~~ When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe from this group, send email to aroma-affymetrix-unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/aroma-affymetrix?hl=en -~--~~~~--~~--~--~---
[aroma.affymetrix] Re: Compare the splicing pattern of two samples
Hi Mark: After I have obtained the statistically significant differentially expression(DE) exons between the two groups using limma, I want further to explore the splicing pattern of the DE exons in each group. As I know, for each array, FIRMA scores each exon as to whether its probes systematically deviate from the expected gene expression level. Is there any way to get a summary of firma score for each exon across all the arrays belonging to the same group? Thanks! How about the average of the FIRMA scores as a summary? Mark Xinjun On Mon, May 4, 2009 at 7:40 AM, Mark Robinson mrobin...@wehi.edu.au wrote: Hi Xinjun. Here, 'unitName' is the transcript cluster id and 'groupName' is the probeset id, as defined by Affymetrix. The 'unit', 'group' and 'cell' columns are indices and you may not need these. To find out what the unitName and groupName correspond to, I would consult the Affymetrix annotation files. (Assuming Human Exon 1.0 ST), if you go to: http://www.affymetrix.com/products_services/arrays/specific/exon.affx#1_4 and find the section Current NetAffx Annotation Files you'll find 2 CSV files that you can download, one for transcript clusters and one for probesets. These should give you genome coordinates, Genbank/ RefSeq/Ensembl identifiers, gene symbols, etc. Hope that helps. Mark On 04/05/2009, at 12:02 AM, Xinjun Zhang wrote: Hi: Thanks very much for your great help. But I still have difficulty in understanding the first 5 columns of fsDF, as you have taken as an example: head(fsDF[,1:5]) unitName groupName unit group cell 1 2315251 23152521 11 2 2315251 23152531 22 3 2315373 23153742 13 4 2315373 23153752 24 5 2315373 23153762 35 6 2315373 23153772 46 What does each column, especially the unitName and groupName mean ? And how can I correlate unitName and groupName to gene name and exon number? Thanks in advance! Xinjun On Thu, Apr 30, 2009 at 3:31 PM, Mark Robinson mrobin...@wehi.edu.au wrote: Hi Xinjun. Comments below. On 30/04/2009, at 12:33 AM, Xinjun Zhang wrote: Hi: Sorry for the second ambiguous question. Now I will give it out in another way: After limma analysis: # two groups: CEU and YRI design CEUYRIvsCEU GSM18868710 GSM18868810 GSM18886111 GSM18886211 #Limma fsDF - extractDataFrame(fs, addNames=TRUE) fsDF[,-c(1:5)] - log2(fsDF[,-c(1:5)]) fit-lmFit(fsDF[,-c(1:5)],design) fit-eBayes(fit) fit$genes-fsDF[,1] topTable(fit, coef=NULL, number = 10, adjust=BH) Then I got output in RGui in this form: topTable(fit,coef=YRIvsCEU,adjust=BH) ID logFC t P.Value adj.P.Val B 248067 3851537 10.781228 18.01769 5.069965e-07 0.1441163 -4.159301## In this line, is 248067 a NCBI geneid d and 3851537 a probeset id? And if the first coloum is gene ID, what is strange to me is that some of the ID is not a human gene id. 219150 3721400 -12.364204 -14.91257 1.798048e-06 0.2146088 -4.164325 90041 2903401 -8.915503 -13.79270 3.021449e-06 0.2146088 -4.166979 80808 2836738 7.811150 13.45085 3.568320e-06 0.2146088 -4.167917 250529 3862018 -7.935552 -13.33698 3.774934e-06 0.2146088 -4.168245 176674 3462843 10.559478 12.92835 4.637400e-06 0.2158173 -4.169490 224640 3744039 7.930627 12.66410 5.314668e-06 0.2158173 -4.170356 134937 3224650 -9.385466 -12.18252 6.860763e-06 0.2437758 -4.172073 155392 3352948 -6.802731 -11.71350 8.878365e-06 0.2804133 -4.173938 104503 3003193 -8.947865 -11.50151 1.000676e-05 0.2844473 -4.174852 Be careful here. The first number you see here (248067) is a row number that limma puts in, and is not an gene/probeset identifier of any kind. The ID column is what you have put in fit$genes (see above, you have the command fit$genes - fsDF[,1]). I would actually recommend that you put more in fit$genes, because what you have have now is only the first column of the fsDF data frame, which gives the transcript_cluster_id. So, you have the transcript_cluster_id, but you don't know what probeset_id this corresponds to. For example: head(fsDF[,1:5]) unitName groupName unit group cell 1 2315251 23152521 11 2 2315251 23152531 22 3 2315373 23153742 13 4 2315373 23153752 24 5 2315373 23153762 35 6 2315373 23153772 46 Maybe it would be better to do something like: ... fit$genes - fsDF[,1:2] ... that way your output table will look something like: topTable(fit,coef=1,n=2) unitName groupName logFC t P.Value adj.P.ValB 4166 3581637 3582005 -2.581813 -12.65587 3.786135e-13
[aroma.affymetrix] Re: Compare the splicing pattern of two samples
2315373 0.118-0.2140.25-0.320.808190.756890.05 0.949222315373 -0.0180.128-0.040.210.968230.838760.03 0.96672315554 -0.0270.191-0.060.310.952830.767830.07 0.93172315554 0.276-0.6290.65-1.050.536930.330060.56 0.596392315554 0.405-0.5520.88-0.850.407260.42320.44 0.660232315554 0.109-0.1990.23-0.30.824440.775090.04 0.956662315554 -0.3180.771-0.631.080.547890.315970.6 0.573922315554 . It got only a colomn called Genes ( it is probe id, I guess). So how can I use this output to find out the differentially expressed exons( and the corresponding genes) in the two groups? Is it clearer to you? Thanks in advance! Xinjun As I mentioned above, you should change what you put in fit$genes to allow you to know what probeset_id each row corresponds to. I'm guessing here, but you are probably interested in the YRIvsCEU parameter, so you'd probably sort on the p.value.YRIvsCEU column ... Cheers, Mark On Tue, Apr 28, 2009 at 4:29 PM, Mark Robinson mrobin...@wehi.edu.au wrote: Hi Xinjun. Comments below. On 28/04/2009, at 12:25 PM, Xinjun Zhang wrote: Hi Mark: Thanks very much for your clarification! Now I have approached to limma analysis of FIRMA score to get differentially spliced genes ( and also splicing pattern of each ). But I still have some difficulty to understand the code ( in red ) below in Limma analysis: #fs is the 'standard' FirmaSet-object fsDF - extractDataFrame(fs, addNames=TRUE) fsDF[,-c(1:5)] - log2(fsDF[,-c(1:5)])# I know why log2 is here but confused by fsDF[,c(1:5)] what does this expression mean? Note that it is -c(1:5), meaning operate on (here, take logs) all of the columns except 1:5 ... that is, because extractDataFrame gives some extra columns at the beginning that are NOT data, we only want to log that columns that have actual data. design - cbind(Grp1=1,Grp2=c(rep(0,n_1),rep(1,n_2))) fit-lmFit(fsDF[,-c(1:5)],design) fit-eBayes(fit) fit$genes-fsDF[,1] # Can I also get seperate splicing patterns for the two differentially spliced genes from two group (control and treatment )? I'm not sure what you are asking here. The probesets where the Grp2 coefficient is significantly different from 0 may highlight differentially spliced exons. Does that help? Mark Thanks in advance! Xinjun On Mon, Apr 27, 2009 at 6:19 AM, Mark Robinson mrobin...@wehi.edu.au wrote: Hi Xinjun. Quick comments below. Hi Mark: Thanks very much for your help and I have have got a quick start on a small dataset that each group (control and treatment ) contains 4 arrays. I have set up a file structure like this: = rawDate/ controlGroup/ HuEx-1_0-st-v1/ GSMXX.CEL GSMXX.CEL treatmentGroup/ HuEx-1_0-st-v1/ GSMXX.CEL GSMXX.CEL ==* This setup will need to be changed. You will want to put ALL samples together to do the PLM fitting, normalization, FIRMA scoring, etc. Something like: rawData/ thisExperiment/ HuEx-1_0-st-v1/ sample1.CEL sample2.CEL ... sampleN.CEL This is my code ( my questions are in red):* library(aroma.affymetrix) #Getting annotation data files chipType - HuEx-1_0-st-v1 cdf - AffymetrixCdfFile$byChipType(chipType) print(cdf) #Defining CEL set cs - AffymetrixCelSet$byName(controlGroup, cdf=cdf) print(cs) #Background Adjustment and Normalization bc - RmaBackgroundCorrection(cs) csBC - process(bc,verbose=verbose) #quantile normalization qn - QuantileNormalization(csBC, typesToUpdate=pm) ### I set the second parameter as pm as the chip type is Affymetrix exon array, is that right? print(qn) csN - process(qn, verbose=verbose) This is fine. #Summarization getCdf(csN) ## * Fit exon-by-exon*, change the value of mergeGroups to FALSE in the ExonRmaPlm() call above. *plmEx *- ExonRmaPlm(csN, mergeGroups=*FALSE*) print(*plmEx*) #To fit the PLM to all of the data, do: fit(*plmEx*, verbose=verbose) * And here is my problem:* firma - FirmaModel(plmTr) # I have noticd that FIRMA analysis ONLY works from the PLM based on transcripts. So when
[aroma.affymetrix] Re: SNPs affecting EXon splicing detection
Hi Sabrina. I have not had to deal with this myself, but I do know that it exists and I can at least suggest a possible route to exclude affected exons. Presumably, there is a database (dbSNP?) that tells you the genome locations of each SNP for your strains. There is also a probe.tab file from Affymetrix that gives you the mapped genome locations of each probe (or you could take the sequences from the same file and map them yourself with a tool like BLAT). It is then just a matter of looking whether each probe maps to a location on the genome that overlaps a SNP. There is probably a Bioconductor tool for this or you could create a hash, etc. There are a couple levels at which you might introduce this to your analysis. You could remove individual probes that are affected. On the aroma.affymetrix side, this would require creating a new CDF with those affected probes not included (a bit tricky but doable). Or, you could simply post-process your existing results and remove probesets that have an affected probe (easier but not as elegant). You might've also seen: Duan S, Zhang W, Bleibel WK, Cox NJ, Dolan ME: SNPinProbe 1.0: A database for filtering out probes in the Affymetrix GeneChip(R) Human Exon 1.0 ST array potentially affected by SNPs. Bioinformation 2008, 2(10):469{470. Hope that gets you started. Cheers, Mark On 30/04/2009, at 6:07 AM, sabrina wrote: Hi, all: I am using Aroma for detecting exon skipping events around two groups (two different strains). I found out that several of my top hits indeed includes at least one SNP between two strains. I wonder if anyone has some suggestion about how to deal with this situation. If I need to remove all affected exons from analysis, how can I do it? I never worked with SNP data before, can anyone give me a hint? Thanks a lot! Sabrina -- Mark Robinson Epigenetics Laboratory, Garvan Bioinformatics Division, WEHI e: m.robin...@garvan.org.au e: mrobin...@wehi.edu.au p: +61 (0)3 9345 2628 f: +61 (0)3 9347 0852 -- --~--~-~--~~~---~--~~ When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe from this group, send email to aroma-affymetrix-unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/aroma-affymetrix?hl=en -~--~~~~--~~--~--~---
[aroma.affymetrix] Re: Compare the splicing pattern of two samples
Hi Xinjun. Comments below. On 28/04/2009, at 12:25 PM, Xinjun Zhang wrote: Hi Mark: Thanks very much for your clarification! Now I have approached to limma analysis of FIRMA score to get differentially spliced genes ( and also splicing pattern of each ). But I still have some difficulty to understand the code ( in red ) below in Limma analysis: #fs is the 'standard' FirmaSet-object fsDF - extractDataFrame(fs, addNames=TRUE) fsDF[,-c(1:5)] - log2(fsDF[,-c(1:5)])# I know why log2 is here but confused by fsDF[,c(1:5)] what does this expression mean? Note that it is -c(1:5), meaning operate on (here, take logs) all of the columns except 1:5 ... that is, because extractDataFrame gives some extra columns at the beginning that are NOT data, we only want to log that columns that have actual data. design - cbind(Grp1=1,Grp2=c(rep(0,n_1),rep(1,n_2))) fit-lmFit(fsDF[,-c(1:5)],design) fit-eBayes(fit) fit$genes-fsDF[,1] # Can I also get seperate splicing patterns for the two differentially spliced genes from two group (control and treatment )? I'm not sure what you are asking here. The probesets where the Grp2 coefficient is significantly different from 0 may highlight differentially spliced exons. Does that help? Mark Thanks in advance! Xinjun On Mon, Apr 27, 2009 at 6:19 AM, Mark Robinson mrobin...@wehi.edu.au wrote: Hi Xinjun. Quick comments below. Hi Mark: Thanks very much for your help and I have have got a quick start on a small dataset that each group (control and treatment ) contains 4 arrays. I have set up a file structure like this: = rawDate/ controlGroup/ HuEx-1_0-st-v1/ GSMXX.CEL GSMXX.CEL treatmentGroup/ HuEx-1_0-st-v1/ GSMXX.CEL GSMXX.CEL ==* This setup will need to be changed. You will want to put ALL samples together to do the PLM fitting, normalization, FIRMA scoring, etc. Something like: rawData/ thisExperiment/ HuEx-1_0-st-v1/ sample1.CEL sample2.CEL ... sampleN.CEL This is my code ( my questions are in red):* library(aroma.affymetrix) #Getting annotation data files chipType - HuEx-1_0-st-v1 cdf - AffymetrixCdfFile$byChipType(chipType) print(cdf) #Defining CEL set cs - AffymetrixCelSet$byName(controlGroup, cdf=cdf) print(cs) #Background Adjustment and Normalization bc - RmaBackgroundCorrection(cs) csBC - process(bc,verbose=verbose) #quantile normalization qn - QuantileNormalization(csBC, typesToUpdate=pm) ### I set the second parameter as pm as the chip type is Affymetrix exon array, is that right? print(qn) csN - process(qn, verbose=verbose) This is fine. #Summarization getCdf(csN) ## * Fit exon-by-exon*, change the value of mergeGroups to FALSE in the ExonRmaPlm() call above. *plmEx *- ExonRmaPlm(csN, mergeGroups=*FALSE*) print(*plmEx*) #To fit the PLM to all of the data, do: fit(*plmEx*, verbose=verbose) * And here is my problem:* firma - FirmaModel(plmTr) # I have noticd that FIRMA analysis ONLY works from the PLM based on transcripts. So when the parameter is plmTr, I wonder how can it detect the splicing events of genes ? Should not the parameter be plmEx? fit(firma, verbose=verbose) fs - getFirmaScores(firma) Like it says on the group web page for Exon arrays: The FIRMA analysis ONLY works from the PLM based on transcripts. This is NOT an error. That's the way it works. The manuscript gives more details for why this is the case: http://bioinformatics.oxfordjournals.org/cgi/content/abstract/24/15/1707 Hope that helps. Cheers, Mark On Fri, Apr 24, 2009 at 5:24 PM, Mark Robinson mrobin...@wehi.edu.auwrote: Hi Xinjun. Here is a quick sketch of what I might do. 1. Run everything to get FIRMA scores. See group page for running details and the Purdom Bioinformatics 2008 paper for methodological details. 2a. If Nn or Nc 1, use 'limma' to look for a difference in FIRMA scores between your two groups. See threads: http://groups.google.com/group/aroma-affymetrix/browse_thread/thread/36d8c59d742fc503/ http://groups.google.com/group/aroma-affymetrix/browse_thread/thread/7d2645bd76cc2023/ 2b. If you have say patient samples (and a good number of them), you might expect only a subset of your C or N patients to have a splicing aberration. In this case, maybe you just want to look for large
[aroma.affymetrix] Re: Quality assessment of Gene ST Array
Hi Cathy. On 27/04/2009, at 11:34 PM, Cathy Mitchell wrote: To whom it may concern I would like to know if there is a way of separating out the foreground and background spatial plots? I assume the spatial plots that are given as an example in the google.groups quality assessment of raw data section. I am using Gene ST 1.0 Human arrays. I'm afraid I don't know the answer to this. Be sure to consider RLE/NUSE plots as part of your quality assessment. Is there a way of creating a box plot representing the log2 probe intensities of each array on one box plot? As opposed to doing plotDensities for each array? See my post from the other day (Extracting raw data before normalization): http://groups.google.com/group/aroma-affymetrix/browse_thread/thread/5af426f8f5e1625b The extracting data part, using getCellIndices() and extractMatrix() (... and boxplot) will be analagous for Gene 1.0 ST as for Exon 1.0 ST. Also is there a way to analyse the individual probes as opposed to the probesets? Not sure what you have in mind here, but using the above you will have probe-level data in hand. Hope that helps. Mark Thank you -- Cathrine Mitchell -- Mark Robinson Epigenetics Laboratory, Garvan Bioinformatics Division, WEHI e: m.robin...@garvan.org.au e: mrobin...@wehi.edu.au p: +61 (0)3 9345 2628 f: +61 (0)3 9347 0852 -- --~--~-~--~~~---~--~~ When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe from this group, send email to aroma-affymetrix-unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/aroma-affymetrix?hl=en -~--~~~~--~~--~--~---
[aroma.affymetrix] Re: Error Occured When Convertting CDF File
Hi Xinjun. Maybe a more-informed Windows user than myself can chime in here (I use OSX/Linux almost exclusively), but perhaps its as simple as adjusting your memory limit. As you can see below, you do get an Reached total allocation of 1535Mb warning message. One possible link to look at: http://projetos.inpa.gov.br/i3geo/pacotes/r/win/library/base/html/Memory-limits.html Cheers, Mark Hi Mark: * Thanks for your reminding. The ASCII CDF file is download from Affymetrix (HuEx-1_0-st-v2.cdf.zip). The following is the error message:* == Reading CDF header... Reading CDF header...done Reading CDF QC units... Reading CDF QC units...done Reading CDF units... Reading CDF units...done Writing CDF structure... Error: Unable to allocate vectors of 24.4 MB ( I translated it from Chinese when running on PC with 4G memory only running R) In addition : Warning messages: 1: In which(raw == as.raw(255)) : Reached total allocation of 1535Mb: see help(memory.size) 2: In which(raw == as.raw(255)) : Reached total allocation of 1535Mb: see help(memory.size) 3: In which(raw == as.raw(255)) : Reached total allocation of 1535Mb: see help(memory.size) 4: In which(raw == as.raw(255)) : Reached total allocation of 1535Mb: see help(memory.size) 5: In which(raw == as.raw(255)) : Reached total allocation of 1535Mb: see help(memory.size) 6: In which(raw == as.raw(255)) : Reached total allocation of 1535Mb: see help(memory.size) Timing stopped at: 21.87 0.13 22.04 = * And this is my code :* === library(affxparser) files - list.files(patt=[Cc][Dd][Ff]$) dir.create(converted) outPath - converted outFiles - paste(outPath, files, sep=/) for (kk in seq(files)) { convertCdf(files[kk], outFiles[kk], version = 4,force = TRUE, verbose=TRUE) } === *Information returned by traceback( ) :* traceback() 6: which(raw == as.raw(255)) 5: .initializeCdf(con = con, nRows = cdfHeader$nrows, nCols = cdfHeader$ncols, nUnits = cdfHeader$nunits, nQcUnits = cdfHeader$nqcunits, refSeq = cdfHeader$refseq, unitnames = unitNames, qcUnitLengths = qcUnitLengths, unitLengths = unitLengths) 4: writeCdfHeader(con = con, cdfheader, unitNames = names(cdf), qcUnitLengths = qcUnitLengths, unitLengths = unitLengths, verbose = verbose) 3: writeCdf(outFilename, cdfheader = cdfHeader, cdf = cdfUnits, cdfqc = cdfQcUnits, overwrite = TRUE, verbose = verbose2) 2: system.time({ writeCdf(outFilename, cdfheader = cdfHeader, cdf = cdfUnits, cdfqc = cdfQcUnits, overwrite = TRUE, verbose = verbose2) }) 1: convertCdf(files[kk], outFiles[kk], version = 4, force = TRUE, verbose = TRUE) On Sat, Apr 25, 2009 at 4:37 PM, Mark Robinson mrobin...@wehi.edu.au wrote: Hi Xinjun. First off, you might be able to make use of the binary CDF files already created (unless you are doing something non-standard). See: http://groups.google.com/group/aroma-affymetrix/web/huex-1-0-st-v2 As to your problem below, I'm not sure of the memory fingerprint for doing the conversion, but I'd be surprised if it was 2GB. Perhaps you have objects in memory in your current workspace? Do you still get the error from a fresh R session? As always, when you get an error, it is good practice to give the output of both sessionInfo() and traceback() ... and even a code example wouldn't hurt. Hope that helps. Mark Hi: I am running convertCdf() with R 2.9.0 on my PC / Windows XP. I am going to convert HuEx-1_0-st-v2.cdf to binary format from ASCII format. But an error has occurred noted as unable to allocate vectors of 10.9 MB ( I have translated the error message from Chinese). The memory is about 2G. Is that the point or some other reasons? Thanks in advance. Xinjun --~--~-~--~~~---~--~~ When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe from this group, send email to aroma-affymetrix-unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/aroma-affymetrix?hl=en -~--~~~~--~~--~--~---
[aroma.affymetrix] Re: question for using custom Affy Exon array CDF
Hi Jing. To be honest, I haven't explored this in any great detail. I know that Elizabeth sometimes used the bigger CDFs for the BG correction/ normalization steps and then switched to the 'core' CDF for fitting the PLMs. I'd expect only subtle changes in the 'normexp' BG adjustment since it would be fitted on 1M probes with either CDF, but as I mentioned, I have not studied it. If all of your downstream analysis is focussed on the 'core' CDF, then it is probably sufficient to use the 'core' CDF for BG adjustment. This is what I do for a lot of my work, at least. Cheers, Mark On 01/04/2009, at 6:25 AM, jing ma wrote: Dear Mark, When doing background subtraction for the Affy exon array, do you think it is sufficient to use the CORE CDF (I usually use HuEx-1_0- st-v2,coreR3,A20071112,EP.cdf)? Thanks, Jing On Mon, Mar 16, 2009 at 5:35 PM, Mark Robinson mrobin...@wehi.edu.au wrote: Hi Jing. See below. On 17/03/2009, at 2:54 AM, jing wrote: To whom it may concern, I'm analyzing some Affy human exon array data and hope to generate similar plots as seen in the supplementary figures in the Purdom 2008 Bioinformatics paper. To do so, I need to get the normalized probe intensities and residuals. I've already followed the steps described in the Human Exon Array Analysis vignette and get the following: (1) ... csN - process(qn, verbose=verbose) (2) ... res-getResidualSet(plmTr) I tried the function extractDataFrame(...,addNames=TRUE) hoping to get the data plus column labels for my samples but it didn't work. Is there any easy way to extract these two sets of data in matrix format similar to the FIRMA score matrix with probe ID and column labels? Thats right. extractDataFrame() is typically used for some kind of summarized data. For example, you can use extractDataFrame() for pulling out FIRMA scores (summarized at the probeset level), or RMA summarized data (summarized at probeset or gene level). To pull out the raw/normalized data and the residuals, you can use extractMatrix() or readUnits(). I prefer the former. Probably its best to suggest you look at the thread: http://groups.google.com/group/aroma-affymetrix/browse_thread/thread/46d609076d9580fb Look for the commands after the # starting from PLM ... line. I've been meaning to put up a page giving a summary of these commands, including how to use exon array data with GenomeGraphs. Hopefully I can find some time shortly to do that. Hope that helps. Mark Thanks, Jing Jing Ma Hartwell Center for Bioinformatics Biotechnology St. Jude Children's Research Hospital -- Mark Robinson Epigenetics Laboratory, Garvan Bioinformatics Division, WEHI e: m.robin...@garvan.org.au e: mrobin...@wehi.edu.au p: +61 (0)3 9345 2628 f: +61 (0)3 9347 0852 -- -- Mark Robinson Epigenetics Laboratory, Garvan Bioinformatics Division, WEHI e: m.robin...@garvan.org.au e: mrobin...@wehi.edu.au p: +61 (0)3 9345 2628 f: +61 (0)3 9347 0852 -- --~--~-~--~~~---~--~~ When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe from this group, send email to aroma-affymetrix-unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/aroma-affymetrix?hl=en -~--~~~~--~~--~--~---
[aroma.affymetrix] Re: gcRMA for Gene ST Arrays
Hi Mario. I will look into this. I know this feature doesn't get used too often for these new arrays. As you may know, it will need to be called differently than when using it for 3' IVT arrays (e.g. HG-U133). Can you give me your sequence of commands? I assume you have specified the set of negative control indices to use, right? Cheers, Mark On 24/03/2009, at 8:57 PM, Mario Fasold wrote: Hi all! I'd like to use gcRMA correction on Human Gene 1.0 ST data. However, the method GcRmaBackgroundCorrection fails, probably since the probe_tab file has a slightly different layout for these chips (see error message below). Is there a way of telling the gcRMA function to use different columns of the probe_tab file? Best, Mario. Reading tab-delimited sequence file... Error in if (any(units 1)) stop(Argument 'units' contains non- positive indices.) : missing value where TRUE/FALSE needed -- Mark Robinson Epigenetics Laboratory, Garvan Bioinformatics Division, WEHI e: m.robin...@garvan.org.au e: mrobin...@wehi.edu.au p: +61 (0)3 9345 2628 f: +61 (0)3 9347 0852 -- --~--~-~--~~~---~--~~ When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe from this group, send email to aroma-affymetrix-unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/aroma-affymetrix?hl=en -~--~~~~--~~--~--~---
[aroma.affymetrix] Re: Write normalized intensities as a CEL file
Hi Libing. Perhaps the combination of createCel() and updateCel() in the 'affxparser' package are what you are after. Although, maybe if you tell us more about what it is you are trying to do, we can be of more help. Cheers, Mark On 24/03/2009, at 2:08 AM, Libing wrote: Hi, I am wondering if writeCel() is the one I should use. There are no documents for its usage. Can anyone provide more details? Or some links. Thanks! -- Mark Robinson Epigenetics Laboratory, Garvan Bioinformatics Division, WEHI e: m.robin...@garvan.org.au e: mrobin...@wehi.edu.au p: +61 (0)3 9345 2628 f: +61 (0)3 9347 0852 -- --~--~-~--~~~---~--~~ When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe from this group, send email to aroma-affymetrix-unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/aroma-affymetrix?hl=en -~--~~~~--~~--~--~---
[aroma.affymetrix] Re: Custom CDF Creation
Hi Jake. As a starting point, you might have a look at: http://groups.google.com/group/aroma-affymetrix/web/creating-cdf-files-from-scratch In there is a script called 'flat2Cdf.R' that takes a flat file of probe information (X/Y location on the chip, probe sequence, identifiers) and creates a CDF file. I assume you will have this information or can get it. Maybe create 2 such flat files. Alternatively, you should also search the Bioconductor archives. I do recall some discussion on this awhile back and there were some scripts generated that removed probes from CDF environments. In any case, you could also familiarize yourself with the CDF format by taking an existing CDF file and reading it in with readCdf() from the 'affxparser' package. That is always good information to know. As far as I can tell, you won't need to modify anything in the CEL files, you'll just have read from the CEL file twice, once from your SNP-probes-removed CDF file and once from SNP-probes-only CDF file. Hope that helps. Mark Hi everyone, I'm trying to perform probe level analyses on the HG-U133Plus2 chip data. Basically I have a set of SNPs that overlap with probes from the chip. I wanted to analyze two different aspects of these allele- specific probes: 1) Analyze the probesets without the allele-specific probe 2) Analyze the allele-specific probe individually How do I generate custom CDFs for this purpose? Are the only changes that need to be made in the CDF or does one have to modify, say, CEL files as well? Any help would be appreciated. Thanks, Jake --~--~-~--~~~---~--~~ When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe from this group, send email to aroma-affymetrix-unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/aroma-affymetrix?hl=en -~--~~~~--~~--~--~---
[aroma.affymetrix] Re: Problem understanding FIRMA
Hi Christian. However, in this respect I have also the following question: How does using median polish compare to using R_rlm_rma_default_model? Are the final scores still of some use if you use medpol? Short answer is I haven't investigated this too thoroughly. But, my guess is that it wouldn't be too different. That prediction is based on the fact that the chip effects are in the same ballpark, as you can see from the Aroma_vs_Affy (Aroma=R_rlm_rma, Affy=medpol) plot in the following thread: http://groups.google.com/group/aroma-affymetrix/browse_thread/thread/1b0ab11fad9b4df3/f745ed0860546313 But, I'd be interested to hear more details if you do look into it more. Cheers, Mark Best regards Christian On Mar 16, 9:13 am, Mark Robinson mrobin...@wehi.edu.au wrote: Hi Christian. From what I can tell looking at your code (rather quickly, i must admit), there will be 2 differences between aroma.affymetrix and what you have: 1. We use the 'preprocessCore' codebase for the robust fitting of the linear model (... but maybe you are just using median polish as an illustration). For example, you might try: library(preprocessCore) f - .Call(R_rlm_rma_default_model, log2(yTr), 0, 1.345,PACKAGE=preprocessCore) [... and piece together the alpha, beta, etc ...] 2. The estimate of standard error is calculated genewise, over residuals from all probes/samples (i.e. u.mad should be a scalar not a vector). Hope that helps. Mark On 16/03/2009, at 6:32 PM, cstratowa wrote: Dear all, After reading the FIRMA paper I would like to understand the implementation, but this is not easy since the source code is hard to read. Nevertheless, I tried and would like to know if this is correct. According to the page on exon array analysis you do the following: I, fit a summary of the entire transcript plmTr - ExonRmaPlm(csN, mergeGroups=TRUE) fit(plmTr, verbose=verbose) II, fit the FIRMA model for each exon firma - FirmaModel(plmTr) fit(firma, verbose=verbose) However, I would like to understand the underlying source code. For this example let us assume that we have quantile-normalized intensities yTr for a transcript containing two exons: yTr HeartA HeartBHeartC MuscleA MuscleB MuscleC 1 5.74954 18.02962.50436 15.5857 26.1744 31.0075 2 9.59819 23.0093 22.01120 70.1742 32.8408 102.0080 3 114.50800 87.1742 70.34080 312.3410 266.1740 601.3410 4 66.34080 52.0075 67.34080 184.1740 266.1740 147.0080 5 210.17400 142.0080 173.34100 514.5080 659.1740 509.6740 6 104.00800 84.3408 70.34080 333.5080 324.1740 231.0080 7 194.00800 124.5080 234.00800 443.6740 767.5080 716.8410 8 319.34100 282.6740 283.50800 656.0080 807.6740 954.6740 Here rows 1:4 code for exon 1 and rows 5:8 code for exon 2. I, fit a summary of the entire transcript To simplify issues I will fit the data using median polish: # 1. fit median polish mp - medpolish(log2(yTr)) # 2. data set specific estimates (probe affinities) beta - mp$overall+mp$col thetaTr - 2^beta # 3. array-specific estimates alpha - mp$row alpha[length(alpha)] - -sum(alpha[1:(length(alpha)-1)]) phiTr - 2^alpha II, fit FIRMA model for each exon # 1. calculate residuals phi - matrix(phiTr, nrow=nrow(yTr), ncol=ncol(yTr)) theta - matrix(thetaTr, nrow=nrow(yTr), ncol=ncol(yTr), byrow=TRUE) yhat - phi *theta eps - yTr/yhat# rma uses y/yhat # 2. estimate of standard error u.mad - apply(log2(eps), 2, mad, center=0) # 3. compute final score statisitc # for 1. exon y1 - log2(eps[1:4,]) F1 - apply(y1/u.mad, 2, median) F1 HeartA HeartB HeartC MuscleA MuscleB MuscleC -0.89938777 -0.03792624 -0.69409936 0.11536565 -0.61385296 1.08709568 # for 2. exon y2 - log2(eps[5:8,]) F2 - apply(y2/u.mad, 2, median) F2 HeartA HeartB HeartC MuscleA MuscleB MuscleC -0.02899616 -1.64645153 -0.70048533 -0.39996057 0.02666064 -1.46657055 Now my question is: Is this calculation of the final score statistic F1 for exon 1 and F2 for exon 2 correct? Did I miss something? Best regards Christian -- Mark Robinson Epigenetics Laboratory, Garvan Bioinformatics Division, WEHI e: m.robin...@garvan.org.au e: mrobin...@wehi.edu.au p: +61 (0)3 9345 2628 f: +61 (0)3 9347 0852 -- -- Mark Robinson Epigenetics Laboratory, Garvan Bioinformatics Division, WEHI e: m.robin...@garvan.org.au e: mrobin...@wehi.edu.au p: +61 (0)3 9345 2628 f: +61 (0)3 9347 0852 -- --~--~-~--~~~---~--~~ When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you
[aroma.affymetrix] Re: Using GenomeGraphs with FIRMA (Was: FIRMA score)
a CDF file where the groups are labelled with Ensembl identifiers. In your example plot your first row shows the data ds - getDataSet(plm) (I think there is a typo somewhere ds-cs). What exactly is this data? Thanks for spotting the typo. Good find. Indeed, I should have written: ds - getDataSet(plm) [...] d - log2(extractMatrix(ds,cells=ind,verbose=verbose)) So, in my plot, I was plotting unnormalized raw intensity data, not the BG-adjusted quantile normalized, since my 'cs' was defined as: cs - AffymetrixCelSet$byName(tissues, cdf=cdf) I though, the idea of a probe level model was to somehow merge all probe values to a single probeset-value. So afterwards you have one summarized intensity for each probeset and each array. Following this thought, there should be only one intensity-value for each probeset in the plot. But your plot shows (1st. row) more than one value per probeset (mostly 4, one value per probe). So what exactly did you plot there? I'm plotting all probes. Top plot is the raw data, lower plot is the residuals ... then all the gene annotation at the bottom. In my plot above I used the values from celsetN - process(QuantileNormalization(celsetBC, typesToUpdate=pm)) and expected to plot the normalized intensities... Did I get i wrong? I have my plot and the corresponding code attached. Your 'plotdata1' is the normalized data. Your 'plotdata2' and 'plotdata3' are the chip effects from probeset-level summaries (PLM and FIRMA, respectively). Therefore, the nProbes element of 'exon2' and 'exon3' doesn't actually match the data, does it? What is the result of: nrow(plotdata2$intensities) == sum(plotdata2$probesetdata[,4]) In my example: nrow(d) == sum(as.numeric(nProbes)) [1] TRUE Is that a possible source of the problem? Right now I'm pretty confused... I'm hoping for some enlightenment! Frank Hopefully I haven't confused you more. There are a lot of questions/ intricacies here! Good luck. Mark P.S.: Right now (2009-03-05, around 3 GMT+1) I can't connect to Ensembl through Biomart: library(GenomeGraphs) mart = useMart(ensembl, dataset = hsapiens_gene_ensembl) Opening and ending tag mismatch: meta line 3 and body Premature end of data in tag html line 1 Fehler: 1: Opening and ending tag mismatch: meta line 3 and body 2: Premature end of data in tag html line 1 Hopefully this is just a temporary error... Does anybody have the same problem? GenomeGraphs_ENSG0060237.pdfGenomeGraphs_ENSG0060237.R -- Mark Robinson Epigenetics Laboratory, Garvan Bioinformatics Division, WEHI e: m.robin...@garvan.org.au e: mrobin...@wehi.edu.au p: +61 (0)3 9345 2628 f: +61 (0)3 9347 0852 -- --~--~-~--~~~---~--~~ When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe from this group, send email to aroma-affymetrix-unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/aroma-affymetrix?hl=en -~--~~~~--~~--~--~---
[aroma.affymetrix] Re: to use array list file
Hi Sun. One of the handy features of aroma.affymetrix is that it pulls a lot of information from the directories in which files reside (e.g. platform, dataset name, etc). If you are on unix/mac osx, you can use symbolic links for this. For example, everything in your ./rawData/MYPLATFORM/MYDATASET directory would be a link to the location on disk where the file resides. You could probably even write a script to create all the symbolic links from the list of full pathnames in your text file ... I don't know how to do this on windows, but presumably there is something similar. HTH, Mark On 13/02/2009, at 11:24 AM, Wukong Sun wrote: Dear Dr. Bengtsson: I am wondering whether aroma-affymetrix can read in CEL files that are scattered in several directories. For example, the user may want to provide a text file specifying the full paths of CEL files. The reason is that I want to process these CEL files together, however I don't want to copy them to one folder. Thanks. -- Mark Robinson Epigenetics Laboratory, Garvan Bioinformatics Division, WEHI e: m.robin...@garvan.org.au e: mrobin...@wehi.edu.au p: +61 (0)3 9345 2628 f: +61 (0)3 9347 0852 -- --~--~-~--~~~---~--~~ When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe from this group, send email to aroma-affymetrix-unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/aroma-affymetrix?hl=en -~--~~~~--~~--~--~---
[aroma.affymetrix] Re: probes missing from mogene10st.db
Hi Sebastien. Have a look at: http://thread.gmane.org/gmane.science.biology.informatics.conductor/19591/ If you want these probesets, you might consider creating the CDF directly from a pdInfoBuilder package (which works directly from the PGF/CLF files from Affy), as described at: http://groups.google.com/group/aroma-affymetrix/web/creating-cdf-files-from-r-packages-environments Hope that helps. Mark On 11/02/2009, at 10:32 AM, Sebastien Gerega wrote: Hi, I have come across some missing probes in the mogene10st.db package. I am analysing my data using the aroma package and have included some of my code: ces = getChipEffectSet(plm) gExprs = extractDataFrame(ces, units=NULL, addNames=TRUE) affyIDs = gExprs$unitName affyIDs[which(affyIDs %in% names(unlist(as.list(mogene10stENTREZID))) == FALSE)] [1] 10344715 10346139 10348945 10351859 10361914 10363331 10364435 10372094 10375121 [10] 10388263 10388269 10393404 10394238 10396419 10398315 10401420 10406248 10408136 [19] 10408144 10408900 10409545 10412699 10418129 10422509 10424416 10427993 10428439 [28] 10433478 10436198 10439287 10439409 10439974 10442153 10442256 10445378 10449547 [37] 10450920 10453715 10457583 10457778 10458958 10459467 10461790 10462311 10467838 [46] 10482137 10484695 10485716 10490984 10495594 10496888 10498497 10502199 10514325 [55] 10514327 10514329 10514331 10521698 10527099 10528157 10528159 10528172 10529815 [64] 10530140 10531631 10535467 10539979 10540931 10541349 10544538 10546227 10548697 [73] 10550167 10553786 10553811 10559498 10571382 10579607 10579763 10583199 10584393 [82] 10591182 10591184 10595787 10608198 10608200 Are people aware of these missing probes? I have not been able to find any documentation about the issue. Should I send this to the BioC mailing list instead? thanks for your help, Sebastien -- Mark Robinson Epigenetics Laboratory, Garvan Bioinformatics Division, WEHI e: m.robin...@garvan.org.au e: mrobin...@wehi.edu.au p: +61 (0)3 9345 2628 f: +61 (0)3 9347 0852 -- --~--~-~--~~~---~--~~ When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe from this group, send email to aroma-affymetrix-unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/aroma-affymetrix?hl=en -~--~~~~--~~--~--~---
[aroma.affymetrix] Re: annotation of ST gene Arrays
Hi Simon. See comments below. I am using the mouse gene ST arrays and am having problems with annotation. When i write a csv file, the annotation is only the probeset_id, no gene names or accession numbers etc. That's what it should be. Actually, its the 'transcript_cluster_id'. Previously, Affy did not provide annotation at the probeset level. The CDF file just contains the identifiers. Linking results (e.g. expression summaries) to the annotation can be done with other R packages. For example, here is some code I gave Sebastien a few weeks ago that will get you started (just replace hugene10st.db with mogene10st.db): - Say you have some Affy identifiers: myids [1] 7950136 7955845 7955852 7955855 7955858 7955865 7955869 [8] 7955873 7955887 8016433 Load package and read off the gene symbols: library(hugene10st.db) symbols - unlist(as.list(hugene10stSYMBOL)) data.frame(affyid = myids,symbol = symbols[myids]) affyid symbol 7950136 7950136 PHOX2A 7955845 7955845 HOXC13 7955852 7955852 HOXC12 7955855 7955855 HOXC11 7955858 7955858 HOXC10 7955865 7955865 HOXC9 7955869 7955869 HOXC8 7955873 7955873 HOXC6 7955887 7955887 HOXC5 8016433 8016433 HOXB1 Here are some other fields in hugene10st.db: hugene10st hugene10st hugene10stCHRLENGTHS hugene10stENTREZID hugene10stGO2ALLPROBES hugene10stORGANISM hugene10stPMID2PROBE hugene10stUNIPROT hugene10st.db:: hugene10stCHRLOC hugene10stENZYME hugene10stGO2PROBE hugene10stPATH hugene10stPROSITEhugene10st_dbInfo hugene10stACCNUM hugene10stCHRLOCEND hugene10stENZYME2PROBE hugene10stMAPhugene10stPATH2PROBE hugene10stREFSEQ hugene10st_dbconn hugene10stALIAS2PROBEhugene10stENSEMBL hugene10stGENENAME hugene10stMAPCOUNTS hugene10stPFAM hugene10stSYMBOL hugene10st_dbfile hugene10stCHRhugene10stENSEMBL2PROBE hugene10stGO hugene10stOMIM hugene10stPMID hugene10stUNIGENEhugene10st_dbschema ... - These probesets also do not match the probeset_ids from MoGene-1_0-st-v1.na27.mm9 off the affymetrix website. Perhaps you want 'transcript_cluster_id's? (CSV files from http://www.affymetrix.com/products_services/arrays/specific/mousegene_1_st.affx) tr - read.csv(MoGene-1_0-st- v1.na27.mm9.transcript.csv,header=TRUE,comment.char=#) ps - read.csv(MoGene-1_0-st- v1.na27.mm9.probeset.csv,header=TRUE,comment.char=#) cdf - AffymetrixCdfFile$fromChipType(MoGene-1_0-st- v1,verbose=verbose) un - getUnitNames(cdf) sum( un %in% ps$transcript_cluster_id ) [1] 28815 sum( un %in% tr$transcript_cluster_id ) [1] 35474 You may also be interested in the following thread, which explains the difference in number of probesets: http://thread.gmane.org/gmane.science.biology.informatics.conductor/19591/ here is my session: library('aroma.affymetrix') cdf - AffymetrixCdfFile$byChipType(MoGene-1_0-st-v1,tags='r3') cs - AffymetrixCelSet$byName(Files, cdf=cdf) bc - RmaBackgroundCorrection(cs) csBC - process(bc,verbose=verbose) qn - QuantileNormalization(csBC, typesToUpdate=pm) csN - process(qn, verbose=verbose) plm - RmaPlm(csN) fit(plm, verbose=verbose) qam - QualityAssessmentModel(plm) ces - getChipEffectSet(plm) mat - extractMatrix(ces) mat - log2(mat) rownames(mat) - getUnitNames(cdf) write.csv(mat, file=data.csv) I am sure there is a simple solution to this and I apologize as I am new to R. Any help would be much appreciated. Also, what are people opinions on the positive and negative controls probesets? Should these be included as part of a final gene list? Thank you in advance for any help. Good question. Some people use the controls for QC and some use them for adjusting for background (for example, the pool of GC content probes). But, definitely if you were to follow this up with some kind of differential expression analysis (e.g. limma), I would discard the non-main probes. For example: table(tr$category) control-affx control-bgp- antigenomic main 22 45 28815 normgene-exon normgene-intron rescue-FLmRNA- unmapped 1324 522291 Hope that helps. Mark -- Mark Robinson Epigenetics Laboratory, Garvan Bioinformatics Division, WEHI e: m.robin...@garvan.org.au e: mrobin...@wehi.edu.au p: +61 (0)3 9345 2628 f: +61 (0)3 9347 0852 -- --~--~-~--~~~---~--~~ When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received
Ensembl-centric CDFs from scratch (Was: Re: [aroma.affymetrix] Error: cannot allocate vector of size 692.8 Mb)
Hi Sabrina. I created a new thread as this is a new question. And a big one! On 05/02/2009, at 6:06 AM, sabrina wrote: Hi, Mark: Quick question. I was looking at the example of creating cdfs from scratch. I downloaded biomaRt , but not sure where to get or how to create the exonBoundary file. Can you give me more information on that? Thanks! Sabrina First of all, you may want to start with the Ensembl-centric CDFs that Elizabeth has already made for the Mouse Exon 1.0 array. You can find these from the link at: http://groups.google.com/group/aroma-affymetrix/web/moex-1-0-st Creating custom CDFs for exon arrays is not an easy procedure and while we've (mostly Elizabeth) made some effort to document and share the code of how we have done it, we (or at least I) don't expect it to be bulletproof and it may require additional effort on your part. That said, to answer your question specifically, you'll need to make sure you get the exon coordinates (i.e. boundaries) when you download from Biomart. Here is one example of what I downloaded from Biomart way back when for the Human array: --- Ensembl Gene ID,Chromosome,Biotype,Exon Start (bp),Exon End (bp),Ensembl Exon ID,Constitutive Exon,Strand,Coding Start (bp),Coding End (bp) ENSG0184895,Y,protein_coding, 2714896,2715740,ENSE1494622,0,-1,2715030,2715644 ENSG0184895,Y,protein_coding, 2715030,2715644,ENSE1299380,0,-1,2715030,2715644 ENSG0129824,Y,protein_coding, 2769527,2769668,ENSE1494579,0,1,2769666,2769668 ENSG0129824,Y,protein_coding, 2770206,2770283,ENSE1159432,1,1,2770206,2770283 ENSG0129824,Y,protein_coding, 2772118,2772298,ENSE0891584,1,1,2772118,2772298 --- I did this from a query at http://www.ensembl.org/biomart/martview and downloaded the to a text file, but note that you should be able to download this directly to an R data.frame using the biomaRt package. These coordinates gets matched up to the coordinates in the probeset.csv file ... and so on and so on. HTH, Mark -- Mark Robinson Epigenetics Laboratory, Garvan Bioinformatics Division, WEHI e: m.robin...@garvan.org.au e: mrobin...@wehi.edu.au p: +61 (0)3 9345 2628 f: +61 (0)3 9347 0852 -- --~--~-~--~~~---~--~~ When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe from this group, send email to aroma-affymetrix-unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/aroma-affymetrix?hl=en -~--~~~~--~~--~--~---
[aroma.affymetrix] Re: changing colours in plotRle
Hi Sebastien. Note that plotRle() eventually makes a call to 'bxp' (in the graphics package that is loaded by default) and any/all arguments are passed on. Have a look at ?bxp for what you can specify. For example: [...] qamTr - QualityAssessmentModel(plmTr) plotRle(qamTr, boxwex=[something], boxfill=[something]) Cheers, Mark Hi, is there a way to change the colour of each individual bargraph when calling plotRle? I would like to make the colours correspond to test groups. thanks, Sebastien --~--~-~--~~~---~--~~ When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe from this group, send email to aroma-affymetrix-unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/aroma-affymetrix?hl=en -~--~~~~--~~--~--~---
[aroma.affymetrix] Re: Error: cannot allocate vector of size 692.8 Mb
Hi Sabrina. First of all, you are somewhat in unchartered waters here. I personally don't recommend using the 'full' CDF for FIRMA analysis. Others can disagree (and I would encourage some discussion about this ...), but my reasoning is that the majority of the probes in the full CDF are querying poorly annotated or predicted exonic regions of the genome. From what I've seen (on Human Exon 1.0 data), the probe intensities for these are mostly at background and could have an unsavoury effect on the PLM modelling ... and passing that on to FIRMA. By restricting to core/Ensembl probes, you lessen the effect of non-responding probes. From thinking about it (although I haven't actually done on my datasets), I would suggest a two-stage analysis: 1. PLM/FIRMA analysis on a set of probes in well-annotated regions. 2. Differential expression analysis on the remaining probes -- strongly differentially expressed ones may be indicative of transcripts/variants that are not covered in the well-annotated set. To answer your question below, the table of FIRMA values shouldn't actually be all that large, so this should be possible. One thing to check first of all is that you have a relatively clean R session. Do you have tables/data frames/objects in memory that are consuming a bunch of space? aroma.affymetrix is memory efficient, but it will need some room to work. I just ran an example here on the FULL CDF and it appears to tick along just fine, though I can see if the memory spikes. As always when reporting errors, it doesn't hurt to give the results of 'sessionInfo()', 'traceback()' ... and you could even set 'verbose=-40' (say) in your call to 'extractDataFrame' to see where it all goes wrong. Hope that helps. Cheers, Mark On 28/01/2009, at 3:45 AM, sabrina wrote: Hi, all: I am using FIRMA model for my Mouse Exon array analysis. I am testing the FULL annotation. After I used FirmaModel and fit it, I used the following code to extract Firma Scores: exFirma-extractDataFrame(firmaScore,addNames=TRUE); however, I got the following error: Error: cannot allocate vector of size 692.8 Mb In addition: Warning messages: 1: In getUnitGroupCellMap.FirmaFile(ce, ...) : Reached total allocation of 1535Mb: see help(memory.size) I used memory.size(), it gave me: 903.4713 and memory.limit() is: 1535.875 Does anyone have any suggestion to solve this problem? Thanks!!! Sabrina -- Mark Robinson Epigenetics Laboratory, Garvan Bioinformatics Division, WEHI e: m.robin...@garvan.org.au e: mrobin...@wehi.edu.au p: +61 (0)3 9345 2628 f: +61 (0)3 9347 0852 -- --~--~-~--~~~---~--~~ When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe from this group, send email to aroma-affymetrix-unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/aroma-affymetrix?hl=en -~--~~~~--~~--~--~---
[aroma.affymetrix] Re: exon array with technical replicates
Hi Sabrina. Thanks for digging into this. I had a quick look at that probeset (6824548) and everything seems alright to me. For example from the unix prompt, I get: unix88 516 % grep 6824548 MoEx-1_0-st-v1.na26.mm9.probeset.csv 4499490,chr14,-,51311295,51311393,4,6824548,268799,115885605,---,---,1,3,0,3,full,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,main 4897698,chr14,-,51300520,51300544,1,6824548,268797,115885601,---,---,1,1,0,1,core,0,0,1,1,0,0,0,0,1,1,0,0,0,0,0,0,0,1,0,0,0,0,main 5218980,chr14,-,51312156,51312189,4,6824548,268800,115885607,---,---,2,1,0,1,full,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,main 5466951,chr14,-,51313326,51313357,4,6824548,268801,115885611,---,---,3,1,0,1,full,0,0,0,0,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,main 5564205,chr14,-,51301330,51301394,4,6824548,268798,115885603,---,---,1,2,0,2,full,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,main So, just 1 'core' probe and 4 'full' probes. So, you'd find only the single probe in the extended (core+extended) CDF as well. If you went to the full CDF (core+extended+full), you'd have the 5. Hope that helps. ... getting back to your original query, it is probably worth throwing out the 'groups' (transcript clusters) that have just a single probeset before calling lmFit(). Cheers, Mark Hi, Mark: I did options, it gave me the exact display as what you showed to me.:) Here is some of my confusion: I checked the exFirma scores of these NAs to see what transcript_cluster_id they were assigned to so that I could check whether they do have 5000 probes assigned to it. One of them is : 6824548. I went back to the file I saved for # of probes per exon to check how many probes it has. To my surprise, it only has one probeset and one probes: 6824548.4897698 which matches the probeset_id (4897698). Then I went ahead to download the affymetrix annotation file, transcript.csv and probeset.csv. In the transcript.csv file, 6824548 has 17 probes in total, and in the probeset.csv, it shows that transcript_cluster_id : 6824548 has 5 probesets, 4897698 is one of them with only one probe assigned to it. I tried the extenedR1 cdf version, it gave me the same thing (in terms of # of probes per exon, using nProbesPerExon-readCdfNbrOfCellsPerUnitGroup(getPathname(cdf)) I am bit confused. It seemed to me that in the cdf file, some of the probesets for the gene were missing for this transcript_cluster_id. I wonder if you can help me out on this . I noticed that you created the cdf files so you are the right person to ask :) The code I generated the # of probes per exon is: nProbesPerExon-readCdfNbrOfCellsPerUnitGroup(getPathname(cdf)) nProbesPerExonVector-unlist(nProbesPerExon) Thanks and Have a great weekend Sabrina On Jan 22, 5:18 pm, Mark Robinson mrobin...@wehi.edu.au wrote: Hi Sabrina. See below. On 23/01/2009, at 3:17 AM, sabrina wrote: Hi, Mark: Thanks for the suggestions. I think I will go with one of the replicates for now, just to make things simple, later on I will deal with the replicates. Now, here is another problem I have. I used the following code to generate the firma score: firma -FirmaModel(plm); fit(firma,verbose=verbose) firmaScore- getFirmaScores(firma); exFirma-extractDataFrame(firmaScore,addNames=TRUE); exFirma[,6:ncol(exFirma)]-log2(exFirma[,6:ncol(exFirma)]) then when I use limma fit, it gave me warnings: Warning message: In lmFit(exFirma[, 6:ncol(exFirma)], mm) : Some coefficients not estimable: coefficient interpretation may vary. It turned out that there are several rows of exFirma (even before the log2) were NAs. But when I check the overall exon expressions (use ExonRmaPlm with para: mergeGroups=FALSE), for that specific exon (I take the group name as the exon id here), they have real values. I wonder where went wrong, perhaps fit(firma) step? I'll make a wager that this is due to a small number of probesets that have a large (500 say) probes assigned to them. You are using the Mouse Exon array and I know there are some of these probesets in there. In the interest of time, aroma.affymetrix changes over to median polish (faster), or skips the probeset altogether, depending on how large. You can see the default settings in options(). For me (and probably you as well), it is: options(aroma.affymetrix.settings) [...] $aroma.affymetrix.settings$models $aroma.affymetrix.settings$models$RmaPlm $aroma.affymetrix.settings$models$RmaPlm$medianPolishThreshold [1] 500 6 $aroma.affymetrix.settings$models$RmaPlm$skipThreshold [1] 5000 1 So, I would just remove these rows before you use lmFit(). Cheers, Mark Thanks! Sabrina On Jan 20, 7:48 pm, Mark Robinson mrobin...@wehi.edu.au wrote: Hi Sabrina. Do you have biological replicates of some samples and technical replicates of others? Or, just technical replicates of everything? My experiment has two groups, one has 5
[aroma.affymetrix] Re: exon array with technical replicates
Hi Sabrina. See below. On 23/01/2009, at 3:17 AM, sabrina wrote: Hi, Mark: Thanks for the suggestions. I think I will go with one of the replicates for now, just to make things simple, later on I will deal with the replicates. Now, here is another problem I have. I used the following code to generate the firma score: firma -FirmaModel(plm); fit(firma,verbose=verbose) firmaScore- getFirmaScores(firma); exFirma-extractDataFrame(firmaScore,addNames=TRUE); exFirma[,6:ncol(exFirma)]-log2(exFirma[,6:ncol(exFirma)]) then when I use limma fit, it gave me warnings: Warning message: In lmFit(exFirma[, 6:ncol(exFirma)], mm) : Some coefficients not estimable: coefficient interpretation may vary. It turned out that there are several rows of exFirma (even before the log2) were NAs. But when I check the overall exon expressions (use ExonRmaPlm with para: mergeGroups=FALSE), for that specific exon (I take the group name as the exon id here), they have real values. I wonder where went wrong, perhaps fit(firma) step? I'll make a wager that this is due to a small number of probesets that have a large (500 say) probes assigned to them. You are using the Mouse Exon array and I know there are some of these probesets in there. In the interest of time, aroma.affymetrix changes over to median polish (faster), or skips the probeset altogether, depending on how large. You can see the default settings in options(). For me (and probably you as well), it is: options(aroma.affymetrix.settings) [...] $aroma.affymetrix.settings$models $aroma.affymetrix.settings$models$RmaPlm $aroma.affymetrix.settings$models$RmaPlm$medianPolishThreshold [1] 500 6 $aroma.affymetrix.settings$models$RmaPlm$skipThreshold [1] 50001 So, I would just remove these rows before you use lmFit(). Cheers, Mark Thanks! Sabrina On Jan 20, 7:48 pm, Mark Robinson mrobin...@wehi.edu.au wrote: Hi Sabrina. Do you have biological replicates of some samples and technical replicates of others? Or, just technical replicates of everything? My experiment has two groups, one has 5 samples (biological replicates, that is 5 mice from one strain), and the other has 4 samples. Among 5 samples of the first group, there are two samples hybridized twice to two arrays, so I have 4 arrays for that two samples ( That is what I meant as technical replicate). Does that make sense? OK, now I get it. Thanks. I suspect it would be difficult to justify adding a bunch of extra (adhoc) steps into the FIRMA pipeline. I don't have a full understanding of your experiment, but what about just dealing with it when you operate on the FIRMA scores? When you say average on plm, I assume this means an average of the chip effects for those two samples? Yes, that is what I meant. By averaging chipEffects, I actually average the gene signals from the two arrays. You could fit a PLM that estimates a single chip effect for those two samples and use that for calculating FIRMA scores. Do you mean to fit a plm for these two arrays for one sample separately from the other samples? If I just fit plm for these two arrays (say sample 1),will it estimate different probe affinity? If it does , then these chipEffects (gene signals to estimate FIRMA score) won't be compatible , am I correct? Thanks! I'm not suggesting to fit multiple PLMs for the same gene and somehow combine them. What I'm suggesting is a single PLM, but where there is only 1 chip effect parameter for those 2 samples. From a design matrix point of view, this is conceptually straightforward. On the top of my head though, I don't know how to get the 'preprocessCore' code (this is used under the hood in aroma.affymetrix to do the fitting) to fit such a model. It may have to be a one-off. Another alternative is to average the FIRMA scores (subsequent to the standard PLM fitting) for these 2 samples and do a 4 versus 4 comparison to look for changes in FIRMA scores. And yet another alternative is to ignore this altogether. Its unlikely (maybe? feel free to disagree) that 1 technical replicate amongst 8 biological replicates would cause you to underestimate the variability so much as to significantly overstate the changes you see ... but thats just a hunch. Presumably, you'd also be validating the major discoveries that you make. Hope that helps. Mark -- Mark Robinson Epigenetics Laboratory, Garvan Bioinformatics Division, WEHI e: m.robin...@garvan.org.au e: mrobin...@wehi.edu.au p: +61 (0)3 9345 2628 f: +61 (0)3 9347 0852 -- --~--~-~--~~~---~--~~ When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google
[aroma.affymetrix] Re: Can't read the CDF file
Anbarasu. In the 'Setting up annotation files' page, it says Aroma.affymetrix searches for CDF files in the annotationData/ directory of the current working directory ... so, create the annotationData directory within your current working directory, not within the directory where the package is installed to your machine. For example, in my setup: getwd() [1] /Users/mrobinson/projects/microarray/exon dir(/Users/mrobinson/projects/microarray/exon/annotationData/ chipTypes/HuEx-1_0-st-v2/) [1] HuEx-1_0-st-v2,coreR3,A20071112,EP,monocell.CDF [2] HuEx-1_0-st-v2,coreR3,A20071112,EP.cdf (the monocell file is created later on) The following command then works: library(aroma.affymetrix) cdf - AffymetrixCdfFile$byChipType(HuEx-1_0-st- v2,tags=coreR3,A20071112,EP) Cheers, Mark On 22/01/2009, at 4:34 AM, anbarasu wrote: Hi, It's me again. I have just read 'Setting up annotation files '. I have placed the annotation file under: /Library/Frameworks/R.framework/ Resources/library/aroma.affymetrix/annotationData/chipTypes/HuEx-1_0- st-v2 and still getting the same error message. Any suggestions? sessionInfo() R version 2.8.1 (2008-12-22) i386-apple-darwin8.11.1 locale: en_GB.UTF-8/en_GB.UTF-8/C/C/en_GB.UTF-8/en_GB.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] aroma.affymetrix_1.0.0 aroma.apd_0.1.3 R.huge_0.1.6 affxparser_1.14.2 aroma.core_1.0.0 aroma.light_1.9.2 [7] digest_0.3.1 matrixStats_0.1.3 R.rsp_0.3.4R.cache_0.1.7 R.utils_1.1.3 R.oo_1.4.6 [13] R.methodsS3_1.0.3 On Jan 21, 4:42 pm, anbarasu anbarasu...@gmail.com wrote: Hi All, I have downloaded the CDF file 'HuEx-1_0-st- v2,coreR3,A20071112,EP.cdf ' and trying to load it using cdf - AffymetrixCdfFile$byChipType(chipType, tags=coreR3,A20071112,EP) I am getting an error mssg: Error in list(`AffymetrixCdfFile$byChipType(chipType, tags = coreR3,A20071112,EP)` = environment, : [2009-01-21 16:32:46] Exception: Could not create AffymetrixCdfFile object. No annotation chip type file with that chip type found: HuEx-1_0-st-v2 at throw(Exception(...)) at throw.default(Could not create , class(static)[1], object. No annotation chip type file with that chip type found: , chipType) at throw(Could not create , class(static)[1], object. No annotation chip type file with that chip When I tried dir(), the file is listed. So, what's going wrong? dir() [1] A0017022.CEL A0017023.CEL A0017024.CEL [4] A0017025.CEL A0017026.CEL A0017027.CEL [7] HuEx-1_0-st-v2,coreR3,A20071112,EP.cdf Any suggestion would be greatly appreciated. Thanks. Anbarasu -- Mark Robinson Epigenetics Laboratory, Garvan Bioinformatics Division, WEHI e: m.robin...@garvan.org.au e: mrobin...@wehi.edu.au p: +61 (0)3 9345 2628 f: +61 (0)3 9347 0852 -- --~--~-~--~~~---~--~~ When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe from this group, send email to aroma-affymetrix-unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/aroma-affymetrix?hl=en -~--~~~~--~~--~--~---
[aroma.affymetrix] Re: exon array with technical replicates
Hi Sabrina. Do you have biological replicates of some samples and technical replicates of others? Or, just technical replicates of everything? My experiment has two groups, one has 5 samples (biological replicates, that is 5 mice from one strain), and the other has 4 samples. Among 5 samples of the first group, there are two samples hybridized twice to two arrays, so I have 4 arrays for that two samples ( That is what I meant as technical replicate). Does that make sense? OK, now I get it. Thanks. I suspect it would be difficult to justify adding a bunch of extra (adhoc) steps into the FIRMA pipeline. I don't have a full understanding of your experiment, but what about just dealing with it when you operate on the FIRMA scores? When you say average on plm, I assume this means an average of the chip effects for those two samples? Yes, that is what I meant. By averaging chipEffects, I actually average the gene signals from the two arrays. You could fit a PLM that estimates a single chip effect for those two samples and use that for calculating FIRMA scores. Do you mean to fit a plm for these two arrays for one sample separately from the other samples? If I just fit plm for these two arrays (say sample 1),will it estimate different probe affinity? If it does , then these chipEffects (gene signals to estimate FIRMA score) won't be compatible , am I correct? Thanks! I'm not suggesting to fit multiple PLMs for the same gene and somehow combine them. What I'm suggesting is a single PLM, but where there is only 1 chip effect parameter for those 2 samples. From a design matrix point of view, this is conceptually straightforward. On the top of my head though, I don't know how to get the 'preprocessCore' code (this is used under the hood in aroma.affymetrix to do the fitting) to fit such a model. It may have to be a one-off. Another alternative is to average the FIRMA scores (subsequent to the standard PLM fitting) for these 2 samples and do a 4 versus 4 comparison to look for changes in FIRMA scores. And yet another alternative is to ignore this altogether. Its unlikely (maybe? feel free to disagree) that 1 technical replicate amongst 8 biological replicates would cause you to underestimate the variability so much as to significantly overstate the changes you see ... but thats just a hunch. Presumably, you'd also be validating the major discoveries that you make. Hope that helps. Mark --~--~-~--~~~---~--~~ When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe from this group, send email to aroma-affymetrix-unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/aroma-affymetrix?hl=en -~--~~~~--~~--~--~---
[aroma.affymetrix] Re: exon array with technical replicates
Hi Sabrina. Tricky question. Do you have biological replicates of some samples and technical replicates of others? Or, just technical replicates of everything? I suspect it would be difficult to justify adding a bunch of extra (adhoc) steps into the FIRMA pipeline. I don't have a full understanding of your experiment, but what about just dealing with it when you operate on the FIRMA scores? When you say average on plm, I assume this means an average of the chip effects for those two samples? You could fit a PLM that estimates a single chip effect for those two samples and use that for calculating FIRMA scores. Hope that helps. Cheers, Mark On 20/01/2009, at 9:09 AM, sabrina wrote: Hi, all: I am working on Affy Mouse Exon Array . Because of the experiment design and quality of the hybridization, we have two arrays hybridized from one mouse (same biological sample). I assume that I should treat these two arrays as tehcnical duplicates. If that is the case, I could do background correction, normalization and summary separately for these two arrays,(RMA, and ExonRmaPlm), then before I use FIRMA to get firma scores, can I just do average on plm of these two arrays? But then what do I feed in FIRMAModel? ( The default one is just plm results directly from ExonRmaPlm) .Or any suggestions about how to deal with this situation? My goal is to find novel splicing events, but right now I am just using core annotation to try it out. Thanks! Sabrina -- Mark Robinson Epigenetics Laboratory, Garvan Bioinformatics Division, WEHI e: m.robin...@garvan.org.au e: mrobin...@wehi.edu.au p: +61 (0)3 9345 2628 f: +61 (0)3 9347 0852 -- --~--~-~--~~~---~--~~ When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe from this group, send email to aroma-affymetrix-unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/aroma-affymetrix?hl=en -~--~~~~--~~--~--~---
Re: Reproducing RMA with Gene ST data (Was: Re: [aroma.affymetrix] Re: How do you analyze Gene ST Data?)
Hi Andy. I don't think you've gotten a response on this. Sorry for the delay -- holidays. Some comments below. On 31/12/2008, at 1:18 AM, Andy_Paparountas wrote: Hi all , I really find this conversation very interesting. I am trying to analyze a set of 3 treatment and 3 control samples of MoGeneSt10 array. Thus far with the code pwhite shared I was able to do RMA Background correction , quantile normalization and got QC , RLE , NUSE , density plots. Q1. Is there any code to get similar results to affyQCreport? or even how can we use affyQCreport to get QC from these arrays? As far as I know, affyQCreport has not been ported to aroma.affymetrix. I usually make due with RLE, NUSE and density plots for my QC. If there is something specific in affyQCreport that you like, it may be easy to port over. Maybe you'd consider doing the implementation. Q2. I tried to export my data to an AffyBatch object in order to play around with older methods ab - extractAffyBatch(cs) but I got a Warning message: CDF enviroment package 'mogene10stv1cdf' not installed. The 'affy' package will later try to download from Bioconductor and install it. of course 'mogene10stv1cdf' does not exist as far as I know , instead we should use mogene10st.db. But what should the exact code be to connect the normalized data to the annotation contained inside mogene10st.db ? A couple points here. First, it looks like Bioconductor is not currently supporting the 'affy' way of doing things for these new (1.0 ST) chips. If you skim the BioC mailing list archives, the suggestion is to use the 'oligo' package or 'xps'. But, then you are outside the world of AffyBatch objects. So, it doesn't make sense to use aroma.affymetrix's 'extractAffyBatch' for these chips. Second, I believe 'mogene10st.db' only really maps the Gene 1.0 ST identifiers to GO attributes, UNIGENE ids, chromosome locations and a whole host of other things. I don't think the physical probe locations are present within 'mogene10st.db', so it is not a replacement for the CDF file/environment. Hope that helps. Mark I would really appreciate some help here :) Thanks all. On 5 ΔΡκ, 17:43, pwhite...@gmail.com wrote: Hi Mark, Thanks for adding flavor=oligo to RmaPlm. I verified it with the new release and the HGU133Plus2 data I have and it all looks good. Pairs plots are attached. Thanks, Peter On Thu, Dec 4, 2008 at 5:41 PM, Mark Robinson mrobin...@wehi.edu.au wrote: Thanks Peter. Perhaps you can repeat this comparison after the next release (this will be very soon!) and split the aroma.affymetrix comparison into: - aroma.affy.oligo -- with RmaPlm(csN,flavor=oligo) - aroma.affy.affyPLM -- with flavor=affyPLM (as you've done already) Perhaps the best way to look at all of this at once is with a single pairs() plot. Cheers, Mark On 05/12/2008, at 9:01 AM, pwhite...@gmail.com wrote: Dear Mark and Henrik, I wanted to confirm that your summary was correct regarding the different flavors for probeset summarization. I downloaded the MAQC HG_U133_Plus_2 array data from the MAQC website: http://edkb.fda.gov/MAQC/MainStudy/upload/MAQC_AFX_123456_120CELs.zip I then ran the analysis of the arrays from site 1, using just the A and B samples, with aroma.affymetrix, affy, affyPLM and oligo (see below for the complete code I used to do this). Basically the aroma.affymetrix and affyPLM data was essentially identical. The affy and oligo data was also essentially identical. As observed with the Gene ST array data there were significant differences between aroma.affymetrix and affy or oligo. Plots are attached. The Gene ST arrays do not have any MM probes - as we are using RMA rather than GCRMA this should not have affected anything. Thanks, Peter #OLIGO ANALYSIS library(pd.hg.u133.plus.2) library(pdInfoBuilder) fn - dir(G:\\BGC_EXPERIMENTS\\MAQC_Data\\HG- U133_Plus_2,CEL,full=T)[1:10] raw.oligo-read.celfiles(filenames=fn,pkgname=pd.hg.u133.plus.2) eset.oligo-rma(raw.oligo) data.oligo-exprs(eset.oligo) #AFFY ANALYSIS library(affy) fn - dir(G:\\BGC_EXPERIMENTS\\MAQC_Data\\HG- U133_Plus_2,CEL,full=T)[1:10] raw.affy - ReadAffy(filenames=fn) eset.affy - rma(raw.affy) data.affy - exprs(eset.affy) #AFFY PLM ANALYSIS library(affyPLM) fn - dir(G:\\BGC_EXPERIMENTS\\MAQC_Data\\HG- U133_Plus_2,CEL,full=T)[1:10] raw.affyPLM - ReadAffy(filenames=fn) fit.affyPLM - fitPLM(raw.affyPLM, verbos=9) data.affyPLM - coefs(fit.affyPLM) #Analysis of MAQC on Human U113 Plus 2 setwd(G:\\BGC_EXPERIMENTS\\MAQC_Analysis) library(aroma.affymetrix) prefixName - MAQC_Data chip1 - HG-U133_Plus_2 cdf - AffymetrixCdfFile$fromChipType(HG-U133_Plus_2) cs - AffymetrixCelSet$byName(prefixName, cdf=cdf, chipType=chip1) pattern - AFX_1_[AB] idxs - grep(pattern, getNames(cs)) cs - extract(cs, idxs) bc - RmaBackgroundCorrection(cs) csBC - process(bc) qn
[aroma.affymetrix] Re: SNP 6.0 processing
Hi Anguraj. You'll need to give more information for us to be able to help you. First of all, give us the output of 'sessionInfo()' (are you using the latest version?) and perhaps what you can do is try and repeat your sequence of commands from a *new* R session, call 'library(aroma.affymetrix)' and then all of the commands leading up to this error. Start with that. Cheers, Mark On 17/12/2008, at 4:50 AM, angu wrote: Hi, I am trying to process SNP6.0 CEL files to get CN data. I am getting the following error. Could anybody help me to figure it out? plm - AvgCnPlm(csC, mergeStrands=TRUE, combineAlleles=TRUE, shift= +300) fit(plm, verbose=verbose) Error in UseMethod(getCdf) : no applicable method for getCdf Calls: fit - fit.ProbeLevelModel - getCdf Execution halted I haven't pasted all of my code. I performed this as per the instructions provided in pages. Thanks, Anguraj -- Mark Robinson Epigenetics Laboratory, Garvan Bioinformatics Division, WEHI e: m.robin...@garvan.org.au e: mrobin...@wehi.edu.au p: +61 (0)3 9345 2628 f: +61 (0)3 9347 0852 -- --~--~-~--~~~---~--~~ When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe from this group, send email to aroma-affymetrix-unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/aroma-affymetrix?hl=en -~--~~~~--~~--~--~---
[aroma.affymetrix] Re: Batch Effects
Hi Sarah. There is currently nothing readily available for batch effect adjustment in aroma.affymetrix (that I know of). But, there are ways of doing it that wouldn't be too difficult to do as a once-off. What did you have in mind? ... however, even with a batch adjustment, it may still be difficult to get reliable results on such an experiment. Cheers, Mark On 15/12/2008, at 10:57 PM, srgrey...@gmail.com wrote: Hi All, I have six Rat Exons arrays that were done about two years apart. The arrays done at each timepoint normalize very well to each other, but I cannot get decent normalization between batches. Is there a way to adjust for batch effects in aroma.affy that can be incorporated into an exon array analysis? Thank you, Sarah -- Mark Robinson Epigenetics Laboratory, Garvan Bioinformatics Division, WEHI e: m.robin...@garvan.org.au e: mrobin...@wehi.edu.au p: +61 (0)3 9345 2628 f: +61 (0)3 9347 0852 -- --~--~-~--~~~---~--~~ When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe from this group, send email to aroma-affymetrix-unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/aroma-affymetrix?hl=en -~--~~~~--~~--~--~---
[aroma.affymetrix] Re: FIRMA score
I assume that you referred me to the probeset annotation csv file , not the transcript annotation csv file. :) I did get the exon coordinates! it seemed to me that the probeset_id in the probeset annotation file is equivalent to groupName of exFirma. Yes! The transcript CSV file may be useful for other things, but indeed not for exon coordinates. In aroma.affymetrix terminology for exon arrays, unit=transcript_cluster_id, group=probeset_id and cell=probe. One question I have is in the exon array plot from GenomeGraphs, the unrData is the probe level data, in other words, in my case, I should use the normalized data, csN, is that correct? If you look closely at the Purdom paper, there are a few things you may want to plot in the context of Ensembl annotation. The normalized data is the obvious one, which you have linked to an object 'csN'. In addition, you may want to plot the residuals: rs - calculateResiduals(plmTr) # plmTr is the fitted ExonRmaPlm with mergeGroups=TRUE d - extractMatrix(rs,cells=...) ... or for example you may want to plot the raw data, adjusted by the probe effects. For example, RMA fits the model Y_{ij} = a_i + b_j (Y_{ij} = normalized data, a_i = chip effects, b_j = probe effects), you may be interested in plotting: Y_{ij} - \hat{b}_j ... i.e. raw data, adjusted by the estimated probe effects. another question is graph related though. Since I have two groups among my 9 samples, when I plot exon array, is there any way to use different color coding for these samples on the same graph? Thanks! Short answer: library(GenomeGraphs) ?ExonArray-class Long answer: create your ExonArray object and use DisplayPars slot to specify colour and line width: ea1-new(ExonArray, intensity = d, probeStart = ..., probeEnd=..., probeId = ..., nProbes = ..., dp = DisplayPars(color = col, lwd=lwd, mapColor = dodgerblue2,plotMap=TRUE)) where 'col' and 'lwd' are either length 1 (in which case all lines get the same width and colour) or vectors of the length of the number of samples ... Thanks for all these questions Sabrina. This will help make a nice vignette ... when I get time :) Unless you'd be interested in summarizing your approach! :) Cheers, Mark Sabrina On Dec 9, 5:02 pm, Mark Robinson [EMAIL PROTECTED] wrote: Hi Sabrina. Great work! See below. On 10/12/2008, at 5:27 AM, sabrina wrote: Hi, Mark! Thank you so much! I think I pretty much figured out how to get the gene level, exon level expressions and comparison done. I checked the GenomeGraphs as you suggested. I tried the code in the exon array section, and it worked fine. So here is the question related to my data set. In order to plot exon data, I used the following code to get exon summary (intensities) which is pbsetSummLog2: plmNoMerge- ExonRmaPlm(csN, mergeGroups=FALSE) fit(plmNoMerge) readUnits(plmNoMerge,units=1) chpNoMerge-getChipEffectSet(plmNoMerge) pbsetSumm-extractMatrix(chpNoMerge,returnUgcMap=TRUE) pbsetSummLog2-log2(pbsetSumm); pbsetNames-readCdfGroupNames(getPathname(getCdf(chpNoMerge)), unit=unique(attr(pbsetSumm,unitGroupCellMap)[,unit])) rownames(pbsetSumm)-unlist(pbsetNames) rownames(pbsetSummLog2)-unlist(pbsetNames) Because each significant exon that was detected by FIRMA is associated with one gene, and that gene has several exons, therefore, I used the following to find all exons associated to that gene: (x is the result from topTable) temp-grep(exFirma[x$ID,1][1],exFirma[,1]); gp_temp-exFirma[temp,2] gene1-pbsetSummLog2[temp,] which gives me: array1 array 2 ... 4308385 6.3387846.896304 4376965 1.9731712.272406 I know that 430835 is the groupName (aka exon id), but I am not sure where to find the start and end position of these individual exons, can you show me how to do it? The reason I asked this is because in makeExonArray, I need the probeStart and probeEnd positions. Thanks!!! Well, you can get the probeset start and end positions from the 'NetAffx Annotation Files' from Affymetrix. For human, this would be at: http://www.affymetrix.com/products_services/arrays/specific/ exon.affx... I think you said you were using mouse, so you can find this at: http://www.affymetrix.com/products_services/arrays/specific/ mouse_exo... Find the 'Current NetAffx Annotation Files' section and download the csv.zip file. Just to have a quick peak at a few columns of this file, if I run a unix tool called awk on the CSV file you get: awk '{FS=,; print $7,$1,$2,$3,$4,$5,$16}' HuEx-1_0-st- v2.na25.hg18.probeset.csv | grep -v ^ | more transcript_cluster_id probeset_id seqname strand start stop level ... 2315373 2315374 chr1 + 742655 742719 core 2315373 2315375 chr1 + 742869 743231 core 2315373 2315376 chr1 + 743293 743434 core 2315373 2315377 chr1 + 744094 744979 core 2315380
Re: Reproducing RMA with Gene ST data (Was: Re: [aroma.affymetrix] Re: How do you analyze Gene ST Data?)
the mailing list? Easy! Now to get the same data using the Affy packages: BIOCONDUCTOR AFFY You first need to create or download your mogene10stv1cdf library from the Affy unsupported CDF file (https://stat.ethz.ch/pipermail/bioc- devel/2007-October/001403.html has some detail on how to do this). However, as Mark Robinson pointed out there are potential issues with using the Affy unsupported CDF files. See the following for some details: https://stat.ethz.ch/pipermail/bioconductor/2007-November/020188.html library(affy) AffyRaw - ReadAffy() AffyEset - rma(AffyRaw) data.affy - exprs(AffyEset) BIOCONDUCTOR OLIGO Download all the required Affy annotation files to your Mouse Gene v1 ST array directory: http://www.affymetrix.com/support/technical/byproduct.affx?product=mogene-1_0-st-v1 setwd(P:\\ANNOTATION\\AffyAnnotation\\Mouse\\MoGene-1_0-st-v1) library(pdInfoBuilder) pgfFile - MoGene-1_0-st-v1.r3.pgf clfFile - MoGene-1_0-st-v1.r3.clf transFile - MoGene-1_0-st-v1.na26.mm9.transcript.csv probeFile - MoGene-1_0-st-v1.probe.tab pkg - new(AffyGenePDInfoPkgSeed, author=Peter White, email=[EMAIL PROTECTED], version=0.1.3, genomebuild=UCSC mm9, July 2007, chipName=MoGene10stv1, manufacturer=affymetrix, biocViews=AnnotationData, pgfFile=pgfFile, clfFile=clfFile, transFile=transFile, probeFile=probeFile) makePdInfoPackage(pkg, destDir=.) #This takes a little while to make the Package. Once created you will need to install the package from the Windows DOS prompt (navigate to the annotation directory with the newly created pd package to be installed): R CMD INSTALL pd.mogene.1.0.st.v1\ Note for this to work you need RTools and you Path variable set up correctly as described at: http://cran.r-project.org/doc/manuals/R-admin.html#The-Windows-toolset) Now return to R, set the working directory to your CEL file directory: library(pd.mogene.1.0.st.v1) library(oligo) OligoRaw-read.celfiles(filenames=list.celfiles()) OligoEset-rma(OligoRaw) data.oligo-exprs(OligoEset) COMPARING THE TWO DATASETS Here is what I did to compare the data generate by affy, oligo and aroma.affymetrix: dim(data.aroma) [1] 3551216 dim(data.affy) [1] 3551216 length(grep(TRUE, rownames(data.affy)==rownames(data.aroma))) [1] 35512 FYI, sum(rownames(data.affy)==rownames(data.aroma)) gives you the same. Replacing sum() with summary() will also work. The output from both the affy rma and aroma.affymetrix methods retains the same order of probes and cel files so the two files can be compared directly. That is probably because they work of the same CDF, but you should never rely on this/assume that this is always the case. If you do, you should at least verify that the unit names (and group names) match. However, dim(data.oligo) [1] 3555716 The normalized data file from the Oligo package includes an additional 45 Transcript IDs (there's no annotation on what these are but they contain anywhere from 9 to 489 probes per probeset). For the record, would you mind posting the names of these additional 45 units here? (I'm sure someone else will search the web later and find this thread very helpful). Fixed this problem as follows: o - match(rownames(data.aroma), rownames(data.oligo)) data.oligo - data.oligo[o,] length(grep(TRUE, rownames(data.affy)==rownames(data.oligo))) [1] 35512 length(grep(TRUE, rownames(data.aroma)==rownames(data.oligo))) [1] 35512 Finally, there was one more issue with the aroma data. All elements in the 18th row of the dataset were flagged Na. This transcript ID for this probeset was 10338063. Looking at the Affy annotation this appears to be a control probeset with 6,515 probes. Could it have been flagged Na by aroma.affymetrix becuase of this (it was OK with the oligo and affy rma analyses)?? Nicely spotted. Voila'. From aroma.affymetrix's NEWS file: Version: 0.9.0 [2008-02-29] o TIME OPTIMIZATION: Now RmaPlm and ExonRmaPlm turn to median polish if there are more than 500 cells *and* 6 arrays in the unit group. Option: aroma.affymetrix.settings$models$RmaPlm $medianPolishThreshold. Moreover, if the unit group is ridiculously large (5000 cells), the unit group is skipped and all returned estimates are NAs. Option: aroma.affymetrix.settings$models$RmaPlm$skipThreshold. e- (data.aroma - data.affy) mean(as.vector(e^2), na.rm=T) [1] 0.1253547 sd(as.vector(e^2), na.rm=T) [1] 0.2717275 e - (data.aroma - data.oligo) mean(as.vector(e^2), na.rm=T) [1] 0.1239203 sd(as.vector(e^2), na.rm=T) [1] 0.2653593 As you can see the data does not pass your mean and sd cutoffs of 0.0001. e- (data.affy - data.oligo) mean(as.vector(e^2), na.rm=T) [1] 0.001484371 sd(as.vector(e^2), na.rm=T) [1] 0.002523521 The difference between the affy and oligo analysis is much less striking. To visualize these differences I did the following plot, as an example I am just showing the data from