Re: [aroma.affymetrix] mufColumns on genes with more than 500 probes across the gene

2015-06-05 Thread Mark Robinson
Hi Iain,

Can you send me a reproducible example, plus the output of sessionInfo() and 
traceback() after the error, please?  It'll probably be too large for email, so 
maybe use Dropbox or dropsend ..

My guess is that the units with 500 probes actually have NAs.  Maybe you can 
check this also beforehand?

Best, Mark


On 04.06.2015, at 16:07, Iain iaingallag...@gmail.com wrote:

 Hi
 
 We're trying to calculate the difference in residuals between two groups and 
 generate mufScores (FIRMAGene) on those residual differences. Calculating the 
 differences goes smoothly:
 
 rsu_diff - lapply(unlist(rsu, use.names=FALSE), byrow=FALSE, 
 ncol=length(unique(cls)))
 
 where cls is a grouping variable (e.g. cls - c('A', 'A, 'B', 'B')
 
 Next we apply the mufColumns function to the elements of the rsu_diff object.
 
 mufScores - lapply(rsu_diff[w], FUN = function(u), c(mufColumns(u)))
 
 w is an indicator variable that only keeps genes (this is gene level 
 analysis) if the gene has more than 3 probes and less than some other number. 
 If we set the upper limit of w to 500 we get the following error.
 
 Error in mufC(x): Nan/NA/Inf in foreign function call
 
 Having looked at the C code for the relevant function (mps.c) I can only see 
 one line where that could cause this:
 
 x[count]=sum/sqrt(j-i+1.);
 
 Can anyone shed any light on why we can't run mufColumns if we select genes 
 with more than 500 probes over the gene?
 
 Thanks,
 
 Iain
 
 
 
 
 -- 
 -- 
 When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
 version of the package, 2) to report the output of sessionInfo() and 
 traceback(), and 3) to post a complete code example.
  
  
 You received this message because you are subscribed to the Google Groups 
 aroma.affymetrix group with website http://www.aroma-project.org/.
 To post to this group, send email to aroma-affymetrix@googlegroups.com
 To unsubscribe and other options, go to http://www.aroma-project.org/forum/
 
 --- 
 You received this message because you are subscribed to the Google Groups 
 aroma.affymetrix group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to aroma-affymetrix+unsubscr...@googlegroups.com.
 For more options, visit https://groups.google.com/d/optout.

--
Prof. Dr. Mark Robinson
Statistical Bioinformatics Group, UZH
http://ow.ly/riRea






-- 
-- 
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
aroma.affymetrix group with website http://www.aroma-project.org/.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe and other options, go to http://www.aroma-project.org/forum/

--- 
You received this message because you are subscribed to the Google Groups 
aroma.affymetrix group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to aroma-affymetrix+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [aroma.affymetrix] which version cdf should used when apply FIRMAGene

2013-01-06 Thread Mark Robinson
Hi Zaiwei,


On Sun, Dec 16, 2012 at 2:22 AM, zhouzaiwei zhouzai...@163.com wrote:

 Hi ,   I want to apply FIRMAGene to analysis differential splicing events
 of
 hugene_1.0_st array and have red  article (Robinson  Speed, 2007) and
 script(http://bioinf.wehi.edu.au/folders/firmagene/sup3_04feb2010.R),I
 want
 to know which version of cdf file should be used?HuGene-1_0-st-v1,r3.cdf or
 HuGene-1_0-st-v1,Ensembl,exon.cdf or something else?



You can use either CDF.  The latter was used for the BMC Bfx paper, since
we used Ensembl annotation.  If you want to use the former, you would want
to use the Affymetrix identifiers.

Other CDFs are possible … as we say in the paper: To facilitate
alternative splicing analysis, probe collections are organized in a
gene-centric fashion, so that probes from all known isoforms for a gene can
be analyzed by a single framework (i.e. fit with the RMA model).  So, it
just requires the CDF to be organized correctly.

Hope that helps.

Best regards, Mark






 --
 View this message in context:
 http://aroma-affymetrix.967894.n3.nabble.com/which-version-cdf-should-used-when-apply-FIRMAGene-tp4024986.html
 Sent from the aroma.affymetrix mailing list archive at Nabble.com.

 --
 When reporting problems on aroma.affymetrix, make sure 1) to run the
 latest version of the package, 2) to report the output of sessionInfo() and
 traceback(), and 3) to post a complete code example.


 You received this message because you are subscribed to the Google Groups
 aroma.affymetrix group with website http://www.aroma-project.org/.
 To post to this group, send email to aroma-affymetrix@googlegroups.com
 To unsubscribe and other options, go to
 http://www.aroma-project.org/forum/


-- 
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
aroma.affymetrix group with website http://www.aroma-project.org/.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe and other options, go to http://www.aroma-project.org/forum/


[aroma.affymetrix] Re: How to extract raw probe intensity from .CEL file

2011-08-09 Thread Mark Robinson

Perhaps something like this is what you want (note: different chip to
what you are using)?

df - readDataFrame(getCdf(cs), verbose=-80)
[...snip...]
head(df)
 unit unitName   unitType unitDirection unitNbrOfAtoms group groupName
11  7892501 expression sense  4 1
7892501
21  7892501 expression sense  4 1
7892501
31  7892501 expression sense  4 1
7892501
41  7892501 expression sense  4 1
7892501
52  7892502 expression sense  4 1
7892502
62  7892502 expression sense  4 1
7892502
 groupDirection groupNbrOfAtomscell   x   y pbase tbase indexPos
atom
1  sense   4  116371 870 110 C G
00
2  sense   4  943979  28 899 A T
11
3  sense   4  493089 638 469 T A
22
4  sense   4  907039 888 863 A T
33
5  sense   4 1033309 108 984 T A
00
6  sense   4  653512 411 622 T A
11

I'm not sure what object you have in mind when it comes to a probe-
intensity pair, but this should give you all the info you might want
(e.g. cell index, x/y physical location).

HTH,
Mark


On Aug 9, 5:45 pm, Pierre Neuvial pie...@stat.berkeley.edu wrote:
 Hi,

 Have you tried using extractAffyBatch, which is documented 
 here:http://aroma-project.org/howtos/extractAffyBatch?
 As far as I understand you will need the Bioconductor annotation
 package corresponding to your chip type to be installed, ie

  source(http://www.bioconductor.org/biocLite.R;)
  biocLite(hgu133plus2cdf)

 This is discussed in this 
 thread:http://groups.google.com/group/aroma-affymetrix/browse_thread/thread/...

 Pierre

 On Tue, Aug 9, 2011 at 4:34 AM, hsingjun cheung







 hsingjun.ch...@gmail.com wrote:
  Hi Pierre:

  Thanks. These functions work now. Do you know how to extract the raw
  intensity for each probe ?

  On Aug 8, 5:48 pm, Pierre Neuvial pie...@stat.berkeley.edu wrote:
  Hi,

  The 'annotationData' directory should be directly in your working
  directory, as explained in the page Setup: Location of annotation
  data files:http://aroma-project.org/node/66

  In your case, you need to change the current directory to ~/experiment/ by

  setwd(~/experiment/)

  (or by starting your R session from this directory).  Then your command

  csR - AffymetrixCelSet$byName(KN01M013,chipType=HG-U133_Plus_2)

  should work.

  Best,

  Pierre

  On Mon, Aug 8, 2011 at 5:29 PM, hsingjun cheung

  hsingjun.ch...@gmail.com wrote:
   Hello:

   I searched the group but got no results ... So I want to know, how to
   extract the raw probe intensity from .CEL file?

   The file structure on my computer is like:

   ~/experiemnt/
               annotationData/
                           chipTypes/
                                  HG-U133_Plus_2/
                                             HG-U133_Plus_2.cdf
   ~/experiment/
                 rawData/
                         KN01M013/
                                 HG-U133_Plus_2/
                                                      KN01M013.CEL

   The .cdf file is downloaded 
   fromhttp://www.aroma-project.org/chipTypes/HG-U133_Plus_2

   When I run R under ~ directory:
   library(aroma.affymetrix)
    csR - AffymetrixCelSet$byName(KN01M013,chipType=HG-U133_Plus_2)

   I got error msg:

   Error in list(`AffymetrixCelSet$byName(KN01M013, chipType = HG-
   U133_Plus_2)` = environment,  :

   [2011-08-08 11:24:05] Exception: Could not locate a file for this chip
   type: HG-U133_Plus_2
    at throw(Exception(...))
    at throw.default(Could not locate a file for this chip type: ,
   paste(c(chipT
    at throw(Could not locate a file for this chip type: ,
   paste(c(chipType, tag
    at method(static, ...)
    at AffymetrixCdfFile$byChipType(chipType)
    at method(static, ...)
    at AffymetrixCelSet$byName(KN01M013, chipType = HG-U133_Plus_2)

   Could anyone help me figure how this error happened ? And how to do
   it  ( extract raw probe intensity ) in a right way ? Thanks

   --
   When reporting problems on aroma.affymetrix, make sure 1) to run the 
   latest version of the package, 2) to report the output of sessionInfo() 
   and traceback(), and 3) to post a complete code example.

   You received this message because you are subscribed to the Google 
   Groups aroma.affymetrix group with 
   websitehttp://www.aroma-project.org/.
   To post to this group, send email to aroma-affymetrix@googlegroups.com
   To unsubscribe and other options, go 
   tohttp://www.aroma-project.org/forum/

  --
  When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
  version of the package, 2) to report the output of sessionInfo() and 
  traceback(), and 3) to post a complete code example.

  You received this message because you are subscribed to the Google 

Re: [aroma.affymetrix] Re: How to extract raw probe intensity from .CEL file

2011-08-09 Thread Mark Robinson

How about grabbing the intensities according to their index:

raw=extractMatrix(cs,cells=df$cell,verbose=verbose)

Then you'll have them matched up to the 'df' data.frame.

(Different numbers for your chip, of course)
 dim(df)
[1] 844550 16
 dim(raw)
[1] 844550 33


Mark

On Aug 10, 2011, at 12:31 AM, hsingjun cheung wrote:

 Hi Mark:
 
 My idea is how we could know the intensity for each probe ? Using
 these command:
 library(aroma.affymetrix)
 cs - AffymetrixCelSet$byName(KN01M013, chipType=HG-U133_Plus_2)
 raw=extractMatrix(cs,verbose=verbose)
 
 I can see 'raw' is a list of intensities, but I don't know which probe
 ids they correspond to.  Hope this clarifies. Thanks
 
 On Aug 9, 6:03 am, Mark Robinson markrobinson@gmail.com wrote:
 Perhaps something like this is what you want (note: different chip to
 what you are using)?
 
 df - readDataFrame(getCdf(cs), verbose=-80)
 [...snip...]
 head(df)
  unit unitName   unitType unitDirection unitNbrOfAtoms group groupName
 11  7892501 expression sense  4 1
 7892501
 21  7892501 expression sense  4 1
 7892501
 31  7892501 expression sense  4 1
 7892501
 41  7892501 expression sense  4 1
 7892501
 52  7892502 expression sense  4 1
 7892502
 62  7892502 expression sense  4 1
 7892502
  groupDirection groupNbrOfAtomscell   x   y pbase tbase indexPos
 atom
 1  sense   4  116371 870 110 C G
 00
 2  sense   4  943979  28 899 A T
 11
 3  sense   4  493089 638 469 T A
 22
 4  sense   4  907039 888 863 A T
 33
 5  sense   4 1033309 108 984 T A
 00
 6  sense   4  653512 411 622 T A
 11
 
 I'm not sure what object you have in mind when it comes to a probe-
 intensity pair, but this should give you all the info you might want
 (e.g. cell index, x/y physical location).
 
 HTH,
 Mark
 
 On Aug 9, 5:45 pm, Pierre Neuvial pie...@stat.berkeley.edu wrote:
 
 Hi,
 
 Have you tried using extractAffyBatch, which is documented 
 here:http://aroma-project.org/howtos/extractAffyBatch?
 As far as I understand you will need the Bioconductor annotation
 package corresponding to your chip type to be installed, ie
 
  source(http://www.bioconductor.org/biocLite.R;)
  biocLite(hgu133plus2cdf)
 
 This is discussed in this 
 thread:http://groups.google.com/group/aroma-affymetrix/browse_thread/thread/...
 
 Pierre
 
 On Tue, Aug 9, 2011 at 4:34 AM, hsingjun cheung
 
 hsingjun.ch...@gmail.com wrote:
 Hi Pierre:
 
 Thanks. These functions work now. Do you know how to extract the raw
 intensity for each probe ?
 
 On Aug 8, 5:48 pm, Pierre Neuvial pie...@stat.berkeley.edu wrote:
 Hi,
 
 The 'annotationData' directory should be directly in your working
 directory, as explained in the page Setup: Location of annotation
 data files:http://aroma-project.org/node/66
 
 In your case, you need to change the current directory to ~/experiment/ by
 
 setwd(~/experiment/)
 
 (or by starting your R session from this directory).  Then your command
 
 csR - AffymetrixCelSet$byName(KN01M013,chipType=HG-U133_Plus_2)
 
 should work.
 
 Best,
 
 Pierre
 
 On Mon, Aug 8, 2011 at 5:29 PM, hsingjun cheung
 
 hsingjun.ch...@gmail.com wrote:
 Hello:
 
 I searched the group but got no results ... So I want to know, how to
 extract the raw probe intensity from .CEL file?
 
 The file structure on my computer is like:
 
 ~/experiemnt/
 annotationData/
 chipTypes/
HG-U133_Plus_2/
   HG-U133_Plus_2.cdf
 ~/experiment/
   rawData/
   KN01M013/
   HG-U133_Plus_2/
KN01M013.CEL
 
 The .cdf file is downloaded 
 fromhttp://www.aroma-project.org/chipTypes/HG-U133_Plus_2
 
 When I run R under ~ directory:
 library(aroma.affymetrix)
  csR - AffymetrixCelSet$byName(KN01M013,chipType=HG-U133_Plus_2)
 
 I got error msg:
 
 Error in list(`AffymetrixCelSet$byName(KN01M013, chipType = HG-
 U133_Plus_2)` = environment,  :
 
 [2011-08-08 11:24:05] Exception: Could not locate a file for this chip
 type: HG-U133_Plus_2
  at throw(Exception(...))
  at throw.default(Could not locate a file for this chip type: ,
 paste(c(chipT
  at throw(Could not locate a file for this chip type: ,
 paste(c(chipType, tag
  at method(static, ...)
  at AffymetrixCdfFile$byChipType(chipType)
  at method(static, ...)
  at AffymetrixCelSet$byName(KN01M013, chipType = HG-U133_Plus_2)
 
 Could anyone help me figure how this error happened ? And how to do
 it  ( extract raw probe intensity ) in a right way ? Thanks
 
 --
 When reporting problems on aroma.affymetrix, make sure 1

[aroma.affymetrix] Re: Question about Firma

2011-06-26 Thread Mark Robinson
Hi Florence,

I've copied the aroma.affymetrix mailing list, just in case others have extra 
comments.

On Jun 25, 2011, at 1:05 AM, Florence Jaffrezic wrote:

 
 Dear Professor Robinson,
 
 I am a French researcher working near Paris, and was trying to run an 
 analysis with Firma to detect alternative splicing.
 I wanted to re-analyze the colon cancer data available on the affymetrix 
 website for the Human exon chip, and used the
 R code provided in the Firma vignette. I ran the FirmaModel on the plmEx 
 object as shown below:
 
 plmEx - ExonRmaPlm(csN,mergeGroups=FALSE)
 fit(plmEx, verbose=verbose)
 firma - FirmaModel(plmEx)
 fit(firma, verbose=verbose)
 fs - getFirmaScores(firma)
 scores=extractDataFrame(fs)
 
 I then obtain one score for each exon and each chip.
 I have a few questions:
 1) First, there are 10 biological replicates in each condition. How should I 
 combine the Firma scores obtained for each replicate ?

You could take the average, or perhaps a more robust median.


 2) For the detection of alternative splicing, I saw in the literature that we 
 should take the log2 value of the scores
 fsScores - log2(extractDataFrame(fs)), and that large negative values will 
 indicate exon skipping.
 Is this correct ? Is there a cut-off value that can be used for these scores 
 to detect alternative splicing ?

One thing: I see it more as detection of 'differential' splicing, i.e. 
different experimental conditions express transcripts differently.  But, yes, 
this is correct, you are looking for 'extreme values' and negative ones may 
indicate exon skipping.  As far as I know, we never set cutoffs because 
assigning estimated FDRs to them is non-trivial.

Regards,
Mark


 Thank you very much in advance for your help,
 
 Florence
 
 ---
 Dr Florence Jaffrezic
 INRA
 Bat 211
 78352 Jouy-en-Josas Cedex
 France
 Tel: (+33) 1 34 65 21 94
 Fax: (+33) 1 34 65 22 10
 ---
 Firma_Exon_question.r

--
Mark Robinson, PhD (Melb)
Epigenetics Laboratory, Garvan
Bioinformatics Division, WEHI
e: mrobin...@wehi.edu.au
e: m.robin...@garvan.org.au
p: +61 (0)3 9345 2628
f: +61 (0)3 9347 0852
--


__
The information in this email is confidential and intended solely for the 
addressee.
You must not disclose, forward, print or use it without the permission of the 
sender.
__

-- 
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
aroma.affymetrix group with website http://www.aroma-project.org/.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe and other options, go to http://www.aroma-project.org/forum/


Re: [aroma.affymetrix] Re: gcRMA for Gene ST Arrays

2011-06-22 Thread Mark Robinson
Hi Setsuko,

I haven't looked at this in a long time, and I can't seem to find the CDF file 
that I used locally on my computer.  The information regarding the antigenomic 
probesets should be readily available in the files you get from Affymetrix 
though, if you want to custom build a CDF file.

To be honest, when I did this 2 years ago, there was no official GCRMA 
implementation for Gene 1.0 ST arrays.  Maybe Bioconductor has this now for 
these arrays?

I've cc'd the aroma.affymetrix mailing list in the hope that someone has a 
better solution for you.

Cheers,
Mark


On Jun 21, 2011, at 8:28 AM, Setsuko Sahara wrote:

 Dear Mark
 
 I found your previous e-mail regarding an application of gcrma to data for 
 Gene ST arrays. Do you happen to have a chance to provide us of antigenomic 
 probesets somewhere? Or do you recommend to try your previous scripts?
 
 Sincerely,
 
 setsu
 
 
 
 Mark Robinson
 Sun, 29 Mar 2009 23:12:09 -0700
 
 Hi Mario.
 
 I have made some modifications to the reading of probe_tab files and  
 to the computing of affinities so that this procedure can run now,  
 either as you have done below by choosing lowly expressed probes, or  
 (perhaps preferably) by using the 'antigenomic' probes on the array:
 
 library(aroma.affymetrix)
 verbose - Arguments$getVerbose(-30); timestampOn(verbose)
 
 cdf-AffymetrixCdfFile$fromChipType(HuGene-1_0-st- 
 v1,verbose=verbose,tags=PD)
 cs-AffymetrixCelSet$fromName(tissues,cdf=cdf,verbose=verbose)
 
 bcGc - GcRmaBackgroundCorrection(cs,  
 type=affinities,indicesNegativeControl = negativeControlIndices)
 csGBC - process(bcGc,verbose=verbose)
 
 controlIndices - which(!isPm(cdf))
 bc -  
 GcRmaBackgroundCorrection 
 (cs,type=affinities,indicesNegativeControl=controlIndices)
 csBC - process(bc,verbose=verbose,force=TRUE)
 
 I needed to make a CDF file that contained these antigenomic probesets  
 as they are not present in the binary-converted CDF files I created  
 previously.  I will make available these CDFs once I can test  
 everything.
 
 Unfortunately, I do not have a good way of testing that my  
 modifications are doing exactly the right thing, as I am also not  
 intimately familiar with the gcrma model/code.  To be honest, I don't  
 know of anyone that has successfully run gcrma on these chips or Exon  
 ST chips, Bioconductor or otherwise.  Do you?  If so, please let me  
 know.
 
 These changes can be made available in the next release or possibly  
 earlier with a patch, but I just want to test the changes first.
 
 Cheers,
 Mark
 
 Setsuko Sahara
 
 
 
 

--
Mark Robinson, PhD (Melb)
Epigenetics Laboratory, Garvan
Bioinformatics Division, WEHI
e: mrobin...@wehi.edu.au
e: m.robin...@garvan.org.au
p: +61 (0)3 9345 2628
f: +61 (0)3 9347 0852
--


__
The information in this email is confidential and intended solely for the 
addressee.
You must not disclose, forward, print or use it without the permission of the 
sender.
__

-- 
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
aroma.affymetrix group with website http://www.aroma-project.org/.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe and other options, go to http://www.aroma-project.org/forum/


Re: [aroma.affymetrix] Extract raw data for core transcripts from EXON arrays

2011-02-10 Thread Mark Robinson
Hi Anbarasu.

To extract just the raw probe intensities (before BG adjustment or 
normalization), how about something like:


cdf - AffymetrixCdfFile$byChipType(chipType, tags=coreR3,A20071112,EP)
cs - AffymetrixCelSet$byName(tissues, cdf=cdf)

u - 1:nbrOfUnits(cdf)
u - 1:10  # could use line above, but use subset to make it quick

ugcM - getUnitGroupCellMap(cdf, units=u, retNames=TRUE)
d - extractMatrix(cs, cells=ugcM$cell)
rownames(d) - paste(ugcM$unit, ugcM$group, sep=.)

There are of course other possibilities for the rownames if you chose, but here 
is what it would give:

 rownames(d) - paste(ugcM$unit, ugcM$group, sep=.)
 head(d)
huex_wta_spleen_A huex_wta_spleen_B huex_wta_spleen_C
2315251.2315252494630
2315251.2315252414837
2315251.2315252684640
2315251.2315252463932
2315251.2315253342631
2315251.2315253303028
huex_wta_testes_A huex_wta_testes_B huex_wta_testes_C
2315251.2315252   131   111   156
2315251.23152529681   153
2315251.2315252   138   102   146
2315251.2315252724956
2315251.2315253453348
2315251.2315253303931
huex_wta_thyroid_A huex_wta_thyroid_B huex_wta_thyroid_C
2315251.2315252 31 57 70
2315251.2315252 49 40 63
2315251.2315252 33 54 61
2315251.2315252 30 53 46
2315251.2315253 29 53 35
2315251.2315253 26 38 39


Hope that helps.
Mark

On 2011-02-09, at 3:38 AM, Anbarasu L A wrote:

 Hi All,
 
 I have been looking at extracting raw data for core transcripts from 
 HuEx-1_0-st-v2 chip type. I have downloaded the custom CDF file provided in 
 http://www.aroma-project.org/node/122. 
 
 chipType - HuEx-1_0-st-v2
 cdf - AffymetrixCdfFile$byChipType(chipType, tags=core)
 print(cdf)
 AffymetrixCdfFile:
 Path: annotationData/chipTypes/HuEx-1_0-st-v2
 Filename: HuEx-1_0-st-v2,core.cdf
 Filesize: 32.00MB
 Chip type: HuEx-1_0-st-v2,core
 RAM: 0.00MB
 File format: v4 (binary; XDA)
 Dimension: 2560x2560
 Number of cells: 6553600
 Number of units: 22010
 Cells per unit: 297.76
 Number of QC units: 0
 
 How can I access these 22010 units (transcripts) and extract un normalized 
 intensity values? If I use: getCellIndices(cdf, unlist=TRUE, useNames=FALSE), 
 I am getting intensity data for 893395 probes. 
 
 Thanks in advance.
 
 Best regards,
 Anbarasu 
 
 
 
 
 
 -- 
 When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
 version of the package, 2) to report the output of sessionInfo() and 
 traceback(), and 3) to post a complete code example.
  
  
 You received this message because you are subscribed to the Google Groups 
 aroma.affymetrix group with website http://www.aroma-project.org/.
 To post to this group, send email to aroma-affymetrix@googlegroups.com
 To unsubscribe and other options, go to http://www.aroma-project.org/forum/

--
Mark Robinson, PhD (Melb)
Epigenetics Laboratory, Garvan
Bioinformatics Division, WEHI
e: mrobin...@wehi.edu.au
e: m.robin...@garvan.org.au
p: +61 (0)3 9345 2628
f: +61 (0)3 9347 0852
--


__
The information in this email is confidential and intended solely for the 
addressee.
You must not disclose, forward, print or use it without the permission of the 
sender.
__

-- 
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
aroma.affymetrix group with website http://www.aroma-project.org/.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe and other options, go to http://www.aroma-project.org/forum/


Re: [aroma.affymetrix] exon array analysis

2011-01-23 Thread Mark Robinson
Hi Kripa.

Have a look at the Affymetrix website:

http://www.affymetrix.com/estore/browse/products.jsp?productId=131452categoryId=35676productName=GeneChip-Human-Exon-1.0-ST-Array#1_3

Click on Technical Documentation and the file you are probably after is 
HuEx-1_0-st-v2 Transcript Cluster Annotations, CSV, Release 31 (25 MB 
08/27/10):

http://www.affymetrix.com/Auth/analysis/downloads/na31/wtexon/HuEx-1_0-st-v2.na31.hg19.transcript.csv.zip

The 'unitName' below is a transcript cluster id.  You may have to parse the 
other columns to extract the gene identifiers/symbols.

There are of course other ways to do this.  For example, you may be able to use 
the 'exonmap' R/Bioconductor package.

Hope that helps.
Mark


On 2011-01-24, at 3:29 PM, kripa raman wrote:

 Hi, 
 I'm very new to microarray analysis and I fear I'm in too deep by starting 
 with the HuEx1_0-st-v2 chip, especially since no one in my building seems to 
 have conducted this analysis!
 
 Experiment currently: 2 chips have been analyzed and have had the same 
 treatment, I'm looking to confirm that the genes/ exons are the same for both 
 chips (ideally by identifying that the top 100 genes are identical)
 
 The issue I'm having is converting the unitName and groupName, currently seen 
 in the trFit table, into meaningful gene ID. I'm under the impression that I 
 should be connecting this with the csv file but I'm not sure how to go about 
 doing this.
 Any help would be greatly appreciated!
 
 -Kripa 
 
 
 
 Code thus far:
 chipType-HuEx-1_0-st-v2 
 
 cdf-AffymetrixCdfFile$
 byChipType(chipType, tags=coreR2,A20070914,EP) ##set cdf: Core probesets: 
 18,708 units/transcript clusters, 284,258 groups/probesets, and 1,082,385 
 probes  
 
 cs-AffymetrixCelSet$byName(control, cdf=cdf) ##set cel group: 0035 and 0028
 
 
 bc - RmaBackgroundCorrection(cs, tag=coreR2) ##background correction
 csBC - process(bc,verbose=verbose)
 
 
 qn - QuantileNormalization(csBC, typesToUpdate=pm) ##normalization
 csN - process(qn, verbose=verbose)
 MNorm-extractMatrix(csN)
 
 
 plmTr - ExonRmaPlm(csN, mergeGroups=TRUE) ##summarize
 fit(plmTr, verbose=verbose)
 
 qamTr - QualityAssessmentModel(plmTr) ##quality assessment
 plotNuse(qamTr)
 plotRle(qamTr) 
 
 cesTr - getChipEffectSet(plmTr)
 trFit - extractDataFrame(cesTr, units=1:3, addNames=TRUE)
 MSumm-extractMatrix(cesTr) 
 
 #result (how do i go about changing this unitName and groupName)
  unitName groupName unit group cellHuEx1_0028   HuEx1_0035 
 1  2315373   23153741 11 6.7226334.735021
 2  2315554   23155862 15 9.4221649.943003
 3  2315633   23156383 320   6.1466156.00318
 
 -- 
 When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
 version of the package, 2) to report the output of sessionInfo() and 
 traceback(), and 3) to post a complete code example.
  
  
 You received this message because you are subscribed to the Google Groups 
 aroma.affymetrix group with website http://www.aroma-project.org/.
 To post to this group, send email to aroma-affymetrix@googlegroups.com
 To unsubscribe and other options, go to http://www.aroma-project.org/forum/

--
Mark Robinson, PhD (Melb)
Epigenetics Laboratory, Garvan
Bioinformatics Division, WEHI
e: mrobin...@wehi.edu.au
e: m.robin...@garvan.org.au
p: +61 (0)3 9345 2628
f: +61 (0)3 9347 0852
--


__
The information in this email is confidential and intended solely for the 
addressee.
You must not disclose, forward, print or use it without the permission of the 
sender.
__

-- 
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
aroma.affymetrix group with website http://www.aroma-project.org/.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe and other options, go to http://www.aroma-project.org/forum/


Re: [aroma.affymetrix] Firma score help

2011-01-14 Thread Mark Robinson
Hi Sabrina.

Sorry for the slow reply.

Basically, there isn't a lot of precedent for this, but I would suggest using 
all 16 arrays to define the FIRMA scores.  Then use limma on those to look for 
your differences of interest.  My reasoning is that this will allow the data 
from more arrays to be used for the estimation of probe effects.

Hope that helps.

Mark

On 2011-01-04, at 7:29 PM, sabrina wrote:

 Hello, all:
 I have 16 exon arrays from 4 groups, A1, A2 and B1, B2. As are
 genetically identical but with different treatment,Bs are genetically
 identical (but different from As) with different treatment 1 and 2. I
 am interested in finding alternative splicing events that was affected
 by treatment on A and B and also by genetics (under same treatement).
 Therefore, the comparison I am interested are A1 vs. A2, B1vsB2, A1 vs
 B1 (only treatment 1) . My question is, when I do RMA and calculate
 the FIRMA score, do I use all 16 array I have to get Firma scores,
 then use limma as suggested before to apply to design matrix and
 contrast matrix? Or should I for each comparison , do RMA and FIRMA
 score for exon arrays only involved in that comparison? Thanks for
 your input!
 
 Sabrina
 
 -- 
 When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
 version of the package, 2) to report the output of sessionInfo() and 
 traceback(), and 3) to post a complete code example.
 
 
 You received this message because you are subscribed to the Google Groups 
 aroma.affymetrix group with website http://www.aroma-project.org/.
 To post to this group, send email to aroma-affymetrix@googlegroups.com
 To unsubscribe and other options, go to http://www.aroma-project.org/forum/

--
Mark Robinson, PhD (Melb)
Epigenetics Laboratory, Garvan
Bioinformatics Division, WEHI
e: mrobin...@wehi.edu.au
e: m.robin...@garvan.org.au
p: +61 (0)3 9345 2628
f: +61 (0)3 9347 0852
--


__
The information in this email is confidential and intended solely for the 
addressee.
You must not disclose, forward, print or use it without the permission of the 
sender.
__

-- 
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
aroma.affymetrix group with website http://www.aroma-project.org/.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe and other options, go to http://www.aroma-project.org/forum/


Re: [aroma.affymetrix] Installing FIRMAGene

2010-10-27 Thread Mark Robinson
 that you have an appropriate version of Dev Tools:
http://developer.apple.com/technologies/tools/

Hope that helps.

Mark


On 2010-10-27, at 1:52 PM, Jon Tang wrote:

 Hi,
 
 I'm having problems installing FIRMAGene on my computer either with the 
 install.packages command in R or with the R Package Installer. I keep getting 
 the warning message that FIRMAGene is not available.  Is there another 
 location to get this package?  Thanks.  
 
 R version 2.12.0 (2010-10-15)
 Platform: i386-apple-darwin9.8.0/i386 (32-bit)
 
 When I try to install by using the install.packages command, it says the 
 package is unavailable:
  install.packages(FIRMAGene, repos=http://R-Forge.R-project.org;)
 Warning: unable to access index for repository 
 http://R-Forge.R-project.org/bin/macosx/leopard/contrib/2.12
 Warning message:
 In getDependencies(pkgs, dependencies, available, lib) :
   package ‘FIRMAGene’ is not available
 
 
 If I try to install the package using the R Package Installer, I get the 
 following warnings:
  install.packages(FIRMAGene, repos=http://R-Forge.R-project.org;)
 Warning: unable to access index for repository 
 http://R-Forge.R-project.org/bin/macosx/leopard/contrib/2.12
 Warning message:
 In getDependencies(pkgs, dependencies, available, lib) :
   package ‘FIRMAGene’ is not available
 trying URL 'http://R-Forge.R-project.org/src/contrib/FIRMAGene_0.9.5.tar.gz'
 Content type 'application/x-gzip' length 9223 bytes
 opened URL
 ==
 downloaded 9223 bytes
 
 * installing *source* package ‘FIRMAGene’ ...
 ** libs
 *** arch - i386
 gcc -arch i386 -std=gnu99 -I/Library/Frameworks/R.framework/Resources/include 
 -I/Library/Frameworks/R.framework/Resources/include/i386  
 -I/usr/local/include-fPIC  -g -O2 -c init.c -o init.o
 gcc -arch i386 -std=gnu99 -I/Library/Frameworks/R.framework/Resources/include 
 -I/Library/Frameworks/R.framework/Resources/include/i386  
 -I/usr/local/include-fPIC  -g -O2 -c mps.c -o mps.o
 gcc -arch i386 -std=gnu99 -dynamiclib -Wl,-headerpad_max_install_names 
 -undefined dynamic_lookup -single_module -multiply_defined suppress 
 -L/usr/local/lib -o FIRMAGene.so init.o mps.o 
 -F/Library/Frameworks/R.framework/.. -framework R -Wl,-framework 
 -Wl,CoreFoundation
 /usr/bin/libtool: for architecture ppc7400 object: 
 /usr/lib/gcc/i686-apple-darwin8/4.0.1/../../../libSystem.dylib malformed 
 object (unknown load command 7)
 /usr/bin/libtool: for architecture: (null) file: -lSystem is not an object 
 file (not allowed in a library)
 /usr/bin/libtool: for architecture ppc64 object: 
 /usr/lib/gcc/i686-apple-darwin8/4.0.1/../../../libSystem.dylib malformed 
 object (unknown load command 7)
 make: *** [FIRMAGene.so] Error 1
 ERROR: compilation failed for package ‘FIRMAGene’
 * removing 
 ‘/Library/Frameworks/R.framework/Versions/2.12/Resources/library/FIRMAGene’
 
 The downloaded packages are in
 
 ‘/private/var/folders/pQ/pQBxScbYEACZOn2BER6xiE+++TI/-Tmp-/RtmpzEsz6S/downloaded_packages’
 
 
 -- 
 When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
 version of the package, 2) to report the output of sessionInfo() and 
 traceback(), and 3) to post a complete code example.
  
  
 You received this message because you are subscribed to the Google Groups 
 aroma.affymetrix group with website http://www.aroma-project.org/.
 To post to this group, send email to aroma-affymetrix@googlegroups.com
 To unsubscribe and other options, go to http://www.aroma-project.org/forum/

--
Mark Robinson, PhD (Melb)
Epigenetics Laboratory, Garvan
Bioinformatics Division, WEHI
e: mrobin...@wehi.edu.au
e: m.robin...@garvan.org.au
p: +61 (0)3 9345 2628
f: +61 (0)3 9347 0852
--


__
The information in this email is confidential and intended solely for the 
addressee.
You must not disclose, forward, print or use it without the permission of the 
sender.
__

-- 
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
aroma.affymetrix group with website http://www.aroma-project.org/.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe and other options, go to http://www.aroma-project.org/forum/


Re: [aroma.affymetrix] Missing file bpmapCluster2Cdf.R in Create a CDF from a BpMap

2010-10-20 Thread Mark Robinson
Apologies folks.  Inadvertent file permissions change.  It should work now.

Cheers,
Mark


-- Forwarded message --
From: vegard vegard.nyga...@medisin.uio.no
Date: Thu, Oct 21, 2010 at 2:55 AM
Subject: [aroma.affymetrix] Missing file bpmapCluster2Cdf.R in
Create a CDF from a BpMap
To: aroma.affymetrix aroma-affymetrix@googlegroups.com


Hi, I am trying to make a CDF as described in the page
How to: Create a CDF (and associated) files from a BpMap file (tiling
arrays)
http://aroma-project.org/node/42

I am supposed to use the script bpmapCluster2Cdf.R, but the link is
dead (forbidden)
http://129.94.136.7/file_dump/mark/bpmapCluster2Cdf.R .
I was not able to find the script or methods elsewhere.

Can you help me?

Best Regards
Vegard Nygaard.

--
When reporting problems on aroma.affymetrix, make sure 1) to run the
latest version of the package, 2) to report the output of
sessionInfo() and traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google
Groups aroma.affymetrix group with website
http://www.aroma-project.org/.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe and other options, go to http://www.aroma-project.org/forum/

-- 
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
aroma.affymetrix group with website http://www.aroma-project.org/.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe and other options, go to http://www.aroma-project.org/forum/


[aroma.affymetrix] Re: run FirmaGene on exon array ?

2010-08-29 Thread Mark Robinson
Hi Qicheng.

I don't think that particular CDF will work with FIRMAGene, since it is laid 
out in a list of lists (probes -- cells, for probe selection regions or 
groups -- are laid out within transcript clusters -- units).  The 
cell/group/unit are CDF speak.

Basically, in order for FIRMAGene to work (and note that I haven't run 
FIRMAGene on the Exon platform myself), you need a CDF file where all the 
probes (cells) are within 1 group ... AND, you need to ensure that the order of 
the probes is the order in which they map to the genome/transcript.  This is 
what FIRMAGene assumes.

I believe the CDF files created by brainarray:

http://brainarray.mbni.med.umich.edu/Brainarray/Database/CustomCDF/CDF_download.asp

... are organized this way, but you'd have to verify that for yourself.

Hope that gets you started.

Cheers,
Mark

On 2010-08-27, at 6:36 AM, Qicheng Ma wrote:

 Hi Mark,
 
   Could you please tell me whether we can run FirmaGene 
 (http://bioinf.wehi.edu.au/folders/firmagene/sup3_04feb2010.R) on human exon 
 array using CDF file HuEx-1_0-st-v2,coreR3,A20071112,EP.cdf , since FirmaGene 
 score would be more useful than Firma score from individual exons ?
 
 Thanks,
 
 Qicheng

--
Mark Robinson, PhD (Melb)
Epigenetics Laboratory, Garvan
Bioinformatics Division, WEHI
e: m.robin...@garvan.org.au
e: mrobin...@wehi.edu.au
p: +61 (0)3 9345 2628
f: +61 (0)3 9347 0852
--






__
The information in this email is confidential and intended solely for the 
addressee.
You must not disclose, forward, print or use it without the permission of the 
sender.
__

-- 
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
aroma.affymetrix group with website http://www.aroma-project.org/.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe and other options, go to http://www.aroma-project.org/forum/


[aroma.affymetrix] Re: sorry, I can not reproduce the table 2 in FIRMAGene paper

2010-08-09 Thread Mark Robinson
Hi Qichengm.

Wrong dataset.  That table was created using the Affy tissue panel, not the 
Affy tissue mixture data.

Cheers,
Mark


On 2010-08-10, at 6:41 AM, qichengm wrote:

 Hi,
 
 I download the recent version of sup3_04feb2010.r, and made minor
 changes to let it run, here is the difference:
 
 
 $diff sup3_04feb2010.r sup3_04feb2010.r.orig
 11,13c11
  #cdf-AffymetrixCdfFile$byChipType(HuGene-1_0-st-
 v1,verbose=verbose)
  chipType - HuGene-1_0-st-v1
  cdf - AffymetrixCdfFile$byChipType(chipType, tags=r3)
 ---
 cdf-AffymetrixCdfFile$byChipType(HuGene-1_0-st-v1,verbose=verbose)
 16c14
  cs-AffymetrixCelSet
 $byName(TisMix_WTGene1C,cdf=cdf,verbose=verbose)
 ---
 cs-AffymetrixCelSet$byName(tissues,cdf=cdf,verbose=verbose)
 35,39c33,34
  #hgnetaffx - read.csv(HuGene-1_0-st-
 v1.na25.hg18.transcript.csv,sep=,,skip=19,header=TRUE,comment.char=,stringsAsFactors=FALSE)
  hgnetaffx - read.csv(annotationData/chipTypes/HuGene-1_0-st-v1/
 HuGene-1_0-st-
 v1.na30.hg19.transcript.csv,sep=,,skip=19,header=TRUE,comment.char=,stringsAsFactors=FALSE)
 
  #probetab - read.table(HuGene-1_0-st-
 v1.probe.tab,sep=\t,header=TRUE,comment.char=,stringsAsFactors=FALSE)
  probetab - read.table(annotationData/chipTypes/HuGene-1_0-st-v1/
 HuGene-1_0-st-
 v1.hg19.probe.tab,sep=\t,header=TRUE,comment.char=,stringsAsFactors=FALSE)
 ---
 hgnetaffx - 
 read.csv(HuGene-1_0-st-v1.na25.hg18.transcript.csv,sep=,,skip=19,header=TRUE,comment.char=,stringsAsFactors=FALSE)
 probetab - 
 read.table(HuGene-1_0-st-v1.probe.tab,sep=\t,header=TRUE,comment.char=,stringsAsFactors=FALSE)
 
 Top 20 hits are different from those in the FirmaGene paper
 ID,Sample,Score,Symbol
 7934979,TisMix_mix9,49.0412085693221,ANKRD1
 7987315,TisMix_mix9,48.5685310739171,ACTC1
 8023889,TisMix_mix1,37.4399023091527,MBP
 8060963,TisMix_mix1,36.0242843230823,SNAP25
 7947099,TisMix_mix9,35.1698682900308,CSRP3
 7912520,TisMix_mix9,34.2770659572940,NPPB
 7929653,TisMix_mix9,31.4139588184231,ANKRD2
 8096959,TisMix_mix1,30.9351264815591,ANK2
 7912692,TisMix_mix9,30.7020787308514,HSPB7
 7957338,TisMix_mix1,30.6749548684879,SYT1
 8169061,TisMix_mix1,29.9989728237812,PLP1
 8087925,TisMix_mix9,29.4092542289276,TNNC1
 7930208,TisMix_mix1,29.002191135071,INA
 8046062,TisMix_mix9,28.5683689224915,XIRP2
 8109663,TisMix_mix1,28.3206965639246,GABRA1
 7982018,TisMix_mix1,28.3138146218297,SNORD115-6
 7924910,TisMix_mix9,28.2833712932546,ACTA1
 7982090,TisMix_mix1,27.4880562975114,SNORD115-42
 8103789,TisMix_mix1,27.399249112706,GPM6A
 7982008,TisMix_mix1,25.0039468447955,SNORD115-1
 
 Could you please tell me where I am wrong ?
 
 Thanks,
 
 Qichengm

--
Mark Robinson, PhD (Melb)
Epigenetics Laboratory, Garvan
Bioinformatics Division, WEHI
e: m.robin...@garvan.org.au
e: mrobin...@wehi.edu.au
p: +61 (0)3 9345 2628
f: +61 (0)3 9347 0852
--






__
The information in this email is confidential and intended solely for the 
addressee.
You must not disclose, forward, print or use it without the permission of the 
sender.
__

-- 
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
aroma.affymetrix group with website http://www.aroma-project.org/.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe and other options, go to http://www.aroma-project.org/forum/


Re: [aroma.affymetrix] Firma Scores in the range of 700-1500

2010-07-22 Thread Mark Robinson
Hi Gaurav.

Note that the FIRMA scores (and expression values, chip effects, etc.) are 
stored in the exponentiated (base 2) scale.  So, take log2() of them:

 log2(1568)
[1] 10.6

That's much more feasible.

Cheers,
Mark


On 2010-07-22, at 12:10 AM, gaurav bhatti wrote:

 I wanted to reproduce the results of FIRMA paper for the tissue sample
 data set (exon array:HuEx-1_0-st-v2) . I used the ensebl cdf,
 HuEx-1_0-st-v2,U-Ensembl47,G-Affy which I think is the one the authors
 used. Here is the exact code that I used:
 
 library(aroma.affymetrix)
 verbose - Arguments$getVerbose(-8, timestamp=TRUE)
 chipType - HuEx-1_0-st-v2
 # Getting annotation data files
 cdf - AffymetrixCdfFile$byChipType(chipType,tags=U-Ensembl47,G-
 Affy)
 # Defining CEL set
 cs - AffymetrixCelSet$byName(coloncancer, cdf=cdf)
 #Background Adjustment and Normalization
 bc - RmaBackgroundCorrection(cs, tag=ensemblcancer)
 csBC - process(bc,verbose=verbose)
 qn - QuantileNormalization(csBC, typesToUpdate=pm)
 csN - process(qn, verbose=verbose)
 #Summarization
 plmTr - ExonRmaPlm(csN, mergeGroups=TRUE)
 fit(plmTr, verbose=verbose)
 CesTr - getChipEffectSet(plmTr)
 trFit - extractDataFrame(CesTr,units=NULL,addNames=TRUE)
 #Alternative Splicing Analysis (FIRMA)
 firma - FirmaModel(plmTr)
 fit(firma, verbose=verbose)
 fs - getFirmaScores(firma)
 firma - extractDataFrame(fs,units=NULL,addNames=TRUE)
 rownames(firma) = firma$groupName
 I am getting some FIRMA values as high as 1568 ( for UNR: 2429323). Is
 that even feasible ?
 
 Gaurav Bhatti
 
 -- 
 When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
 version of the package, 2) to report the output of sessionInfo() and 
 traceback(), and 3) to post a complete code example.
 
 
 You received this message because you are subscribed to the Google Groups 
 aroma.affymetrix group with website http://www.aroma-project.org/.
 To post to this group, send email to aroma-affymetrix@googlegroups.com
 To unsubscribe and other options, go to http://www.aroma-project.org/forum/

--
Mark Robinson, PhD (Melb)
Epigenetics Laboratory, Garvan
Bioinformatics Division, WEHI
e: m.robin...@garvan.org.au
e: mrobin...@wehi.edu.au
p: +61 (0)3 9345 2628
f: +61 (0)3 9347 0852
--






__
The information in this email is confidential and intended solely for the 
addressee.
You must not disclose, forward, print or use it without the permission of the 
sender.
__

-- 
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
aroma.affymetrix group with website http://www.aroma-project.org/.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe and other options, go to http://www.aroma-project.org/forum/


Re: [aroma.affymetrix] FIRMA and GenomeGraphs

2010-05-13 Thread Mark Robinson
Hi Lara.

Some comments below.

On 2010-05-14, at 12:48 AM, Lara wrote:

 Dear all,
 
 first of all congratulations for aroma, I've been using it for a while
 and it saved me for the first analysis of exon arrays back in 2008.
 Since then I have used several times for serveral purposes.
 Unfortunately, sometimes there is a lack of help but a good forum...
 
 So, I have been struggling with alternative splicing, trying to
 understand, and done several tests.
 I have performed Firma scores and limma to see differences between two
 groups.
 I have also performed DE exons (probesets) with limma.
 Then, I used GenomeGraphs to visualize results and I have serveral
 questions which i hope to write clearly and not to be obvious:
 1. what I expect is, in those exons that have differences in fs, to
 have a clear graphical difference in expression in the selected exon
 between conditions but is not what I get.
 So, for instance, if we take 7922737 -- ENSG0157060 -- C1orf14 --
 blue=Testis, which is the first example  of supplementary material (1)
 of Differential splicing using whole-transcript microarrays, BMC
 Bioinformatics, 2009, 10, 156. 
 http://www.biomedcentral.com/content/supplementary/1471-2105-10-156-s1.pdf.
 I know that is FIRMAGene for gene arrays, but is something similar to
 what I get with FIRMA and exon arrays. I would say (from the
 residuals) that last two exons (10 probesets) are spliced. But if you
 check  the expression of those probesets i wouldn't say that they have
 differences. On the contrary, first exons for instance seem to be
 spliced.
 Apparently, DE exons look better on graphs (no matter if they belong
 to a DE gene or not)

Yes, this can happen. And, that is the real value of the GenomeGraphs output.  
Remember that FIRMA and FIRMAGene are really just outlier detection procedures.

This is (sort of) eluded to in both the FIRMA and FIRMAGene papers:

Purdom et al., Section 4.1: In particular, if the proportion of samples 
showing alternative splicing is high within an exon (say in the majority of 
samples), the high residuals will be found not in those samples classified by 
the simulation as spliced out, but rather the complementary set of samples

Robinson and Speed, Conclusions and Discusion: Identifying departures through 
residuals from the RMA model will not always be perfect. Some departures from 
the RMA linear model may not be alternative splicing at all ... or are induced 
through, for example, an exon that is not expressed in any of the samples in 
combination with strong differential expression.


 2. Do Firma scores correspond to residuals of rma model? because this
 is what you plot in genomeGraphs, isn't it?

A FIRMA score for each sample and probeset is the median of the (usually 4) 
residuals from the robustly-fit linear model of RMA.  See the paper for the 
formal definition.  FIRMAGene is a little bit different, in that it takes 
partial sums of adjacent residuals to try and look for a persistence of 
departure from the model.


 3. Everything is done on a probeset basis, but shouldn't it be done on
 a real exon (exon cluster) basis?. Sometimes just a probeset appears
 to be spliced and is complicated when you try to interpret this
 biologically...

Debatable.  Affymetrix picked probesets (or probe selection regions) based on 
what they thought could be independent units of expression, based on annotated 
transcripts, ESTs, etc.  But, you could certainly build a CDF file that grouped 
the probes together in a different way.

Hope that helps.

Cheers,
Mark


 I think there is no point on adding my code and sessionInfo() given
 that those are general questions, but if you need it I can add it.
 
 Thanks for your time and answer,
 
 Lara
 
 -- 
 When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
 version of the package, 2) to report the output of sessionInfo() and 
 traceback(), and 3) to post a complete code example.
 
 
 You received this message because you are subscribed to the Google Groups 
 aroma.affymetrix group with website http://www.aroma-project.org/.
 To post to this group, send email to aroma-affymetrix@googlegroups.com
 To unsubscribe and other options, go to http://www.aroma-project.org/forum/

--
Mark Robinson, PhD (Melb)
Epigenetics Laboratory, Garvan
Bioinformatics Division, WEHI
e: m.robin...@garvan.org.au
e: mrobin...@wehi.edu.au
p: +61 (0)3 9345 2628
f: +61 (0)3 9347 0852
--






__
The information in this email is confidential and intended solely for the 
addressee.
You must not disclose, forward, print or use it without the permission of the 
sender.
__

-- 
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3

Re: [aroma.affymetrix] Re: what kind of file

2010-04-01 Thread Mark Robinson
Hi Daniela.

That's strange.  And no, these files do not need to be in a particular
directory (In fact, this error is completely independent of the
aroma.affmetrix package).  The error you get suggests the file is empty. 
When you say it seems to be fine, what does that mean?  Alternatively,
what does this give:

file.info(MoGene-1_0-st-v1.probe.tab)

This should be a decent sized file.  The ZIP file that you get from
Affymetrix is 23MB, so unzipped it will be a lot larger.

Cheers,
Mark


 So far I have managed to nearly run everything of the script.
 I do though have issues in loading the probe sequences.
 Do the files need to be stored in a particular folder? For now I have
 it in ./annotationData/chipTypes/ as I have seen someone else in this
 forum doing it like this.
 What else could the reason be?

 probetab-read.table(MoGene-1_0-st-v1.probe.tab, sep=\t,header=TRUE,
 comment.char=,stringsAsFactors=FALSE)
 Error in read.table(MoGene-1_0-st-v1.probe.tab, sep = \t, header =
 TRUE,  :
   no lines available in input

 I double checked the file I downloaded from Affy and it seems to be
 fine!
 Thx,
 Daniela

 On Mar 16, 4:56 pm, Mark Robinson mrobin...@wehi.edu.au wrote:
 Hi Daniela.

 Those files are available from Affymetrix.

 For example,
 see:http://www.affymetrix.com/estore/browse/products.jsp?productId=131453...

 HuGene-1_0-st-v1 Transcript Cluster Annotations, CSV, Release 30 (18  
 MB 11/09/09)
 HuGene-1_0-st-v1 Probe Sequences, tabular format (22 MB 07/13/07)

 (there are different versions of the annotation files available)

 Cheers,
 Mark

 On 13-Mar-10, at 8:12 AM, dkny169 wrote:



  Hello,
  I am working on the sup3.R script (FIRMAGene).

  I was wondering what kind of files these are: HuGene-1_0-st-
  v1.na25.hg18.transcript.csv and HuGene-1_0-st-v1.probe.tab?
  What is in these files?
  What am I supposed to load here into FIRMAGene?

  Thanks a lot!

  --
  When reporting problems on aroma.affymetrix, make sure 1) to run the  
  latest version of the package, 2) to report the output of  
  sessionInfo() and traceback(), and 3) to post a complete code example.

  You received this message because you are subscribed to the Google  
  Groups aroma.affymetrix group.
  To post to this group, send email to aroma-affymetrix@googlegroups.com
  To unsubscribe from this group, send email to
 aroma-affymetrix-unsubscr...@googlegroups.com
  For more options, visit this group
 athttp://groups.google.com/group/aroma-affymetrix?hl=en

 --
 Mark Robinson, PhD (Melb)
 Epigenetics Laboratory, Garvan
 Bioinformatics Division, WEHI
 e: m.robin...@garvan.org.au
 e: mrobin...@wehi.edu.au
 p: +61 (0)3 9345 2628
 f: +61 (0)3 9347 0852
 --

 __
 The information in this email is confidential and intended solely for
 the addressee.
 You must not disclose, forward, print or use it without the permission
 of the sender.
 __

 --
 When reporting problems on aroma.affymetrix, make sure 1) to run the
 latest version of the package, 2) to report the output of sessionInfo()
 and traceback(), and 3) to post a complete code example.


 You received this message because you are subscribed to the Google Groups
 aroma.affymetrix group with website http://www.aroma-project.org/.
 To post to this group, send email to aroma-affymetrix@googlegroups.com
 To unsubscribe from this group, send email to
 aroma-affymetrix-unsubscr...@googlegroups.com
 For more options, visit this group at
 http://groups.google.com/group/aroma-affymetrix?hl=en

 To unsubscribe, reply using remove me as the subject.




__
The information in this email is confidential and intended solely for the 
addressee.
You must not disclose, forward, print or use it without the permission of the 
sender.
__

-- 
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
aroma.affymetrix group with website http://www.aroma-project.org/.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe from this group, send email to 
aroma-affymetrix-unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/aroma-affymetrix?hl=en


Re: [aroma.affymetrix] Re: FIRMAGene command

2010-03-18 Thread Mark Robinson
Hi Daniela.

 I definitely loaded the package and had a look at the help.start docs.
 Neverthelss, I wasn't able to work out my problems that I described in
 my previous post.

OK, so you've read the documentation.  But, you haven't told us what
you didn't understand there.  I can try and explain the docs to you
...

-
 cls: variable giving the class (aligned with the columns or sample
  names of the input object).
-

So, this says that you need to specify a vector which gives the
experimental group of your samples.

So, in my example, the sample names were:

 getNames(cs)
 [1] TisMap_Brain_01_v1_WTGene1TisMap_Brain_02_v1_WTGene1
 [3] TisMap_Brain_03_v1_WTGene1TisMap_Breast_01_v1_WTGene1
 [5] TisMap_Breast_02_v1_WTGene1   TisMap_Breast_03_v1_WTGene1
[snip]

which get converted to:

 cls-gsub(TisMap_,,gsub(_0[1-3]_v1_WTGene1,,getNames(cs)))
 cls
 [1] BrainBrainBrainBreast   Breast   Breast
[snip]

And, with the 'cls' variable, I tell FIRMAGene() what group each
sample is from.  You need to do the same for you 14 samples.

---
idsToUse: indices of the units to calculate FIRMAGene scores for.
---

In my example, all this does:

u - which(getUnitNames(cdf) %in%
hgnetaffx$probeset_id[hgnetaffx$category == main 
hgnetaffx$total_probes  7  hgnetaffx$total_probes  200])

... (assuming you've read in an appropriate file to hgnetaffx) ...
only uses the main category probeset (i.e. not the control
probesets), only probesets with 7 and 200 probes within them.


  I am not sure what is supposed to be stored in cls and u.

OK, so hopefully you are ok with whats spelled out above.  Ask
questions, mentioning what you don't understand, if not.


  I’m a bit confused however, with what the whole “unique cdf set” is
  for and how plm is working. Can I save the plm data into a txt file?

You don't really need to understand the uniquifying.  Its just a step
that needs to be done.

For general info on probe level models, you might look at the
references mentioned in fitPLM() or rma():

library(affyPLM)
?fitPLM
library(oligo)
?rma

In terms of saving the plm data (I assume you mean chip effects?),
you should breeze through the vignette for Gene 1.0 ST arrays.  At the
end, it extracts the summarized data into a data frame:
http://aroma-project.org/node/38

... and you could output this to a text file using write.table().

Hope that helps.

Cheers,
Mark






On Fri, Mar 19, 2010 at 5:25 AM, dkny169 daniela...@yahoo.com wrote:
 I definitely loaded the package and had a look at the help.start docs.
 Neverthelss, I wasn't able to work out my problems that I described in
 my previous post.

 On Mar 18, 1:58 pm, Henrik Bengtsson henrik.bengts...@gmail.com
 wrote:
 Hi.

 On Thu, Mar 18, 2010 at 6:47 PM, dkny169 daniela...@yahoo.com wrote:
  Unfortunately I cannot get to the docs, unless the same docs are
  stored under help.start()

 Please explain what the problem/error is.  Note that you have to load
 a package in order to use ?/help() on its methods, e.g.

 library(FIRMAGene);
 ?FIRMAGene

 If you don't load it, you get something like:

  ?FIRMAGene

 No documentation for 'FIRMAGene' in specified packages and libraries:
 you could try '??FIRMAGene'

 The help is the same regardless if you access it via ?/help() or
 help.start().  So, yes, you'll find the same information if you do
 help.start() - Packages - FIRMAGene - FIRMAGene

 /Henrik



  I used following parameters:
 plm - RmaPlm(csNU)
  plm
  [1] RmaPlm: 0x22388540
 cls-gsub(TisMap_,,gsub(_0(1-3)_v1_WTGene1,,getNames(cs)))
 cls
   [1] P.L_10 P.L_11 P.L_12 P.L_14 P.L_15 P.L_16 P.L_2
  P.L_3
   [9] P.L_4  P.L_5  P.L_6  P.L_7  P.L_8  P.L_9

  I am not sure what is supposed to be stored in cls and u.
  I’m a bit confused however, with what the whole “unique cdf set” is
  for and how plm is working. Can I save the plm data into a txt file?
  Many thanks for your help.
  I really appreciate it.
  Daniela

  On Mar 16, 4:57 pm, Mark Robinson mrobin...@wehi.edu.au wrote:
  Hi Daniela.

  You haven't told us what inputs you've used for 'plm' and 'cls' ...
  and what is stored in 'u'?

  Have you read the docs at:

  ?FIRMAGene

  Cheers,
  Mark

  On 14-Mar-10, at 10:21 AM, dkny169 wrote:

   Hello,
   I have a question regarding FIRMAGene. Executing the FIRMAGene
   command I get the following error:
   fg-FIRMAGene(plm, idsToUse=u, cls=cls)
   Gathering/calculating residuals.
   Reading units.
   Error in if (any(units  1)) stop(Argument 'units' contains non-
   positive indices.) :
   missing value where TRUE/FALSE needed

   The commands used right before are:

   monetaffx-read.csv(MoEx-1_0-st-v1.na29.mm9.transcript.csv,
   sep=,,skip=20, header=TRUE,comment.char=,stringsAsFactors=FALSE)
   probetab-read.table(MoEx-1_0-st-v1.na29.mm9.probeset.csv,
   sep=\t, header=TRUE, comment.char=, stringsAsFactors=FALSE)
   u-which(getUnitNames(cdf) %in% monetaffx$probeset_id [monetaffx
   $category ==main

Re: [aroma.affymetrix] getUniqueCdf inflates dimensions of original cdf

2010-03-16 Thread Mark Robinson

Hi Vince.

Yes, getUniqueCdf() *should* inflate the dimensions of the original  
CDF.  Basically, it is rearranging the probesets so that individual  
probes do not match to multiple probesets.  To do this, it creates a  
CDF with a higher dimension, copying the original physical location to  
multiple locations.  convertToUnique() takes an AffymetrixCelSet and  
copies the data to match the new CDF.


I'm not sure what is going wrong in your analysis.  But, as Henrik  
says, please tell us how you created the CDF as a starter.  You have a  
near doubling in the number of probes in your unique CDF to the  
original CDF, which obviously is curious in itself.  So, a full  
explanation of what you've done upstream of this would be useful.


Cheers,
Mark

On 16-Mar-10, at 1:09 AM, Henrik Bengtsson wrote:


Hi,

I leave this one to Mark Robinson who is designed createUniqueCdf()
for AffymetrixCdfFile and is on top of this. Though, in the meanwhile
could you please:

1. Clarify the origin of Mm_PromPR_v02.CDF, because Affymetrix does
not provide an CDF.

2. Make the Mm_PromPR_v02.CDF available to us?   If you're happy to
share it (and got the rights), I'm happy to have aroma-project.org to
either link to it or host it.

/Henrik


On Fri, Mar 12, 2010 at 8:04 PM, stvjc carey...@gmail.com wrote:

cdfU

AffymetrixCdfFile:
Path: annotationData/chipTypes/Mm_PromPR_v02
Filename: Mm_PromPR_v02,unique.CDF
Filesize: 126.33MB
Chip type: Mm_PromPR_v02,unique
RAM: 0.00MB
File format: v4 (binary; XDA)
Dimension: 3026x3026
Number of cells: 9156676
Number of units: 25373
Cells per unit: 360.88
Number of QC units: 0

cdf

AffymetrixCdfFile:
Path: annotationData/chipTypes/Mm_PromPR_v02
Filename: Mm_PromPR_v02.cdf
Filesize: 126.33MB
Chip type: Mm_PromPR_v02
RAM: 0.00MB
File format: v4 (binary; XDA)
Dimension: 2166x2166
Number of cells: 4691556
Number of units: 25373
Cells per unit: 184.90
Number of QC units: 0

this leads to (i think)


csU = convertToUnique(csN, verbose=verbose)

20100312 14:02:59|Converting to unique CDF...
20100312 14:02:59| Getting unique CDF...
20100312 14:02:59| Getting unique CDF...done
20100312 14:02:59| Input tags:MN,lm
20100312 14:02:59| Input Path: probeData/Dawn,MN,lm/Mm_PromPR_v02
20100312 14:02:59| Output Path:probeData/Dawn,MN,lm,UNQ/Mm_PromPR_v02
20100312 14:02:59| allTags:MN,lm,UNQ
20100312 14:02:59| Test whether dataset exists
20100312 14:02:59| Reading cell indices from standard CDF...
20100312 14:03:08| Reading cell indices from standard CDF...done
20100312 14:03:08| Reading cell indices list from unique CDF...
20100312 14:03:17| Reading cell indices list from unique CDF...done
20100312 14:03:17| Converting CEL data from standard to unique CDF  
for

sample 1 ( 10_BL6_IP_Mmp ) of 8...
20100312 14:03:17|  Reading intensity values according to standard
CDF...
Error in readCel(filename, indices = indices, readHeader = FALSE,
readOutliers = FALSE,  :
 Argument 'indices' contains an element out of range.
20100312 14:03:23|  Reading intensity values according to standard
CDF...done
20100312 14:03:23| Converting CEL data from standard to unique CDF  
for

sample 1 ( 10_BL6_IP_Mmp ) of 8...done
20100312 14:03:23|Converting to unique CDF...done


sessionInfo()

R version 2.11.0 Under development (unstable) (2010-03-02 r51194)
x86_64-apple-darwin9.8.0

locale:
[1] C

attached base packages:
[1] stats graphics  grDevices datasets  tools utils
methods
[8] base

other attached packages:
 [1] gsmoothr_0.1.4 limma_3.3.4
aroma.affymetrix_1.5.0
 [4] aroma.apd_0.1.7affxparser_1.19.6
R.huge_0.2.0
 [7] aroma.core_1.5.0   aroma.light_1.15.1
matrixStats_0.1.9
[10] R.rsp_0.3.6R.cache_0.2.0
R.filesets_0.8.0
[13] R.utils_1.3.3  R.oo_1.6.7
R.methodsS3_1.1.0
[16] weaver_1.13.0  codetools_0.2-2
digest_0.4.2

--
When reporting problems on aroma.affymetrix, make sure 1) to run  
the latest version of the package, 2) to report the output of  
sessionInfo() and traceback(), and 3) to post a complete code  
example.



You received this message because you are subscribed to the Google  
Groups aroma.affymetrix group.
To post to this group, send email to aroma- 
affymet...@googlegroups.com

To unsubscribe from this group, send email to 
aroma-affymetrix-unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/aroma-affymetrix?hl=en


--
When reporting problems on aroma.affymetrix, make sure 1) to run the  
latest version of the package, 2) to report the output of  
sessionInfo() and traceback(), and 3) to post a complete code example.



You received this message because you are subscribed to the Google  
Groups aroma.affymetrix group.

To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe from this group, send email to 
aroma-affymetrix-unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/aroma-affymetrix?hl=en


--
Mark

Re: [aroma.affymetrix] FIRMAGene command

2010-03-16 Thread Mark Robinson

Hi Daniela.

You haven't told us what inputs you've used for 'plm' and 'cls' ...  
and what is stored in 'u'?


Have you read the docs at:

?FIRMAGene

Cheers,
Mark


On 14-Mar-10, at 10:21 AM, dkny169 wrote:


Hello,
I have a question regarding FIRMAGene. Executing the FIRMAGene
command I get the following error:

fg-FIRMAGene(plm, idsToUse=u, cls=cls)

Gathering/calculating residuals.
Reading units.
Error in if (any(units  1)) stop(Argument 'units' contains non-
positive indices.) :
missing value where TRUE/FALSE needed

The commands used right before are:

monetaffx-read.csv(MoEx-1_0-st-v1.na29.mm9.transcript.csv,  
sep=,,skip=20, header=TRUE,comment.char=,stringsAsFactors=FALSE)
probetab-read.table(MoEx-1_0-st-v1.na29.mm9.probeset.csv,  
sep=\t, header=TRUE, comment.char=, stringsAsFactors=FALSE)
u-which(getUnitNames(cdf) %in% monetaffx$probeset_id [monetaffx 
$category ==main  monetaffx$total_probes 7  monetaffx 
$total_probes 200])


I'm not sure what these commands do and how they need to be
changed to accommodate my own data:

cls - gsub(TisMap_,,gsub(_0[1-3]_v1_WTGene1,,getNames(cs)))

Many thanks,
Daniela


--
When reporting problems on aroma.affymetrix, make sure 1) to run the  
latest version of the package, 2) to report the output of  
sessionInfo() and traceback(), and 3) to post a complete code example.



You received this message because you are subscribed to the Google  
Groups aroma.affymetrix group.

To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe from this group, send email to 
aroma-affymetrix-unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/aroma-affymetrix?hl=en


--
Mark Robinson, PhD (Melb)
Epigenetics Laboratory, Garvan
Bioinformatics Division, WEHI
e: m.robin...@garvan.org.au
e: mrobin...@wehi.edu.au
p: +61 (0)3 9345 2628
f: +61 (0)3 9347 0852
--






__
The information in this email is confidential and intended solely for the 
addressee.
You must not disclose, forward, print or use it without the permission of the 
sender.
__

--
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
aroma.affymetrix group.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe from this group, send email to 
aroma-affymetrix-unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/aroma-affymetrix?hl=en


Re: [aroma.affymetrix] Is the PdInfo2Cdf.R script working

2010-03-16 Thread Mark Robinson
Hi Peter.

I haven't looked at this since early 2009 and our motive in making this
available as a script (instead of within the aroma.affymetrix package) was
simply as an FYI and the code is readily available for others to modify
for their needs.  You will of course understand that underlying packages
change all the time.

On a cursory glance from what you have mentioned below, it seems like the
script actually works now problem.  Each unit appears to be a probeset,
not a transcript cluster.  I know there was a point (looks like early
2009) where Affymetrix made annotation available at the probeset as well
as transcript cluster level, whereas when I built this script there was
only transcript cluster.

Also, you'll notice my post on BioC:
https://stat.ethz.ch/pipermail/bioconductor/2009-July/028893.html
... so, 250k probesets seems about right.

You'll notice that BioC now makes available annotation at both levels:
http://www.bioconductor.org/packages/release/data/annotation/
(hugene10sttranscriptcluster.db, hugene10stprobeset.db)

I'm pretty sure that the 'pd.hugene.1.0.st.v1' package includes
information about both probesets and transcript clusters, since
oligo::rma() can summarize these chips at both levels.  So, it may be an
easy modification to make to the script to extract this.

Also, you haven't told us what you mean by aroma.affymetrix will not work
with it, so I can't offer much there.

Cheers,
Mark


 I attempted to use the script posted on the site to convert the
 pd.hugene.1.0.st.v1 package to a CDF file but it appears not to be
 working. The resulting cdf file has too many units and
 aroma.affymetrix will not work with it beyond naming the CDF. Are you
 aware of any issues?

   source(http://bioinf.wehi.edu.au/folders/mrobinson/aroma/PdInfo2Cdf.R;)
   PdInfo2Cdf(pd.hugene.1.0.st.v1, A1_Affy.CEL, overwrite=TRUE)

 I renamed the resulting binary cdf file and moved it to the appropriate
 aroma.affymetrix directory.

 setwd(P:\\ANNOTATION\\aromaAffymetrix)
 library(aroma.affymetrix)
 chipType - HuGene-1_0-st-v1
 cdf - AffymetrixCdfFile$byChipType(chipType, tags=r4)
 print(cdf)

 print(cdf)
 AffymetrixCdfFile:
 Path: annotationData/chipTypes/HuGene-1_0-st-v1
 Filename: HuGene-1_0-st-v1,r4.cdf
 Filesize: 51.94MB
 Chip type: HuGene-1_0-st-v1,r4
 RAM: 0.00MB
 File format: v4 (binary; XDA)
 Dimension: 1050x1050
 Number of cells: 1102500
 Number of units: 253002
 Cells per unit: 4.36
 Number of QC units: 0

 --
 When reporting problems on aroma.affymetrix, make sure 1) to run the
 latest version of the package, 2) to report the output of sessionInfo()
 and traceback(), and 3) to post a complete code example.


 You received this message because you are subscribed to the Google Groups
 aroma.affymetrix group.
 To post to this group, send email to aroma-affymetrix@googlegroups.com
 To unsubscribe from this group, send email to
 aroma-affymetrix-unsubscr...@googlegroups.com
 For more options, visit this group at
 http://groups.google.com/group/aroma-affymetrix?hl=en




__
The information in this email is confidential and intended solely for the 
addressee.
You must not disclose, forward, print or use it without the permission of the 
sender.
__

-- 
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
aroma.affymetrix group.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe from this group, send email to 
aroma-affymetrix-unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/aroma-affymetrix?hl=en


Re: [aroma.affymetrix] can't load CDF file

2010-03-09 Thread Mark Robinson
Hi Daniela.

Is your CDF in the:

annotationData/chipTypes/MoEx-1_0-st-v1/

directory?

(http://aroma-project.org/node/66)

Cheers,
Mark

 Hi,
 I stored my CDF file in annotationData/chipTypes; nevertheless I cannot
 upload the file.
 Can anyone please tel me what I am doing wrong:

 cdf-AffymetrixCdfFile$byChipType(MoEx-1_0-st-v1.cdf)

 ror in list(`AffymetrixCdfFile$byChipType(MoEx-1_0-st-v1.cdf)` =
 environment,  :
  
 [2010-03-09 14:50:58] Exception: Could not locate a file for this chip
 type: MoEx-1_0-st-v1.cdf
   at throw(Exception(...))
   at throw.default(Could not locate a file for this chip type: ,
 paste(c(chipT
   at throw(Could not locate a file for this chip type: ,
 paste(c(chipType, tag
   at method(static, ...)
   at AffymetrixCdfFile$byChipType(MoEx-1_0-st-v1.cdf)

 Many thanks for your help!






 --
 When reporting problems on aroma.affymetrix, make sure 1) to run the
 latest version of the package, 2) to report the output of sessionInfo()
 and traceback(), and 3) to post a complete code example.


 You received this message because you are subscribed to the Google Groups
 aroma.affymetrix group.
 To post to this group, send email to aroma-affymetrix@googlegroups.com
 To unsubscribe from this group, send email to
 aroma-affymetrix-unsubscr...@googlegroups.com
 For more options, visit this group at
 http://groups.google.com/group/aroma-affymetrix?hl=en



__
The information in this email is confidential and intended solely for the 
addressee.
You must not disclose, forward, print or use it without the permission of the 
sender.
__

-- 
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
aroma.affymetrix group.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe from this group, send email to 
aroma-affymetrix-unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/aroma-affymetrix?hl=en


Re: [aroma.affymetrix] custom CDF and flat file problems

2010-01-17 Thread Mark Robinson

Hi Zaid.

Bit hard to tell from the information you've given us.

So, you built a custom CDF file.  Did you check its contents?  For  
example, you could do:


cdf - AffymetrixCdfFile()
readCdf( getPathname(cdf), units=1:2 )

... and you could check that the indices/X-Y values match what your  
inputs are.


Assuming that is all ok, what commands did you use to fit the probe  
level model using that CDF to a dataset?


In my experience, when you get all zeros, that generally means you  
haven't actually fit the model, via something like:


plm - ExonRmaPlm(...)
fit(plm, verbose=verbose)

...
# pull out chip effects using extractMatrix()

So, need a few more details.

Cheers,
Mark


On 13-Jan-10, at 12:17 PM, zaid wrote:


I was tryign to use a custom CDF in aroma package in R.

Basically I have a custom Flat file that i have fileterd to contain
only the columns as the example provided in this gruup then ran it
using the flat2Cdf function in R.

Then I used the binary CDF file to run the analysis on the CEL files
in aroma R package. The results I obtained were all zeros.

Here's a snippet of the original FLAT file and the fileterd flat file:

Original Flat file:
pr_text pr_set_text chip_x  chip_y  interog_pos probe_sequence  temp
chr
dna_fromdna_to  strand  junction type   SNP entrezgene_id   gene
mrna in
spliced est in  unspliced est inmrna outspliced est out
5827196 2315304 635 22760   ggtatgctgttcgaattcataagaa   52.76   
1   554527
554551  +   exon0   100131754   LOC1001317540   0   
0   0   0
5942976 2315304 121523210   tgtatgagttggtcgtagcggaatc   57.68   
1   554555
554579  +   exon0   100131754   LOC1001317540   0   
0   0   0
502148  2315304 387 196 0   catataagtaatgctagggtgagtg   54.4
1   554603
554627  +   exon0   100131754   LOC1001317540   0   
0   0   0
5836909 2315304 108 22800   tgtaatgggtatggagacatatcat   52.76   
1   554625
554649  +   exon0   100131754   LOC1001317540   0   
0   0   0
4237863 2315305 106216550   aaactcctattatttactctatcaa   47.84   
1   554703
554727  +   exon0   100131754   LOC1001317540   0   
0   0   0
2980983 2315305 114211640   ttaaactcctattatttactctatc   47.84   
1   554705
554729  +   exon0   100131754   LOC1001317540   0   
0   0   0
3217941 2315307 20  12570   agcgctgtgatgagtgtgcctgcaa   60.96   
1   554923
554947  +   exon0   100131754   LOC1001317540   0   
0   0   0
143826  2315307 465 56  0   taatcagtgcgagcttagcgc   56.04   
1   554943
554967  +   exon0   100131754   LOC1001317540   0   
0   0   0

end of snippet

Fileterd flat file:
Probe_IDX   Y   Probe_Sequence  Group_IDUnit_ID
5827196 635 2276ggtatgctgttcgaattcataagaa   100131754
LOC100131754
5942976 12152321tgtatgagttggtcgtagcggaatc   100131754   
LOC100131754
502148  387 196 catataagtaatgctagggtgagtg   100131754   
LOC100131754
5836909 108 2280tgtaatgggtatggagacatatcat   100131754   
LOC100131754
4237863 10621655aaactcctattatttactctatcaa   100131754   
LOC100131754
2980983 11421164ttaaactcctattatttactctatc   100131754   
LOC100131754
3217941 20  1257agcgctgtgatgagtgtgcctgcaa   100131754   
LOC100131754
143826  465 56  taatcagtgcgagcttagcgc   100131754   
LOC100131754
3751638 12371465gcagcttctgtggaacgagggttta   100131754   
LOC100131754
5640155 474 2203cttgcgtgaggaaatacttgatggc   100131754   
LOC100131754
3909899 778 1527aatggcccatttgggcagccg   100131754   
LOC100131754
4012422 901 1567gtgaattcttcgataatggcc   100131754   
LOC100131754

end of snippet

any ideas?

thanks
--
When reporting problems on aroma.affymetrix, make sure 1) to run the  
latest version of the package, 2) to report the output of  
sessionInfo() and traceback(), and 3) to post a complete code example.



You received this message because you are subscribed to the Google  
Groups aroma.affymetrix group.

To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe from this group, send email to 
aroma-affymetrix-unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/aroma-affymetrix?hl=en


--
Mark Robinson, PhD (Melb)
Epigenetics Laboratory, Garvan
Bioinformatics Division, WEHI
e: m.robin...@garvan.org.au
e: mrobin...@wehi.edu.au
p: +61 (0)3 9345 2628
f: +61 (0)3 9347 0852

[aroma.affymetrix] Re: Gene-Level Summarization of Expression Data

2010-01-17 Thread Mark Robinson

Hi Randy.

From that error message, it looks like there was a mix of CDF files  
being used (my guess is 54675 corresponds to the number of Affymetrix  
probesets, whereas 30625 corresponds to the Refseq reorganization of  
probesets).  Can you post the code you ran?


Cheers,
Mark

On 16-Jan-10, at 11:41 AM, Randy Gobbel wrote:


I'm also trying to get gene-level expression values, using HG-
U133_Plus_2 data.  I downloaded the custom CDF that combines probes
into probesets that correspond to RefSeq genes, linked from the
aroma.affymetrix group page for this chip type (Hs133P_Hs_REFSEQ.cdf),
and ran the same set of commands. It works up to the point of trying
to extract expression values, then dies with:

Exception: Range of argument 'indices' is out of range [1,30625]:
[1,54675]

At this point, I'm not sure what to do next. Suggestions?  It looks
like you were the creator of the CDF--is it the right one for this?

-Randy

On Jun 19 2009, 10:08 pm, Mark Robinson mrobin...@wehi.edu.au wrote:

Hi Steve.

I don't know how common this is.  Basically, a colleague found agene
that was very differentially expressed when analyzing using the
Affymetrix probesets definition and found virtually nothing when  
using

the custom CDF that bundles all the probes for agenetogether.  The
reason was simple.  There were several probesets designed for this   
geneand presumably they measure different isoforms.  The probes for

the DE probeset showed the difference, but all the other probesets
didn't.  When you use a robust linear model like RMA, outliers get
downweighted.  Because the DE probes accounted for a small proportion
of the probes (I think there was 3 or 4 other probesets at this
locus), their effect got washed out.

So, its a tradeoff.  Sometimes (perhaps most of the time) you gain by
lumping them all together ... more information, more power to detect
changes.  But, sometimes (perhaps rarely) it can mislead.  I'm sure
I'm not the only one to observe such things.  The probe-level data
(usually?) doesn't lie.  But, since you are comparing across
platforms, you will undoubtedly find this as you go along.  Different
microarray designs often measure slightly different things.

One other thing.  Be sure to convert your CDF to binary if it is not
already using affxparser's convertCdf().  Having this info stored in
binary format will make the processing much faster.  I think the MBNI
custom CDFs are text.

Cheers,
Mark

On 20/06/2009, at 6:55 AM, Steve P wrote:






Mark,



Thanks for the information. That is very helpful.


I want to do the latter, which is to combine probesets such that  
all

probes for a givengene(by some definition -- RefSeq, Ensembl, etc)
are used to arise at the summarize value.


I was able to obtain a custom CDF for the U133-A array. So I will  
try
that approach. But part of the reason I want to do this is to be  
able

to compare values across platforms, so I may need to find/build a
custom CDF for the other platform.


I would appreciate any cautionary advice you have about  
summarizing at

thegenelevel.



Regards,
-Steve



On Jun 17, 9:56 am, Steve Piccolo steve.picc...@gmail.com wrote:

Yesterday I posted this question to the list, but the spam blocker
didn't
let it through. Below my question is a response from Mark Robinson.



--- 
---



Following the example provided 
athttp://groups.google.com/group/aroma-affymetrix/web/gene-1-0-st-array
...
,
I am running the following code:



chipType - HT_HG-U133A
dataSet = myData



library(aroma.affymetrix)
verbose - Arguments$getVerbose(-8, timestamp=TRUE)



cdf - AffymetrixCdfFile$byChipType(chipType)
cs - AffymetrixCelSet$byName(dataSet, cdf=cdf)



bc - RmaBackgroundCorrection(cs)
csBC - process(bc,verbose=verbose)
qn - QuantileNormalization(csBC)
csN - process(qn, verbose=verbose)



plm - RmaPlm(csN)
fit(plm, verbose=verbose)



ces - getChipEffectSet(plm)
gExprs - extractDataFrame(ces, units=NULL, addNames=TRUE)



This seems to be working beautifully.


However, I'm doing an analysis that requires my expression values  
to

be summarized at thegenelevel rather than the probeset level.



In the gExprs object that results from the above analysis, I get a
data.frame object in which each row contains expression values  
for a

given probeset across all samples. What I would love to see in each
row is an expression value for a givengene. I believe RMA has the
ability to do this, but I'm not sure how to do it via
aroma.affymetrix.


Any suggestions? I'm happy to provide any more details that would  
be

helpful.



Regards,
-Steve



--- 
---



Hi Steve.



As to your question, it depends on what you need.  When you say you
want
every row to be agene, do you just want to know thegenename that
goes
with the probeset identifier

Re: [aroma.affymetrix] Re: a question about FIRMA score

2010-01-10 Thread Mark Robinson

Hi Jiang.

You need to take the log2 of the residuals.  CEL files store only  
positive numbers.


This question has been answered many times.  For starters, have a look  
at:

http://www.mail-archive.com/aroma-affymetrix@googlegroups.com/msg01015.html

Cheers,
Mark

On 8-Jan-10, at 2:53 AM, camelbbs wrote:


Hi,
can anyone help?
Jiang

On Jan 5, 9:47 am, camelbbs camel...@gmail.com wrote:

HI,
I found the FIRMA scores are all 0, so what do you mean the
positive or negative? and i am not very understand this sentence it
seems logical to find FIRMA scores that are different b/w your 2+
groups. That if there are several samples in each group, how to deal
with it?
Thanks,
Jiang

--
When reporting problems on aroma.affymetrix, make sure 1) to run the  
latest version of the package, 2) to report the output of  
sessionInfo() and traceback(), and 3) to post a complete code example.



You received this message because you are subscribed to the Google  
Groups aroma.affymetrix group.

To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe from this group, send email to 
aroma-affymetrix-unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/aroma-affymetrix?hl=en


--
Mark Robinson, PhD (Melb)
Epigenetics Laboratory, Garvan
Bioinformatics Division, WEHI
e: m.robin...@garvan.org.au
e: mrobin...@wehi.edu.au
p: +61 (0)3 9345 2628
f: +61 (0)3 9347 0852
--






__
The information in this email is confidential and intended solely for the 
addressee.
You must not disclose, forward, print or use it without the permission of the 
sender.
__
-- 
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
aroma.affymetrix group.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe from this group, send email to 
aroma-affymetrix-unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/aroma-affymetrix?hl=en

Re: [aroma.affymetrix] FIRMA SCORE from different test

2010-01-10 Thread Mark Robinson

Hi Sabrina.

Some comments below.

On 24-Dec-09, at 2:39 AM, sabrina wrote:


Hi, everyone:
I am trying to use aroma to detect splicing events. My dataset
consists 3 groups (genetically different) with one as control, the
other two as mutants. each group also had control and treatment
subgroups. My interest is to compare mutant with control, under normal
condition,and compare control under two condition (normal and
treatment), and interactions among mutant and control with condition.

I did two different runs for the second comparison (control group
under two conditions): 1. using all groups , all conditions to do
normalization, plmFit, firmaScore and use limFit with specific
contrast matrix to find the splicing events. 2. only use control
group, under two conditions, do normalization, plmFit, firmaScore, and
limFit to find the splicing events.


I'm not sure whether such a comparison will lead you to anything  
meaningful.  It seems like the best approach should be to use all  
data together, since that should allow the best estimates of probe  
effects to be estimated.  Your #1 and #2 don't seem comparable to me.



I compared the results from these two runs, they were quite different
and B values from the topTable were very different.  I wonder what is
the right choice to fit my objective. Thanks


If you did this with gene expression data in limma, you'd probably  
find the same thing -- process the data in different ways and you'll  
get different B values.


Cheers,
Mark





Sabrina

--
When reporting problems on aroma.affymetrix, make sure 1) to run the  
latest version of the package, 2) to report the output of  
sessionInfo() and traceback(), and 3) to post a complete code example.



You received this message because you are subscribed to the Google  
Groups aroma.affymetrix group.

To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe from this group, send email to 
aroma-affymetrix-unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/aroma-affymetrix?hl=en


--
Mark Robinson, PhD (Melb)
Epigenetics Laboratory, Garvan
Bioinformatics Division, WEHI
e: m.robin...@garvan.org.au
e: mrobin...@wehi.edu.au
p: +61 (0)3 9345 2628
f: +61 (0)3 9347 0852
--






__
The information in this email is confidential and intended solely for the 
addressee.
You must not disclose, forward, print or use it without the permission of the 
sender.
__
-- 
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
aroma.affymetrix group.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe from this group, send email to 
aroma-affymetrix-unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/aroma-affymetrix?hl=en

Re: [aroma.affymetrix] Question for custom CDF of ST-Array

2010-01-10 Thread Mark Robinson

Hi Ranger.

See comments below.


On 8-Jan-10, at 8:03 AM, rangerq wrote:


Hi,

I use aroma.affymetrix to process Human Gene 1.0 ST Array. In the step
of using custom cdf, I want to know that could I use the regular
HuGene ST cdf from Affymetrix instead of the one provided at here?

What are the different between unsupported cdf and the regular cdf?
Could you explain what does 'unsupported' mean? Which one that I can
trust to annotate my data?


The only difference between the CDF that Affy provides and the one  
that you can download from the aroma.affymetrix site is that ours has  
been converted to binary, allowing it to be read a LOT faster.  The  
content is identical.  As far as the unsupported business goes,  
Affymetrix doesn't support it.  They make their annotation available  
in a different format nowadays.  However, that annotation is the same  
(or at least it was when I last checked it):


http://www.mail-archive.com/aroma-affymetrix@googlegroups.com/msg00281.html



If I want to make my own cdf, are there some instructions?


Does this help?
http://groups.google.com/group/aroma-affymetrix/web/creating-cdf-files-from-scratch

Cheers,
Mark


Thanks,

ranger QI
--
When reporting problems on aroma.affymetrix, make sure 1) to run the  
latest version of the package, 2) to report the output of  
sessionInfo() and traceback(), and 3) to post a complete code example.



You received this message because you are subscribed to the Google  
Groups aroma.affymetrix group.

To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe from this group, send email to 
aroma-affymetrix-unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/aroma-affymetrix?hl=en


--
Mark Robinson, PhD (Melb)
Epigenetics Laboratory, Garvan
Bioinformatics Division, WEHI
e: m.robin...@garvan.org.au
e: mrobin...@wehi.edu.au
p: +61 (0)3 9345 2628
f: +61 (0)3 9347 0852
--






__
The information in this email is confidential and intended solely for the 
addressee.
You must not disclose, forward, print or use it without the permission of the 
sender.
__
-- 
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
aroma.affymetrix group.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe from this group, send email to 
aroma-affymetrix-unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/aroma-affymetrix?hl=en

Re: [aroma.affymetrix] questiones about annotations for exon arrays: no gene symbol or refseq id for majority of core probe sets?

2010-01-10 Thread Mark Robinson

Hi Yupu.

I haven't explored this in any detail, but on a cursory inspection  
(below), it appears that biomaRt has 244000 probesets represented in  
its database (which seems about right).


 bm -  
getBM 
(attributes 
= 
c 
(affy_huex_1_0_st_v2 
,hgnc_symbol,chromosome_name,band),mart=mart)

 dim(bm)
[1] 324334  4
 head(bm)
  affy_huex_1_0_st_v2 hgnc_symbol chromosome_name   band
1 3581777   IGHA2  14 q32.33
2 3581646   IGHA2  14 q32.33
3 3581642   IGHA2  14 q32.33
4 3581781   IGHA2  14 q32.33
5 3581783   IGHA2  14 q32.33
6 3581788   IGHA2  14 q32.33
 length(unique(bm$affy_huex_1_0_st_v2))
[1] 244801

Strictly speaking, this isn't an aroma.affymetrix question.  What I  
suggest you try is exploring what identifiers are not represented in  
the database and whether something is missing from biomaRt (or the  
Ensembl web page).


Of course, you can also get annotation from other sources (e.g. from  
Affymetrix).


Hope that helps,
Mark


On 7-Jan-10, at 7:30 AM, yupu wrote:


Hi,

I am new to exon array analysis. I managed to follow the instructions
from 
http://groups.google.com/group/aroma-affymetrix/web/human-exon-array-analysis
to get the estimation of transcript:

trFit - extractDataFrame(cesTr, units=1:3, addNames=TRUE)

Then I followed the following thread's idea of using biomaRt to get
the annotation information through the group id of trFit object:
http://groups.google.com/group/aroma-affymetrix/browse_thread/thread/1f4af7fca4352022/a3fe6980ffa7b925?lnk=gstq=questions+about+annotations#a3fe6980ffa7b925

groupnames = trFit[,2]

ann-getBM(attributes = c(affy_huex_1_0_st_v2, hgnc_symbol),
filters = affy_huex_1_0_st_v2, values = groupnames, mart = ensembl)

What surprised me is a majority of these group ids don't have any gene
symbol or refseq id associated with them (even I was using the core
probeset upstream)


length(groupnames)

[1] 18708


dim(ann)

[1] 78352

I am not sure if this is expected or I am doing something wrong here.

Thanks,
Yupu


--
When reporting problems on aroma.affymetrix, make sure 1) to run the  
latest version of the package, 2) to report the output of  
sessionInfo() and traceback(), and 3) to post a complete code example.



You received this message because you are subscribed to the Google  
Groups aroma.affymetrix group.

To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe from this group, send email to 
aroma-affymetrix-unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/aroma-affymetrix?hl=en


--
Mark Robinson, PhD (Melb)
Epigenetics Laboratory, Garvan
Bioinformatics Division, WEHI
e: m.robin...@garvan.org.au
e: mrobin...@wehi.edu.au
p: +61 (0)3 9345 2628
f: +61 (0)3 9347 0852
--






__
The information in this email is confidential and intended solely for the 
addressee.
You must not disclose, forward, print or use it without the permission of the 
sender.
__
-- 
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
aroma.affymetrix group.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe from this group, send email to 
aroma-affymetrix-unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/aroma-affymetrix?hl=en

Re: [aroma.affymetrix] Re: FIRMA score for each transcript

2009-12-16 Thread Mark Robinson
Hi Libing.

I'm afraid that aroma.affymetrix will not work on TXT files.  I  
suggest you check out the following functions in the 'affxparser'  
package:

?readCel (just so you know what is stored)

?createCel (to create the file)
?updateCel (to store intensities in the file)

Once you figure out the inputs for those functions It should be pretty  
straightforward to take your simulated data and push it into a CEL file.

Hope that helps.

Cheers,
Mark

On 16-Dec-09, at 5:14 AM, Libing Wang wrote:

 Hi Mark,

 Thanks a lot for your help!
 Now I want to work with some simulated data with aroma to calculate  
 summarized intensities of probesets. The problem is that I only have  
 a txt file with original probe signal intensities but aroma could be  
 only fed by cel files. Is it possible let aroma work with txt files?  
 If not, are there any ways to construct cel files from txt files?  
 Thanks!   /Libing

 On Fri, Nov 6, 2009 at 6:35 AM, Mark Robinson  
 mrobin...@wehi.edu.au wrote:
 Hi Libing.

 Are you after the probe IDs from the probe.tab file?  For example:

 Probe IDProbe Set IDprobe x probe y assembly 
 seqname start   stop
 strand  probe sequence  target strandedness category
 4485910 2315252 789 1752build-34/hg16   chr1407616   
 407640  +
 GTAATGCTTGCCACATAGAGCACAG   Sense   main
 2412400 2315252 879 942 build-34/hg16   chr1408027   
 408051  +
 AAGCTGTCCAACACATTAGGGCCAC   Sense   main
 4260180 2315252 339 1664build-34/hg16   chr1408088   
 408112  +
 GAACTGCAATCTGTAGGTGTCGGTA   Sense   main
 5750312 2315252 551 2246build-34/hg16   chr1408300   
 408324  +
 TCCATCTGTGAATTAGGGTGTGGCC   Sense   main
 2959753 2315253 392 1156build-34/hg16   chr1408431   
 408455  +
 AGATCCTCTTGTAAATCACTAGCTG   Sense   main
 294823  2315253 422 115 build-34/hg16   chr1408433   
 408457  +
 TGAGATCCTCTTGTAAATCACTAGC   Sense   main
 5504333 2315253 332 2150build-34/hg16   chr1408434   
 408458  +
 ATGAGATCCTCTTGTAAATCACTAG   Sense   main
 1224013 2315253 332 478 build-34/hg16   chr1408436   
 408460  +
 TTATGAGATCCTCTTGTAAATCACT   Sense   main


 If so, you could make a lookup table from that and match them to the
 info in your CDF file.  For example:

   cdf - AffymetrixCdfFile$byChipType(HuEx-1_0-st-v2,
 tag=coreR3,A20071112,EP)
   u - readUnits(cdf, units=1, readBases=FALSE, readExpos=FALSE,
 readType=FALSE, readDirection=FALSE)
   u
 $`2315251`
 $`2315251`$groups
 $`2315251`$groups$`2315252`
 $`2315251`$groups$`2315252`$x
 [1] 789 339 879 551

 $`2315251`$groups$`2315252`$y
 [1] 1752 1664  942 2246


 $`2315251`$groups$`2315253`
 $`2315251`$groups$`2315253`$x
 [1] 332 422 392 332

 $`2315251`$groups$`2315253`$y
 [1] 2150  115 1156  478

 ... so if you read your BG adjusted intensities into a matrix, you
 could annotate each row with the probe ID.

 Is that what you had in mind?  If so, hope that gets you started.

 Cheers,
 Mark



 On 3-Nov-09, at 6:50 AM, Libing Wang wrote:

  Hi Mark,
 
  Thanks for your help so far! Now I have a quick question for you. Is
  there any ways to get the probe ID for background corrected probe
  intensities? If I have finish the following steps:
 
  bc - RmaBackgroundCorrection(cs, tag=core,A20071112,EP)
  csBC - process(bc, verbose=verbose)
 
  Thanks!
 
  Libing
 
  On Wed, Jun 17, 2009 at 6:18 PM, Mark Robinson
  mrobin...@wehi.edu.au wrote:
 
 
  Hi Libing.
 
  Doesn't 'addNames=TRUE' already do this for you?
 
 
fs1 - extractDataFrame(fs, units=1:2, addNames=TRUE)
head(fs1[,1:6])
unitName groupName unit group cell huex_wta_breast_A
  1  2315251   23152521 11 1.1150999
  2  2315251   23152531 22 0.9551846
  3  2315373   23153742 13 1.5354252
  4  2315373   23153752 24 0.6288152
  5  2315373   23153762 35 1.5658265
  6  2315373   23153772 46 1.2131032
fs2 - extractDataFrame(fs, units=1:2, addNames=FALSE)
head(fs2[,1:6])
unit group cell huex_wta_breast_A huex_wta_breast_B
  huex_wta_breast_C
  11 11 1.1150999 0.8552212
  0.9177643
  21 22 0.9551846 1.1747438
  0.8580346
  32 13 1.5354252 1.0427089
  1.6461661
  42 24 0.6288152 0.7053325
  0.6999596
  52 35 1.5658265 1.0576524
  1.1404822
  62 46 1.2131032 1.0494679
  0.7729633
 
  If not, please send your entire script and the output of
  sessionInfo().
 
  Cheers,
  Mark
 
 
  On 18/06/2009, at 1:02 AM, Libing Wang wrote:
 
   Hi Mark,
  
   I am wondering if it is possible to get the actual unit
   id(transcript cluster id) and group id(probeset id) for each firma
   score instead of artificial number from 1 to whatever in the firma
   score data frame.
  
   Thanks

Re: [aroma.affymetrix] probe_id in cdf file

2009-12-16 Thread Mark Robinson
Hi Renyi.

No, the 'Probe_ID' column from that example is not used anywhere.   
Really, its just the 'X', 'Y' and then the 2 columns for group and  
unit that are used from the input TXT file.

And yes, X/Y are analogous to pmx/pmy from the bpmap file.

Cheers,
Mark

On 16-Dec-09, at 5:27 AM, Renyi Liu wrote:

 Hi, Mark and Aroma.affymetrix fans,

 I have a quick question: when we create a custom CDF file according to
 the following page, does it matter what number we use for the probe_id
 field? I have seen somebody used (y*DIM+x+1). If it does not matter
 too much, I would like to just use a sequential number. Also I assume
 that x and y refer to pmx and pmy in the bpmap file. Is this
 assumption correct?

 http://groups.google.com/group/aroma-affymetrix/web/creating-cdf-files-from-scratch

 Many thanks for your help.

 Renyi

 -- 
 When reporting problems on aroma.affymetrix, make sure 1) to run the  
 latest version of the package, 2) to report the output of  
 sessionInfo() and traceback(), and 3) to post a complete code example.


 You received this message because you are subscribed to the Google  
 Groups aroma.affymetrix group.
 To post to this group, send email to aroma-affymetrix@googlegroups.com
 To unsubscribe from this group, send email to 
 aroma-affymetrix-unsubscr...@googlegroups.com
 For more options, visit this group at 
 http://groups.google.com/group/aroma-affymetrix?hl=en

--
Mark Robinson, PhD (Melb)
Epigenetics Laboratory, Garvan
Bioinformatics Division, WEHI
e: m.robin...@garvan.org.au
e: mrobin...@wehi.edu.au
p: +61 (0)3 9345 2628
f: +61 (0)3 9347 0852
--






__
The information in this email is confidential and intended solely for the 
addressee.
You must not disclose, forward, print or use it without the permission of the 
sender.
__

-- 
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
aroma.affymetrix group.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe from this group, send email to 
aroma-affymetrix-unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/aroma-affymetrix?hl=en


[aroma.affymetrix] Re: annotation of ST gene Arrays

2009-12-16 Thread Mark Robinson
Hi Wade.

I think the problem lies with the 'ragene10st*probeset*.db' library.

How about trying the symbols from the 'ragene10sttranscriptcluster.db'  
package:
http://www.bioconductor.org/packages/release/data/annotation/html/ragene10sttranscriptcluster.db.html

I can't remember when this change was made, but my 'hugene10st.db'  
example is now outdated.  You should use hugene10stprobeset.db for  
probesets or hugene10sttranscriptcluster.db for transcript clusters.

Hope that helps.

Cheers,
Mark

On 17-Dec-09, at 10:09 AM, Wade D wrote:

 Hi Mark and others,
 I am in a somewhat similar as the original person who started this
 discussion, so I am tacking on my question to your response from
 February.

 This is my first ST analysis, and I am using the Rat gene 1.0 ST. I
 followed the example given at
 
 http://groups.google.com/group/aroma-affymetrix/web/gene-1-0-st-array-analysis
 and everything has worked fine so far.

 Now, I would like to annotate my gene-level summaries. I tried using
 methods I typically do (from the annotate package) with
 ragene10stprobeset.db, but things didn't seem right. So I figured it
 was me, and I came back the group help pages and found your post.
 Mimicking it below, it seems that I've either done something wrong, or
 there is a problem with ragene10stprobeset.db.

 library(ragene10stprobeset.db)
 symbols - unlist(as.list(ragene10stprobesetSYMBOL))
 myids-gExprs[,1]
 head(myids)
 [1] 1071 1073 1074 1075 10700013 10700014

 temp-data.frame(affyid = myids,symbol = symbols[myids])
 #temp[!is.na(temp$symbol),]

 sum(!is.na(temp$symbol))
 [1] 237

 This is a disturbingly low number, so I figure something is amiss.
 Following your lead, I compare the CDF with what is on Affy's website
 in the transcript and probeset files...

 tr - read.csv(RaGene-1_0-st- 
 v1.na30.1.rn4.transcript.csv,header=TRUE,comment.char=#)
 ps - read.csv(RaGene-1_0-st- 
 v1.na30.rn4.probeset.csv,header=TRUE,comment.char=#)
 #chipType - RaGene-1_0-st-v1
 #cdf - AffymetrixCdfFile$byChipType(chipType, tags=r3)
 un - getUnitNames(cdf)
 sum( un %in% ps$transcript_cluster_id )
 [1] 27342

 sum( un %in% tr$transcript_cluster_id )
 [1] 29169

 Everything looks reasonable here.

 sum(names(symbols) %in% ps$transcript_cluster_id )
 [1] 0
 sum(names(symbols) %in% tr$transcript_cluster_id )
 [1] 1872

 This is the problem it seems.

 I wanted to ask others before I build my own annotation.db for
 ragene10st. I've done it for Illumina arrays before, but it has been
 awhile, and it is a little bit of a pain for Windows users to do. Just
 wanted to get a second opinion before I go down that road, especially
 since this is my first time dealing with ST arrays.

 Thanks,
 Wade




 On Feb 10, 3:13 am, Mark Robinson mrobin...@wehi.edu.au wrote:
 Hi Simon.

 See comments below.

 I am using the mouse  gene ST arrays and am having problems with
 annotation. When i write a csv file, theannotationis only the
 probeset_id, no gene names or accession numbers etc.

 That's what it should be.  Actually, its the 'transcript_cluster_id'.
 Previously, Affy did not provideannotationat the probeset level.

 The CDF file just contains the identifiers.  Linking results (e.g.
 expression summaries) to theannotationcan be done with other R
 packages.  For example, here is some code I gave Sebastien a few  
 weeks
 ago that will get you started (just replace hugene10st.db with
 mogene10st.db):

 -
 Say you have some Affy identifiers:

   myids
 [1] 7950136 7955845 7955852 7955855 7955858 7955865
 7955869
 [8] 7955873 7955887 8016433

 Load package and read off the gene symbols:

   library(hugene10st.db)
   symbols - unlist(as.list(hugene10stSYMBOL))
   data.frame(affyid = myids,symbol = symbols[myids])
  affyid symbol
 7950136 7950136 PHOX2A
 7955845 7955845 HOXC13
 7955852 7955852 HOXC12
 7955855 7955855 HOXC11
 7955858 7955858 HOXC10
 7955865 7955865  HOXC9
 7955869 7955869  HOXC8
 7955873 7955873  HOXC6
 7955887 7955887  HOXC5
 8016433 8016433  HOXB1

 Here are some other fields in hugene10st.db:

   hugene10st
 hugene10st   hugene10stCHRLENGTHS
 hugene10stENTREZID   hugene10stGO2ALLPROBES   hugene10stORGANISM
 hugene10stPMID2PROBE hugene10stUNIPROT
 hugene10st.db::  hugene10stCHRLOC
 hugene10stENZYME hugene10stGO2PROBE   hugene10stPATH
 hugene10stPROSITEhugene10st_dbInfo
 hugene10stACCNUM hugene10stCHRLOCEND
 hugene10stENZYME2PROBE   hugene10stMAP 
 hugene10stPATH2PROBE
 hugene10stREFSEQ hugene10st_dbconn
 hugene10stALIAS2PROBEhugene10stENSEMBL
 hugene10stGENENAME   hugene10stMAPCOUNTS  hugene10stPFAM
 hugene10stSYMBOL hugene10st_dbfile
 hugene10stCHRhugene10stENSEMBL2PROBE
 hugene10stGO hugene10stOMIM   hugene10stPMID
 hugene10stUNIGENEhugene10st_dbschema

 ...
 -

 These probesets
 also do not match the probeset_ids from MoGene-1_0-st-v1.na27.mm9

Re: [aroma.affymetrix] Re: custom CDF

2009-12-11 Thread Mark Robinson
Hi Zaid.

There is an example flat file (well, 10 lines of it) at that page that
Pierre suggested.  You'll want to filter the missing lines and make sure
all the data in a column is of the same type.

Cheers,
Mark

 No.

 My flat file has some lines with missing values and non integer
 values.
 Is there an example flat file that I could try out? Or some database
 that I could download them from?

 Thanks for your help

 On Dec 9, 10:08 pm, Pierre Neuvial pie...@stat.berkeley.edu wrote:
 Hi Zaid,

 Does your file satisfy the requirements detailed on the corresponding
 help page ?

 http://groups.google.com/group/aroma-affymetrix/web/creating-cdf-file...

 Pierre



 On Wed, Dec 9, 2009 at 4:18 PM, zaid z...@genomedx.com wrote:
  Hello,

  I'm trying to create a custom CDF file from a flat file uisng the R
  script provided in this group (flat2Cdf()).

  I'm running into errors such as incorrect number of columns, integer
  not found etc.

  Is there a standard flat file structure required? Or are there any
  flat files available for download?
  I just want to try the script and have a standard structure to work
  with.

  Thanks for the help.

  Z

  --
  When reporting problems on aroma.affymetrix, make sure 1) to run the
 latest version of the package, 2) to report the output of
 sessionInfo() and traceback(), and 3) to post a complete code example.

  You received this message because you are subscribed to the Google
 Groups aroma.affymetrix group.
  To post to this group, send email to aroma-affymetrix@googlegroups.com
  To unsubscribe from this group, send email to
 aroma-affymetrix-unsubscr...@googlegroups.com
  For more options, visit this group
 athttp://groups.google.com/group/aroma-affymetrix?hl=en- Hide quoted
 text -

 - Show quoted text -

 --
 When reporting problems on aroma.affymetrix, make sure 1) to run the
 latest version of the package, 2) to report the output of sessionInfo()
 and traceback(), and 3) to post a complete code example.


 You received this message because you are subscribed to the Google Groups
 aroma.affymetrix group.
 To post to this group, send email to aroma-affymetrix@googlegroups.com
 To unsubscribe from this group, send email to
 aroma-affymetrix-unsubscr...@googlegroups.com
 For more options, visit this group at
 http://groups.google.com/group/aroma-affymetrix?hl=en




__
The information in this email is confidential and intended solely for the 
addressee.
You must not disclose, forward, print or use it without the permission of the 
sender.
__

-- 
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
aroma.affymetrix group.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe from this group, send email to 
aroma-affymetrix-unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/aroma-affymetrix?hl=en


Re: [aroma.affymetrix] Problem generating CDF file (Arabidopsis)

2009-12-06 Thread Mark Robinson
Hi Renyi.

The reason we require startpos  0 is that many of the BPMAP files  
we've looked at have control antigenomic probes in them and these all  
have startpos=0, so this was an easy way to filter them out.  I think  
you are right though that startpos could start at 0, so your  
workaround of setting it to 1 should be fine.

Of course, the bpmapCluster2Cdf() script is not meant to be a cure-all  
for everyone's needs.  The source code is of course available to you  
should you wish to do something different.

Cheers,
Mark

On 6-Dec-09, at 10:32 AM, Renyi Liu wrote:

 Hi, Mark,

 Thanks for your quick reply and your suggestion. You guessed right:
 the first startpos is 0 (the probe is obviously mapped right at the
 beginning of the chromosome). I will change that number to 1 for now,
 but in my understanding, startpos in bpmap file does start at 0 (not
 1), why your script does not allow it?

 Thanks,

 Renyi

 On Sat, Dec 5, 2009 at 2:43 PM, Mark Robinson  
 mrobin...@wehi.edu.au wrote:
 Hi Renyi.

 One thing to do is check that the genome positions in your BPMAP file
 for chr5 are all 0.

 To do this, try:

 library(affxparser)
 bp - readBpmap(At35b_MR_v04-2_TAIR9_unique.bpmap)

 z - lapply(bp, FUN=function(u) {
   print(u$seqInfo[c(groupname,fullname)]);
   cat(sum(u$startpos=0),\n---\n)
 })

 If you see a non-zero number next to chr5, then that is the problem
 and you'll have to remove those when you create the custom BPMAP.
 Otherwise, post the output of that command and we'll have to
 investigate further.

 Cheers,
 Mark

 On 6-Dec-09, at 8:55 AM, Renyi Liu wrote:

 Hi, All,

 I am trying to generating a CDF file for Arabidopsis Tiling array  
 1.0R
 from a custom bpmap file that I created (it contains only probes  
 that
 map to a single location to the TAIR9 genome, no control  
 probesets). I
 used bpmapCluster2Cdf script with the following command:

 bpmapCluster2Cdf(At35b_MR_v04-2_TAIR9_unique.bpmap,
 At35b_MR_v04,rows=2560,cols=2560,groupName=At, verbose=-20)

 It works well for all chromosomes except chr5 because there is a
 message saying Skipping all 657459 probes for At:TAIR9;chr5. I
 certainly do not want to skip a whole chromosome. Can you please  
 tell
 me what is going on and how I can correct it?

 Thanks,

 Renyi

 --
 
 Renyi Liu, PhD
 Assistant Professor
 Department of Botany and Plant Sciences
 3109 Batchelor Hall
 University of California, Riverside
 Riverside, CA 92521
 Email: renyi@ucr.edu
 Phone: (951)827-3987
 Fax: (951)827-4437

 --
 When reporting problems on aroma.affymetrix, make sure 1) to run the
 latest version of the package, 2) to report the output of
 sessionInfo() and traceback(), and 3) to post a complete code  
 example.


 You received this message because you are subscribed to the Google
 Groups aroma.affymetrix group.
 To post to this group, send email to aroma-affymetrix@googlegroups.com
 To unsubscribe from this group, send email to 
 aroma-affymetrix-unsubscr...@googlegroups.com
 For more options, visit this group at 
 http://groups.google.com/group/aroma-affymetrix?hl=en

 --
 Mark Robinson, PhD (Melb)
 Epigenetics Laboratory, Garvan
 Bioinformatics Division, WEHI
 e: m.robin...@garvan.org.au
 e: mrobin...@wehi.edu.au
 p: +61 (0)3 9345 2628
 f: +61 (0)3 9347 0852
 --





 --
 When reporting problems on aroma.affymetrix, make sure 1) to run  
 the latest version of the package, 2) to report the output of  
 sessionInfo() and traceback(), and 3) to post a complete code  
 example.


 You received this message because you are subscribed to the Google  
 Groups aroma.affymetrix group.
 To post to this group, send email to aroma- 
 affymet...@googlegroups.com
 To unsubscribe from this group, send email to 
 aroma-affymetrix-unsubscr...@googlegroups.com
 For more options, visit this group at 
 http://groups.google.com/group/aroma-affymetrix?hl=en

 -- 
 When reporting problems on aroma.affymetrix, make sure 1) to run the  
 latest version of the package, 2) to report the output of  
 sessionInfo() and traceback(), and 3) to post a complete code example.


 You received this message because you are subscribed to the Google  
 Groups aroma.affymetrix group.
 To post to this group, send email to aroma-affymetrix@googlegroups.com
 To unsubscribe from this group, send email to 
 aroma-affymetrix-unsubscr...@googlegroups.com
 For more options, visit this group at 
 http://groups.google.com/group/aroma-affymetrix?hl=en

--
Mark Robinson, PhD (Melb)
Epigenetics Laboratory, Garvan
Bioinformatics Division, WEHI
e: m.robin...@garvan.org.au
e: mrobin...@wehi.edu.au
p: +61 (0)3 9345 2628
f: +61 (0)3 9347 0852
--





-- 
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You

[aroma.affymetrix] Re: more question on tiling array

2009-12-06 Thread Mark Robinson
Hi Renyi.

Even though your questions aren't aroma.affymetrix-specific, I'll send  
my responses to the aroma.affymetrix list in case others have the same  
questions.

Comments below.

On 6-Dec-09, at 6:46 PM, Renyi Liu wrote:

 Hi, Mark,

 Thanks again for helping me with my previous question on CDF file. I
 have read your article on Promoter 1.0R Tiling array analysis and
 would like to apply MAT normalization to my dataset. Again, my dataset
 is from Arabidopsis and I need to create my own CDF file. For MAT
 analysis, we need to put copy number of probes into the bpmap file
 (and the CDF file). If you got a chance, I'd appreciate your help on a
 couple of more questions:

You may be able to use the BPMAP file for Arabidopsis from the MAT  
website:

http://liulab.dfci.harvard.edu/MAT/Download.htm

... that would already have copy number in there.

 (1) should I filter out probes with copy number 10? (their xMAN paper
 states that they remove probes with copy number 10)

Yes, that's what they do for the BPMAP files at the MAT website.

 (2) for probes that map to multiple locations, how many entries need
 to included? (i.e. should I just randomly choose one of the mapping
 locations and put in startpos?). If multiple locations are included,
 the total number of probes will be much larger than the original
 affymetrix bpmap file.

You would want to keep all of them I guess, except for the 10 hits  
that you've filtered.  So, yes, you would end up with more probes than  
what you started with, but I'd expect that it wouldn't be *much*  
larger, since (at least for human) only a small percentages of probes  
map to multiple locations.

 (3) for MeDIP-chip data, which normalization method is the best choice
 in your opinion?

We typically use MAT for normalization, but you might be interested is  
this comparison:

http://www.biomedcentral.com/1471-2105/10/204

Hope that helps.

Cheers,
Mark


 Many thanks,

 Renyi

 -- 
 
 Renyi Liu, PhD
 Assistant Professor
 Department of Botany and Plant Sciences
 3109 Batchelor Hall
 University of California, Riverside
 Riverside, CA 92521
 Email: renyi@ucr.edu
 Phone: (951)827-3987
 Fax: (951)827-4437

--
Mark Robinson, PhD (Melb)
Epigenetics Laboratory, Garvan
Bioinformatics Division, WEHI
e: m.robin...@garvan.org.au
e: mrobin...@wehi.edu.au
p: +61 (0)3 9345 2628
f: +61 (0)3 9347 0852
--





-- 
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
aroma.affymetrix group.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe from this group, send email to 
aroma-affymetrix-unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/aroma-affymetrix?hl=en


Re: [aroma.affymetrix] Debate sobre gene-1-0-st-array-analysis

2009-12-04 Thread Mark Robinson
Hi Germán.

Try this link:
http://www.biomedcentral.com/1471-2105/10/156

Cheers,
Mark

On 5-Dec-09, at 12:33 AM, Germán González wrote:

 There has been some recent work that suggests you can use the Gene
 arrays to do splicing analysis.

 Can you give me sites or any references about that?

 thanks, german

 --  
 When reporting problems on aroma.affymetrix, make sure 1) to run the  
 latest version of the package, 2) to report the output of  
 sessionInfo() and traceback(), and 3) to post a complete code example.


 You received this message because you are subscribed to the Google  
 Groups aroma.affymetrix group.
 To post to this group, send email to aroma-affymetrix@googlegroups.com
 To unsubscribe from this group, send email to 
 aroma-affymetrix-unsubscr...@googlegroups.com
 For more options, visit this group at 
 http://groups.google.com/group/aroma-affymetrix?hl=en

--
Mark Robinson, PhD (Melb)
Epigenetics Laboratory, Garvan
Bioinformatics Division, WEHI
e: m.robin...@garvan.org.au
e: mrobin...@wehi.edu.au
p: +61 (0)3 9345 2628
f: +61 (0)3 9347 0852
--





-- 
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
aroma.affymetrix group.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe from this group, send email to 
aroma-affymetrix-unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/aroma-affymetrix?hl=en


Re: [aroma.affymetrix] Re: exon array analysis errors

2009-12-04 Thread Mark Robinson
Hi Yu Chuan.

As I mentioned before, I was unable to reproduce your error from a  
test dataset on my system.  Its hard to know the history of your  
environment, so what I suggest you do is start from a brand new  
session, and run the commands start to finish.  There are a couple  
ways to do this.  By brut strength, you could delete the relevant  
plmData/ and probeData/ directories and when you run the commands,  
aroma.affymetrix will just recreate them.  Alternatively (and more  
elegantly?), you can add a 'force=TRUE' argument to all the process()  
and fit() commands that you use.

Hope that helps.

Cheers,
Mark

On 2-Dec-09, at 11:59 AM, Yu Chuan wrote:

 Mark,

 I pulled out some chip effectsthe NUSE are still all 0s.

 cesTr - getChipEffectSet(plmTr)

 trFit - extractDataFrame(cesTr, units=1:3, addNames=TRUE)
 dim(trFit)
 [1]  3 13
 trFit
  unitName groupName unit group cell 20091119_Colon4_Exon2
 1  2315251   23152521 11  21.13534
 2  2315373   23153742 13  21.74671
 3  2315554   23155863 17  22.94160
  20091119_Colon4_Exon3 20091119_UBR_Exon1 20091119_UBR_Exon2
 1  22.23353   21.05928   21.57542
 2  21.51784   21.39991   22.58863
 3  22.78790   22.42114   23.21064
  20091119_UBR_Exon3 20091119_UHR_Exon1 20091119_UHR_Exon2
 20091119_UHR_Exon3
 1   21.29856   22.37293   22.17496
 21.97057
 2   21.72278   22.74162   22.76515
 22.23168
 3   22.39577   23.52349   23.74061
 22.97983

 qamTr - QualityAssessmentModel(plmTr)

 z - plotNuse(qamTr)
 z
 $`20091119_Colon4_Exon2`
 $`20091119_Colon4_Exon2`$stats
 [1] 0 0 0 0 0

 $`20091119_Colon4_Exon2`$n
 [1] 18705

 $`20091119_Colon4_Exon2`$conf
 [1] 0 0

 $`20091119_Colon4_Exon2`$out
 numeric(0)


 $`20091119_Colon4_Exon3`
 $`20091119_Colon4_Exon3`$stats
 [1] 0 0 0 0 0

 $`20091119_Colon4_Exon3`$n
 [1] 18705

 $`20091119_Colon4_Exon3`$conf
 [1] 0 0

 $`20091119_Colon4_Exon3`$out
 numeric(0)


 $`20091119_UBR_Exon1`
 $`20091119_UBR_Exon1`$stats
 [1] 0 0 0 0 0

 $`20091119_UBR_Exon1`$n
 [1] 18705

 $`20091119_UBR_Exon1`$conf
 [1] 0 0

 $`20091119_UBR_Exon1`$out
 numeric(0)


 $`20091119_UBR_Exon2`
 $`20091119_UBR_Exon2`$stats
 [1] 0 0 0 0 0

 $`20091119_UBR_Exon2`$n
 [1] 18705

 $`20091119_UBR_Exon2`$conf
 [1] 0 0

 $`20091119_UBR_Exon2`$out
 numeric(0)


 $`20091119_UBR_Exon3`
 $`20091119_UBR_Exon3`$stats
 [1] 0 0 0 0 0

 $`20091119_UBR_Exon3`$n
 [1] 18705

 $`20091119_UBR_Exon3`$conf
 [1] 0 0

 $`20091119_UBR_Exon3`$out
 numeric(0)


 $`20091119_UHR_Exon1`
 $`20091119_UHR_Exon1`$stats
 [1] 0 0 0 0 0

 $`20091119_UHR_Exon1`$n
 [1] 18705

 $`20091119_UHR_Exon1`$conf
 [1] 0 0

 $`20091119_UHR_Exon1`$out
 numeric(0)


 $`20091119_UHR_Exon2`
 $`20091119_UHR_Exon2`$stats
 [1] 0 0 0 0 0

 $`20091119_UHR_Exon2`$n
 [1] 18705

 $`20091119_UHR_Exon2`$conf
 [1] 0 0

 $`20091119_UHR_Exon2`$out
 numeric(0)


 $`20091119_UHR_Exon3`
 $`20091119_UHR_Exon3`$stats
 [1] 0 0 0 0 0

 $`20091119_UHR_Exon3`$n
 [1] 18705

 $`20091119_UHR_Exon3`$conf
 [1] 0 0

 $`20091119_UHR_Exon3`$out
 numeric(0)


 attr(,type)
 [1] NUSE


 On Nov 30, 1:28 am, Mark Robinson mrobin...@wehi.edu.au wrote:
 Hi Yu Chuan.

 I'm still mystified by this.  Can you check that the PLM was
 successfully fit?  Pull out some chip effects, maybe.

 Cheers,
 Mark

 On 25-Nov-09, at 4:51 AM, Yu Chuan wrote:



 Mark,

 I think the below info. may help too. Looks like all the gene-level
 NUSE are 0. How could this happen?

 z - plotNuse(qamTr)
 z
 $`20091119_Colon4_Exon2`
 $`20091119_Colon4_Exon2`$stats
 [1] 0 0 0 0 0

 $`20091119_Colon4_Exon2`$n
 [1] 18705

 $`20091119_Colon4_Exon2`$conf
 [1] 0 0

 $`20091119_Colon4_Exon2`$out
 numeric(0)

 $`20091119_Colon4_Exon3`
 $`20091119_Colon4_Exon3`$stats
 [1] 0 0 0 0 0

 $`20091119_Colon4_Exon3`$n
 [1] 18705

 $`20091119_Colon4_Exon3`$conf
 [1] 0 0

 $`20091119_Colon4_Exon3`$out
 numeric(0)

 $`20091119_UBR_Exon1`
 $`20091119_UBR_Exon1`$stats
 [1] 0 0 0 0 0

 $`20091119_UBR_Exon1`$n
 [1] 18705

 $`20091119_UBR_Exon1`$conf
 [1] 0 0

 $`20091119_UBR_Exon1`$out
 numeric(0)

 $`20091119_UBR_Exon2`
 $`20091119_UBR_Exon2`$stats
 [1] 0 0 0 0 0

 $`20091119_UBR_Exon2`$n
 [1] 18705

 $`20091119_UBR_Exon2`$conf
 [1] 0 0

 $`20091119_UBR_Exon2`$out
 numeric(0)

 $`20091119_UBR_Exon3`
 $`20091119_UBR_Exon3`$stats
 [1] 0 0 0 0 0

 $`20091119_UBR_Exon3`$n
 [1] 18705

 $`20091119_UBR_Exon3`$conf
 [1] 0 0

 $`20091119_UBR_Exon3`$out
 numeric(0)

 $`20091119_UHR_Exon1`
 $`20091119_UHR_Exon1`$stats
 [1] 0 0 0 0 0

 $`20091119_UHR_Exon1`$n
 [1] 18705

 $`20091119_UHR_Exon1`$conf
 [1] 0 0

 $`20091119_UHR_Exon1`$out
 numeric(0)

 $`20091119_UHR_Exon2`
 $`20091119_UHR_Exon2`$stats
 [1] 0 0 0 0 0

 $`20091119_UHR_Exon2`$n
 [1] 18705

 $`20091119_UHR_Exon2`$conf
 [1] 0 0

 $`20091119_UHR_Exon2`$out
 numeric(0)

 $`20091119_UHR_Exon3`
 $`20091119_UHR_Exon3`$stats

Re: [aroma.affymetrix] Re: apt- affymetrix power tool

2009-11-30 Thread Mark Robinson
Hi Elai/Zaid.

You'll want to be careful with all this (i.e. linear models on  
unnormalized data ... or maybe you are standardizing some other way),  
but yes you can run the probe level model fits on any AffymetrixCelSet  
object.

The standard RMA procedure would BG adjust, then quantile normalize,  
then fit the PLMs:
cs - AffymetrixCelSet$byName(BCGC_2006, cdf=cdf)
bc - RmaBackgroundCorrection(cs, tag=coreR2)
csBC - process(bc,verbose=verbose)
qn - QuantileNormalization(csBC, typesToUpdate=pm)
csN - process(qn, verbose=verbose)
plmTr - ExonRmaPlm(csN, mergeGroups=TRUE)
fit(plmTr, verbose=verbose)

You could replace this simply with:

csX - AffymetrixCelSet$byName(BCGC_2006, cdf=cdf)
[...maybe do something else to csX...]
plmTr - ExonRmaPlm(csX, mergeGroups=TRUE)  # put in the  
AffymetrixCelSet here you want to fit PLM on
fit(plmTr, verbose=verbose)

Cheers,
Mark


On 30-Nov-09, at 3:32 PM, davic...@gmail.com wrote:

 Henrik
 Is it possible to use aroma to run an RMA implementation without
 quantile normalization? Zaid- have you tried this?
 Best,
 Elai
 CSO
 GenomeDx Biosciences

 On Nov 28, 5:51 am, Henrik Bengtsson h...@stat.berkeley.edu wrote:
 Hi Zaid,

 I think you have mistaken the aroma.affymetrix mailing list as  
 being a
 mailing list for Affymetrix software - this forum is only for
 aroma.affymetrix related topics.  Please use the appropriate official
 Affymetrix forum for their APT software:

  https://www.affymetrix.com/community/forums/index.jspa

 That way you also know you will get the correct answer from the
 correct source.  Your question might even have been answered there
 before (I don't know).

 /HenrikOn Thu, Nov 26, 2009 at 9:22 PM, zaid z...@genomedx.com  
 wrote:
 I tried running the 64 bit version of the command tool apt. I was  
 not
 able to find any information to run the command with no  
 normalization.

 I tried many different commands such as: apt-probeset-summarize -a  
 rma-
 bg,pm-only,sea

 Any ideas on how I can run that tool with no normlization.

 Thanks

 --
 When reporting problems on aroma.affymetrix, make sure 1) to run  
 the latest version of the package, 2) to report the output of  
 sessionInfo() and traceback(), and 3) to post a complete code  
 example.

 You received this message because you are subscribed to the Google  
 Groups aroma.affymetrix group.
 To post to this group, send email to aroma-affymetrix@googlegroups.com
 To unsubscribe from this group, send email to 
 aroma-affymetrix-unsubscr...@googlegroups.com
 For more options, visit this group 
 athttp://groups.google.com/group/aroma-affymetrix?hl=en

 -- 
 When reporting problems on aroma.affymetrix, make sure 1) to run the  
 latest version of the package, 2) to report the output of  
 sessionInfo() and traceback(), and 3) to post a complete code example.


 You received this message because you are subscribed to the Google  
 Groups aroma.affymetrix group.
 To post to this group, send email to aroma-affymetrix@googlegroups.com
 To unsubscribe from this group, send email to 
 aroma-affymetrix-unsubscr...@googlegroups.com
 For more options, visit this group at 
 http://groups.google.com/group/aroma-affymetrix?hl=en

--
Mark Robinson, PhD (Melb)
Epigenetics Laboratory, Garvan
Bioinformatics Division, WEHI
e: m.robin...@garvan.org.au
e: mrobin...@wehi.edu.au
p: +61 (0)3 9345 2628
f: +61 (0)3 9347 0852
--





-- 
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
aroma.affymetrix group.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe from this group, send email to 
aroma-affymetrix-unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/aroma-affymetrix?hl=en


Re: [aroma.affymetrix] Re: exon array analysis errors

2009-11-30 Thread Mark Robinson
 -  
 QualityAssessmentModel(plmTr)
 plotNuse(qamTr)
 plotRle(qamTr)

 Error in plot.window(xlim = xlim, ylim = ylim, log = log, yaxs = pars
 $yaxs) :
   need finite 'ylim' values
 In addition: There were 16 warnings (use warnings() to see them)  
 qamEx - QualityAssessmentModel(plmEx)
 plotNuse(qamEx)
 plotRle(qamEx)
 z - plotNuse(qamTr)
 plotBoxplotStats(z, ylim=c(-0.01,0.01))
 sessionInfo()

 R version 2.9.2 (2009-08-24)
 i386-pc-mingw32

 locale:
 LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.
 1252;LC_MONETARY=English_United States.
 1252;LC_NUMERIC=C;LC_TIME=English_United States.1252

 attached base packages:
 [1] stats graphics  grDevices utils datasets  methods
 base

 other attached packages:
  [1] aroma.affymetrix_1.2.0 aroma.apd_0.1.6
 affxparser_1.16.0
  [4] R.huge_0.1.9   aroma.core_1.2.0
 aroma.light_1.12.2
  [7] matrixStats_0.1.6  R.rsp_0.3.6
 R.filesets_0.5.3
 [10] digest_0.4.1   R.cache_0.1.9
 R.utils_1.2.0
 [13] R.oo_1.5.0 R.methodsS3_1.0.3 traceback()

 8: plot.window(xlim = xlim, ylim = ylim, log = log, yaxs = pars$yaxs)
 7: bxp(bxpStats, ylim = ylim, outline = outline, las = las, ...)
 6: plotBoxplotStats.list(stats, main = main, ylab = ylab, ...)
 5: plotBoxplotStats(stats, main = main, ylab = ylab, ...)
 4: plotBoxplot.ChipEffectSet(ces, type = RLE, ...)
 3: plotBoxplot(ces, type = RLE, ...)
 2: plotRle.QualityAssessmentModel(qamTr)
 1: plotRle(qamTr)

 On Nov 24, 1:50 am, Mark Robinson mrobin...@wehi.edu.au wrote:

 Hi Yu Chuan.

 Comments below.

 On 24-Nov-09, at 1:00 PM, Yu Chuan wrote:

 Hi,

 I am pre-processing 8 exon arrays (Hu-Ex-1_0-st-v2) and doing  
 quality
 assessment. When I plotted the NUSE using plotNUSE, I found that  
 the
 y-
 axis limit is too wide, such that the boxplots were all squeezed
 tightly around 0 and it's hard to see what's going on there. Is  
 there
 any way I can change the y-axis limit? I tried

 I assume you mean tightly around 1?  That's where they should be.

 plotNuse(qamTr,ylim=c(-0.2,0.2))
 Error in boxplot.stats(stdvs/medianSE, ...) :
  unused argument(s) (ylim = c(-0.2, 0.2))

 An easy work-around for this is:

 z - plotNuse(qamTr)
 plotBoxplotStats(z, ylim=c(.5,2))

 I'm unable to recreate these errors below on a local dataset.  They
 all work fine for me.  Here is my complete set of commands from a
 fresh R session, as described in the exon array vignette page:

 http://groups.google.com/group/aroma-affymetrix/web/human-exon-array- 
 ...

 
 library(aroma.affymetrix)
 cdf - AffymetrixCdfFile$byChipType(chipType,
 tags=coreR3,A20071112,EP)
 cs - AffymetrixCelSet$byName(tissues, cdf=cdf)
 setCdf(cs,cdf)

 bc - RmaBackgroundCorrection(cs, tag=coreR2)
 csBC - process(bc,verbose=verbose)

 qn - QuantileNormalization(csBC, typesToUpdate=pm)
 csN - process(qn, verbose=verbose)

 plmTr - ExonRmaPlm(csN, mergeGroups=TRUE)
 fit(plmTr, verbose=verbose)

 rs - calculateResiduals(plmTr, verbose=verbose)
 qamTr - QualityAssessmentModel(plmTr)
 plotNuse(qamTr)

 z - plotNuse(qamTr)
 plotBoxplotStats(z, ylim=c(.5,2))

 plotRle(z)
 

 Have you fit the probe level model in advance of these commands?
 Given that your NUSE values are tightly around 0, I suspect maybe
 not.  Otherwise, can you give a complete code example, and maybe run
 it from a fresh R session and check whether that solves your  
 problem.
 And, as usual, if you get an error, the output of traceback() is  
 much
 appreciated ... and of course, the output of your sessionInfo().

 Hope that helps.

 Cheers,
 Mark

 ps. my sessionInfo():

   sessionInfo()
 R version 2.10.0 (2009-10-26)
 i386-apple-darwin9.8.0

 locale:
 [1] en_CA.UTF-8/en_CA.UTF-8/C/C/en_CA.UTF-8/en_CA.UTF-8

 attached base packages:
 [1] stats graphics  grDevices utils datasets  methods   base

 other attached packages:
   [1] preprocessCore_1.7.9   Biobase_2.5.8
 aroma.affymetrix_1.2.0
   [4] aroma.apd_0.1.7affxparser_1.17.5  R.huge_0.2.0
   [7] aroma.core_1.2.0   aroma.light_1.13.5  
 matrixStats_0.1.6
 [10] R.rsp_0.3.6R.filesets_0.5.3   digest_0.4.1
 [13] R.cache_0.2.0  R.utils_1.2.2  R.oo_1.6.2
 [16] affy_1.23.12   R.methodsS3_1.0.3

 loaded via a namespace (and not attached):
 [1] affyio_1.13.5

 But it didn't work. In addition, I got the following error when I  
 used
 plotRLE

 plotRle(qamTr)
 Error in plot.window(xlim = xlim, ylim = ylim, log = log, yaxs =  
 pars
 $yaxs) :
  need finite 'ylim' values
 In addition: There were 16 warnings (use warnings() to see them)
 warnings()
 Warning messages:
 1: In min(x) : no non-missing arguments to min; returning Inf
 2: In max(x) : no non-missing arguments to max; returning -Inf
 3: In min(x) : no non-missing arguments to min; returning Inf
 4: In max(x) : no non-missing arguments to max; returning -Inf
 5: In min(x) : no non-missing arguments to min; returning Inf
 6: In max(x) : no non-missing arguments to max; returning -Inf

Re: [aroma.affymetrix] exon array analysis errors

2009-11-24 Thread Mark Robinson
Hi Yu Chuan.

Comments below.

On 24-Nov-09, at 1:00 PM, Yu Chuan wrote:

 Hi,

 I am pre-processing 8 exon arrays (Hu-Ex-1_0-st-v2) and doing quality
 assessment. When I plotted the NUSE using plotNUSE, I found that the  
 y-
 axis limit is too wide, such that the boxplots were all squeezed
 tightly around 0 and it's hard to see what's going on there. Is there
 any way I can change the y-axis limit? I tried

I assume you mean tightly around 1?  That's where they should be.


 plotNuse(qamTr,ylim=c(-0.2,0.2))
 Error in boxplot.stats(stdvs/medianSE, ...) :
  unused argument(s) (ylim = c(-0.2, 0.2))

An easy work-around for this is:

z - plotNuse(qamTr)
plotBoxplotStats(z, ylim=c(.5,2))

I'm unable to recreate these errors below on a local dataset.  They  
all work fine for me.  Here is my complete set of commands from a  
fresh R session, as described in the exon array vignette page:

http://groups.google.com/group/aroma-affymetrix/web/human-exon-array-analysis


library(aroma.affymetrix)
cdf - AffymetrixCdfFile$byChipType(chipType,  
tags=coreR3,A20071112,EP)
cs - AffymetrixCelSet$byName(tissues, cdf=cdf)
setCdf(cs,cdf)

bc - RmaBackgroundCorrection(cs, tag=coreR2)
csBC - process(bc,verbose=verbose)

qn - QuantileNormalization(csBC, typesToUpdate=pm)
csN - process(qn, verbose=verbose)

plmTr - ExonRmaPlm(csN, mergeGroups=TRUE)
fit(plmTr, verbose=verbose)

rs - calculateResiduals(plmTr, verbose=verbose)
qamTr - QualityAssessmentModel(plmTr)
plotNuse(qamTr)

z - plotNuse(qamTr)
plotBoxplotStats(z, ylim=c(.5,2))

plotRle(z)



Have you fit the probe level model in advance of these commands?   
Given that your NUSE values are tightly around 0, I suspect maybe  
not.  Otherwise, can you give a complete code example, and maybe run  
it from a fresh R session and check whether that solves your problem.   
And, as usual, if you get an error, the output of traceback() is much  
appreciated ... and of course, the output of your sessionInfo().

Hope that helps.

Cheers,
Mark

ps. my sessionInfo():

  sessionInfo()
R version 2.10.0 (2009-10-26)
i386-apple-darwin9.8.0

locale:
[1] en_CA.UTF-8/en_CA.UTF-8/C/C/en_CA.UTF-8/en_CA.UTF-8

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

other attached packages:
  [1] preprocessCore_1.7.9   Biobase_2.5.8   
aroma.affymetrix_1.2.0
  [4] aroma.apd_0.1.7affxparser_1.17.5  R.huge_0.2.0
  [7] aroma.core_1.2.0   aroma.light_1.13.5 matrixStats_0.1.6
[10] R.rsp_0.3.6R.filesets_0.5.3   digest_0.4.1
[13] R.cache_0.2.0  R.utils_1.2.2  R.oo_1.6.2
[16] affy_1.23.12   R.methodsS3_1.0.3

loaded via a namespace (and not attached):
[1] affyio_1.13.5





 But it didn't work. In addition, I got the following error when I used
 plotRLE

 plotRle(qamTr)
 Error in plot.window(xlim = xlim, ylim = ylim, log = log, yaxs = pars
 $yaxs) :
  need finite 'ylim' values
 In addition: There were 16 warnings (use warnings() to see them)
 warnings()
 Warning messages:
 1: In min(x) : no non-missing arguments to min; returning Inf
 2: In max(x) : no non-missing arguments to max; returning -Inf
 3: In min(x) : no non-missing arguments to min; returning Inf
 4: In max(x) : no non-missing arguments to max; returning -Inf
 5: In min(x) : no non-missing arguments to min; returning Inf
 6: In max(x) : no non-missing arguments to max; returning -Inf
 7: In min(x) : no non-missing arguments to min; returning Inf
 8: In max(x) : no non-missing arguments to max; returning -Inf
 9: In min(x) : no non-missing arguments to min; returning Inf
 10: In max(x) : no non-missing arguments to max; returning -Inf
 11: In min(x) : no non-missing arguments to min; returning Inf
 12: In max(x) : no non-missing arguments to max; returning -Inf
 13: In min(x) : no non-missing arguments to min; returning Inf
 14: In max(x) : no non-missing arguments to max; returning -Inf
 15: In min(x) : no non-missing arguments to min; returning Inf
 16: In max(x) : no non-missing arguments to max; returning -Inf

 Any idea about how to fix this? Thanks!
 Yu Chuan

 -- 
 When reporting problems on aroma.affymetrix, make sure 1) to run the  
 latest version of the package, 2) to report the output of  
 sessionInfo() and traceback(), and 3) to post a complete code example.


 You received this message because you are subscribed to the Google  
 Groups aroma.affymetrix group.
 To post to this group, send email to aroma-affymetrix@googlegroups.com
 To unsubscribe from this group, send email to 
 aroma-affymetrix-unsubscr...@googlegroups.com
 For more options, visit this group at 
 http://groups.google.com/group/aroma-affymetrix?hl=en

--
Mark Robinson, PhD (Melb)
Epigenetics Laboratory, Garvan
Bioinformatics Division, WEHI
e: m.robin...@garvan.org.au
e: mrobin...@wehi.edu.au
p: +61 (0)3 9345 2628
f: +61 (0)3 9347 0852
--





-- 
When reporting problems

Re: [aroma.affymetrix] how to analysis the FIRMA score

2009-11-24 Thread Mark Robinson
Hi Jiang.

A couple quick comments below.

On 20-Nov-09, at 10:07 AM, camelbbs wrote:

 Hi,
 After I got the firma scores, how can i analyze it. I see boxplot of
 firma scores in the paper. So how i can get the same result.

You can use the boxplot() command on your matrix of FIRMA scores?   
That's how the boxplot in the paper was made.


 I want to check the alternative splicing between our several samples.
 Now I have got the score of each one but How to compare them.

You've read the paper.  So, you'll know that extreme FIRMA scores  
(i.e. large positive/negative) represent putative differential  
splicing events.  So, in general, you are looking for large (in  
magnitude) values.  If you have replicates, maybe you want to look for  
significant changes in FIRMA scores between groups.

Hope that helps.

Cheers,
Mark



 Thanks very much.
 Jiang

 -- 
 When reporting problems on aroma.affymetrix, make sure 1) to run the  
 latest version of the package, 2) to report the output of  
 sessionInfo() and traceback(), and 3) to post a complete code example.


 You received this message because you are subscribed to the Google  
 Groups aroma.affymetrix group.
 To post to this group, send email to aroma-affymetrix@googlegroups.com
 To unsubscribe from this group, send email to 
 aroma-affymetrix-unsubscr...@googlegroups.com
 For more options, visit this group at 
 http://groups.google.com/group/aroma-affymetrix?hl=en

--
Mark Robinson, PhD (Melb)
Epigenetics Laboratory, Garvan
Bioinformatics Division, WEHI
e: m.robin...@garvan.org.au
e: mrobin...@wehi.edu.au
p: +61 (0)3 9345 2628
f: +61 (0)3 9347 0852
--





-- 
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
aroma.affymetrix group.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe from this group, send email to 
aroma-affymetrix-unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/aroma-affymetrix?hl=en


extracting FIRMA scores (Was: [aroma.affymetrix] a question)

2009-11-17 Thread Mark Robinson
Hi Jiang.

Its helpful to use meaningful subject headers, so that others can  
search the mailing lists.  So, I've changed your messages to a new  
thread.  Comments below.

 Hi,
 I have a question can you help me. That's about using FIRMA. I  
 cannot get the result after I run FIRMA. I only found the score.CEL  
 files but they cannot be opened. So may you give me some suggestion?
 Thanks very much.
 Best,
 Jiang

You shouldn't need to deal directly with the CEL files.  You can  
access the FIRMA scores by using  extractDataFrame().  For example:

[...]
firma - FirmaModel(plm)
fit(firma, verbose=verbose)

fs - getFirmaScores(firma)
fsDf - extractDataFrame(fs)

The 'fsDf' should be a data.frame object containing the FIRMA scores,  
assuming you've run all the previous steps mentioned in the vignette:
http://groups.google.com/group/aroma-affymetrix/web/human-exon-array-analysis


 Hi,
 When I run FIRMA, I got the FIRMAscores.CEL files, but I cannot open
 it
 using TXT.

They are binary files, but as mentioned above, you shouldn't need to  
deal with them directly.


 So how I can check my result of the FIRMA. The result
 folder is
 empty. Another, if I want to add the samples groups info to the exon
 array,
 what is the format of it? I wirte like this:
 sample1 group1
 sample2 group1
 sample3 group2
 sample4 group3
 and put it in the ..\annotationData\samples\, that's right?
 Thanks,
 Jiang



FIRMA is a single sample method, in the sense that it doesn't need  
replicates.  Therefore, it does not need this information.

Cheers,
Mark


--
Mark Robinson, PhD (Melb)
Epigenetics Laboratory, Garvan
Bioinformatics Division, WEHI
e: m.robin...@garvan.org.au
e: mrobin...@wehi.edu.au
p: +61 (0)3 9345 2628
f: +61 (0)3 9347 0852
--





-- 
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
aroma.affymetrix group.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe from this group, send email to 
aroma-affymetrix-unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/aroma-affymetrix?hl=en


Re: [aroma.affymetrix] FIRMAGene with masked CEL files

2009-11-16 Thread Mark Robinson
 To unsubscribe from this group, send email to 
 aroma-affymetrix-unsubscr...@googlegroups.com
 For more options, visit this group at 
 http://groups.google.com/group/aroma-affymetrix?hl=en

--
Mark Robinson, PhD (Melb)
Epigenetics Laboratory, Garvan
Bioinformatics Division, WEHI
e: m.robin...@garvan.org.au
e: mrobin...@wehi.edu.au
p: +61 (0)3 9345 2628
f: +61 (0)3 9347 0852
--





-- 
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
aroma.affymetrix group.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe from this group, send email to 
aroma-affymetrix-unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/aroma-affymetrix?hl=en


[aroma.affymetrix] Re: Discussion on using-the-genomegraphs-package-with-firma

2009-10-26 Thread Mark Robinson

Hi Lasse.

Yes, the 'plm' mentioned on that page should be the 'plmTr' object  
(with mergeGroups=TRUE as you have) mentioned in the HuEx vignette:

http://groups.google.com/group/aroma-affymetrix/web/human-exon-array-analysis

Hope that helps.

Cheers,
Mark


On 26-Oct-09, at 9:20 PM, Lasse wrote:


 Thanks for sharing your code.

 I don't really understand - when getting the PLM for this instruction
 are you supposed to set mergeGroups to true like this:
 plm - ExonRmaPlm(celSet, mergeGroups=TRUE)


 -L

 

--
Mark Robinson, PhD (Melb)
Epigenetics Laboratory, Garvan
Bioinformatics Division, WEHI
e: m.robin...@garvan.org.au
e: mrobin...@wehi.edu.au
p: +61 (0)3 9345 2628
f: +61 (0)3 9347 0852
--






--~--~-~--~~~---~--~~
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
aroma.affymetrix group.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe from this group, send email to 
aroma-affymetrix-unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/aroma-affymetrix?hl=en
-~--~~~~--~~--~--~---



[aroma.affymetrix] Re: error in exon array analysis: fit(plmTr, verbose=verbose)

2009-10-26 Thread Mark Robinson

Hi Elizabeth.

Indeed, this is certainly a R 2.10 problem, related to changes in  
'preprocessCore' ...

processCore v 1.7.9 (BioC 2.5, R 2.10.x):
SEXP R_rlm_rma_default_model(SEXP Y, SEXP PsiCode, SEXP PsiK, SEXP  
Scales){

processCore v 1.6.0 (BioC 2.4, R 2.9.x):
SEXP R_rlm_rma_default_model(SEXP Y, SEXP PsiCode, SEXP PsiK){

The fix is probably quite easy, but we'll need to update this (I  
haven't begun my migration to 2.10 yet ...).  And, surely other  
routines will be affected.

Henrik: are you planning a release in the near future that works with  
R 2.10?

Cheers,
Mark

On 27-Oct-09, at 11:35 AM, Elizabeth Purdom wrote:


 Hi Mark,
 I am running into the same problem with 2.10.0 that I just installed
 (see session info below). It appears to be a problem with  
 preprocessCore
 function that does the rma fit having changed its format. In which  
 case,
 I think this a more general problem (i.e. not just for the exon  
 array).
 Best,
 Elizabeth


 20091026 17:32:55|  Identifying non-fitted units in chip-effect  
 file...done
 20091026 17:32:55| Identifying non-estimated units...done
 20091026 17:32:55| Getting model fit for 23885 units.
 Loading required package: preprocessCore
 simpleError in .Call(R_rlm_rma_default_model, y, psiCode, psiK,
 PACKAGE = rlmPkg): Incorrect number of arguments (3), expecting 4 for
 R_rlm_rma_default_model
 Error in list(`fit(plmColonCCL2run, verbose = verbose)` =  
 environment,  :

 [2009-10-26 17:32:55] Exception: The fit function for requested exon  
 RMA
 PLM failed
   at throw(Exception(...))
   at throw.default(The fit function for requested exon RMA PLM  
 failed)
   at throw(The fit function for requested exon RMA PLM failed)
   at getFitUnitGroupFunction.ExonRmaPlm(this, ...)
   at getFitUnitGroupFunction(this, ...)
   at getFitUnitFunction.MultiArrayUnitModel(this)
   at getFitUnitFunction(this)
   at fit.ProbeLevelModel(plmColonCCL2run, verbose = verbose)
   at fit(plmColonCCL2run, verbose = verbose)
 20091026 17:32:55|Fitting model of class ExonRmaPlm:...done
 sessionInfo()
 R version 2.10.0 (2009-10-26)
 x86_64-apple-darwin9.8.0

 locale:
 [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8

 attached base packages:
 [1] stats graphics  grDevices utils datasets  methods   base

 other attached packages:
  [1] preprocessCore_1.7.9   aroma.affymetrix_1.2.0 aroma.apd_0.1.7
affxparser_1.17.5  R.huge_0.2.0   aroma.core_1.2.0
  aroma.light_1.13.6
  [8] matrixStats_0.1.6  R.rsp_0.3.6R.filesets_0.5.3
digest_0.4.1   R.cache_0.2.0  R.utils_1.2.2
  R.oo_1.6.2
 [15] R.methodsS3_1.0.3  projectManager_1.0 XML_2.6-0

 Mark Robinson wrote:
 Hi Hailei.

 For starters, can you give the *full* output of your  
 sessionInfo()?  The
 error you are getting has something to do with the 'preprocessCore'
 package and I first want to check whether it is a package version  
 mismatch
 error.

 I haven't used aroma.affymetrix on R 2.10 and I don't know if  
 anyone else
 has either.  You could try all this on R 2.9.2 ...

 Cheers,
 Mark

 Dear All,

 I am analyzing human affy exon arrays for first time. I followed the
 steps listed in website:
 http://groups.google.com/group/aroma-affymetrix/web/human-exon-array-analysis

 In summarization step, I met an error when I  began to fit the PLM  
 to
 all of the data.

 Thanks
 Hailei


 my R session:
 sessionInfo()
 R version 2.10.0 Under development (unstable) (2009-08-10 r49148)
 x86_64-unknown-linux-gnu

 My script:
 library(aroma.affymetrix)
 verbose - Arguments$getVerbose(-8,timestamp=TRUE)
 chipType - HuEx-1_0-st-v2
 cdf - AffymetrixCdfFile$byChipType(chipType,
 tags=coreR3,A20071112,EP)
 print(cdf)
 cs - AffymetrixCelSet$byName(tissues,cdf=cdf)
 print(cs)
 bc - RmaBackgroundCorrection(cs, tag=coreR3)
 csBC - process(bc,verbose=verbose)
 qn - QuantileNormalization(csBC, typesToUpdate=pm)
 print(qn)
 csN - process(qn, verbose=verbose)
 plmTr - ExonRmaPlm(csN, mergeGroups=TRUE)
 print(plmTr)
 fit(plmTr,verbose=verbose)

 The error is:

 fit(plmTr, verbose=verbose)
 20091015 15:49:54|Fitting model of class ExonRmaPlm:...
 ExonRmaPlm:
 Data set: tissues
 Chip type: HuEx-1_0-st-v2,coreR3,A20071112,EP
 Input tags: coreR3,QN
 Output tags: coreR3,QN,RMA,merged
 Parameters: (probeModel: chr pm; shift: num 0; flavor: chr
 affyPLM; treatNAsAs: chr weights; mergeGroups: logi TRUE).
 Path: plmData/tissues,coreR3,QN,RMA,merged/HuEx-1_0-st-v2
 RAM: 0.01MB
 20091015 15:49:54| Identifying non-estimated units...
 20091015 15:49:54|  Identifying non-fitted units in chip-effect
 file...
 20091015 15:49:54|   Pathname: plmData/tissues,coreR3,QN,RMA,merged/
 HuEx-1_0-st-v2/RD2009092837,chipEffects.CEL
 20091015 15:49:54|   Found indices cached on file
 20091015 15:49:54|   Reading data for these 18708 cells...
 20091015 15:49:54|   Reading data for these 18708 cells...done
 20091015 15:49:54|   Looking for stdvs = 0 indicating non-estimated
 units:
int [1:18708] 1 2 3 4 5 6 7

[aroma.affymetrix] Re: selection of CDF and PA call for exon array

2009-10-20 Thread Mark Robinson

Hi Yong.

Comments below.

On 21-Oct-09, at 9:11 AM, Yong wrote:


 Dear All,

 I am a newbie of aroma.affymetrix package. After reading user guild
 and Human exon array analysis case study, I was able to process my
 mouse exon array data and generate gene-level expression intensity.
 But I still have two questions:

 1. There is one Ensembl gene based mouse CDF file available,
 http://groups.google.com/group/aroma-affymetrix/web/moex-1-0-st .
 Also, 
 http://brainarray.mbni.med.umich.edu/Brainarray/Database/CustomCDF/11.0.1/ensg.asp
 provides another one based on Ensembl genes. I am wondering which one
 is better for generation of gene-level summarization.

That depends on what you mean by and how you define better.  These  
different CDFs are just rearrangements of what probes go in what  
probesets, based on the annotation sources.  From what I recall, you  
won't be able to use the brainarray CDFs for FIRMA analysis, since  
that requires the CDF to be stored in a hierarchical way (exon  
probesets within gene probesets).  I would also guess that if those 2  
CDFs are based on the same Ensembl gene build, then the gene-level  
summarization would essentially be the same.


 2. Is it possible to generate gene-level PA (presence/absence) call
 for exon array?

As far as I know, there is no implementation in aroma.affymetrix for  
this.

Cheers,
Mark




 Many thanks ahead.

 Yong Zhang
 Ph.D, Research Scholar
 Manyuan Long's Lab
 University of Chicago

 

--
Mark Robinson, PhD (Melb)
Epigenetics Laboratory, Garvan
Bioinformatics Division, WEHI
e: m.robin...@garvan.org.au
e: mrobin...@wehi.edu.au
p: +61 (0)3 9345 2628
f: +61 (0)3 9347 0852
--






--~--~-~--~~~---~--~~
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
aroma.affymetrix group.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe from this group, send email to 
aroma-affymetrix-unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/aroma-affymetrix?hl=en
-~--~~~~--~~--~--~---



[aroma.affymetrix] Re: FIRMA score and limma analysis

2009-10-18 Thread Mark Robinson

Hi Hailei.

This has been discussed before as well.  You might start with:

http://groups.google.com/group/aroma-affymetrix/browse_thread/thread/36d8c59d742fc503/
http://groups.google.com/group/aroma-affymetrix/browse_thread/thread/f4d015cae1848f51/

You should definitely take log2 before proceeding with FIRMA scores.   
Since they are a summary of residuals for a probeset (see the paper  
for details), you will have positive and negative numbers.

Cheers,
Mark


On 17-Oct-09, at 3:46 AM, hailei@gmail.com wrote:


 Dear All,

 I have 2 tumor samples and 2 control samples and want to find
 alternative splicing.

 After I got the FIRMA score, Could I use limma to find the alternative
 splicing?

 Before using limma, I did log2 scale to FIRMA score. But I found there
 are a lot of negative value in data set. It is common?

 Original Firma score:
 head(exFirma)
  unitName groupName unit group cell RD2009092839 RD2009092840
 RD2009092841
 1  6838637   43049271 110.45864440.2794902
 1.8639828
 2  6838637   43305951 220.61026090.9152724
 1.3151890
 3  6838637   43567711 331.65704730.1941053
 1.3088931
 4  6838637   43663261 440.74172301.8255262
 0.7095107
 5  6838637   43679511 551.00014531.3591065
 0.6875097
 6  6838637   43963761 661.40897040.4405679
 1.1440083
  RD2009092842
 11.3710542
 21.3612753
 30.8766123
 40.6292946
 51.2785655
 60.6761603

 log2 score:
 head(exFirma)
  unitName groupName unit group cell  RD2009092839 RD2009092840
 RD2009092841
 1  6838637   43049271 11 -1.1245521013   -1.8391304
 0.8983885
 2  6838637   43305951 22 -0.7125019273   -0.1277269
 0.3952701
 3  6838637   43567711 33  0.7286147599   -2.3650883
 0.3883473
 4  6838637   43663261 44 -0.43104758690.8683124
 -0.4951036
 5  6838637   43679511 55  0.00020963160.4426586
 -0.5405480
 6  6838637   43963761 66  0.4946412584   -1.1825638
 0.1940975
  RD2009092842
 10.4552856
 20.4449589
 3   -0.1899892
 4   -0.6681926
 50.3545261
 6   -0.5645628

 Thanks

 Hailei
 

--
Mark Robinson, PhD (Melb)
Epigenetics Laboratory, Garvan
Bioinformatics Division, WEHI
e: m.robin...@garvan.org.au
e: mrobin...@wehi.edu.au
p: +61 (0)3 9345 2628
f: +61 (0)3 9347 0852
--






--~--~-~--~~~---~--~~
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
aroma.affymetrix group.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe from this group, send email to 
aroma-affymetrix-unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/aroma-affymetrix?hl=en
-~--~~~~--~~--~--~---



[aroma.affymetrix] Re: error in exon array analysis: fit(plmTr, verbose=verbose)

2009-10-15 Thread Mark Robinson

Hi Hailei.

For starters, can you give the *full* output of your sessionInfo()?  The
error you are getting has something to do with the 'preprocessCore'
package and I first want to check whether it is a package version mismatch
error.

I haven't used aroma.affymetrix on R 2.10 and I don't know if anyone else
has either.  You could try all this on R 2.9.2 ...

Cheers,
Mark


 Dear All,

 I am analyzing human affy exon arrays for first time. I followed the
 steps listed in website:
 http://groups.google.com/group/aroma-affymetrix/web/human-exon-array-analysis

 In summarization step, I met an error when I  began to fit the PLM to
 all of the data.

 Thanks
 Hailei


 my R session:
 sessionInfo()
 R version 2.10.0 Under development (unstable) (2009-08-10 r49148)
 x86_64-unknown-linux-gnu

 My script:
 library(aroma.affymetrix)
 verbose - Arguments$getVerbose(-8,timestamp=TRUE)
 chipType - HuEx-1_0-st-v2
 cdf - AffymetrixCdfFile$byChipType(chipType,
 tags=coreR3,A20071112,EP)
 print(cdf)
 cs - AffymetrixCelSet$byName(tissues,cdf=cdf)
 print(cs)
 bc - RmaBackgroundCorrection(cs, tag=coreR3)
 csBC - process(bc,verbose=verbose)
 qn - QuantileNormalization(csBC, typesToUpdate=pm)
 print(qn)
 csN - process(qn, verbose=verbose)
 plmTr - ExonRmaPlm(csN, mergeGroups=TRUE)
 print(plmTr)
 fit(plmTr,verbose=verbose)

 The error is:

 fit(plmTr, verbose=verbose)
 20091015 15:49:54|Fitting model of class ExonRmaPlm:...
  ExonRmaPlm:
  Data set: tissues
  Chip type: HuEx-1_0-st-v2,coreR3,A20071112,EP
  Input tags: coreR3,QN
  Output tags: coreR3,QN,RMA,merged
  Parameters: (probeModel: chr pm; shift: num 0; flavor: chr
 affyPLM; treatNAsAs: chr weights; mergeGroups: logi TRUE).
  Path: plmData/tissues,coreR3,QN,RMA,merged/HuEx-1_0-st-v2
  RAM: 0.01MB
 20091015 15:49:54| Identifying non-estimated units...
 20091015 15:49:54|  Identifying non-fitted units in chip-effect
 file...
 20091015 15:49:54|   Pathname: plmData/tissues,coreR3,QN,RMA,merged/
 HuEx-1_0-st-v2/RD2009092837,chipEffects.CEL
 20091015 15:49:54|   Found indices cached on file
 20091015 15:49:54|   Reading data for these 18708 cells...
 20091015 15:49:54|   Reading data for these 18708 cells...done
 20091015 15:49:54|   Looking for stdvs = 0 indicating non-estimated
 units:
 int [1:18708] 1 2 3 4 5 6 7 8 9 10 ...
 20091015 15:49:54|  Identifying non-fitted units in chip-effect
 file...done
 20091015 15:49:54| Identifying non-estimated units...done
 20091015 15:49:54| Getting model fit for 18708 units.
 simpleError in .Call(R_rlm_rma_default_model, y, psiCode, psiK,
 PACKAGE = rlmPkg): Incorrect number of arguments (3), expecting 4 for
 R_rlm_rma_default_model
 Error in list(`fit(plmTr, verbose = verbose)` = environment,
 `fit.ProbeLevelModel(plmTr, verbose = verbose)` = environment,  :

 [2009-10-15 15:49:54] Exception: The fit function for requested exon
 RMA PLM failed
   at throw(Exception(...))
   at throw.default(The fit function for requested exon RMA PLM
 failed)
   at throw(The fit function for requested exon RMA PLM failed)
   at getFitUnitGroupFunction.ExonRmaPlm(this, ...)
   at getFitUnitGroupFunction(this, ...)
   at getFitUnitFunction.MultiArrayUnitModel(this)
   at getFitUnitFunction(this)
   at fit.ProbeLevelModel(plmTr, verbose = verbose)
   at fit(plmTr, verbose = verbose)
 20091015 15:49:54|Fitting model of class ExonRmaPlm:...done

 




--~--~-~--~~~---~--~~
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
aroma.affymetrix group.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe from this group, send email to 
aroma-affymetrix-unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/aroma-affymetrix?hl=en
-~--~~~~--~~--~--~---



[aroma.affymetrix] Re: error in exon analysis

2009-10-15 Thread Mark Robinson

Hi Hailei.

This has come up recently:
http://groups.google.com/group/aroma-affymetrix/browse_thread/thread/94a7343c92e54946

... and what happened there is the:

fit(plmTr, verbose=verbose)

didn't get run.  Can you check this, perhaps from a fresh R session?

Cheers,
Mark

ps. I think we are pushing the limits for RMA/FIRMA by running this on  
just 3 samples.



On 16-Oct-09, at 8:55 AM, hailei@gmail.com wrote:


 my fs shows:

 head(fs$files)
 [[1]]
 FirmaFile:
 Name: RD2009092835
 Tags: FIRMAscores
 Full name: RD2009092835,FIRMAscores
 Pathname: firmaData/tissues,coreR3,QN,RMA,merged,FIRMA,medres/ 
 HuEx-1_0-
 st-v2/RD2009092835,FIRMAscores.CEL
 File size: 2.72 MB (2846934 bytes)
 RAM: 104.76 MB
 File format: v4 (binary; XDA)
 Platform: Affymetrix
 Chip type: HuEx-1_0-st-v2,coreR3,A20071112,EP,monocell
 Timestamp: 2009-10-15 14:55:04

 [[2]]
 FirmaFile:
 Name: RD2009092836
 Tags: FIRMAscores
 Full name: RD2009092836,FIRMAscores
 Pathname: firmaData/tissues,coreR3,QN,RMA,merged,FIRMA,medres/ 
 HuEx-1_0-
 st-v2/RD2009092836,FIRMAscores.CEL
 File size: 2.72 MB (2846934 bytes)
 RAM: 0.00 MB
 File format: v4 (binary; XDA)
 Platform: Affymetrix
 Chip type: HuEx-1_0-st-v2,coreR3,A20071112,EP,monocell
 Timestamp: 2009-10-15 14:55:04

 [[3]]
 FirmaFile:
 Name: RD2009092837
 Tags: FIRMAscores
 Full name: RD2009092837,FIRMAscores
 Pathname: firmaData/tissues,coreR3,QN,RMA,merged,FIRMA,medres/ 
 HuEx-1_0-
 st-v2/RD2009092837,FIRMAscores.CEL
 File size: 2.72 MB (2846934 bytes)
 RAM: 0.00 MB
 File format: v4 (binary; XDA)
 Platform: Affymetrix
 Chip type: HuEx-1_0-st-v2,coreR3,A20071112,EP,monocell
 Timestamp: 2009-10-15 14:55:04



 On Oct 15, 5:44 pm, hailei zhang hailei@gmail.com wrote:
 Dear All,

 I want to get FIRma score for my exon array. But after I type this  
 commod:
 exFirma - extractDataFrame(fs,addNames=TRUE,units=NULL)

 My FIRMA scores are all Nan: head(exFirma)

   unitName groupName unit group cell RD2009092835 RD2009092836  
 RD2009092837
 1  2315251   23152521 11  NaN   
 NaN  NaN
 2  2315251   23152531 22  NaN   
 NaN  NaN
 3  2315373   23153742 13  NaN   
 NaN  NaN
 4  2315373   23153752 24  NaN   
 NaN  NaN
 5  2315373   23153762 35  NaN   
 NaN  NaN
 6  2315373   23153772 46  NaN   
 NaN  NaN

  The following is my script:
 library(aroma.affymetrix)
 verbose - Arguments$getVerbose(-8,timestamp=TRUE)
 chipType - HuEx-1_0-st-v2
 cdf - AffymetrixCdfFile$byChipType(chipType,  
 tags=coreR3,A20071112,EP)
 print(cdf)
 cs - AffymetrixCelSet$byName(tissues,cdf=cdf)
 print(cs)
 bc - RmaBackgroundCorrection(cs, tag=coreR3)
 csBC - process(bc,verbose=verbose)
 qn - QuantileNormalization(csBC, typesToUpdate=pm)
 print(qn)
 csN - process(qn, verbose=verbose)
 plmTr - ExonRmaPlm(csN, mergeGroups=TRUE)
 fit(plmTr,verbose=verbose)
 rs - calculateResiduals(plmTr, verbose=verbose)
 firma - FirmaModel(plmTr)
 fit(firma, verbose=verbose)
 fs - getFirmaScores(firma)
 exFirma - extractDataFrame(fs,addNames=TRUE,units=NULL)
 savehistory(file=Firma_score.commond)
 My session information: sessionInfo()

 R version 2.9.2 (2009-08-24)
 x86_64-unknown-linux-gnu
 locale:
 LC_CTYPE 
 =en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.UTF-­ 
 8 
 ;LC_MONETARY 
 =C;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=C;LC_A­ 
 DDRESS 
 =C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATION=C
 attached base packages:
 [1] stats graphics  grDevices utils datasets  methods   base
 other attached packages:
  [1] preprocessCore_1.6.0   aroma.affymetrix_1.2.0 aroma.apd_0.1.6
  [4] affxparser_1.16.0  R.huge_0.1.9   aroma.core_1.2.0
  [7] aroma.light_1.12.2 matrixStats_0.1.6  R.rsp_0.3.6
 [10] R.filesets_0.5.3   digest_0.4.1   R.cache_0.1.9
 [13] R.utils_1.2.0  R.oo_1.5.0 R.methodsS3_1.0.3
 loaded via a namespace (and not attached):
 [1] tools_2.9.2 class(fs)

 [1] FirmaSet   ParameterCelSet
 AffymetrixCelSet
 [4] AffymetrixFileSet  AromaPlatformInterface
 AromaMicroarrayDataSet
 [7] GenericDataFileSet Object

 Thanks.

 Hailei
 

--
Mark Robinson, PhD (Melb)
Epigenetics Laboratory, Garvan
Bioinformatics Division, WEHI
e: m.robin...@garvan.org.au
e: mrobin...@wehi.edu.au
p: +61 (0)3 9345 2628
f: +61 (0)3 9347 0852
--






--~--~-~--~~~---~--~~
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
aroma.affymetrix group.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe

[aroma.affymetrix] Re: Exception: Unknown arguments: cdf, checkChipType

2009-10-14 Thread Mark Robinson
 arguments: , argsStr)
  at throw(Unknown arguments: , argsStr)
  at GenericDataFileSet(files = files, ...)
  at extend(GenericDataFileSet(files = files, ...),
 AromaMicroarrayDataSet)
  at AromaMicroarrayDataSet(files = files, ...)
  at extend(AromaMicroarrayDataSet(files = files, ...), c
 (AffymetrixFileSet, uses(AromaPlatformI
  at AffymetrixFileSet(files = files, ...)
  at extend(AffymetrixFileSet(files = files, ...),  
 AffymetrixCelSet,
 `cached:.intensities` = NULL
  at this(...)
  at newInstance.Class(clazz, ...)
  at newInstance(clazz, ...)
  at newInstance.Object(static, files, ...)
  at newInstance(static, files, ...)
  at method(static, ...)
  at staticMethod(path = probeData/MGH09,RBC/MoGene-1_0-st-v1,
 pattern = ^[^.].*[.](CEL|cel)$,
  at do.call(staticMethod, args = args)
  at getOutputDataSet0.AromaTransform(this, ..., verbose
 Background correcting data set...done

 3. QuantileNormalization(cs) Error:

 qn -QuantileNormalization(cs)
 qn
 Error in list(`print(NA)` = environment, `print.Object(NA)` =
 environment,  :

 [2009-10-12 11:33:42] Exception: Unknown arguments: cdf,  
 checkChipType
  at throw(Exception(...))
  at throw.default(Unknown arguments: , argsStr)
  at throw(Unknown arguments: , argsStr)
  at GenericDataFileSet(files = files, ...)
  at extend(GenericDataFileSet(files = files, ...),
 AromaMicroarrayDataSet)
  at AromaMicroarrayDataSet(files = files, ...)
  at extend(AromaMicroarrayDataSet(files = files, ...), c
 (AffymetrixFileSet, u
  at AffymetrixFileSet(files = files, ...)
  at extend(AffymetrixFileSet(files = files, ...),  
 AffymetrixCelSet,
 `cached:.
  at this(...)
  at newInstance.Class(clazz, ...)
  at newInstance(clazz, ...)
  at newInstance.Object(static, files, ...)
  at newInstance(static, files, ...)
  at method(static, ...)
  at staticMethod(path = probeData/MGH09,QN/MoGene-1_0-st-v1,
 pattern = ^[^.]
  at do.call(staticMethod, args = args)


 

--
Mark Robinson, PhD (Melb)
Epigenetics Laboratory, Garvan
Bioinformatics Division, WEHI
e: m.robin...@garvan.org.au
e: mrobin...@wehi.edu.au
p: +61 (0)3 9345 2628
f: +61 (0)3 9347 0852
--






--~--~-~--~~~---~--~~
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
aroma.affymetrix group.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe from this group, send email to 
aroma-affymetrix-unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/aroma-affymetrix?hl=en
-~--~~~~--~~--~--~---



[aroma.affymetrix] Re: AffymetrixCelSet, Could not locate a file for this chip type

2009-10-11 Thread Mark Robinson

Hi Carol.

This throws an error because your CDF has a tag (r3) and there is no way
to send a tag for the CDF file through the AffymetrixCelSet$byName
function.  I recommend that you just do this in 2 commands, as you've done
successfully.  Alternatively, you can remove the ,r3 from your CDF file
name to remove the tag and that single command should work.

Cheers,
Mark




 Hi all,

 I'm just starting to analyze expression data form the MoGene-1_0-st-v1
chip using aroma.affymetrix but I'm running into an error for creating a
cs object that I don't really understand.

 DATA SET UP:

 I started by reading the following to learn about setting up the data
files:
 http://groups.google.com/group/aroma-affymetrix/web/users-guide

 Here is my basic set up:

 \annotationData\chipTypes\MoGene-1_0-st-v1\MoGene-1_0-st-v1,r3.cdf
\rawData\MGH09\MoGene-1_0-st-v1\lots o CEL files


 WHAT WORKS:

 cdf - AffymetrixCdfFile$byChipType(chipType, tags=r3)
 print (cdf)

 AffymetrixCdfFile:
 Path: annotationData/chipTypes/MoGene-1_0-st-v1
 Filename: MoGene-1_0-st-v1,r3.cdf
 Filesize: 67.42MB
 Chip type: MoGene-1_0-st-v1,r3
 RAM: 0.00MB
 File format: v3 (text; ASCII)
 Dimension: 1050x1050
 Number of cells: 1102500
 Number of units: 35512
 Cells per unit: 31.05
 Number of QC units: 1


 cs - AffymetrixCelSet$byName(MGH09, cdf=cdf)
 print (cs)
 AffymetrixCelSet:
 Name: MGH09
 Tags:
 Path: rawData/MGH09/MoGene-1_0-st-v1
 Platform: Affymetrix
 Chip type: MoGene-1_0-st-v1,r3
 Number of arrays: 26
 Names: J001_3_11.5, J001_4_11.5, ..., J010_4_16.5
 Time period: 2009-09-23 17:51:49 -- 2009-09-23 21:45:31
 Total file size: 275.06MB
 RAM: 0.02MB


 WHAT DOESN'T WORK:

 What I tried first, and what I've found in other tutorials, is the
folllowing:

 cs -AffymetrixCelSet$byName(data name, tags, chipType=chipType)

 Which I translated to:

 cs -AffymetrixCelSet$byName(MGH09, chipType=MoGene-1_0-st-v1)

 But when I run this I get the error listed below. I'm wondering one
approach approach works just fine but this one doesn't.

 Error in list(`AffymetrixCelSet$byName(MGH09, chipType = MoGene-1_0-
st-v1)` = environment,  :

 [2009-10-10 06:55:28] Exception: Could not locate a file for this chip
type: MoGene-1_0-st-v1
   at throw(Exception(...))
   at throw.default(Could not locate a file for this chip type: ,
 paste(c(chipType, tags), collapse = ,))
   at throw(Could not locate a file for this chip type: , paste(c
 (chipType, tags), collapse = ,))
   at method(static, ...)
   at AffymetrixCdfFile$byChipType(chipType, nbrOfCells = nbrOfCells) at
fromFiles.AffymetrixCelSet(static, path = path, cdf = cdf, ...) at
fromFiles(static, path = path, cdf = cdf, ...)
   at withCallingHandlers(expr, warning = function(w) invokeRestart
 (muffleWarning))
   at suppressWarnings({
   at method(static, ...)
   at AffymetrixCelSet$byName(MGH09, chipType = MoGene-1_0-st-v1)

 






--~--~-~--~~~---~--~~
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
aroma.affymetrix group.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe from this group, send email to 
aroma-affymetrix-unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/aroma-affymetrix?hl=en
-~--~~~~--~~--~--~---



[aroma.affymetrix] Re: exon array analysis

2009-10-05 Thread Mark Robinson

Hi Enid.

I was unsuccessful in repeating your problem.  I ran the script below  
from a fresh session on the Affy tissues dataset using  
aroma.affymetrix 1.2.0 ... and I get results.  I don't think I've run  
FIRMA on as few as 4 samples as your example suggests, but in theory  
that shouldn't be the problem.

Can you post your full script and give your sessionInfo()?

Cheers,
Mark


---
library(aroma.affymetrix)

# setup
verbose - Arguments$getVerbose(-20, timestamp=TRUE)
chipType - HuEx-1_0-st-v2
cdf - AffymetrixCdfFile$byChipType(chipType,  
tags=coreR3,A20071112,EP)
cs - AffymetrixCelSet$byName(tissues, cdf=cdf)

# BG adjust + QN
bc - RmaBackgroundCorrection(cs, tag=coreR2)
csBC - process(bc,verbose=verbose)
qn - QuantileNormalization(csBC, typesToUpdate=pm)
csN - process(qn, verbose=verbose)

# fit PLM, FIRMA
plm - ExonRmaPlm(csN, mergeGroups=TRUE)
rs - calculateResiduals(plm, verbose=verbose)
firma - FirmaModel(plm)
fit(firma, verbose=verbose)

fs - getFirmaScores(firma)
firmascore - extractDataFrame(fs)
---


On 5-Oct-09, at 7:50 AM, Enid wrote:


 Dear all,

 I am analysing a set of exon array data, and have been following the
 human exon array analysis vignette, but am having trouble getting the
 firma scores

 After the firma analysis, I get a table full of NaN's.

 firma - FirmaModel(plmTr)
 fit(firma, verbose=verbose)
 fs - getFirmaScores(firma)
 firmascore-extractDataFrame(fs)
 firmascore[1:10,]
   unit group cell P2008 P2009 P2010 P2011
 1 1 11   NaN   NaN   NaN   NaN
 2 1 22   NaN   NaN   NaN   NaN
 3 2 13   NaN   NaN   NaN   NaN
 4 2 24   NaN   NaN   NaN   NaN
 5 2 35   NaN   NaN   NaN   NaN
 6 2 46   NaN   NaN   NaN   NaN
 7 3 17   NaN   NaN   NaN   NaN
 8 3 28   NaN   NaN   NaN   NaN
 9 3 39   NaN   NaN   NaN   NaN
 103 4   10   NaN   NaN   NaN   NaN

 but the transcript summarisation seems to be ok.

 readUnits(plmTr,unit=1)
 $`2315251`
 $`2315251`[[1]]
 $`2315251`[[1]]$intensities
  [,1]  [,2]  [,3]  [,4]
 [1,]  6.418280 51.159870  5.914537  8.396150
 [2,] 11.297100  7.589603 11.975643  8.396150
 [3,]  9.184888 12.899874 83.409866 44.409882
 [4,]  8.778770  5.476091 82.909866  5.136143
 [5,]  4.446828  6.080955 11.454535  4.062980
 [6,]  5.914537  4.550271  7.589603  3.849267
 [7,]  8.038540  3.542727  3.808023  7.923751
 [8,] 13.260725  6.238900  4.660217  4.200828

 I have not sure what I'm doing wrong and would really appreciate any
 help.

 Thank you very much in advance,
 Enid

 

--
Mark Robinson, PhD (Melb)
Epigenetics Laboratory, Garvan
Bioinformatics Division, WEHI
e: m.robin...@garvan.org.au
e: mrobin...@wehi.edu.au
p: +61 (0)3 9345 2628
f: +61 (0)3 9347 0852
--






--~--~-~--~~~---~--~~
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
aroma.affymetrix group.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe from this group, send email to 
aroma-affymetrix-unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/aroma-affymetrix?hl=en
-~--~~~~--~~--~--~---



[aroma.affymetrix] Re: QC of probe level model.

2009-10-03 Thread Mark Robinson

Hi Cathy.

Comments below.

On 29-Sep-09, at 10:11 PM, CathyMitchell wrote:

 Hi all,

 I am using the gene st array and would like to know a couple of things
 about the probe level model. After doing the RmaPlm one can do two
 types of QC, the NUSE and RLE plots. These however compare results for
 each array. I would like to be able to have a look at the individual
 genes/probes (be able to flag up problem genes/probes). Is there a way
 to plot these?

For plotting individual genes, you may be interested in this page (and  
discussion):
http://groups.google.com/group/aroma-affymetrix/web/using-the-genomegraphs-package-with-firma

As another approach, you could simply plot the data as line plots (1  
line for each sample, 1 point for every probe).  For example, you  
could use extractMatrix() to read your normalization data into a  
matrix and then use matplot().


 Also is there an error reading for the probe level models? like a
 splice index or something?

I'm not sure exactly what you are after here.  Are you looking for  
standard errors of the probe/chip effects?  That information is not  
stored, although it is calculated from the underlying 'preprocessCore'  
package.

If you are looking at doing something along the lines of differential  
splicing with Gene 1.0ST arrays, you might be interested in this  
(apologies for the shameless self-promotion):

http://www.biomedcentral.com/1471-2105/10/156


Cheers,
Mark



 Thanks,
 Cathy
 

--
Mark Robinson, PhD (Melb)
Epigenetics Laboratory, Garvan
Bioinformatics Division, WEHI
e: m.robin...@garvan.org.au
e: mrobin...@wehi.edu.au
p: +61 (0)3 9345 2628
f: +61 (0)3 9347 0852
--






--~--~-~--~~~---~--~~
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
aroma.affymetrix group.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe from this group, send email to 
aroma-affymetrix-unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/aroma-affymetrix?hl=en
-~--~~~~--~~--~--~---



[aroma.affymetrix] Re: Mat implementation - comparing MAT (pure) vs MAT aroma.affymetrix

2009-09-01 Thread Mark Robinson

Hi Lavinia.

I'm hoping that most of this is explained in the MAT Smoothing  
section at:
http://groups.google.com/group/aroma-affymetrix/web/promoter-tiling-array

In your case, the IPs would be Treatment (you would make + numbers in  
the design matrix) and Inputs would be Control (- numbers in the  
design matrix).

As an example, if your MAT tag file had samples/type:

ABCDEF
000111

you would specify your design matrix something like:

  design - matrix( rep(c(-1,1),each=3), nc=1,
+ dimnames=list( toupper(letters[1:6]), A+B+C-D-E-F) )
  design
   A+B+C-D-E-F
A  -1
B  -1
C  -1
D   1
E   1
F   1

Of course, the rows of your design matrix must match the order of the  
files from your AffymetrixCelSet.

Hope that helps.

Cheers,
Mark


On 1-Sep-09, at 2:47 PM, Lavinia wrote:


 Thanks Mark, very helpful.
 Sorry, one other question.
 With MAT (pure), you group your controls + inputs, e.g.
 Treatment (1) or Control (0) groups = 111000 for 3ChIP and 3Input
 How is this best done with MAT aroma.affymetrix (in the contrast
 matrix)?, it isn't immediately clear to me from the example how input
 and IP are treated?

 many thanks

 Lavinia.

 On Sep 1, 1:24 pm, Mark Robinson mrobin...@wehi.edu.au wrote:
 Hi Lavinia.

 Yes.

 Bandwidth(MAT)=probeWindow(aroma.affymetrix MAT)

 Also,

 MinProbe(MAT)=nProbes(aroma.affymetrix MAT)

 Cheers,
 Mark

 On 1-Sep-09, at 11:31 AM, Lavinia Gordon wrote:



 Hi,

 I have some older MAT pure results that I'd like to compare to  
 newer
 MAT aroma.affymetrix results.  Can I just check, from the MAT .tag
 (parameters file), does probeWindow correspond directly to  
 Bandwidth?

 many thanks

 Lavinia.

 --
 Mark Robinson, PhD (Melb)
 Epigenetics Laboratory, Garvan
 Bioinformatics Division, WEHI
 e: m.robin...@garvan.org.au
 e: mrobin...@wehi.edu.au
 p: +61 (0)3 9345 2628
 f: +61 (0)3 9347 0852
 --
 

--
Mark Robinson, PhD (Melb)
Epigenetics Laboratory, Garvan
Bioinformatics Division, WEHI
e: m.robin...@garvan.org.au
e: mrobin...@wehi.edu.au
p: +61 (0)3 9345 2628
f: +61 (0)3 9347 0852
--






--~--~-~--~~~---~--~~
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
aroma.affymetrix group.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe from this group, send email to 
aroma-affymetrix-unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/aroma-affymetrix?hl=en
-~--~~~~--~~--~--~---



[aroma.affymetrix] Re: Mat implementation - comparing MAT (pure) vs MAT aroma.affymetrix

2009-08-31 Thread Mark Robinson

Hi Lavinia.

Yes.

Bandwidth(MAT)=probeWindow(aroma.affymetrix MAT)

Also,

MinProbe(MAT)=nProbes(aroma.affymetrix MAT)

Cheers,
Mark




On 1-Sep-09, at 11:31 AM, Lavinia Gordon wrote:


 Hi,

 I have some older MAT pure results that I'd like to compare to newer
 MAT aroma.affymetrix results.  Can I just check, from the MAT .tag
 (parameters file), does probeWindow correspond directly to Bandwidth?

 many thanks

 Lavinia.
 

--
Mark Robinson, PhD (Melb)
Epigenetics Laboratory, Garvan
Bioinformatics Division, WEHI
e: m.robin...@garvan.org.au
e: mrobin...@wehi.edu.au
p: +61 (0)3 9345 2628
f: +61 (0)3 9347 0852
--






--~--~-~--~~~---~--~~
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
aroma.affymetrix group.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe from this group, send email to 
aroma-affymetrix-unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/aroma-affymetrix?hl=en
-~--~~~~--~~--~--~---



[aroma.affymetrix] Re: Discussion on gene-1-0-st-array-analysis

2009-08-30 Thread Mark Robinson

Hi Diya.

Just to add my 2 cents to this, although Mathieu has pretty much  
covered it.

There are no online tutorials (that I know of) for exactly what you  
want to do, but your proposed analysis is very standard.  lmFit() of  
limma can work directly on your matrix of logged data.  Consult the  
limma documentation (e.g. the limma user's guide) on how to do this.

Cheers,
Mark



On 29-Aug-09, at 12:13 AM, Diya v wrote:

 Hi

 I have 2 control and 2 treatment groups of MoGene-1_0-st.I have the  
 data normalized and and a data matrix  after fit(plm) is performed.

 I want to  do statistical analysis for differentially expressed ganes.

 Can I take the datamatrix generated from aroma.affymetrix and do the  
 analysis with limma

 Is there any online tutorial for this?

 Thanks,
 Diya

 --- On Fri, 28/8/09, Mathieu Parent parent.math...@gmail.com wrote:

 From: Mathieu Parent parent.math...@gmail.com
 Subject: [aroma.affymetrix] Re: Discussion on gene-1-0-st-array- 
 analysis
 To: aroma-affymetrix@googlegroups.com
 Date: Friday, 28 August, 2009, 5:58 PM

 Hi,

 They way it has been proposed to me, is to extract the matrix from  
 the normalised and summarised data, log it and pass into the LIMMA  
 package for differential expression analysis.

 What is your experimental design ?

 Math
 McGill University

 On Thu, Aug 27, 2009 at 3:46 PM, Diya Vaka biotechd...@gmail.com  
 wrote:

 Hello All,

 I want to know about the up and down regulated genes.So how am i
 supposed to proceed after this step

 Diya






 Love Cricket? Check out live scores, photos, video highlights and  
 more. Click here
 


--
Mark Robinson, PhD (Melb)
Epigenetics Laboratory, Garvan
Bioinformatics Division, WEHI
e: m.robin...@garvan.org.au
e: mrobin...@wehi.edu.au
p: +61 (0)3 9345 2628
f: +61 (0)3 9347 0852
--






--~--~-~--~~~---~--~~
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
aroma.affymetrix group.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe from this group, send email to 
aroma-affymetrix-unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/aroma-affymetrix?hl=en
-~--~~~~--~~--~--~---



[aroma.affymetrix] Re: installation fails

2009-08-13 Thread Mark Robinson

Hey Henrik.

If I run those commands, they work for me, but only because I already have
digest, etc. installed.  In order for it to give an error on my R 2.9.1,
I've removed a few packages (e.g. digest) from my default install
directory (/Users/mrobinson/Library/R/2.9/library/), so that it requires
an installation of those dependencies.  The error for me appears to be
related to not finding the digest package.  See below.

Does that help?
Mark

---
 source(http://www.braju.com/R/hbLite.R;);
 hbLite(R.filesets);
Using repository: http://www.braju.com/R/repos
Identified packages to be processed: utils, R.methodsS3, methods, R.oo,
R.utils, digest, R.filesets
Installing external packages...
 Packages:  utils, methods, digest
--- Please select a CRAN mirror for use in this session ---
Loading Tcl/Tk interface ... done
Updating packages: utils, methods, digest
 01/03. utils: not available.
 02/03. methods: not available.
 03/03. digest: missing. Installing:
simpleError in .find_bundles(available): subscript out of bounds
Installing external packages...done
Installing braju.com packages...
 Packages:  R.methodsS3, R.oo, R.utils, R.filesets
Detected R option pkgType=mac.binary, which is not available. Enforcing
installation from source instead for packages: R.methodsS3, R.oo, R.utils,
R.filesets
Updating packages: R.methodsS3, R.oo, R.utils, R.filesets
 01/04. R.methodsS3: v1.0.3, i.e. up to date.
 02/04. R.oo: v1.4.8, i.e. up to date.
 03/04. R.utils: v1.1.7, i.e. up to date.
 04/04. R.filesets: missing. Installing:
Warning: dependency ‘digest’ is not available
trying URL 'http://www.braju.com/R/repos/R.filesets_0.5.3.tar.gz'
Content type 'application/x-tar' length 35408 bytes (34 Kb)
opened URL
==
downloaded 34 Kb

* Installing *source* package ‘R.filesets’ ...
** R
** inst
** preparing package for lazy loading
R.methodsS3 v1.0.3 (2008-07-02) successfully loaded. See ?R.methodsS3 for
help.
R.oo v1.4.8 (2009-05-18) successfully loaded. See ?R.oo for help.
R.utils v1.1.7 (2009-05-30) successfully loaded. See ?R.utils for help.
Error : package 'digest' required by 'R.filesets' could not be found
ERROR: lazy loading failed for package ‘R.filesets’
* Removing ‘/Users/mrobinson/Library/R/2.9/library/R.filesets’

The downloaded packages are in

‘/private/var/folders/8T/8TpdlpGXGzyAaNQm0Wp-zk++mHo/-Tmp-/Rtmp7MMjod/downloaded_packages’
Installing braju.com packages...done
Warning messages:
1: In hbLite(R.filesets) :
  Detected R option pkgType=mac.binary, which is not available.
Enforcing installation from source instead for packages: R.methodsS3,
R.oo, R.utils, R.filesets
2: In install.packages(pkg, lib = lib, ..., available = available) :
  installation of package 'R.filesets' had non-zero exit status
 sessionInfo()
R version 2.9.1 (2009-06-26)
i386-apple-darwin8.11.1

locale:
en_CA.UTF-8/en_CA.UTF-8/C/C/en_CA.UTF-8/en_CA.UTF-8

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

loaded via a namespace (and not attached):
[1] tcltk_2.9.1 tools_2.9.1
---


 Hi,

 this seems to be an OSX issue and I cannot reproduce it myself.  From
 the error message:

   simpleError: invalid version specification 1.1.3NA

 it looks like it there are some parsing errors when parsing version
 numbers in the PACKAGES file on the repository server, but not sure.

 See if this is only a problem with the aroma.affymetrix package or
 with all packages. Can you install individual packages from the
 braju.com server by:

 source(http://www.braju.com/R/hbLite.R;);
 hbLite(R.filesets);
 library(R.filesets);

 Q1) Does this work?

 hbLite(aroma.core);
 library(aroma.core);

 Q2) Does this work?

 hbLite(aroma.affymetrix);
 library(aroma.affymetrix);

 Q3) Does this work?

 Mark (Robinson), you mentioned this problem a few weeks ago. Can you
 help me troubleshoot this one?

 /Henrik

 2009/7/31 mbaudis mbau...@gmail.com:

 Dear Henrik,

 currently, I have an installation problem after upgrading to R 2.9.1
 (Mac OS X 10.5.7):

 ...
 everything fine up to here
 ...

 Installing/updating: CRAN:aroma.core (= 1.1.0)
 Repositories: CRAN
 Package: aroma.core (= 1.1.0)
 Tags:
 Updating packages: aroma.core from repository 'DEFAULT'
  01/01. aroma.core: not available.
 Installing/updating: CRAN:matrixStats (= 0.1.4)
 Repositories: CRAN
 Package: matrixStats (= 0.1.4)
 Tags:
 Updating packages: matrixStats from repository 'DEFAULT'
  01/01. matrixStats: v0.1.6, i.e. up to date.
 Installing/updating: CRAN:RColorBrewer
 Repositories: CRAN
 Package: RColorBrewer
 Tags:
 Updating packages: RColorBrewer from repository 'DEFAULT'
  01/01. RColorBrewer: v1.0-2, i.e. up to date.
 Installing/updating: BIOC:aroma.light (= 1.12.2)
 Repositories: BIOC
 Package: aroma.light (= 1.12.2)
 Tags:
 Package up to date: aroma.light (= 1.12.2)
 Installing/updating: BIOC:affxparser (= 1.13.8)
 Repositories: BIOC
 Package

[aroma.affymetrix] Re: Re-run aroma.affymetrix

2009-08-09 Thread Mark Robinson

Hi Anbarasu.

Comments below.

 Hi Mark,

 Thanks for your suggestions. What I have tried so far is: I removed all
outliers CEL files from rawData and re-run the analysis. I was expecting
a
 slightly different intensity distributions of chips (due to quantile
normalization) but it seems I have the same distributions that I got
with
 all chips, including outliers.

Amongst many chips, I would guess that removing a handful would have very
little effect on the overall distribution that each sample is
quantile-normalized to.  So, this doesn't surprise me.  Also, be sure that
you run the fit() and process() with force=TRUE, otherwise the code *may*
be going directly to cached results, regardless of your removal of files.


 I will try with what you have suggested.  Do I need to use extract() for
sub
 setting before or after normalization?  Are we ignoring the effect of these
 outlier chips in normalization step (if I have to use extract() after
normalization)?


I would do it after.  And yes, this ignores the effects of outlier chips,
which I suspect is minimal over a big dataset.

Cheers,
Mark


 Thanks again.

 Kind regards,
 Anbarasu

 On Thu, Aug 6, 2009 at 10:46 PM, Mark Robinson
 mrobin...@wehi.edu.auwrote:

 Hi Anbarasu.
 No, you don't have to remove all the files.  What you can do is use
extract() to extract the files that you are interested in, and create a
new AffymetrixCelSet and fit the probe level modesl only on those
samples.  You do need to be careful though and I suggest you use *tags*
so that the output results are sent to a different location on disk. 
Here is an example:
 [...] # preprocessing as before
 csN1 - extract(csN,1:12)  # take a subset
 plmTr - ExonRmaPlm(csN1, mergeGroups=TRUE, tag=*,subsetmerged)  #
add a tag
 fit(plmTr, verbose=verbose)  # fit as normal
 Hope that helps.
 Mark
 On 04/08/2009, at 8:22 PM, anbarasu wrote:
 
  Dear All,
 
  I was able to run the human exon array analysis with 120 chips. I
have
  identified few outlier chips and would like to re-run the analysis
again without these outliers. Do I need to remove all files (in
plmData, probeData, and reports) that are created by
aroma.affymetrix?
 
  Thanks in advance.
 
  Kind regards,
  Anbarasu
  
 --
 Mark Robinson, PhD (Melb)
 Epigenetics Laboratory, Garvan
 Bioinformatics Division, WEHI
 e: m.robin...@garvan.org.au
 e: mrobin...@wehi.edu.au
 p: +61 (0)3 9345 2628
 f: +61 (0)3 9347 0852
 --
 

 






--~--~-~--~~~---~--~~
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
aroma.affymetrix group.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe from this group, send email to 
aroma-affymetrix-unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/aroma-affymetrix?hl=en
-~--~~~~--~~--~--~---



[aroma.affymetrix] Re: RMA of aroma.affymetrix, affyPLM and affy

2009-08-09 Thread Mark Robinson

Hi Yiwen.

I think this thread will answer your question in detail:
http://groups.google.com/group/aroma-affymetrix/browse_thread/thread/1b0ab11fad9b4df3

In brief, the main difference is how the probe-level linear model is fit
-- median polish OR iteratively reweighted least squares with a specified
influence function.

Note that in aroma.affymetrix, you have the 'flavor' argument in the
RmaPlm and ExonRmaPlm objects.

Cheers,
Mark



 Hi,

 Following the Reproducibility of other implementations Replication
 test: RMA (background, normalization  summarization) section in
 aroma.affymetrix online document, I tried to compare the difference in
 the RMA summary of gene expression index generated by
 aroma.affymetrix, affyPLM and affy for a public dataset I am studying.

 I found that the RMA summary generated by aroma.affymetrix and affyPLM
 (fitPLM) are highly consistent, while
 the values between aroma.affymetrix/affyPLM and affy(rma function) are
 quite different (Pearson correlation is only about ~0.97) and there is
 a significant deviation from straight line in the scatter-plot. I was
 wonder  what is the main cause of the discrepency between RMA
 calculated by aroma.affymetrix/affyPLM and affly(rma function).


 Thanks  a lot.

 Yiwen Chen
 




--~--~-~--~~~---~--~~
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
aroma.affymetrix group.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe from this group, send email to 
aroma-affymetrix-unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/aroma-affymetrix?hl=en
-~--~~~~--~~--~--~---



[aroma.affymetrix] Re: Re-run aroma.affymetrix

2009-08-06 Thread Mark Robinson

Hi Anbarasu.

No, you don't have to remove all the files.  What you can do is use  
extract() to extract the files that you are interested in, and create  
a new AffymetrixCelSet and fit the probe level modesl only on those  
samples.  You do need to be careful though and I suggest you use  
*tags* so that the output results are sent to a different location on  
disk.  Here is an example:

[...] # preprocessing as before
csN1 - extract(csN,1:12)  # take a subset
plmTr - ExonRmaPlm(csN1, mergeGroups=TRUE, tag=*,subsetmerged)  #  
add a tag
fit(plmTr, verbose=verbose)  # fit as normal

Hope that helps.
Mark

On 04/08/2009, at 8:22 PM, anbarasu wrote:


 Dear All,

 I was able to run the human exon array analysis with 120 chips. I have
 identified few outlier chips and would like to re-run the analysis
 again without these outliers. Do I need to remove all files (in
 plmData, probeData, and reports) that are created by aroma.affymetrix?

 Thanks in advance.

 Kind regards,
 Anbarasu
 

--
Mark Robinson, PhD (Melb)
Epigenetics Laboratory, Garvan
Bioinformatics Division, WEHI
e: m.robin...@garvan.org.au
e: mrobin...@wehi.edu.au
p: +61 (0)3 9345 2628
f: +61 (0)3 9347 0852
--






--~--~-~--~~~---~--~~
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
aroma.affymetrix group.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe from this group, send email to 
aroma-affymetrix-unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/aroma-affymetrix?hl=en
-~--~~~~--~~--~--~---



[aroma.affymetrix] Re: MedianNormalization

2009-07-31 Thread Mark Robinson

Hi Cathrine.

A few comments below.

On 30/07/2009, at 2:33 AM, Cathy Mitchell wrote:

 To whom it may concern,

 Is there a way of median normalising across your arrays in  
 aroma.affymetrix or can you only quantile normalise?
 Is there a way of finding out all the other methods that are  
 available in aroma.affymetrix as the only information I've been able  
 to find is through the google groups.

There is a ScaleNormalization, but it appears to be specific to SNP  
chips.  It should be quite easy to implement, if it is not there.   
Maybe Henrik can comment on that one.

As for getting help, you can call:

help.start()

... and then find the aroma.affymetrix package and a bunch of documents.


 Is there a way to quantile normalise between replicate arrays only  
 instead of all of the arrays?
 How does one pull out a single array from the cel set?

Yes, you can do this with extract().  For example, say 'cs' is an  
AffymetrixCelSet object.  You can extract the first 2 samples with:

csSubset - extract(cs, 1:2)

Hope that helps.
Mark




 Thank you very much.

 -- 
 Cathrine Mitchell


 


--
Mark Robinson, PhD (Melb)
Epigenetics Laboratory, Garvan
Bioinformatics Division, WEHI
e: m.robin...@garvan.org.au
e: mrobin...@wehi.edu.au
p: +61 (0)3 9345 2628
f: +61 (0)3 9347 0852
--






--~--~-~--~~~---~--~~
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
aroma.affymetrix group.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe from this group, send email to 
aroma-affymetrix-unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/aroma-affymetrix?hl=en
-~--~~~~--~~--~--~---



[aroma.affymetrix] Re: Differential expression analysis

2009-07-31 Thread Mark Robinson

Bonjour Mathieu.

A few comments below.

On 30/07/2009, at 4:35 AM, Mathieu wrote:


 Salut !!

 I am analysing  rat exon arrays from Affymetrix with Aroma.Affymetrix
 (sessionInfo() bellow).

 What I want, is a differential expression analysis between my two
 groups.

 Is extracting the matrix of plmTr and pass a log2() of that to LIMMA
 the right thing to do ? I am asking because it seems to be what people
 are doing to do such analysis here.

Yep.  That seems like a reasonably 'standard' thing to do.

 But by doing so, aren't we losing
 some analysis power by losing the different statistics involved by the
 fact that all those probes in a transcript may have different
 behaviors ? and all those of an exon ?

 I was proposed to run LIMMA on all the probes of the array, get the t
 statistics out of it and then using a Wilcox Ranked test to compare
 each units to the distribution of all the t statistics... That's
 feasible for me but will involve a lot of head scratching :)

You *could* do any number of things.  But, you would have to justify  
all these steps and demonstrate that it does better than the  
standard methods.  That is generally difficult to do.


 Honestly, I am not a very good statistician (yet!) and a begginner
 programmer and I want the safe way to have my differential expression
 between my two groups, done using good statistics.

Seems like a good starting point would be RMA (or GCRMA) at the gene- 
level and a limma analysis afterwards.



 Second question. I don't get anything out of my analysis if I use a
 FDR correction. I thought that would be ok if I used the core cdf
 only, but it seems to not be the case. Nothing is significant between
 my two groups.. :/
 Is the following the right thing to do ?
 
 method - fdr
 pval - 0.05
 lfc  -  1   # log2(2)
 results - decideTests(fit.eb, adjust.method=method, p.val=pval,
 lfc=lfc)
 

This all seems pretty standard.  With respect to the fact that nothing  
is signfificant, that can be due a number of things:  data quality,  
small sample size, true differences are very small.

Hope that helps.

Cheers,
Mark




 Thanks a lot in advance ! If needed, I could forward the whole code
 but it's basically the user case example of the Human exon array.

 Merci,
 Mathieu
 McGill University
 *
 sessionInfo()
 R version 2.8.1 (2008-12-22)
 x86_64-unknown-linux-gnu

 attached base packages:
 [1] stats graphics  grDevices utils datasets  methods
 base

 other attached packages:
 [1] plotrix_2.5-2  limma_2.16.4
 aroma.affymetrix_1.1.1
 [4] aroma.apd_0.1.6affxparser_1.14.2
 R.huge_0.1.8
 [7] aroma.core_1.1.2   aroma.light_1.12.2
 matrixStats_0.1.6
 [10] R.rsp_0.3.4R.filesets_0.5.2
 digest_0.3.1
 [13] R.cache_0.1.7  R.utils_1.1.7
 R.oo_1.4.8
 [16] EBImage_2.6.0  R.methodsS3_1.0.3

 

--
Mark Robinson, PhD (Melb)
Epigenetics Laboratory, Garvan
Bioinformatics Division, WEHI
e: m.robin...@garvan.org.au
e: mrobin...@wehi.edu.au
p: +61 (0)3 9345 2628
f: +61 (0)3 9347 0852
--






--~--~-~--~~~---~--~~
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
aroma.affymetrix group.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe from this group, send email to 
aroma-affymetrix-unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/aroma-affymetrix?hl=en
-~--~~~~--~~--~--~---



[aroma.affymetrix] Re: CDF creation

2009-07-24 Thread Mark Robinson

Hi Naresh.

That particular script (flat2Cdf.R) creates a PM-only CDF.

But, if you understand the format of a CDF file -- you might read in the
your original CDF with readCdf() from the 'affxparser' package to
understand it -- then you should be able to modify the script to include
MM probes.

Keep in mind that we make this script available to explain what we have
done in the past, not as a cure-all for CDF creation.

Cheers,
Mark


 Hello Group,

 I created custom cdf for HG-U133_Plus_2 array by redefining probesets
(mapping original probes to exons and redefining probesets according to
the probes mapping to the same exons considered as new probe set) . I
downloaded Oiginal CDF fo this array from affy website.I tried to access
a particular probe i'm able to get both PM and MM values but for my
customized CDF i'm getting only PM values for a probe but not MM
values.So could you please confirm whether this customized CDF does not
contain MM values or i'm doing a mistake in creating CDF.

 If there is a mistake please suggest me some way to rectify it.
 I created CDF using
 Source :
 http://groups.google.com/group/aroma-affymetrix/web/creating-cdf-files-from-scratch

 Thanks and Regards
 Naresh P

 






--~--~-~--~~~---~--~~
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
aroma.affymetrix group.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe from this group, send email to 
aroma-affymetrix-unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/aroma-affymetrix?hl=en
-~--~~~~--~~--~--~---



[aroma.affymetrix] Re: Discussion on using-the-genomegraphs-package-with-firma

2009-07-16 Thread Mark Robinson

Hi John.

See below.


On 16/07/2009, at 9:30 AM, JFP wrote:

 Hi Mark,
 Thanks for making our code available, its a great help.  I have an
 error reproducing part of it which is bothering me.
 cdf
 AffymetrixCdfFile:
 Path: annotationData/chipTypes/HuEx-1_0-st-v2
 Filename: HuEx-1_0-st-v2,mainR3,A20071112,EP.cdf
 Filesize: 207.11MB
 Chip type: HuEx-1_0-st-v2,mainR3,A20071112,EP
 RAM: 10.76MB
 File format: v4 (binary; XDA)
 Dimension: 2560x2560
 Number of cells: 6553600
 Number of units: 312355
 Cells per unit: 20.98
 Number of QC units: 1
 u - indexOf(cdf,3400034)
 ugcM - getUnitGroupCellMap(getCdf(ds), units=u, retNames=TRUE)
 ind - getCellIndices(cdf,units=ugcM 
 $cell,verbose=verbose,useNames=FALSE,unlist=TRUE)

Thanks for pointing this out.  You'll want to change this line above to:

ind -  
getCellIndices(cdf,units=u,verbose=verbose,useNames=FALSE,unlist=TRUE)

Or, perhaps more elegantly:

ind - ugcM$cell

Not sure how that got in there.  I've fixed the docs online.

 BTW (for fellow windows sufferers)
 The way I generated the small probeset file was with a perl script
 (one free implementation is with Activestate perl and UnxUtils). Here
 is the script:
 #!C:/Perl/bin/perl
 while(){
 next if(/^#/);
   chomp;
   s///g;
   @sl = split(/,/,$_);
   for $i ( 0..7) {print $sl[$i],,;}
   print $sl[8],\n;
   }

 save it to foo.pl then (say using the unix.sh from UnxUtils) go
 perl foo.pl HuEx-1_0-st-v2.na28.hg18.probeset.csv


For those that suffer by using windows, you could alternatively  
install the cygwin tools and then have access to the standard battery  
of command line tools, such as grep/awk from the example.

Cheers,
Mark


 regards,
 John

 

--
Mark Robinson, PhD (Melb)
Epigenetics Laboratory, Garvan
Bioinformatics Division, WEHI
e: m.robin...@garvan.org.au
e: mrobin...@wehi.edu.au
p: +61 (0)3 9345 2628
f: +61 (0)3 9347 0852
--






--~--~-~--~~~---~--~~
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
aroma.affymetrix group.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe from this group, send email to 
aroma-affymetrix-unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/aroma-affymetrix?hl=en
-~--~~~~--~~--~--~---



[aroma.affymetrix] Re: Calling enriched regions in tiling array experiments

2009-07-06 Thread Mark Robinson

Hi Lars.

Apologies for the slow response.

I do have some scripts for calling enriched regions, but they are not
really ready for public consumption.  And, they are geared more towards
ChIP-chip of modified histones / DNA methylation than either TF binding or
transcript expression.  We have implemented something similar to MAT's
region calling procedure that is hopefully a bit more flexible.

 Secondly, do you plan to implement further functions for tiling arrays, in
 particular for transcript discovery (which is  similar to identifying
 chip-chip regions but involves discrete steps between signal and no-signal
 regions) and for detection of differential splicing?

My work with Affy tiling arrays is ongoing so we are planning to implement
more things within aroma.affymetrix.  However, the development is slow
since at this stage, it is just me working on it.  I encourage you to
contribute a previously proposed method or even a new one.

This makes me think of segmentation, of which aroma.affymetrix does have
implementations for (e.g. CbsModel), more in the context of copy number
data.  Maybe you can use those routines.

Cheers,
Mark



 Dear Aroma team,

 I enjoyed using the MAT implementation in aroma, but now I wonder how to
 best proceed for calling of enriched regions? Are there any functions /
 scripts available?

 Secondly, do you plan to implement further functions for tiling arrays, in
 particular for transcript discovery (which is  similar to identifying
 chip-chip regions but involves discrete steps between signal and no-signal
 regions) and for detection of differential splicing?

 Thank you very much for your help.

 Best wishes,

 Lars




 




--~--~-~--~~~---~--~~
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
aroma.affymetrix group.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe from this group, send email to 
aroma-affymetrix-unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/aroma-affymetrix?hl=en
-~--~~~~--~~--~--~---



[aroma.affymetrix] Re: Gene-Level Summarization of Expression Data

2009-06-20 Thread Mark Robinson

Hi Steve.

I don't know how common this is.  Basically, a colleague found a gene  
that was very differentially expressed when analyzing using the  
Affymetrix probesets definition and found virtually nothing when using  
the custom CDF that bundles all the probes for a gene together.  The  
reason was simple.  There were several probesets designed for this  
gene and presumably they measure different isoforms.  The probes for  
the DE probeset showed the difference, but all the other probesets  
didn't.  When you use a robust linear model like RMA, outliers get  
downweighted.  Because the DE probes accounted for a small proportion  
of the probes (I think there was 3 or 4 other probesets at this  
locus), their effect got washed out.

So, its a tradeoff.  Sometimes (perhaps most of the time) you gain by  
lumping them all together ... more information, more power to detect  
changes.  But, sometimes (perhaps rarely) it can mislead.  I'm sure  
I'm not the only one to observe such things.  The probe-level data  
(usually?) doesn't lie.  But, since you are comparing across  
platforms, you will undoubtedly find this as you go along.  Different  
microarray designs often measure slightly different things.

One other thing.  Be sure to convert your CDF to binary if it is not  
already using affxparser's convertCdf().  Having this info stored in  
binary format will make the processing much faster.  I think the MBNI  
custom CDFs are text.

Cheers,
Mark


On 20/06/2009, at 6:55 AM, Steve P wrote:

 Mark,

 Thanks for the information. That is very helpful.

 I want to do the latter, which is to combine probesets such that all
 probes for a given gene (by some definition -- RefSeq, Ensembl, etc)
 are used to arise at the summarize value.

 I was able to obtain a custom CDF for the U133-A array. So I will try
 that approach. But part of the reason I want to do this is to be able
 to compare values across platforms, so I may need to find/build a
 custom CDF for the other platform.

 I would appreciate any cautionary advice you have about summarizing at
 the gene level.

 Regards,
 -Steve

 On Jun 17, 9:56 am, Steve Piccolo steve.picc...@gmail.com wrote:
 Yesterday I posted this question to the list, but the spam blocker  
 didn't
 let it through. Below my question is a response from Mark Robinson.

 --- 
 ---

 Following the example provided 
 athttp://groups.google.com/group/aroma-affymetrix/web/gene-1-0-st-array 
 ...
 ,
 I am running the following code:

 chipType - HT_HG-U133A
 dataSet = myData

 library(aroma.affymetrix)
 verbose - Arguments$getVerbose(-8, timestamp=TRUE)

 cdf - AffymetrixCdfFile$byChipType(chipType)
 cs - AffymetrixCelSet$byName(dataSet, cdf=cdf)

 bc - RmaBackgroundCorrection(cs)
 csBC - process(bc,verbose=verbose)
 qn - QuantileNormalization(csBC)
 csN - process(qn, verbose=verbose)

 plm - RmaPlm(csN)
 fit(plm, verbose=verbose)

 ces - getChipEffectSet(plm)
 gExprs - extractDataFrame(ces, units=NULL, addNames=TRUE)

 This seems to be working beautifully.

 However, I'm doing an analysis that requires my expression values to
 be summarized at the gene level rather than the probeset level.

 In the gExprs object that results from the above analysis, I get a
 data.frame object in which each row contains expression values for a
 given probeset across all samples. What I would love to see in each
 row is an expression value for a given gene. I believe RMA has the
 ability to do this, but I'm not sure how to do it via
 aroma.affymetrix.

 Any suggestions? I'm happy to provide any more details that would be
 helpful.

 Regards,
 -Steve

 --- 
 ---

 Hi Steve.

 As to your question, it depends on what you need.  When you say you  
 want
 every row to be a gene, do you just want to know the gene name that  
 goes
 with the probeset identifier, or do you want to combine probesets  
 such that
 all probes for a given gene (by some definition -- RefSeq, Ensembl,  
 etc) are
 used to arise at the summarize value (a la the MBNI CustomCDF)?

 If the former, then there are annotation packages within R.

 If the latter, I have a few cautionary tales of doing this, since the
 different probesets for a given locus can be measuring different  
 variants.
  But if you still want to do this, we need to make a CDF file  
 specific to
 the annotation you want.  For the standard HG-U133 arrays, I know  
 the MBNI
 guys made the CDFs and we could use those within aroma.affymetrix.   
 I don't
 know if they build custom CDFs for the HT- arrays.

 Hope that gets you started.

 Cheers,
 Mark- Show quoted text -
 --
 Mark Robinson, PhD (Melb)
 Epigenetics Laboratory, Garvan
 Bioinformatics Division, WEHI
 e: m.robin...@garvan.org.au
 e: mrobin...@wehi.edu.au
 p: +61 (0)3 9345 2628

[aroma.affymetrix] Re: FIRMA score for each transcript

2009-06-17 Thread Mark Robinson


Hi Libing.

Doesn't 'addNames=TRUE' already do this for you?


  fs1 - extractDataFrame(fs, units=1:2, addNames=TRUE)
  head(fs1[,1:6])
   unitName groupName unit group cell huex_wta_breast_A
1  2315251   23152521 11 1.1150999
2  2315251   23152531 22 0.9551846
3  2315373   23153742 13 1.5354252
4  2315373   23153752 24 0.6288152
5  2315373   23153762 35 1.5658265
6  2315373   23153772 46 1.2131032
  fs2 - extractDataFrame(fs, units=1:2, addNames=FALSE)
  head(fs2[,1:6])
   unit group cell huex_wta_breast_A huex_wta_breast_B huex_wta_breast_C
11 11 1.1150999 0.8552212 0.9177643
21 22 0.9551846 1.1747438 0.8580346
32 13 1.5354252 1.0427089 1.6461661
42 24 0.6288152 0.7053325 0.6999596
52 35 1.5658265 1.0576524 1.1404822
62 46 1.2131032 1.0494679 0.7729633

If not, please send your entire script and the output of sessionInfo().

Cheers,
Mark


On 18/06/2009, at 1:02 AM, Libing Wang wrote:

 Hi Mark,

 I am wondering if it is possible to get the actual unit  
 id(transcript cluster id) and group id(probeset id) for each firma  
 score instead of artificial number from 1 to whatever in the firma  
 score data frame.

 Thanks,

 Libing

 On Sat, Apr 11, 2009 at 5:48 PM, Mark Robinson  
 mrobin...@wehi.edu.au wrote:

 Hi Libing.

 As the error message suggests, there are no degrees of freedom for the
 fit, meaning you have no replicates.  It appears you only have 2 total
 samples, one for each group.  You wouldn't be able to use limma to do
 differential expression on any experiment with only 2 1-channel chips.

 If that is all the data you have, perhaps you are best off looking for
 large (positive or negative) values of the difference:

 fsdf - extractDataFrame(fs, addNames=TRUE)
 fsdf[,6:ncol(fsdf)] - log2(fsdf[,6:ncol(fsdf)])

 fsdf[,7] - fsdf[,6]  # B-A, assuming you've already taken logs

 Cheers,
 mark



  Hi Mark,
 
  I am trying to find differences of FIRMA scores between two chips  
 and
  don't
  know what's wrong:
 
  cls - c(A,B)
  mm - model.matrix(~cls)
  Warning message:
  In model.matrix.default(~cls) : variable 'cls' converted to a factor
  fit - lmFit(fsdf[,6:7], mm)
  Warning message:
  In lmFit(fsdf[, 6:7], mm) :
   Some coefficients not estimable: coefficient interpretation may  
 vary.
  fit - eBayes(fit)
  Error in ebayes(fit = fit, proportion = proportion, stdev.coef.lim =
  stdev.coef.lim) :
   No residual degrees of freedom in linear model fits
 
  Thanks,
 
  Libing
 
  On Tue, Apr 7, 2009 at 5:54 PM, Mark Robinson  
 mrobin...@wehi.edu.au
  wrote:
 
 
  Hi Libing.
 
  limma has quite an extensive user manual.  See link to it:
  http://www.bioconductor.org/packages/release/bioc/html/limma.html
 
  Your response still puzzles me.  You say your wording should've  
 been
  'splicing' not 'expression', but then you go on to say that you  
 want
  to do differential *expression* with limma.
 
  However, note that you can use limma on FIRMA scores as well, as
  discussed previously.  If that is what you are interested in, you
  might check the following thread:
 
  http://groups.google.com/group/aroma-affymetrix/browse_thread/thread/36d8c59d742fc503/
 
  If you give a more detailed description of what it is you are  
 doing or
  want to do, I might be better able to help.
 
  Cheers,
  Mark
 
  On 08/04/2009, at 8:10 AM, Libing Wang wrote:
 
   Hi Mark,
  
   Thank you for your reply!
   Sorry for my wrong wording! It should be splicing not  
 expression.
  
   ... then you can use log2 of the chip effects here for an  
 analysis of
   differential expression with an appropriate design matrix with  
 limma.
   Is that what you are after?
  
   Yes, this is what I want. I think I need process Affymetrix  
 probeset
   file to correlate probesets and transcripts, then use limma to do
   the analysis. I am pretty new to limma, do you have any  
 suggestions?
  
   Thanks,
  
   Libing
  
   On Tue, Apr 7, 2009 at 4:35 PM, Mark Robinson
   mrobin...@wehi.edu.au wrote:
  
   Hi Libing.
  
  
   On 08/04/2009, at 1:42 AM, Libing Wang wrote:
  
Hi,
   
I am wondering if there is a way to compute a FIRMA score for  
 each
transcript. Currently I only have FIRMA score for each  
 probeset or
group. I did as follows:
   
1. plmTr - ExonRmaPlm(csN, mergeGroups=TRUE)
2. fit(plmTr)
3. firma-FirmaModel(plmTr)
4.fit(firma)
5.fs-getFirmaScores(firma)
  
   The short answer is that FIRMA scores are really a probeset-level
   statistic, not a gene/transcript-level statistic.  This is the
   recommended use of FIRMA.
  
  
 Or with the FIRMA score of each probeset, find out which
transcripts are differentially expressed

[aroma.affymetrix] Re: FIRMAGene

2009-06-11 Thread Mark Robinson

Hi Nick.

At present, FIRMAGene is not actually part of the aroma.affymetrix  
project, although it makes use of it.  So, I will reply to this off  
the aroma.affymetrix mailing list, except to say that FIRMAGene is now  
hosted by R-forge.  See the following link for details:

http://bioinf.wehi.edu.au/folders/firmagene/

When (and if) time permits, I plan to add FIRMAGene to  
aroma.affymetrix, so that it can share the same memory efficiency and  
mailing list support.

Cheers,
Mark


On 11/06/2009, at 10:31 PM, nmcgli...@googlemail.com wrote:


 Hello,

 I have two questions regarding FIRMAGene:

 1. The same as the first in this thread: using the code from sup3.r
 when I try to load the FIRMAGene library or execute the FIRMAGene
 command I get the following errors:

 library(FIRMAGene)
 Error in base::library(...) : there is no package called 'FIRMAGene'

 fg - FIRMAGene(plm, idsToUse=u)
 Error: could not find function FIRMAGene

 I'm using aroma.affymetrix v1.1.0 with R2.9.0 on MacOSX 10.5.7

 2. I'm unsure of what this command is doing and how it needs to be
 changed to accommodate my own data:

 cls - gsub(TisMap_,,gsub(_0[1-3]_v1_WTGene1,,getNames(cs)))

 Many thanks,

 Nick

 

--
Mark Robinson, PhD (Melb)
Epigenetics Laboratory, Garvan
Bioinformatics Division, WEHI
e: m.robin...@garvan.org.au
e: mrobin...@wehi.edu.au
p: +61 (0)3 9345 2628
f: +61 (0)3 9347 0852
--






--~--~-~--~~~---~--~~
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
aroma.affymetrix group.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe from this group, send email to 
aroma-affymetrix-unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/aroma-affymetrix?hl=en
-~--~~~~--~~--~--~---



[aroma.affymetrix] Re: SNPs affecting EXon splicing detection

2009-06-11 Thread Mark Robinson

Hi Sabrina.

The Unit_ID can be any transcript cluster identifier of your  
choice.  The easiest may be to use the Affymetrix transcript cluster  
identifier itself ... available from:

http://www.affymetrix.com/analysis/downloads/current_exon/MoEx-1_0-st-v1.mm9.probeset.csv.zip

See the 'transcript_cluster_id' column.  Perhaps only take the core  
probes, as defined in the the 'level' column?

Note: we used Ensembl in that flat2Cdf() example since we were using a  
custom organization (i.e. non-Affy) of the probesets.

Cheers,
Mark


On 11/06/2009, at 10:58 PM, sabrina wrote:


 Hi, Mark:
 for the Unit_id, does it have to be Ensembl gene ID like ENSMUSG?
 Lots of genes do not have ensembl assignment from Affy annotation
 file. There are lots of missing annotaions, and I still have not found
 any good way to deal with it. Do you have any suggestions?

 Thanks

 Sabrina

 On Jun 10, 12:32 am, Mark Robinson mrobin...@wehi.edu.au wrote:
 Hi Sabrina.

 How about you try and create a 'flat' file like the one described  
 at:http://groups.google.com/group/aroma-affymetrix/web/creating-cdf-file 
 ...

 Presumably, you will be comfortable with the Exon Array's 'probetab'
 file by now and possibly the Affymetrix annotation CSV file and so  
 you
 should have access to all this information.

 For example, from the following table:

 mac1618:HuEx-1_0-st-v2.probe.tab mrobinson$ head HuEx-1_0-st-
 v2.probe.tab
 Probe IDProbe Set IDprobe x probe y assembly 
 seqname start   stop
 strand  probe sequence  target strandedness category
 494998  2315101 917 193 build-34/hg16   chr11788 
 1812+
 CACGGGAAGTCTGGGCTAAGAGACA   Sense   main
 1734213 2315101 1092677 build-34/hg16   chr11973 
 1997+
 ACACCAGAAGATGAACAATGG   Sense   main
 4767517 2315101 796 1862build-34/hg16   chr11992 
 2016+
 ATTAAGTTACATGCAGACAACAGGG   Sense   main
 4286427 2315101 986 1674build-34/hg16   chr12006 
 2030+
 TGCCTGGTTGTGGTATTAAGTTACA   Sense   main
 5760145 2315102 144 2250build-34/hg16   chr12520 
 2544+
 TCGGCCGTCGTCTTCTGCAGCTCTG   Sense   main
 671410  2315102 689 262 build-34/hg16   chr12523 
 2547+
 AAGTCGGCCGTCGTCTTCTGCAGCT   Sense   main
 4275780 2315102 579 1670build-34/hg16   chr12526 
 2550+
 TCCAAGTCGGCCGTCGTCTTCTGCA   Sense   main
 4293462 2315102 341 1677build-34/hg16   chr12531 
 2555+
 TGTGATCCAAGTCGGCCGTCGTCTT   Sense   main
 53882315103 267 2   build-34/hg16   chr12927 
 2951+
 CTGTCTGTCGACCCAGCTGGAGGCA   Sense   main
 [snip]

 ... you see the second column is the probeset_id, which would be used
 as the Group_ID column for your flat file.  Depending on whether  
 you
 are using the Ensembl CDF or the Affymetrix annotation, you would  
 need
 to create a mapping to get the transcript cluster id column (here,  
 the
 Unit_ID).  Everything else you need (Probe_Sequence, X, Y,  
 Probe_ID)
 is within the table above.

 Then, it would be just a matter of filtering OUT those probes that
 overlap a SNP, which based on your mapping exercise, you must have a
 list of.  Then, make a call to the flat2Cdf() script and hopefully
 you'll be off and running.

 Let me know how you go.

 Cheers,
 Mark

 On 10/06/2009, at 1:00 PM, sabrina wrote:





 Thanks , Mark!
 Can you show me /walk me through how to get a new snp-free CDF ? I
 finally got the right version of snp and probe mapping so I am ready
 to try it out!

 Sabrina

 On Jun 6, 3:14 am, Mark Robinson mrobin...@wehi.edu.au wrote:
 Hi Sabrina.

 Comments below.

 On 06/06/2009, at 1:57 AM, sabrina wrote:

 Hi, Mark:
 I finally found the SNP data set that is suitable for my case.  
 As I
 understand, aroma used RMA to estimate gene level and exon level
 intensities. After I estimate gene level (transcript level), I can
 use
 FIRMA to estimate residual for each exon and compose a score as
 described in the paper . My question is: if there is a SNP
 difference
 between two strains within one exon, should I exclude that exon  
 from
 estimating transcript level value? My guess is probably no.

 If the SNP affects only 1 probe in an entire transcript, I would
 expect it to have very little impact on the gene-level summary.   
 And,
 especially so if there are a large number of total probes for that
 gene.  It may have a noticeable effect on the probe effect.

 So will it
 be a good idea if I exclude that exon after I calculate all FIRMA
 scores or  should I exclude these exons after I estimate  
 residuals ,
 but only used these residuals not affected by SNPs for firma score
 estimation? Thanks

 Keep in mind the residuals are calculated at the probe-level, not  
 the
 probeset-level.  The FIRMA score is then a summary of the all the
 residuals for a probeset.

 I think you have (at least) 3 choices:

 1. (preferred, i would think) you could

[aroma.affymetrix] Re: CDF files for Affymetrix whole transcript arrays (Gene 1.0, Exon 1.0)

2009-06-11 Thread Mark Robinson

Hi Dick.

(I've copied the aroma.affymetrix list in case others have the same  
question).

I remember doing an update *late* last year (don't remember exactly  
when) and recreating the CDFs --- the contents were identical to  
previous ones.  That is, the Affymetrix annotation (at least how  
probesets are put into transcript clusters) doesn't change that much.   
I haven't checked recently -- do you know if Affy has made some major  
changes?  If so, I'm happy to recreate them.  But, just because they  
are dated July 2008, doesn't mean they have old information.

However, if you use the Ensembl builds (for Exon 1.0ST), these change  
more regularly.  The last one I built was Ensembl 50 and I see Ensembl  
is now at v54.

In general, building CDFs for Exon arrays is described at:
http://groups.google.com/group/aroma-affymetrix/web/creating-cdf-files-from-scratch


Cheers,
Mark


On 12/06/2009, at 9:23 AM, dbe...@u.washington.edu wrote:

 Hi Mark,

 I was wondering if there was any planned updates to the Mouse Exon 1.0
 ST cdf files?  I noticed following your link:
 http://groups.google.com/group/aroma-affymetrix/web/moex-1-0-st
 that the last date was about a year ago.

 Or, if you could direct me to some document that would tell me how to
 build custom cdfs, that would be good too.

 Thanks very much,
 Dick

 On Aug 4 2008, 10:05 pm, Mark Robinson mrobin...@wehi.edu.au wrote:
 Hi folks.

 I've been updating/creating CDF files for some of the recent
 Affymetrix whole transcript expression arrays (i.e. Human/Mouse/Rat
 Gene/Exon1.0 ST).  You should be able to just download the relevant
 CDF, put it in the correct location (annotationData/chipTypes/ as
 well as putting the data in the correct location) and run your
 standard sorts of analyses (e.g. BG correction, normalization,
 summarization, quality assessement) as described in the user's  
 guide:http://groups.google.com/group/aroma-affymetrix/web/users-guide

 So far, I have created default versions, based on the Affymetrix
 (probeset CSV/unsupported CDF) annotation, for:

 Human Gene 1.0 
 SThttp://groups.google.com/group/aroma-affymetrix/web/hugene-1-0-st

 MouseGene 1.0 
 SThttp://groups.google.com/group/aroma-affymetrix/web/mogene-1-0-st-v1

 Rat Gene 1.0 
 SThttp://groups.google.com/group/aroma-affymetrix/web/ragene-1-0-st-v1

 MouseExon1.0 
 SThttp://groups.google.com/group/aroma-affymetrix/web/moex-1-0-st

 RatExon1.0 
 SThttp://groups.google.com/group/aroma-affymetrix/web/raex-1-0-st-v1

 ...

 HumanExon1.0 ST has seen a little more development as Ken, Elizabeth
 and I all have worked directly with data from these, so we have some
 custom CDFs (and more on the way), based on different annotation
 sources.  More information can be found at (and a big thank you to
 Elizabeth for the code that creates 
 these):http://groups.google.com/group/aroma-affymetrix/web/huex-1-0-st-v2

 If there are others on this list who want custom CDFs designed for  
 any
 of these platforms, let me know and I can at least point you in the
 direction of how to create them.  This is typically not a hard thing
 to do.

 Cheers,
 Mark

--
Mark Robinson, PhD (Melb)
Epigenetics Laboratory, Garvan
Bioinformatics Division, WEHI
e: m.robin...@garvan.org.au
e: mrobin...@wehi.edu.au
p: +61 (0)3 9345 2628
f: +61 (0)3 9347 0852
--






--~--~-~--~~~---~--~~
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
aroma.affymetrix group.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe from this group, send email to 
aroma-affymetrix-unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/aroma-affymetrix?hl=en
-~--~~~~--~~--~--~---



[aroma.affymetrix] Re: SNPs affecting EXon splicing detection

2009-06-09 Thread Mark Robinson

Hi Sabrina.

How about you try and create a 'flat' file like the one described at:
http://groups.google.com/group/aroma-affymetrix/web/creating-cdf-files-from-scratch

Presumably, you will be comfortable with the Exon Array's 'probetab'  
file by now and possibly the Affymetrix annotation CSV file and so you  
should have access to all this information.

For example, from the following table:

mac1618:HuEx-1_0-st-v2.probe.tab mrobinson$ head HuEx-1_0-st- 
v2.probe.tab
Probe IDProbe Set IDprobe x probe y assemblyseqname start   
stop 
strand  probe sequence  target strandedness category
494998  2315101 917 193 build-34/hg16   chr117881812+   
 
CACGGGAAGTCTGGGCTAAGAGACA   Sense   main
1734213 2315101 1092677 build-34/hg16   chr119731997+   
 
ACACCAGAAGATGAACAATGG   Sense   main
4767517 2315101 796 1862build-34/hg16   chr119922016+   
 
ATTAAGTTACATGCAGACAACAGGG   Sense   main
4286427 2315101 986 1674build-34/hg16   chr120062030+   
 
TGCCTGGTTGTGGTATTAAGTTACA   Sense   main
5760145 2315102 144 2250build-34/hg16   chr125202544+   
 
TCGGCCGTCGTCTTCTGCAGCTCTG   Sense   main
671410  2315102 689 262 build-34/hg16   chr125232547+   
 
AAGTCGGCCGTCGTCTTCTGCAGCT   Sense   main
4275780 2315102 579 1670build-34/hg16   chr125262550+   
 
TCCAAGTCGGCCGTCGTCTTCTGCA   Sense   main
4293462 2315102 341 1677build-34/hg16   chr125312555+   
 
TGTGATCCAAGTCGGCCGTCGTCTT   Sense   main
53882315103 267 2   build-34/hg16   chr129272951+   
 
CTGTCTGTCGACCCAGCTGGAGGCA   Sense   main
[snip]

... you see the second column is the probeset_id, which would be used  
as the Group_ID column for your flat file.  Depending on whether you  
are using the Ensembl CDF or the Affymetrix annotation, you would need  
to create a mapping to get the transcript cluster id column (here, the  
Unit_ID).  Everything else you need (Probe_Sequence, X, Y, Probe_ID)  
is within the table above.

Then, it would be just a matter of filtering OUT those probes that  
overlap a SNP, which based on your mapping exercise, you must have a  
list of.  Then, make a call to the flat2Cdf() script and hopefully  
you'll be off and running.

Let me know how you go.

Cheers,
Mark

On 10/06/2009, at 1:00 PM, sabrina wrote:


 Thanks , Mark!
 Can you show me /walk me through how to get a new snp-free CDF ? I
 finally got the right version of snp and probe mapping so I am ready
 to try it out!

 Sabrina

 On Jun 6, 3:14 am, Mark Robinson mrobin...@wehi.edu.au wrote:
 Hi Sabrina.

 Comments below.

 On 06/06/2009, at 1:57 AM, sabrina wrote:



 Hi, Mark:
 I finally found the SNP data set that is suitable for my case. As I
 understand, aroma used RMA to estimate gene level and exon level
 intensities. After I estimate gene level (transcript level), I can  
 use
 FIRMA to estimate residual for each exon and compose a score as
 described in the paper . My question is: if there is a SNP  
 difference
 between two strains within one exon, should I exclude that exon from
 estimating transcript level value? My guess is probably no.

 If the SNP affects only 1 probe in an entire transcript, I would
 expect it to have very little impact on the gene-level summary.  And,
 especially so if there are a large number of total probes for that
 gene.  It may have a noticeable effect on the probe effect.

 So will it
 be a good idea if I exclude that exon after I calculate all FIRMA
 scores or  should I exclude these exons after I estimate residuals ,
 but only used these residuals not affected by SNPs for firma score
 estimation? Thanks

 Keep in mind the residuals are calculated at the probe-level, not the
 probeset-level.  The FIRMA score is then a summary of the all the
 residuals for a probeset.

 I think you have (at least) 3 choices:

 1. (preferred, i would think) you could remove all affected *probes*
 (via the creation of a SNP-affected-probe-free CDF) in advance, then
 run FIRMA as normal.  I can help with this if you tell me which  
 probes
 are affected.

 2. remove the affected *probesets* afterwards, since you may not
 believe the FIRMA scores for which these are based on.

 3. as you suggested, only calculate FIRMA scores from unaffected
 residuals.  But, the information you require to do this is the same
 information required to do #1 and it would seems like #1 is  
 preferred.

 The good thing about option #1 is you would still have some ability  
 to
 detect differential splicing for the probeset (instead of tossing it
 away), albeit with the smaller number of remaining unaffected probes.

 Cheers,
 Mark



 Sabrina

 On Apr 30, 3:46 am, Mark Robinson mrobin...@wehi.edu.au wrote:
 Hi Sabrina.

 I have not had to deal with this myself, but I do know that it  
 exists
 and I can at least

[aroma.affymetrix] Re: SNPs affecting EXon splicing detection

2009-06-06 Thread Mark Robinson

Hi Sabrina.

Comments below.

On 06/06/2009, at 1:57 AM, sabrina wrote:


 Hi, Mark:
 I finally found the SNP data set that is suitable for my case. As I
 understand, aroma used RMA to estimate gene level and exon level
 intensities. After I estimate gene level (transcript level), I can use
 FIRMA to estimate residual for each exon and compose a score as
 described in the paper . My question is: if there is a SNP difference
 between two strains within one exon, should I exclude that exon from
 estimating transcript level value? My guess is probably no.

If the SNP affects only 1 probe in an entire transcript, I would  
expect it to have very little impact on the gene-level summary.  And,  
especially so if there are a large number of total probes for that  
gene.  It may have a noticeable effect on the probe effect.


 So will it
 be a good idea if I exclude that exon after I calculate all FIRMA
 scores or  should I exclude these exons after I estimate residuals ,
 but only used these residuals not affected by SNPs for firma score
 estimation? Thanks

Keep in mind the residuals are calculated at the probe-level, not the  
probeset-level.  The FIRMA score is then a summary of the all the  
residuals for a probeset.

I think you have (at least) 3 choices:

1. (preferred, i would think) you could remove all affected *probes*  
(via the creation of a SNP-affected-probe-free CDF) in advance, then  
run FIRMA as normal.  I can help with this if you tell me which probes  
are affected.

2. remove the affected *probesets* afterwards, since you may not  
believe the FIRMA scores for which these are based on.

3. as you suggested, only calculate FIRMA scores from unaffected  
residuals.  But, the information you require to do this is the same  
information required to do #1 and it would seems like #1 is preferred.

The good thing about option #1 is you would still have some ability to  
detect differential splicing for the probeset (instead of tossing it  
away), albeit with the smaller number of remaining unaffected probes.

Cheers,
Mark





 Sabrina

 On Apr 30, 3:46 am, Mark Robinson mrobin...@wehi.edu.au wrote:
 Hi Sabrina.

 I have not had to deal with this myself, but I do know that it exists
 and I can at least suggest a possible route to exclude affected  
 exons.

 Presumably, there is a database (dbSNP?) that tells you the genome
 locations of each SNP for your strains.  There is also a probe.tab
 file from Affymetrix that gives you the mapped genome locations of
 each probe (or you could take the sequences from the same file and  
 map
 them yourself with a tool like BLAT).  It is then just a matter of
 looking whether each probe maps to a location on the genome that
 overlaps a SNP.  There is probably a Bioconductor tool for this or  
 you
 could create a hash, etc.

 There are a couple levels at which you might introduce this to your
 analysis.  You could remove individual probes that are affected.  On
 the aroma.affymetrix side, this would require creating a new CDF with
 those affected probes not included (a bit tricky but doable).  Or,  
 you
 could simply post-process your existing results and remove probesets
 that have an affected probe (easier but not as elegant).

 You might've also seen:

 Duan S, Zhang W, Bleibel WK, Cox NJ, Dolan ME: SNPinProbe 1.0: A
 database for filtering out
 probes in the Affymetrix GeneChip(R) HumanExon1.0 ST array
 potentially affected bySNPs.
 Bioinformation 2008, 2(10):469{470.

 Hope that gets you started.

 Cheers,
 Mark

 On 30/04/2009, at 6:07 AM, sabrina wrote:



 Hi, all:
 I am using Aroma for detectingexonskipping events around two groups
 (two different strains). I found out that several of my top hits
 indeed includes at least one SNP between two strains. I wonder if
 anyone has some suggestion about how to deal with this situation.  
 If I
 need to remove all affected exons from analysis, how can I do it? I
 never worked with SNP data before, can anyone give me a hint?  
 Thanks a
 lot!

 Sabrina

 --
 Mark Robinson
 Epigenetics Laboratory, Garvan
 Bioinformatics Division, WEHI
 e: m.robin...@garvan.org.au
 e: mrobin...@wehi.edu.au
 p: +61 (0)3 9345 2628
 f: +61 (0)3 9347 0852
 --
 

--
Mark Robinson, PhD (Melb)
Epigenetics Laboratory, Garvan
Bioinformatics Division, WEHI
e: m.robin...@garvan.org.au
e: mrobin...@wehi.edu.au
p: +61 (0)3 9345 2628
f: +61 (0)3 9347 0852
--






--~--~-~--~~~---~--~~
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
aroma.affymetrix group.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe from this group, send email

[aroma.affymetrix] Re: FIRMAGene

2009-06-04 Thread Mark Robinson


Hi Ettore.

Comments below.

 I have the following question. In the sup3.R file the probe level
 model fitting is realised using the instructions:

 plm - RmaPlm(csNU)
 fit(plm, verbose=verbose)

 where csNU is an object obtained after background correction, quantile
 normalisation and conversion of the cdf to a unique version.

The conversion to 'unique' is actually done both on the CDF and the data. 
This is simply to dance around the fact that a handful of probes are used
in multiple probesets.

 I suppose that this approach should enable the exon-level analysis of
 the Gene 1.0 data, as required by FIRMAGene. However I don't
 understand where is the difference since the methods are the same as
 in the gene-level analysis of such data.

I'm actually not sure what it is you are asking here.  Indeed, the
methodology of FIRMAGene operates on the results (specifically, the
residuals) of your standard RMA probe level model.  This is, however,
quite different to the standard DE analysis, if that is what you mean by
gene-level analysis.

Hope that helps.

Cheers,
Mark






 Thanks,


 Ettore M.


 On May 29, 3:16 pm, rhizomorph cognitiontechnic...@yahoo.com wrote:
 I have the same question as Ettore. I installed the aroma.affymetrix
 package (and all supporting packages), but nowhere can I find a source
 to download and install the FIRMAGene package that the SUP3.R script
 clearly calls for.

 Rhizomorph.

 On May 29, 3:15 am, ettore mosca ettore.mos...@gmail.com wrote:

  Dear aroma.affymetrix developers,

  I'm very interested in using Gene 1.0 ST platform for alternative
  splicing. I read in your paper Differential splicing using
  whole-transcript microarrays that FIRMAGene is freely available as R
  package but I can not load the library following the instruction in
 the
  third additional file sup3.r (I installed and loaded
 aroma.affymetrix
  successfully)

  How do I install and load FIRMAGene library?

  Thanks,

  Ettore M.

  --
  Ettore M.

 http://www.ettoremosca.it
 




--~--~-~--~~~---~--~~
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
aroma.affymetrix group.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe from this group, send email to 
aroma-affymetrix-unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/aroma-affymetrix?hl=en
-~--~~~~--~~--~--~---



[aroma.affymetrix] Re: Discussion on affymetrix-defined-transcript-clusters

2009-05-08 Thread Mark Robinson

Hi.

Check the page for the MoEx-1_0-st chip, it has CDF files you can  
download:
http://groups.google.com/group/aroma-affymetrix/web/moex-1-0-st

Cheers,
Mark

On 08/05/2009, at 11:35 PM, telos wrote:


 Hi,

 I'd like to run FIRMA on mouse Exon array data. For this, it seem that
 I need special mouse Exon array CDF files... any advice on how I can
 get hold of them?

 Thanks

 

--
Mark Robinson
Epigenetics Laboratory, Garvan
Bioinformatics Division, WEHI
e: m.robin...@garvan.org.au
e: mrobin...@wehi.edu.au
p: +61 (0)3 9345 2628
f: +61 (0)3 9347 0852
--





--~--~-~--~~~---~--~~
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
aroma.affymetrix group.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe from this group, send email to 
aroma-affymetrix-unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/aroma-affymetrix?hl=en
-~--~~~~--~~--~--~---



[aroma.affymetrix] Re: Compare the splicing pattern of two samples

2009-05-06 Thread Mark Robinson

 Hi Mark:

 After I have obtained the statistically significant differentially
 expression(DE) exons between the two groups using limma, I want further to
 explore the splicing pattern of the DE exons in each group. As I know, for
 each array, FIRMA scores each exon as to whether its probes systematically
 deviate from the expected gene expression level. Is there any way to get a
 summary of firma score for each exon across all the arrays belonging to
 the
 same group? Thanks!


How about the average of the FIRMA scores as a summary?

Mark



 Xinjun

 On Mon, May 4, 2009 at 7:40 AM, Mark Robinson mrobin...@wehi.edu.au
 wrote:


 Hi Xinjun.

 Here, 'unitName' is the transcript cluster id and 'groupName' is the
 probeset id, as defined by Affymetrix.  The 'unit', 'group' and 'cell'
 columns are indices and you may not need these.  To find out what the
 unitName and groupName correspond to, I would consult the Affymetrix
 annotation files.

 (Assuming Human Exon 1.0 ST), if you go to:

 http://www.affymetrix.com/products_services/arrays/specific/exon.affx#1_4

 and find the section Current NetAffx Annotation Files you'll find 2
 CSV files that you can download, one for transcript clusters and one
 for probesets.  These should give you genome coordinates, Genbank/
 RefSeq/Ensembl identifiers, gene symbols, etc.

 Hope that helps.
 Mark


 On 04/05/2009, at 12:02 AM, Xinjun Zhang wrote:

  Hi:
 
  Thanks very much for your great help. But I still have difficulty in
  understanding the first 5 columns of fsDF, as you have taken as an
  example:
 
   head(fsDF[,1:5])
unitName groupName unit group cell
  1  2315251   23152521 11
  2  2315251   23152531 22
  3  2315373   23153742 13
  4  2315373   23153752 24
  5  2315373   23153762 35
  6  2315373   23153772 46
 
  What does each column, especially the unitName and groupName mean ?
  And how can I correlate unitName and groupName to gene name and exon
  number? Thanks in advance!
 
  Xinjun
 
  On Thu, Apr 30, 2009 at 3:31 PM, Mark Robinson
  mrobin...@wehi.edu.au wrote:
 
  Hi Xinjun.
 
  Comments below.
 
 
  On 30/04/2009, at 12:33 AM, Xinjun Zhang wrote:
 
   Hi:
  
   Sorry for the second ambiguous question. Now I will give it out in
   another way:
  
   After limma analysis:
  
   # two groups: CEU and YRI
   design
CEUYRIvsCEU
   GSM18868710
   GSM18868810
   GSM18886111
   GSM18886211
  
   #Limma
   fsDF - extractDataFrame(fs, addNames=TRUE)
   fsDF[,-c(1:5)] - log2(fsDF[,-c(1:5)])
   fit-lmFit(fsDF[,-c(1:5)],design)
   fit-eBayes(fit)
   fit$genes-fsDF[,1]
   topTable(fit, coef=NULL, number = 10, adjust=BH)
  
   Then I got output in RGui in this form:
  
topTable(fit,coef=YRIvsCEU,adjust=BH)
   ID  logFC t  P.Value adj.P.Val B
   248067 3851537  10.781228  18.01769 5.069965e-07 0.1441163
   -4.159301## In this line, is 248067  a NCBI geneid d and
   3851537 a probeset id?  And if the first coloum is gene ID, what
   is strange to me is that  some of the ID is not a human gene id.
   219150 3721400 -12.364204 -14.91257 1.798048e-06 0.2146088 -4.164325
   90041  2903401  -8.915503 -13.79270 3.021449e-06 0.2146088 -4.166979
   80808  2836738   7.811150  13.45085 3.568320e-06 0.2146088 -4.167917
   250529 3862018  -7.935552 -13.33698 3.774934e-06 0.2146088 -4.168245
   176674 3462843  10.559478  12.92835 4.637400e-06 0.2158173 -4.169490
   224640 3744039   7.930627  12.66410 5.314668e-06 0.2158173 -4.170356
   134937 3224650  -9.385466 -12.18252 6.860763e-06 0.2437758 -4.172073
   155392 3352948  -6.802731 -11.71350 8.878365e-06 0.2804133 -4.173938
   104503 3003193  -8.947865 -11.50151 1.000676e-05 0.2844473 -4.174852
 
  Be careful here.  The first number you see here (248067) is a row
  number that limma puts in, and is not an gene/probeset identifier of
  any kind.  The ID column is what you have put in fit$genes (see
  above, you have the command fit$genes - fsDF[,1]).
 
  I would actually recommend that you put more in fit$genes, because
  what you have have now is only the first column of the fsDF data
  frame, which gives the transcript_cluster_id.  So, you have the
  transcript_cluster_id, but you don't know what probeset_id this
  corresponds to.
 
  For example:
 
head(fsDF[,1:5])
unitName groupName unit group cell
  1  2315251   23152521 11
  2  2315251   23152531 22
  3  2315373   23153742 13
  4  2315373   23153752 24
  5  2315373   23153762 35
  6  2315373   23153772 46
 
  Maybe it would be better to do something like:
 
  ...
  fit$genes - fsDF[,1:2]
 
  ... that way your output table will look something like:
 
topTable(fit,coef=1,n=2)
   unitName groupName logFC t  P.Value
  adj.P.ValB
  4166  3581637   3582005 -2.581813 -12.65587 3.786135e-13

[aroma.affymetrix] Re: Compare the splicing pattern of two samples

2009-04-30 Thread Mark Robinson
2315373
 0.118-0.2140.25-0.320.808190.756890.05 
 0.949222315373
 -0.0180.128-0.040.210.968230.838760.03 
 0.96672315554
 -0.0270.191-0.060.310.952830.767830.07 
 0.93172315554
 0.276-0.6290.65-1.050.536930.330060.56 
 0.596392315554
 0.405-0.5520.88-0.850.407260.42320.44 
 0.660232315554
 0.109-0.1990.23-0.30.824440.775090.04 
 0.956662315554
 -0.3180.771-0.631.080.547890.315970.6 
 0.573922315554
 .

 It got only a colomn called Genes ( it is probe id, I guess). So  
 how can I use this output to find out the differentially expressed  
 exons( and the corresponding genes) in the two groups? Is it clearer  
 to you? Thanks in advance!

 Xinjun


As I mentioned above, you should change what you put in fit$genes to  
allow you to know what probeset_id each row corresponds to.  I'm  
guessing here, but you are probably interested in the YRIvsCEU  
parameter, so you'd probably sort on the p.value.YRIvsCEU column ...

Cheers,
Mark









 On Tue, Apr 28, 2009 at 4:29 PM, Mark Robinson  
 mrobin...@wehi.edu.au wrote:

 Hi Xinjun.

 Comments below.


 On 28/04/2009, at 12:25 PM, Xinjun Zhang wrote:

  Hi Mark:
 
  Thanks very much for your clarification! Now I have approached to
  limma analysis of FIRMA score to get differentially spliced genes
  ( and also splicing pattern of each ). But I still have some
  difficulty to understand the code ( in red ) below in Limma  
 analysis:
 
  #fs is the 'standard' FirmaSet-object
  fsDF - extractDataFrame(fs, addNames=TRUE)
  fsDF[,-c(1:5)] - log2(fsDF[,-c(1:5)])# I know why log2
  is here but confused by fsDF[,c(1:5)]  what does this expression
  mean?


 Note that it is -c(1:5), meaning operate on (here, take logs) all of
 the columns except 1:5 ... that is, because extractDataFrame gives
 some extra columns at the beginning that are NOT data, we only want to
 log that columns that have actual data.


  design - cbind(Grp1=1,Grp2=c(rep(0,n_1),rep(1,n_2)))
  fit-lmFit(fsDF[,-c(1:5)],design)
  fit-eBayes(fit)
  fit$genes-fsDF[,1] # Can I also get seperate splicing
  patterns for the  two differentially spliced genes from two group
  (control and treatment )?


 I'm not sure what you are asking here.  The probesets where the Grp2
 coefficient is significantly different from 0 may highlight
 differentially spliced exons.  Does that help?

 Mark





  Thanks in advance!
 
  Xinjun
 
  On Mon, Apr 27, 2009 at 6:19 AM, Mark Robinson
  mrobin...@wehi.edu.au wrote:
 
 
  Hi Xinjun.
 
  Quick comments below.
 
 
   Hi Mark:
  
   Thanks very much for your help and I have have got a quick start
  on a
   small
   dataset that each group (control and treatment ) contains 4
  arrays. I have
   set up a file structure like this:
   =
   rawDate/
controlGroup/
 HuEx-1_0-st-v1/
 GSMXX.CEL
 GSMXX.CEL
 
  
treatmentGroup/
 HuEx-1_0-st-v1/
 GSMXX.CEL
 GSMXX.CEL
 
   ==*
 
 
  This setup will need to be changed.  You will want to put ALL  
 samples
  together to do the PLM fitting, normalization, FIRMA scoring, etc.
 
  Something like:
 
  rawData/
  thisExperiment/
   HuEx-1_0-st-v1/
   sample1.CEL
   sample2.CEL
   ...
   sampleN.CEL
 
 
This is my code ( my questions are in red):*
  
   library(aroma.affymetrix)
  
   #Getting annotation data files
   chipType - HuEx-1_0-st-v1
   cdf - AffymetrixCdfFile$byChipType(chipType)
   print(cdf)
  
   #Defining CEL set
   cs - AffymetrixCelSet$byName(controlGroup, cdf=cdf)
   print(cs)
  
   #Background Adjustment and Normalization
   bc - RmaBackgroundCorrection(cs)
   csBC - process(bc,verbose=verbose)
  
   #quantile normalization
   qn - QuantileNormalization(csBC, typesToUpdate=pm)  ### I set  
 the
   second
   parameter as pm as the chip type is Affymetrix exon array, is  
 that
   right?
   print(qn)
   csN - process(qn, verbose=verbose)
 
  This is fine.
 
  
   #Summarization
   getCdf(csN)
   ## * Fit exon-by-exon*, change the value of mergeGroups to FALSE
  in the
   ExonRmaPlm() call above.
   *plmEx *- ExonRmaPlm(csN, mergeGroups=*FALSE*)
   print(*plmEx*)
   #To fit the PLM to all of the data, do:
   fit(*plmEx*, verbose=verbose)
   *
   And here is my problem:*
   firma - FirmaModel(plmTr)  # I have noticd that FIRMA analysis
  ONLY works
   from the PLM based on transcripts. So when

[aroma.affymetrix] Re: SNPs affecting EXon splicing detection

2009-04-30 Thread Mark Robinson

Hi Sabrina.

I have not had to deal with this myself, but I do know that it exists  
and I can at least suggest a possible route to exclude affected exons.

Presumably, there is a database (dbSNP?) that tells you the genome  
locations of each SNP for your strains.  There is also a probe.tab  
file from Affymetrix that gives you the mapped genome locations of  
each probe (or you could take the sequences from the same file and map  
them yourself with a tool like BLAT).  It is then just a matter of  
looking whether each probe maps to a location on the genome that  
overlaps a SNP.  There is probably a Bioconductor tool for this or you  
could create a hash, etc.

There are a couple levels at which you might introduce this to your  
analysis.  You could remove individual probes that are affected.  On  
the aroma.affymetrix side, this would require creating a new CDF with  
those affected probes not included (a bit tricky but doable).  Or, you  
could simply post-process your existing results and remove probesets  
that have an affected probe (easier but not as elegant).

You might've also seen:

Duan S, Zhang W, Bleibel WK, Cox NJ, Dolan ME: SNPinProbe 1.0: A  
database for filtering out
probes in the Affymetrix GeneChip(R) Human Exon 1.0 ST array  
potentially affected by SNPs.
Bioinformation 2008, 2(10):469{470.

Hope that gets you started.

Cheers,
Mark


On 30/04/2009, at 6:07 AM, sabrina wrote:


 Hi, all:
 I am using Aroma for detecting exon skipping events around two groups
 (two different strains). I found out that several of my top hits
 indeed includes at least one SNP between two strains. I wonder if
 anyone has some suggestion about how to deal with this situation. If I
 need to remove all affected exons from analysis, how can I do it? I
 never worked with SNP data before, can anyone give me a hint? Thanks a
 lot!

 Sabrina
 

--
Mark Robinson
Epigenetics Laboratory, Garvan
Bioinformatics Division, WEHI
e: m.robin...@garvan.org.au
e: mrobin...@wehi.edu.au
p: +61 (0)3 9345 2628
f: +61 (0)3 9347 0852
--





--~--~-~--~~~---~--~~
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
aroma.affymetrix group.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe from this group, send email to 
aroma-affymetrix-unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/aroma-affymetrix?hl=en
-~--~~~~--~~--~--~---



[aroma.affymetrix] Re: Compare the splicing pattern of two samples

2009-04-28 Thread Mark Robinson

Hi Xinjun.

Comments below.


On 28/04/2009, at 12:25 PM, Xinjun Zhang wrote:

 Hi Mark:

 Thanks very much for your clarification! Now I have approached to  
 limma analysis of FIRMA score to get differentially spliced genes  
 ( and also splicing pattern of each ). But I still have some  
 difficulty to understand the code ( in red ) below in Limma analysis:

 #fs is the 'standard' FirmaSet-object
 fsDF - extractDataFrame(fs, addNames=TRUE)
 fsDF[,-c(1:5)] - log2(fsDF[,-c(1:5)])# I know why log2  
 is here but confused by fsDF[,c(1:5)]  what does this expression  
 mean?


Note that it is -c(1:5), meaning operate on (here, take logs) all of  
the columns except 1:5 ... that is, because extractDataFrame gives  
some extra columns at the beginning that are NOT data, we only want to  
log that columns that have actual data.


 design - cbind(Grp1=1,Grp2=c(rep(0,n_1),rep(1,n_2)))
 fit-lmFit(fsDF[,-c(1:5)],design)
 fit-eBayes(fit)
 fit$genes-fsDF[,1] # Can I also get seperate splicing  
 patterns for the  two differentially spliced genes from two group  
 (control and treatment )?


I'm not sure what you are asking here.  The probesets where the Grp2  
coefficient is significantly different from 0 may highlight  
differentially spliced exons.  Does that help?

Mark





 Thanks in advance!

 Xinjun

 On Mon, Apr 27, 2009 at 6:19 AM, Mark Robinson  
 mrobin...@wehi.edu.au wrote:


 Hi Xinjun.

 Quick comments below.


  Hi Mark:
 
  Thanks very much for your help and I have have got a quick start  
 on a
  small
  dataset that each group (control and treatment ) contains 4  
 arrays. I have
  set up a file structure like this:
  =
  rawDate/
   controlGroup/
HuEx-1_0-st-v1/
GSMXX.CEL
GSMXX.CEL

 
   treatmentGroup/
HuEx-1_0-st-v1/
GSMXX.CEL
GSMXX.CEL

  ==*


 This setup will need to be changed.  You will want to put ALL samples
 together to do the PLM fitting, normalization, FIRMA scoring, etc.

 Something like:

 rawData/
 thisExperiment/
  HuEx-1_0-st-v1/
  sample1.CEL
  sample2.CEL
  ...
  sampleN.CEL


   This is my code ( my questions are in red):*
 
  library(aroma.affymetrix)
 
  #Getting annotation data files
  chipType - HuEx-1_0-st-v1
  cdf - AffymetrixCdfFile$byChipType(chipType)
  print(cdf)
 
  #Defining CEL set
  cs - AffymetrixCelSet$byName(controlGroup, cdf=cdf)
  print(cs)
 
  #Background Adjustment and Normalization
  bc - RmaBackgroundCorrection(cs)
  csBC - process(bc,verbose=verbose)
 
  #quantile normalization
  qn - QuantileNormalization(csBC, typesToUpdate=pm)  ### I set the
  second
  parameter as pm as the chip type is Affymetrix exon array, is that
  right?
  print(qn)
  csN - process(qn, verbose=verbose)

 This is fine.

 
  #Summarization
  getCdf(csN)
  ## * Fit exon-by-exon*, change the value of mergeGroups to FALSE  
 in the
  ExonRmaPlm() call above.
  *plmEx *- ExonRmaPlm(csN, mergeGroups=*FALSE*)
  print(*plmEx*)
  #To fit the PLM to all of the data, do:
  fit(*plmEx*, verbose=verbose)
  *
  And here is my problem:*
  firma - FirmaModel(plmTr)  # I have noticd that FIRMA analysis  
 ONLY works
  from the PLM based on transcripts. So when the parameter is plmTr, I
  wonder
  how can it detect the splicing events of genes ? Should not the  
 parameter
  be
  plmEx?
  fit(firma, verbose=verbose)
  fs - getFirmaScores(firma)

 Like it says on the group web page for Exon arrays: The FIRMA  
 analysis
 ONLY works from the PLM based on transcripts.  This is NOT an error.
 That's the way it works.

 The manuscript gives more details for why this is the case:
 http://bioinformatics.oxfordjournals.org/cgi/content/abstract/24/15/1707

 Hope that helps.

 Cheers,
 Mark







 
 
  On Fri, Apr 24, 2009 at 5:24 PM, Mark Robinson
  mrobin...@wehi.edu.auwrote:
 
 
  Hi Xinjun.
 
  Here is a quick sketch of what I might do.
 
  1. Run everything to get FIRMA scores.  See group page for running
  details and the Purdom Bioinformatics 2008 paper for methodological
  details.
 
  2a. If Nn or Nc  1, use 'limma' to look for a difference in FIRMA
  scores between your two groups.  See threads:
 
  http://groups.google.com/group/aroma-affymetrix/browse_thread/thread/36d8c59d742fc503/
 
  http://groups.google.com/group/aroma-affymetrix/browse_thread/thread/7d2645bd76cc2023/
 
  2b. If you have say patient samples (and a good number of them),  
 you
  might expect only a subset of your C or N patients to have a  
 splicing
  aberration.  In this case, maybe you just want to look for large

[aroma.affymetrix] Re: Quality assessment of Gene ST Array

2009-04-27 Thread Mark Robinson

Hi Cathy.


On 27/04/2009, at 11:34 PM, Cathy Mitchell wrote:

 To whom it may concern

 I would like to know if there is a way of separating out the  
 foreground and background spatial plots? I assume the spatial plots  
 that are given as an example in the google.groups quality assessment  
 of raw data section. I am using Gene ST 1.0 Human arrays.

I'm afraid I don't know the answer to this.

Be sure to consider RLE/NUSE plots as part of your quality assessment.


 Is there a way of creating a box plot representing the log2 probe  
 intensities of each array on one box plot? As opposed to doing  
 plotDensities for each array?

See my post from the other day (Extracting raw data before  
normalization):

http://groups.google.com/group/aroma-affymetrix/browse_thread/thread/5af426f8f5e1625b

The extracting data part, using getCellIndices() and extractMatrix()  
(... and boxplot) will be analagous for Gene 1.0 ST as for Exon 1.0 ST.


 Also is there a way to analyse the individual probes as opposed to  
 the probesets?

Not sure what you have in mind here, but using the above you will have  
probe-level data in hand.

Hope that helps.
Mark




 Thank you



 -- 
 Cathrine Mitchell


 


--
Mark Robinson
Epigenetics Laboratory, Garvan
Bioinformatics Division, WEHI
e: m.robin...@garvan.org.au
e: mrobin...@wehi.edu.au
p: +61 (0)3 9345 2628
f: +61 (0)3 9347 0852
--





--~--~-~--~~~---~--~~
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
aroma.affymetrix group.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe from this group, send email to 
aroma-affymetrix-unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/aroma-affymetrix?hl=en
-~--~~~~--~~--~--~---



[aroma.affymetrix] Re: Error Occured When Convertting CDF File

2009-04-25 Thread Mark Robinson

Hi Xinjun.

Maybe a more-informed Windows user than myself can chime in here (I use
OSX/Linux almost exclusively), but perhaps its as simple as adjusting your
memory limit.  As you can see below, you do get an Reached total
allocation of 1535Mb warning message.  One possible link to look at:

http://projetos.inpa.gov.br/i3geo/pacotes/r/win/library/base/html/Memory-limits.html

Cheers,
Mark



 Hi Mark:
 *
 Thanks for your reminding. The ASCII CDF file is download from Affymetrix
 (HuEx-1_0-st-v2.cdf.zip). The following is the error message:*
 ==
 Reading CDF header...
 Reading CDF header...done
 Reading CDF QC units...
 Reading CDF QC units...done
 Reading CDF units...
 Reading CDF units...done
 Writing CDF structure...
 Error: Unable to allocate vectors of 24.4 MB ( I translated it
 from
 Chinese when running on PC with 4G memory only running R)

 In addition : Warning messages:
 1: In which(raw == as.raw(255)) :
  Reached total allocation of 1535Mb: see help(memory.size)
 2: In which(raw == as.raw(255)) :
  Reached total allocation of 1535Mb: see help(memory.size)
 3: In which(raw == as.raw(255)) :
  Reached total allocation of 1535Mb: see help(memory.size)
 4: In which(raw == as.raw(255)) :
  Reached total allocation of 1535Mb: see help(memory.size)
 5: In which(raw == as.raw(255)) :
  Reached total allocation of 1535Mb: see help(memory.size)
 6: In which(raw == as.raw(255)) :
  Reached total allocation of 1535Mb: see help(memory.size)
 Timing stopped at: 21.87 0.13 22.04
 =
 *
 And this is my code :*
  ===
 library(affxparser)
 files - list.files(patt=[Cc][Dd][Ff]$)
 dir.create(converted)
 outPath - converted
 outFiles - paste(outPath, files, sep=/)
 for (kk in seq(files)) {
   convertCdf(files[kk], outFiles[kk], version = 4,force = TRUE,
 verbose=TRUE)

 }
  ===


 *Information returned by traceback( )  :*
 
 traceback()
 6: which(raw == as.raw(255))
 5: .initializeCdf(con = con, nRows = cdfHeader$nrows, nCols =
 cdfHeader$ncols,
nUnits = cdfHeader$nunits, nQcUnits = cdfHeader$nqcunits,
refSeq = cdfHeader$refseq, unitnames = unitNames, qcUnitLengths =
 qcUnitLengths,
unitLengths = unitLengths)
 4: writeCdfHeader(con = con, cdfheader, unitNames = names(cdf),
qcUnitLengths = qcUnitLengths, unitLengths = unitLengths,
verbose = verbose)
 3: writeCdf(outFilename, cdfheader = cdfHeader, cdf = cdfUnits,
cdfqc = cdfQcUnits, overwrite = TRUE, verbose = verbose2)
 2: system.time({
writeCdf(outFilename, cdfheader = cdfHeader, cdf = cdfUnits,
cdfqc = cdfQcUnits, overwrite = TRUE, verbose = verbose2)
})
 1: convertCdf(files[kk], outFiles[kk], version = 4, force = TRUE,
verbose = TRUE)





 On Sat, Apr 25, 2009 at 4:37 PM, Mark Robinson mrobin...@wehi.edu.au
 wrote:


 Hi Xinjun.

 First off, you might be able to make use of the binary CDF files already
 created (unless you are doing something non-standard).  See:

 http://groups.google.com/group/aroma-affymetrix/web/huex-1-0-st-v2

 As to your problem below, I'm not sure of the memory fingerprint for
 doing
 the conversion, but I'd be surprised if it was 2GB.  Perhaps you have
 objects in memory in your current workspace?  Do you still get the error
 from a fresh R session?

 As always, when you get an error, it is good practice to give the output
 of both sessionInfo() and traceback() ... and even a code example
 wouldn't
 hurt.

 Hope that helps.
 Mark




 Hi:

 I am running convertCdf() with R 2.9.0 on my PC / Windows XP. I am
 going to convert HuEx-1_0-st-v2.cdf to binary format from ASCII
 format. But an error has occurred noted as unable to allocate vectors
 of 10.9 MB ( I have translated the error message from Chinese). The
 memory is about 2G. Is that the point or some other reasons? Thanks in
 advance.

 Xinjun


 




 


 




--~--~-~--~~~---~--~~
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
aroma.affymetrix group.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe from this group, send email to 
aroma-affymetrix-unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/aroma-affymetrix?hl=en
-~--~~~~--~~--~--~---



[aroma.affymetrix] Re: question for using custom Affy Exon array CDF

2009-03-31 Thread Mark Robinson

Hi Jing.

To be honest, I haven't explored this in any great detail.  I know  
that Elizabeth sometimes used the bigger CDFs for the BG correction/ 
normalization steps and then switched to the 'core' CDF for fitting  
the PLMs.  I'd expect only subtle changes in the 'normexp' BG  
adjustment since it would be fitted on 1M probes with either CDF, but  
as I mentioned, I have not studied it.

If all of your downstream analysis is focussed on the 'core' CDF, then  
it is probably sufficient to use the 'core' CDF for BG adjustment.   
This is what I do for a lot of my work, at least.

Cheers,
Mark

On 01/04/2009, at 6:25 AM, jing ma wrote:

 Dear Mark,

 When doing background subtraction for the Affy exon array, do you  
 think it is sufficient to use the CORE CDF (I usually use HuEx-1_0- 
 st-v2,coreR3,A20071112,EP.cdf)?

 Thanks,
 Jing



 On Mon, Mar 16, 2009 at 5:35 PM, Mark Robinson  
 mrobin...@wehi.edu.au wrote:

 Hi Jing.

 See below.

 On 17/03/2009, at 2:54 AM, jing wrote:

  To whom it may concern,
 
  I'm analyzing some Affy human exon array data and hope to generate
  similar plots as seen in the supplementary figures in the Purdom  
 2008
  Bioinformatics paper. To do so, I need to get the normalized probe
  intensities and residuals.
 
  I've already followed the steps described in the Human Exon Array
  Analysis vignette and get the following:
 
  (1) ...
 csN - process(qn, verbose=verbose)
 
  (2) ...
  res-getResidualSet(plmTr)
 
  I tried the function extractDataFrame(...,addNames=TRUE) hoping to
  get the data plus column labels for my samples but it didn't  
 work.  Is
  there any easy way to extract these two sets of data in matrix  
 format
  similar to the FIRMA score matrix with probe ID and column labels?


 Thats right.  extractDataFrame() is typically used for some kind of
 summarized data.  For example, you can use extractDataFrame() for
 pulling out FIRMA scores (summarized at the probeset level), or RMA
 summarized data (summarized at probeset or gene level).

 To pull out the raw/normalized data and the residuals, you can use
 extractMatrix() or readUnits().  I prefer the former.  Probably its
 best to suggest you look at the thread:

 http://groups.google.com/group/aroma-affymetrix/browse_thread/thread/46d609076d9580fb

 Look for the commands after the # starting from PLM ... line.

 I've been meaning to put up a page giving a summary of these commands,
 including how to use exon array data with GenomeGraphs.  Hopefully I
 can find some time shortly to do that.

 Hope that helps.
 Mark




  Thanks,
 
  Jing
 
  Jing Ma
  Hartwell Center for Bioinformatics  Biotechnology
  St. Jude Children's Research Hospital
 
  

 --
 Mark Robinson
 Epigenetics Laboratory, Garvan
 Bioinformatics Division, WEHI
 e: m.robin...@garvan.org.au
 e: mrobin...@wehi.edu.au
 p: +61 (0)3 9345 2628
 f: +61 (0)3 9347 0852
 --








 


--
Mark Robinson
Epigenetics Laboratory, Garvan
Bioinformatics Division, WEHI
e: m.robin...@garvan.org.au
e: mrobin...@wehi.edu.au
p: +61 (0)3 9345 2628
f: +61 (0)3 9347 0852
--





--~--~-~--~~~---~--~~
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
aroma.affymetrix group.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe from this group, send email to 
aroma-affymetrix-unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/aroma-affymetrix?hl=en
-~--~~~~--~~--~--~---



[aroma.affymetrix] Re: gcRMA for Gene ST Arrays

2009-03-25 Thread Mark Robinson

Hi Mario.

I will look into this.  I know this feature doesn't get used too often  
for these new arrays.  As you may know, it will need to be called  
differently than when using it for 3' IVT arrays (e.g. HG-U133).

Can you give me your sequence of commands?  I assume you have  
specified the set of negative control indices to use, right?

Cheers,
Mark


On 24/03/2009, at 8:57 PM, Mario Fasold wrote:


 Hi all! I'd like to use gcRMA correction on Human Gene 1.0 ST data.
 However, the method GcRmaBackgroundCorrection fails, probably since
 the probe_tab file has a slightly different layout for these chips
 (see error message below). Is there a way of telling the gcRMA
 function to use different columns of the probe_tab file?

 Best, Mario.

 Reading tab-delimited sequence file...
 Error in if (any(units  1)) stop(Argument 'units' contains non-
 positive indices.) :
  missing value where TRUE/FALSE needed

 

--
Mark Robinson
Epigenetics Laboratory, Garvan
Bioinformatics Division, WEHI
e: m.robin...@garvan.org.au
e: mrobin...@wehi.edu.au
p: +61 (0)3 9345 2628
f: +61 (0)3 9347 0852
--





--~--~-~--~~~---~--~~
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
aroma.affymetrix group.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe from this group, send email to 
aroma-affymetrix-unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/aroma-affymetrix?hl=en
-~--~~~~--~~--~--~---



[aroma.affymetrix] Re: Write normalized intensities as a CEL file

2009-03-23 Thread Mark Robinson

Hi Libing.

Perhaps the combination of createCel() and updateCel() in the  
'affxparser' package are what you are after.

Although, maybe if you tell us more about what it is you are trying to  
do, we can be of more help.

Cheers,
Mark


On 24/03/2009, at 2:08 AM, Libing wrote:


 Hi,

 I am wondering if writeCel() is the one I should use. There are no
 documents for its usage. Can anyone provide more details? Or some
 links. Thanks!

 

--
Mark Robinson
Epigenetics Laboratory, Garvan
Bioinformatics Division, WEHI
e: m.robin...@garvan.org.au
e: mrobin...@wehi.edu.au
p: +61 (0)3 9345 2628
f: +61 (0)3 9347 0852
--





--~--~-~--~~~---~--~~
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
aroma.affymetrix group.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe from this group, send email to 
aroma-affymetrix-unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/aroma-affymetrix?hl=en
-~--~~~~--~~--~--~---



[aroma.affymetrix] Re: Custom CDF Creation

2009-03-19 Thread Mark Robinson


Hi Jake.

As a starting point, you might have a look at:
http://groups.google.com/group/aroma-affymetrix/web/creating-cdf-files-from-scratch

In there is a script called 'flat2Cdf.R' that takes a flat file of probe
information (X/Y location on the chip, probe sequence, identifiers) and
creates a CDF file.  I assume you will have this information or can get
it.  Maybe create 2 such flat files.

Alternatively, you should also search the Bioconductor archives.  I do
recall some discussion on this awhile back and there were some scripts
generated that removed probes from CDF environments.

In any case, you could also familiarize yourself with the CDF format by
taking an existing CDF file and reading it in with readCdf() from the
'affxparser' package.  That is always good information to know.

As far as I can tell, you won't need to modify anything in the CEL files,
you'll just have read from the CEL file twice, once from your
SNP-probes-removed CDF file and once from SNP-probes-only CDF file.

Hope that helps.
Mark



 Hi everyone,

 I'm trying to perform probe level analyses on the HG-U133Plus2 chip
 data. Basically I have a set of SNPs that overlap with probes from the
 chip. I wanted to analyze two different aspects of these allele-
 specific probes:

 1) Analyze the probesets without the allele-specific probe
 2) Analyze the allele-specific probe individually

 How do I generate custom CDFs for this purpose? Are the only changes
 that need to be made in the CDF or does one have to modify, say, CEL
 files as well?

 Any help would be appreciated.

 Thanks,

 Jake

 




--~--~-~--~~~---~--~~
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
aroma.affymetrix group.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe from this group, send email to 
aroma-affymetrix-unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/aroma-affymetrix?hl=en
-~--~~~~--~~--~--~---



[aroma.affymetrix] Re: Problem understanding FIRMA

2009-03-17 Thread Mark Robinson

Hi Christian.


 However, in this respect I have also the following question:
 How does using median polish compare to using
 R_rlm_rma_default_model?
 Are the final scores still of some use if you use medpol?


Short answer is I haven't investigated this too thoroughly.

But, my guess is that it wouldn't be too different.  That prediction  
is based on the fact that the chip effects are in the same ballpark,  
as you can see from the Aroma_vs_Affy (Aroma=R_rlm_rma, Affy=medpol)  
plot in the following thread:

http://groups.google.com/group/aroma-affymetrix/browse_thread/thread/1b0ab11fad9b4df3/f745ed0860546313

But, I'd be interested to hear more details if you do look into it more.

Cheers,
Mark




 Best regards
 Christian



 On Mar 16, 9:13 am, Mark Robinson mrobin...@wehi.edu.au wrote:
 Hi Christian.

  From what I can tell looking at your code (rather quickly, i must
 admit), there will be 2 differences between aroma.affymetrix and what
 you have:

 1. We use the 'preprocessCore' codebase for the robust fitting of the
 linear model (... but maybe you are just using median polish as an
 illustration).  For example, you might try:

 library(preprocessCore)
 f - .Call(R_rlm_rma_default_model, log2(yTr), 0,
 1.345,PACKAGE=preprocessCore)
 [... and piece together the alpha, beta, etc ...]

 2. The estimate of standard error is calculated genewise, over
 residuals from all probes/samples (i.e. u.mad should be a scalar  
 not a
 vector).

 Hope that helps.
 Mark

 On 16/03/2009, at 6:32 PM, cstratowa wrote:





 Dear all,

 After reading the FIRMA paper I would like to understand the
 implementation, but this is not easy since the source code is hard  
 to
 read. Nevertheless, I tried and would like to know if this is  
 correct.

 According to the page on exon array analysis you do the following:

 I, fit a summary of the entire transcript
 plmTr - ExonRmaPlm(csN, mergeGroups=TRUE)
 fit(plmTr, verbose=verbose)

 II, fit the FIRMA model for each exon
 firma - FirmaModel(plmTr)
 fit(firma, verbose=verbose)

 However, I would like to understand the underlying source code.

 For this example let us assume that we have quantile-normalized
 intensities yTr for a transcript  containing two exons:
 yTr
 HeartA   HeartBHeartC  MuscleA  MuscleB  MuscleC
 1   5.74954   18.02962.50436   15.5857   26.1744   31.0075
 2   9.59819   23.0093   22.01120   70.1742   32.8408  102.0080
 3 114.50800   87.1742   70.34080  312.3410  266.1740  601.3410
 4  66.34080   52.0075   67.34080  184.1740  266.1740  147.0080
 5 210.17400  142.0080  173.34100  514.5080  659.1740  509.6740
 6 104.00800   84.3408   70.34080  333.5080  324.1740  231.0080
 7 194.00800  124.5080  234.00800  443.6740  767.5080  716.8410
 8 319.34100  282.6740  283.50800  656.0080  807.6740  954.6740

 Here rows 1:4 code for exon 1 and rows 5:8 code for exon 2.

 I, fit a summary of the entire transcript
 To simplify issues I will fit the data using median polish:
 # 1. fit median polish
 mp - medpolish(log2(yTr))

 # 2. data set specific estimates (probe affinities)
 beta  - mp$overall+mp$col
 thetaTr - 2^beta

 # 3. array-specific estimates
 alpha - mp$row
 alpha[length(alpha)] - -sum(alpha[1:(length(alpha)-1)])
 phiTr - 2^alpha

 II, fit FIRMA model for each exon
 # 1. calculate residuals
 phi   - matrix(phiTr, nrow=nrow(yTr), ncol=ncol(yTr))
 theta - matrix(thetaTr, nrow=nrow(yTr), ncol=ncol(yTr),
 byrow=TRUE)
 yhat  - phi *theta
 eps   - yTr/yhat# rma uses y/yhat

 # 2. estimate of standard error
 u.mad - apply(log2(eps), 2, mad, center=0)

 # 3. compute final score statisitc
 # for 1. exon
 y1 - log2(eps[1:4,])
 F1 - apply(y1/u.mad, 2, median)
 F1
 HeartA  HeartB  HeartC MuscleA MuscleB
 MuscleC
 -0.89938777 -0.03792624 -0.69409936  0.11536565 -0.61385296
 1.08709568

 # for 2. exon
 y2 - log2(eps[5:8,])
 F2 - apply(y2/u.mad, 2, median)
 F2
 HeartA  HeartB  HeartC MuscleA MuscleB
 MuscleC
 -0.02899616 -1.64645153 -0.70048533 -0.39996057  0.02666064
 -1.46657055

 Now my question is:
 Is this calculation of the final score statistic F1 for exon 1 and  
 F2
 for exon 2 correct?
 Did I miss something?

 Best regards
 Christian

 --
 Mark Robinson
 Epigenetics Laboratory, Garvan
 Bioinformatics Division, WEHI
 e: m.robin...@garvan.org.au
 e: mrobin...@wehi.edu.au
 p: +61 (0)3 9345 2628
 f: +61 (0)3 9347 0852
 --
 

--
Mark Robinson
Epigenetics Laboratory, Garvan
Bioinformatics Division, WEHI
e: m.robin...@garvan.org.au
e: mrobin...@wehi.edu.au
p: +61 (0)3 9345 2628
f: +61 (0)3 9347 0852
--





--~--~-~--~~~---~--~~
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you

[aroma.affymetrix] Re: Using GenomeGraphs with FIRMA (Was: FIRMA score)

2009-03-04 Thread Mark Robinson
 a CDF file where the groups are  
labelled with Ensembl identifiers.


 In your example plot your first row shows the data ds -  
 getDataSet(plm) (I think there is a typo somewhere ds-cs). What  
 exactly is this data?


Thanks for spotting the typo.  Good find.  Indeed, I should have  
written:

ds - getDataSet(plm)
[...]
d - log2(extractMatrix(ds,cells=ind,verbose=verbose))

So, in my plot, I was plotting unnormalized raw intensity data, not  
the BG-adjusted quantile normalized, since my 'cs' was defined as:

cs - AffymetrixCelSet$byName(tissues, cdf=cdf)


 I though, the idea of a probe level model was to somehow merge all  
 probe values to a single probeset-value. So afterwards you have one  
 summarized intensity for each probeset and each array. Following  
 this thought, there should be only one intensity-value for each  
 probeset in the plot. But your plot shows (1st. row) more than one  
 value per probeset (mostly 4, one value  per probe). So what exactly  
 did you plot there?


I'm plotting all probes.  Top plot is the raw data, lower plot is the  
residuals ... then all the gene annotation at the bottom.

 In my plot above I used the values from celsetN -  
 process(QuantileNormalization(celsetBC, typesToUpdate=pm)) and  
 expected to plot the normalized intensities... Did I get i wrong?
 I have my plot and the corresponding code attached.

Your 'plotdata1' is the normalized data.  Your 'plotdata2' and  
'plotdata3' are the chip effects from probeset-level summaries (PLM  
and FIRMA, respectively).  Therefore, the nProbes element of 'exon2'  
and 'exon3' doesn't actually match the data, does it?

What is the result of:

nrow(plotdata2$intensities) == sum(plotdata2$probesetdata[,4])


In my example:

  nrow(d) == sum(as.numeric(nProbes))
[1] TRUE

Is that a possible source of the problem?


 Right now I'm pretty confused... I'm hoping for some enlightenment!
 Frank

Hopefully I haven't confused you more.  There are a lot of questions/ 
intricacies here!

Good luck.
Mark



 P.S.: Right now (2009-03-05, around 3 GMT+1) I can't connect to  
 Ensembl through Biomart:
  library(GenomeGraphs)
  mart = useMart(ensembl, dataset = hsapiens_gene_ensembl)
 Opening and ending tag mismatch: meta line 3 and body
 Premature end of data in tag html line 1
 Fehler: 1: Opening and ending tag mismatch: meta line 3 and body
 2: Premature end of data in tag html line 1

 Hopefully this is just a temporary error... Does anybody have the  
 same problem?

 
 GenomeGraphs_ENSG0060237.pdfGenomeGraphs_ENSG0060237.R


--
Mark Robinson
Epigenetics Laboratory, Garvan
Bioinformatics Division, WEHI
e: m.robin...@garvan.org.au
e: mrobin...@wehi.edu.au
p: +61 (0)3 9345 2628
f: +61 (0)3 9347 0852
--





--~--~-~--~~~---~--~~
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
aroma.affymetrix group.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe from this group, send email to 
aroma-affymetrix-unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/aroma-affymetrix?hl=en
-~--~~~~--~~--~--~---



[aroma.affymetrix] Re: to use array list file

2009-02-12 Thread Mark Robinson

Hi Sun.

One of the handy features of aroma.affymetrix is that it pulls a lot  
of information from the directories in which files reside (e.g.  
platform, dataset name, etc).

If you are on unix/mac osx, you can use symbolic links for this.

For example, everything in your ./rawData/MYPLATFORM/MYDATASET  
directory would be a link to the location on disk where the file  
resides.  You could probably even write a script to create all the  
symbolic links from the list of full pathnames in your text file ...

I don't know how to do this on windows, but presumably there is  
something similar.

HTH,
Mark

On 13/02/2009, at 11:24 AM, Wukong Sun wrote:

 Dear Dr. Bengtsson:


 I am wondering whether aroma-affymetrix can read in CEL files that  
 are scattered in several directories.
 For example, the user may want to provide a text file specifying the  
 full paths of CEL files.
 The reason is that I want to process these CEL files together,  
 however I don't want to copy them to one folder.

 Thanks.

 


--
Mark Robinson
Epigenetics Laboratory, Garvan
Bioinformatics Division, WEHI
e: m.robin...@garvan.org.au
e: mrobin...@wehi.edu.au
p: +61 (0)3 9345 2628
f: +61 (0)3 9347 0852
--





--~--~-~--~~~---~--~~
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
aroma.affymetrix group.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe from this group, send email to 
aroma-affymetrix-unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/aroma-affymetrix?hl=en
-~--~~~~--~~--~--~---



[aroma.affymetrix] Re: probes missing from mogene10st.db

2009-02-11 Thread Mark Robinson

Hi Sebastien.

Have a look at:

http://thread.gmane.org/gmane.science.biology.informatics.conductor/19591/

If you want these probesets, you might consider creating the CDF  
directly from a pdInfoBuilder package (which works directly from the  
PGF/CLF files from Affy), as described at:

http://groups.google.com/group/aroma-affymetrix/web/creating-cdf-files-from-r-packages-environments

Hope that helps.
Mark



On 11/02/2009, at 10:32 AM, Sebastien Gerega wrote:


 Hi,
 I have come across some missing probes in the mogene10st.db package. I
 am analysing my data using the aroma package and have included some of
 my code:

 ces = getChipEffectSet(plm)
 gExprs = extractDataFrame(ces, units=NULL, addNames=TRUE)
 affyIDs = gExprs$unitName

 affyIDs[which(affyIDs %in%  
 names(unlist(as.list(mogene10stENTREZID))) ==
 FALSE)]
 [1] 10344715 10346139 10348945 10351859 10361914 10363331
 10364435 10372094 10375121
 [10] 10388263 10388269 10393404 10394238 10396419 10398315
 10401420 10406248 10408136
 [19] 10408144 10408900 10409545 10412699 10418129 10422509
 10424416 10427993 10428439
 [28] 10433478 10436198 10439287 10439409 10439974 10442153
 10442256 10445378 10449547
 [37] 10450920 10453715 10457583 10457778 10458958 10459467
 10461790 10462311 10467838
 [46] 10482137 10484695 10485716 10490984 10495594 10496888
 10498497 10502199 10514325
 [55] 10514327 10514329 10514331 10521698 10527099 10528157
 10528159 10528172 10529815
 [64] 10530140 10531631 10535467 10539979 10540931 10541349
 10544538 10546227 10548697
 [73] 10550167 10553786 10553811 10559498 10571382 10579607
 10579763 10583199 10584393
 [82] 10591182 10591184 10595787 10608198 10608200

 Are people aware of these missing probes? I have not been able to find
 any documentation about the issue. Should I send this to the BioC
 mailing list instead?
 thanks for your help,
 Sebastien

 

--
Mark Robinson
Epigenetics Laboratory, Garvan
Bioinformatics Division, WEHI
e: m.robin...@garvan.org.au
e: mrobin...@wehi.edu.au
p: +61 (0)3 9345 2628
f: +61 (0)3 9347 0852
--





--~--~-~--~~~---~--~~
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
aroma.affymetrix group.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe from this group, send email to 
aroma-affymetrix-unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/aroma-affymetrix?hl=en
-~--~~~~--~~--~--~---



[aroma.affymetrix] Re: annotation of ST gene Arrays

2009-02-10 Thread Mark Robinson

Hi Simon.

See comments below.

 I am using the mouse  gene ST arrays and am having problems with
 annotation. When i write a csv file, the annotation is only the
 probeset_id, no gene names or accession numbers etc.

That's what it should be.  Actually, its the 'transcript_cluster_id'.   
Previously, Affy did not provide annotation at the probeset level.

The CDF file just contains the identifiers.  Linking results (e.g.  
expression summaries) to the annotation can be done with other R  
packages.  For example, here is some code I gave Sebastien a few weeks  
ago that will get you started (just replace hugene10st.db with  
mogene10st.db):

-
Say you have some Affy identifiers:

  myids
[1] 7950136 7955845 7955852 7955855 7955858 7955865  
7955869
[8] 7955873 7955887 8016433

Load package and read off the gene symbols:

  library(hugene10st.db)
  symbols - unlist(as.list(hugene10stSYMBOL))
  data.frame(affyid = myids,symbol = symbols[myids])
 affyid symbol
7950136 7950136 PHOX2A
7955845 7955845 HOXC13
7955852 7955852 HOXC12
7955855 7955855 HOXC11
7955858 7955858 HOXC10
7955865 7955865  HOXC9
7955869 7955869  HOXC8
7955873 7955873  HOXC6
7955887 7955887  HOXC5
8016433 8016433  HOXB1

Here are some other fields in hugene10st.db:

  hugene10st
hugene10st   hugene10stCHRLENGTHS  
hugene10stENTREZID   hugene10stGO2ALLPROBES   hugene10stORGANISM
hugene10stPMID2PROBE hugene10stUNIPROT
hugene10st.db::  hugene10stCHRLOC  
hugene10stENZYME hugene10stGO2PROBE   hugene10stPATH
hugene10stPROSITEhugene10st_dbInfo
hugene10stACCNUM hugene10stCHRLOCEND   
hugene10stENZYME2PROBE   hugene10stMAPhugene10stPATH2PROBE  
hugene10stREFSEQ hugene10st_dbconn
hugene10stALIAS2PROBEhugene10stENSEMBL 
hugene10stGENENAME   hugene10stMAPCOUNTS  hugene10stPFAM
hugene10stSYMBOL hugene10st_dbfile
hugene10stCHRhugene10stENSEMBL2PROBE   
hugene10stGO hugene10stOMIM   hugene10stPMID
hugene10stUNIGENEhugene10st_dbschema

...
-



 These probesets
 also do not match the probeset_ids from MoGene-1_0-st-v1.na27.mm9 off
 the affymetrix website.

Perhaps you want 'transcript_cluster_id's?

(CSV files from 
http://www.affymetrix.com/products_services/arrays/specific/mousegene_1_st.affx)

  tr - read.csv(MoGene-1_0-st- 
v1.na27.mm9.transcript.csv,header=TRUE,comment.char=#)
  ps - read.csv(MoGene-1_0-st- 
v1.na27.mm9.probeset.csv,header=TRUE,comment.char=#)

  cdf - AffymetrixCdfFile$fromChipType(MoGene-1_0-st- 
v1,verbose=verbose)
  un - getUnitNames(cdf)
  sum( un %in% ps$transcript_cluster_id )
[1] 28815
  sum( un %in% tr$transcript_cluster_id )
[1] 35474

You may also be interested in the following thread, which explains the  
difference in number of probesets:
http://thread.gmane.org/gmane.science.biology.informatics.conductor/19591/


 here is my session:

 library('aroma.affymetrix')
 cdf - AffymetrixCdfFile$byChipType(MoGene-1_0-st-v1,tags='r3')
 cs - AffymetrixCelSet$byName(Files, cdf=cdf)
 bc - RmaBackgroundCorrection(cs)
 csBC - process(bc,verbose=verbose)
 qn - QuantileNormalization(csBC, typesToUpdate=pm)
 csN - process(qn, verbose=verbose)
 plm - RmaPlm(csN)
 fit(plm, verbose=verbose)
 qam - QualityAssessmentModel(plm)
 ces - getChipEffectSet(plm)
 mat - extractMatrix(ces)
 mat - log2(mat)
 rownames(mat) - getUnitNames(cdf)
 write.csv(mat, file=data.csv)

 I am sure there is a simple solution to this and I apologize as I am
 new to R. Any help would be much appreciated. Also, what are people
 opinions on the positive and negative controls probesets? Should
 these be included as part of a final gene list?
 Thank you in advance for any help.

Good question.  Some people use the controls for QC and some use them  
for adjusting for background (for example, the pool of GC content  
probes).  But, definitely if you were to follow this up with some kind  
of differential expression analysis (e.g. limma), I would discard the  
non-main probes.  For example:

  table(tr$category)

 control-affx control-bgp- 
 antigenomic  main
22 
45 28815
normgene-exon  normgene-intron  rescue-FLmRNA- 
 unmapped
  1324   
522291


Hope that helps.
Mark




--
Mark Robinson
Epigenetics Laboratory, Garvan
Bioinformatics Division, WEHI
e: m.robin...@garvan.org.au
e: mrobin...@wehi.edu.au
p: +61 (0)3 9345 2628
f: +61 (0)3 9347 0852
--





--~--~-~--~~~---~--~~
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received

Ensembl-centric CDFs from scratch (Was: Re: [aroma.affymetrix] Error: cannot allocate vector of size 692.8 Mb)

2009-02-05 Thread Mark Robinson

Hi Sabrina.

I created a new thread as this is a new question.  And a big one!


On 05/02/2009, at 6:06 AM, sabrina wrote:

 Hi, Mark:
 Quick question. I was looking at the example of creating cdfs from
 scratch. I downloaded biomaRt , but not sure where to get or how to
 create the exonBoundary file. Can you give me more information on  
 that?
 Thanks!

 Sabrina


First of all, you may want to start with the Ensembl-centric CDFs that  
Elizabeth has already made for the Mouse Exon 1.0 array.  You can find  
these from the link at:

http://groups.google.com/group/aroma-affymetrix/web/moex-1-0-st

Creating custom CDFs for exon arrays is not an easy procedure and  
while we've (mostly Elizabeth) made some effort to document and share  
the code of how we have done it, we (or at least I) don't expect it to  
be bulletproof and it may require additional effort on your part.

That said, to answer your question specifically, you'll need to make  
sure you get the exon coordinates (i.e. boundaries) when you download  
from Biomart.  Here is one example of what I downloaded from Biomart  
way back when for the Human array:

---
Ensembl Gene ID,Chromosome,Biotype,Exon Start (bp),Exon End  
(bp),Ensembl Exon ID,Constitutive Exon,Strand,Coding Start (bp),Coding  
End (bp)
ENSG0184895,Y,protein_coding, 
2714896,2715740,ENSE1494622,0,-1,2715030,2715644
ENSG0184895,Y,protein_coding, 
2715030,2715644,ENSE1299380,0,-1,2715030,2715644
ENSG0129824,Y,protein_coding, 
2769527,2769668,ENSE1494579,0,1,2769666,2769668
ENSG0129824,Y,protein_coding, 
2770206,2770283,ENSE1159432,1,1,2770206,2770283
ENSG0129824,Y,protein_coding, 
2772118,2772298,ENSE0891584,1,1,2772118,2772298
---

I did this from a query at http://www.ensembl.org/biomart/martview and  
downloaded the to a text file, but note that you should be able to  
download this directly to an R data.frame using the biomaRt package.   
These coordinates gets matched up to the coordinates in the  
probeset.csv file  ... and so on and so on.

HTH,
Mark




--
Mark Robinson
Epigenetics Laboratory, Garvan
Bioinformatics Division, WEHI
e: m.robin...@garvan.org.au
e: mrobin...@wehi.edu.au
p: +61 (0)3 9345 2628
f: +61 (0)3 9347 0852
--





--~--~-~--~~~---~--~~
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
aroma.affymetrix group.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe from this group, send email to 
aroma-affymetrix-unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/aroma-affymetrix?hl=en
-~--~~~~--~~--~--~---



[aroma.affymetrix] Re: changing colours in plotRle

2009-02-03 Thread Mark Robinson


Hi Sebastien.

Note that plotRle() eventually makes a call to 'bxp' (in the graphics
package that is loaded by default) and any/all arguments are passed on. 
Have a look at ?bxp for what you can specify.

For example:

[...]
qamTr - QualityAssessmentModel(plmTr)
plotRle(qamTr, boxwex=[something], boxfill=[something])

Cheers,
Mark


 Hi,
 is there a way to change the colour of each individual bargraph when
 calling plotRle? I would like to make the colours correspond to test
 groups.
 thanks,
 Sebastien


 




--~--~-~--~~~---~--~~
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
aroma.affymetrix group.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe from this group, send email to 
aroma-affymetrix-unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/aroma-affymetrix?hl=en
-~--~~~~--~~--~--~---



[aroma.affymetrix] Re: Error: cannot allocate vector of size 692.8 Mb

2009-01-28 Thread Mark Robinson

Hi Sabrina.

First of all, you are somewhat in unchartered waters here.  I  
personally don't recommend using the 'full' CDF for FIRMA analysis.   
Others can disagree (and I would encourage some discussion about  
this ...), but my reasoning is that the majority of the probes in the  
full CDF are querying poorly annotated or predicted exonic regions of  
the genome.  From what I've seen (on Human Exon 1.0 data), the probe  
intensities for these are mostly at background and could have an  
unsavoury effect on the PLM modelling ... and passing that on to  
FIRMA.  By restricting to core/Ensembl probes, you lessen the effect  
of non-responding probes.

 From thinking about it (although I haven't actually done on my  
datasets), I would suggest a two-stage analysis:

1. PLM/FIRMA analysis on a set of probes in well-annotated regions.

2. Differential expression analysis on the remaining probes --  
strongly differentially expressed ones may be indicative of  
transcripts/variants that are not covered in the well-annotated set.

To answer your question below, the table of FIRMA values shouldn't  
actually be all that large, so this should be possible.  One thing to  
check first of all is that you have a relatively clean R session.  Do  
you have tables/data frames/objects in memory that are consuming a  
bunch of space?  aroma.affymetrix is memory efficient, but it will  
need some room to work.  I just ran an example here on the FULL CDF  
and it appears to tick along just fine, though I can see if the memory  
spikes.

As always when reporting errors, it doesn't hurt to give the results  
of 'sessionInfo()', 'traceback()' ... and you could even set  
'verbose=-40' (say) in your call to 'extractDataFrame' to see where it  
all goes wrong.

Hope that helps.

Cheers,
Mark



On 28/01/2009, at 3:45 AM, sabrina wrote:


 Hi, all:
 I am using FIRMA model for my Mouse Exon array analysis. I am testing
 the FULL annotation. After I used FirmaModel and fit it, I used the
 following code to extract Firma Scores:
 exFirma-extractDataFrame(firmaScore,addNames=TRUE);


 however, I got the following error:

 Error: cannot allocate vector of size 692.8 Mb
 In addition: Warning messages:
 1: In getUnitGroupCellMap.FirmaFile(ce, ...) :
  Reached total allocation of 1535Mb: see help(memory.size)


 I used memory.size(), it gave me:  903.4713
 and memory.limit() is:  1535.875


 Does anyone have any suggestion to solve this problem? Thanks!!!

 Sabrina
 

--
Mark Robinson
Epigenetics Laboratory, Garvan
Bioinformatics Division, WEHI
e: m.robin...@garvan.org.au
e: mrobin...@wehi.edu.au
p: +61 (0)3 9345 2628
f: +61 (0)3 9347 0852
--





--~--~-~--~~~---~--~~
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
aroma.affymetrix group.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe from this group, send email to 
aroma-affymetrix-unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/aroma-affymetrix?hl=en
-~--~~~~--~~--~--~---



[aroma.affymetrix] Re: exon array with technical replicates

2009-01-23 Thread Mark Robinson

Hi Sabrina.

Thanks for digging into this.  I had a quick look at that probeset
(6824548) and everything seems alright to me.  For example from the unix
prompt, I get:

unix88 516 % grep 6824548 MoEx-1_0-st-v1.na26.mm9.probeset.csv
4499490,chr14,-,51311295,51311393,4,6824548,268799,115885605,---,---,1,3,0,3,full,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,main
4897698,chr14,-,51300520,51300544,1,6824548,268797,115885601,---,---,1,1,0,1,core,0,0,1,1,0,0,0,0,1,1,0,0,0,0,0,0,0,1,0,0,0,0,main
5218980,chr14,-,51312156,51312189,4,6824548,268800,115885607,---,---,2,1,0,1,full,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,main
5466951,chr14,-,51313326,51313357,4,6824548,268801,115885611,---,---,3,1,0,1,full,0,0,0,0,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,main
5564205,chr14,-,51301330,51301394,4,6824548,268798,115885603,---,---,1,2,0,2,full,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,main

So, just 1 'core' probe and 4 'full' probes.  So, you'd find only the
single probe in the extended (core+extended) CDF as well.  If you went to
the full CDF (core+extended+full), you'd have the 5.

Hope that helps.

... getting back to your original query, it is probably worth throwing out
the 'groups' (transcript clusters) that have just a single probeset before
calling lmFit().

Cheers,
Mark






 Hi, Mark:
 I did options, it gave me the exact display as what you showed to
 me.:)
 Here is some of my confusion:
  I checked the exFirma scores of these NAs to see what
 transcript_cluster_id they were assigned to so that I could check
 whether they do have 5000 probes assigned to it. One of them is :
 6824548.  I went back to the file I saved for # of probes per exon to
 check how many probes it has. To my surprise, it only has one probeset
 and one probes: 6824548.4897698 which matches the probeset_id
 (4897698). Then I went ahead to download the affymetrix annotation
 file, transcript.csv and probeset.csv. In the transcript.csv file,
 6824548 has 17 probes in total, and in the probeset.csv, it shows that
 transcript_cluster_id : 6824548 has 5 probesets, 4897698 is one of
 them with only one probe assigned to it. I tried the extenedR1 cdf
 version, it gave me the same thing (in terms of # of probes per exon,
 using nProbesPerExon-readCdfNbrOfCellsPerUnitGroup(getPathname(cdf))
  I am bit confused. It seemed to me that in the cdf file, some of the
 probesets for the gene were missing for this transcript_cluster_id. I
 wonder if you can help me out on this . I noticed that you created the
 cdf files so you are the right person to ask  :)

 The code I generated the # of probes per exon is:
 nProbesPerExon-readCdfNbrOfCellsPerUnitGroup(getPathname(cdf))
 nProbesPerExonVector-unlist(nProbesPerExon)


 Thanks and Have a great weekend

 Sabrina

 On Jan 22, 5:18 pm, Mark Robinson mrobin...@wehi.edu.au wrote:
 Hi Sabrina.

 See below.

 On 23/01/2009, at 3:17 AM, sabrina wrote:





  Hi, Mark:
  Thanks for the suggestions. I think I will go with one of the
  replicates for now, just to make things simple, later on I will deal
  with the replicates.
  Now, here is another problem I have. I used the following code to
  generate the firma score:

  firma -FirmaModel(plm);
  fit(firma,verbose=verbose)
  firmaScore- getFirmaScores(firma);

  exFirma-extractDataFrame(firmaScore,addNames=TRUE);
  exFirma[,6:ncol(exFirma)]-log2(exFirma[,6:ncol(exFirma)])

  then when I use limma fit, it gave me warnings:
  Warning message:
  In lmFit(exFirma[, 6:ncol(exFirma)], mm) :
   Some coefficients not estimable: coefficient interpretation may
  vary.

  It turned out that there are several rows of exFirma (even before the
  log2) were NAs. But when I check  the overall exon expressions (use
  ExonRmaPlm with para: mergeGroups=FALSE), for that specific exon (I
  take the group name as the exon id here), they have real values. I
  wonder where went wrong, perhaps fit(firma) step?

 I'll make a wager that this is due to a small number of probesets that  
 have a large (500 say) probes assigned to them.  You are using the  
 Mouse Exon array and I know there are some of these probesets in  
 there.  In the interest of time, aroma.affymetrix changes over to  
 median polish (faster), or skips the probeset altogether, depending on  
 how large.

 You can see the default settings in options().  For me (and probably  
 you as well), it is:

   options(aroma.affymetrix.settings)
 [...]

 $aroma.affymetrix.settings$models
 $aroma.affymetrix.settings$models$RmaPlm
 $aroma.affymetrix.settings$models$RmaPlm$medianPolishThreshold
 [1] 500   6

 $aroma.affymetrix.settings$models$RmaPlm$skipThreshold
 [1] 5000    1

 So, I would just remove these rows before you use lmFit().

 Cheers,
 Mark





  Thanks!

  Sabrina

  On Jan 20, 7:48 pm, Mark Robinson mrobin...@wehi.edu.au wrote:
  Hi Sabrina.

  Do you have biological replicates of some samples and technical
  replicates of others?  Or, just technical replicates of everything?

  My experiment has two groups, one has 5

[aroma.affymetrix] Re: exon array with technical replicates

2009-01-22 Thread Mark Robinson

Hi Sabrina.

See below.

On 23/01/2009, at 3:17 AM, sabrina wrote:

 Hi, Mark:
 Thanks for the suggestions. I think I will go with one of the
 replicates for now, just to make things simple, later on I will deal
 with the replicates.
 Now, here is another problem I have. I used the following code to
 generate the firma score:

 firma -FirmaModel(plm);
 fit(firma,verbose=verbose)
 firmaScore- getFirmaScores(firma);

 exFirma-extractDataFrame(firmaScore,addNames=TRUE);
 exFirma[,6:ncol(exFirma)]-log2(exFirma[,6:ncol(exFirma)])

 then when I use limma fit, it gave me warnings:
 Warning message:
 In lmFit(exFirma[, 6:ncol(exFirma)], mm) :
  Some coefficients not estimable: coefficient interpretation may
 vary.

 It turned out that there are several rows of exFirma (even before the
 log2) were NAs. But when I check  the overall exon expressions (use
 ExonRmaPlm with para: mergeGroups=FALSE), for that specific exon (I
 take the group name as the exon id here), they have real values. I
 wonder where went wrong, perhaps fit(firma) step?


I'll make a wager that this is due to a small number of probesets that  
have a large (500 say) probes assigned to them.  You are using the  
Mouse Exon array and I know there are some of these probesets in  
there.  In the interest of time, aroma.affymetrix changes over to  
median polish (faster), or skips the probeset altogether, depending on  
how large.

You can see the default settings in options().  For me (and probably  
you as well), it is:

  options(aroma.affymetrix.settings)
[...]

$aroma.affymetrix.settings$models
$aroma.affymetrix.settings$models$RmaPlm
$aroma.affymetrix.settings$models$RmaPlm$medianPolishThreshold
[1] 500   6

$aroma.affymetrix.settings$models$RmaPlm$skipThreshold
[1] 50001

So, I would just remove these rows before you use lmFit().

Cheers,
Mark






 Thanks!

 Sabrina



 On Jan 20, 7:48 pm, Mark Robinson mrobin...@wehi.edu.au wrote:
 Hi Sabrina.

 Do you have biological replicates of some samples and technical
 replicates of others?  Or, just technical replicates of everything?

 My experiment has two groups, one has 5 samples (biological
 replicates, that is 5 mice from one strain), and the other has 4
 samples. Among 5 samples of the first group, there are two samples
 hybridized twice to two arrays, so I have 4 arrays for that two
 samples
 ( That is what I meant as technical replicate). Does that make  
 sense?

 OK, now I get it.  Thanks.



 I suspect it would be difficult to justify adding a bunch of extra
 (adhoc) steps into the FIRMA pipeline.  I don't have a full
 understanding of your experiment, but what about just dealing  
 with it
 when you operate on the FIRMA scores?  When you say average on  
 plm,
 I assume this means an average of the chip effects for those two
 samples?

 Yes, that is what I meant. By averaging chipEffects, I actually
 average the gene signals from the two arrays.

 You could fit a PLM that estimates a single chip effect for
 those two samples and use that for calculating FIRMA scores.

 Do you mean to fit a plm for these two arrays for one sample
 separately from the other samples? If I just fit plm for these two
 arrays (say sample 1),will it estimate different probe affinity?  
 If it
 does , then these chipEffects (gene signals to estimate FIRMA score)
 won't be compatible , am I correct? Thanks!

 I'm not suggesting to fit multiple PLMs for the same gene and somehow
 combine them.  What I'm suggesting is a single PLM, but where there  
 is
 only 1 chip effect parameter for those 2 samples.  From a design
 matrix point of view, this is conceptually straightforward.  On the
 top of my head though, I don't know how to get the 'preprocessCore'
 code (this is used under the hood in aroma.affymetrix to do the
 fitting) to fit such a model.  It may have to be a one-off.

 Another alternative is to average the FIRMA scores (subsequent to the
 standard PLM fitting) for these 2 samples and do a 4 versus 4
 comparison to look for changes in FIRMA scores.

 And yet another alternative is to ignore this altogether.  Its
 unlikely (maybe? feel free to disagree) that 1 technical replicate
 amongst 8 biological replicates would cause you to underestimate the
 variability so much as to significantly overstate the changes you
 see ... but thats just a hunch.  Presumably, you'd also be validating
 the major discoveries that you make.

 Hope that helps.

 Mark
 

--
Mark Robinson
Epigenetics Laboratory, Garvan
Bioinformatics Division, WEHI
e: m.robin...@garvan.org.au
e: mrobin...@wehi.edu.au
p: +61 (0)3 9345 2628
f: +61 (0)3 9347 0852
--





--~--~-~--~~~---~--~~
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google

[aroma.affymetrix] Re: Can't read the CDF file

2009-01-21 Thread Mark Robinson

Anbarasu.

In the 'Setting up annotation files' page, it says Aroma.affymetrix  
searches for CDF files in the annotationData/ directory of the current  
working directory ...

so, create the annotationData directory within your current working  
directory, not within the directory where the package is installed to  
your machine.

For example, in my setup:

  getwd()
[1] /Users/mrobinson/projects/microarray/exon

  dir(/Users/mrobinson/projects/microarray/exon/annotationData/ 
chipTypes/HuEx-1_0-st-v2/)
[1] HuEx-1_0-st-v2,coreR3,A20071112,EP,monocell.CDF
[2] HuEx-1_0-st-v2,coreR3,A20071112,EP.cdf

(the monocell file is created later on)

The following command then works:

library(aroma.affymetrix)
cdf - AffymetrixCdfFile$byChipType(HuEx-1_0-st- 
v2,tags=coreR3,A20071112,EP)

Cheers,
Mark


On 22/01/2009, at 4:34 AM, anbarasu wrote:


 Hi,

 It's me again. I have just read 'Setting up annotation files '. I have
 placed the annotation file under: /Library/Frameworks/R.framework/
 Resources/library/aroma.affymetrix/annotationData/chipTypes/HuEx-1_0-
 st-v2

 and still getting the same error message. Any suggestions?

 sessionInfo()
 R version 2.8.1 (2008-12-22)
 i386-apple-darwin8.11.1

 locale:
 en_GB.UTF-8/en_GB.UTF-8/C/C/en_GB.UTF-8/en_GB.UTF-8

 attached base packages:
 [1] stats graphics  grDevices utils datasets  methods
 base

 other attached packages:
 [1] aroma.affymetrix_1.0.0 aroma.apd_0.1.3
 R.huge_0.1.6   affxparser_1.14.2  aroma.core_1.0.0
 aroma.light_1.9.2
 [7] digest_0.3.1   matrixStats_0.1.3
 R.rsp_0.3.4R.cache_0.1.7  R.utils_1.1.3
 R.oo_1.4.6
 [13] R.methodsS3_1.0.3




 On Jan 21, 4:42 pm, anbarasu anbarasu...@gmail.com wrote:
 Hi All,

 I have downloaded the CDF file 'HuEx-1_0-st- 
 v2,coreR3,A20071112,EP.cdf
 ' and trying to load it using

 cdf - AffymetrixCdfFile$byChipType(chipType,
 tags=coreR3,A20071112,EP)

 I am getting an error mssg:

 Error in list(`AffymetrixCdfFile$byChipType(chipType, tags =
 coreR3,A20071112,EP)` = environment,  :

 [2009-01-21 16:32:46] Exception: Could not create AffymetrixCdfFile
 object. No annotation chip type file with that chip type found:
 HuEx-1_0-st-v2
   at throw(Exception(...))
   at throw.default(Could not create , class(static)[1],  object.  
 No
 annotation chip type file with that chip type found: , chipType)
   at throw(Could not create , class(static)[1],  object. No
 annotation chip type file with that chip

 When I tried dir(), the file is listed. So, what's going wrong?

 dir()

  [1] A0017022.CEL
 A0017023.CEL
 A0017024.CEL
  [4] A0017025.CEL
 A0017026.CEL
 A0017027.CEL
  [7] HuEx-1_0-st-v2,coreR3,A20071112,EP.cdf

 Any suggestion would be greatly appreciated.

 Thanks.
 Anbarasu
 

--
Mark Robinson
Epigenetics Laboratory, Garvan
Bioinformatics Division, WEHI
e: m.robin...@garvan.org.au
e: mrobin...@wehi.edu.au
p: +61 (0)3 9345 2628
f: +61 (0)3 9347 0852
--





--~--~-~--~~~---~--~~
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
aroma.affymetrix group.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe from this group, send email to 
aroma-affymetrix-unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/aroma-affymetrix?hl=en
-~--~~~~--~~--~--~---



[aroma.affymetrix] Re: exon array with technical replicates

2009-01-20 Thread Mark Robinson

Hi Sabrina.


 Do you have biological replicates of some samples and technical
 replicates of others?  Or, just technical replicates of everything?

 My experiment has two groups, one has 5 samples (biological
 replicates, that is 5 mice from one strain), and the other has 4
 samples. Among 5 samples of the first group, there are two samples
 hybridized twice to two arrays, so I have 4 arrays for that two  
 samples
 ( That is what I meant as technical replicate). Does that make sense?

OK, now I get it.  Thanks.

 I suspect it would be difficult to justify adding a bunch of extra
 (adhoc) steps into the FIRMA pipeline.  I don't have a full
 understanding of your experiment, but what about just dealing with it
 when you operate on the FIRMA scores?  When you say average on plm,
 I assume this means an average of the chip effects for those two
 samples?

 Yes, that is what I meant. By averaging chipEffects, I actually
 average the gene signals from the two arrays.

 You could fit a PLM that estimates a single chip effect for
 those two samples and use that for calculating FIRMA scores.

 Do you mean to fit a plm for these two arrays for one sample
 separately from the other samples? If I just fit plm for these two
 arrays (say sample 1),will it estimate different probe affinity? If it
 does , then these chipEffects (gene signals to estimate FIRMA score)
 won't be compatible , am I correct? Thanks!


I'm not suggesting to fit multiple PLMs for the same gene and somehow  
combine them.  What I'm suggesting is a single PLM, but where there is  
only 1 chip effect parameter for those 2 samples.  From a design  
matrix point of view, this is conceptually straightforward.  On the  
top of my head though, I don't know how to get the 'preprocessCore'  
code (this is used under the hood in aroma.affymetrix to do the  
fitting) to fit such a model.  It may have to be a one-off.

Another alternative is to average the FIRMA scores (subsequent to the  
standard PLM fitting) for these 2 samples and do a 4 versus 4  
comparison to look for changes in FIRMA scores.

And yet another alternative is to ignore this altogether.  Its  
unlikely (maybe? feel free to disagree) that 1 technical replicate  
amongst 8 biological replicates would cause you to underestimate the  
variability so much as to significantly overstate the changes you  
see ... but thats just a hunch.  Presumably, you'd also be validating  
the major discoveries that you make.

Hope that helps.

Mark




--~--~-~--~~~---~--~~
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
aroma.affymetrix group.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe from this group, send email to 
aroma-affymetrix-unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/aroma-affymetrix?hl=en
-~--~~~~--~~--~--~---



[aroma.affymetrix] Re: exon array with technical replicates

2009-01-19 Thread Mark Robinson

Hi Sabrina.

Tricky question.

Do you have biological replicates of some samples and technical  
replicates of others?  Or, just technical replicates of everything?

I suspect it would be difficult to justify adding a bunch of extra  
(adhoc) steps into the FIRMA pipeline.  I don't have a full  
understanding of your experiment, but what about just dealing with it  
when you operate on the FIRMA scores?  When you say average on plm,  
I assume this means an average of the chip effects for those two  
samples?  You could fit a PLM that estimates a single chip effect for  
those two samples and use that for calculating FIRMA scores.

Hope that helps.

Cheers,
Mark

On 20/01/2009, at 9:09 AM, sabrina wrote:


 Hi, all:
 I am working on Affy Mouse Exon Array . Because of the experiment
 design and quality of the hybridization, we have two arrays hybridized
 from one mouse (same biological sample). I assume that I should treat
 these two arrays as tehcnical duplicates. If that is the case, I could
 do background correction, normalization and summary separately for
 these two arrays,(RMA, and  ExonRmaPlm), then before I use FIRMA to
 get firma scores, can I just do average on plm of these two arrays?
 But then what do I feed in FIRMAModel? ( The default one is just plm
 results  directly from ExonRmaPlm) .Or any suggestions about how to
 deal with this situation? My goal is to find novel splicing events,
 but right now I am just using core annotation to try it out. Thanks!

 Sabrina
 

--
Mark Robinson
Epigenetics Laboratory, Garvan
Bioinformatics Division, WEHI
e: m.robin...@garvan.org.au
e: mrobin...@wehi.edu.au
p: +61 (0)3 9345 2628
f: +61 (0)3 9347 0852
--





--~--~-~--~~~---~--~~
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
aroma.affymetrix group.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe from this group, send email to 
aroma-affymetrix-unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/aroma-affymetrix?hl=en
-~--~~~~--~~--~--~---



Re: Reproducing RMA with Gene ST data (Was: Re: [aroma.affymetrix] Re: How do you analyze Gene ST Data?)

2009-01-12 Thread Mark Robinson

Hi Andy.

I don't think you've gotten a response on this.  Sorry for the delay  
-- holidays.  Some comments below.


On 31/12/2008, at 1:18 AM, Andy_Paparountas wrote:


 Hi all ,

 I really find this conversation very interesting. I am trying to
 analyze a set of 3 treatment and 3 control samples of MoGeneSt10
 array. Thus far with the code pwhite shared I was able to do RMA
 Background correction , quantile normalization and got QC , RLE ,
 NUSE , density plots.

 Q1.  Is there any code to get similar results to affyQCreport? or even
 how can we use affyQCreport to get QC from these arrays?

As far as I know, affyQCreport has not been ported to  
aroma.affymetrix.  I usually make due with RLE, NUSE and density plots  
for my QC.  If there is something specific in affyQCreport that you  
like, it may be easy to port over.  Maybe you'd consider doing the  
implementation.



 Q2. I tried to export my data to an AffyBatch object in order to play
 around with older methods
 ab - extractAffyBatch(cs)

 but I got a Warning message:
 CDF enviroment package 'mogene10stv1cdf' not installed. The 'affy'
 package will later try to download from Bioconductor and install it.

 of course  'mogene10stv1cdf' does not exist as far as I know ,
 instead  we should use mogene10st.db.

 But what should the exact code be to connect the normalized data to
 the annotation contained inside mogene10st.db ?

A couple points here.  First, it looks like Bioconductor is not  
currently supporting the 'affy' way of doing things for these new (1.0  
ST) chips.  If you skim the BioC mailing list archives, the suggestion  
is to use the 'oligo' package or 'xps'.  But, then you are outside the  
world of AffyBatch objects.  So, it doesn't make sense to use  
aroma.affymetrix's 'extractAffyBatch' for these chips.

Second, I believe 'mogene10st.db' only really maps the Gene 1.0 ST  
identifiers to GO attributes, UNIGENE ids, chromosome locations and a  
whole host of other things.  I don't think the physical probe  
locations are present within 'mogene10st.db', so it is not a  
replacement for the CDF file/environment.

Hope that helps.

Mark



 I would really appreciate some help here :)

 Thanks all.


 On 5 ΔΡκ, 17:43, pwhite...@gmail.com wrote:
 Hi Mark,

 Thanks for adding flavor=oligo to RmaPlm. I verified it with the  
 new
 release and the HGU133Plus2 data I have and it all looks good.  
 Pairs plots
 are attached.

 Thanks,

 Peter

 On Thu, Dec 4, 2008 at 5:41 PM, Mark Robinson  
 mrobin...@wehi.edu.au wrote:

 Thanks Peter.

 Perhaps you can repeat this comparison after the next release (this
 will be very soon!) and split the aroma.affymetrix comparison into:

 - aroma.affy.oligo -- with RmaPlm(csN,flavor=oligo)
 - aroma.affy.affyPLM -- with flavor=affyPLM (as you've done  
 already)

 Perhaps the best way to look at all of this at once is with a single
 pairs() plot.

 Cheers,
 Mark

 On 05/12/2008, at 9:01 AM, pwhite...@gmail.com wrote:

 Dear Mark and Henrik,

 I wanted to confirm that your summary was correct regarding the
 different flavors for probeset summarization. I downloaded the MAQC
 HG_U133_Plus_2 array data from the MAQC website:

 http://edkb.fda.gov/MAQC/MainStudy/upload/MAQC_AFX_123456_120CELs.zip

 I then ran the analysis of the arrays from site 1, using just the A
 and B samples, with aroma.affymetrix, affy, affyPLM and oligo (see
 below for the complete code I used to do this). Basically the
 aroma.affymetrix and affyPLM data was essentially identical. The
 affy and oligo data was also essentially identical. As observed  
 with
 the Gene ST array data there were significant differences between
 aroma.affymetrix and affy or oligo. Plots are attached.

 The Gene ST arrays do not have any MM probes - as we are using RMA
 rather than GCRMA this should not have affected anything.

 Thanks,

 Peter

 #OLIGO ANALYSIS

 library(pd.hg.u133.plus.2)
 library(pdInfoBuilder)
 fn - dir(G:\\BGC_EXPERIMENTS\\MAQC_Data\\HG-
 U133_Plus_2,CEL,full=T)[1:10]
 raw.oligo-read.celfiles(filenames=fn,pkgname=pd.hg.u133.plus.2)
 eset.oligo-rma(raw.oligo)
 data.oligo-exprs(eset.oligo)

 #AFFY ANALYSIS

 library(affy)
 fn - dir(G:\\BGC_EXPERIMENTS\\MAQC_Data\\HG-
 U133_Plus_2,CEL,full=T)[1:10]
 raw.affy - ReadAffy(filenames=fn)
 eset.affy - rma(raw.affy)
 data.affy - exprs(eset.affy)

 #AFFY PLM ANALYSIS

 library(affyPLM)
 fn - dir(G:\\BGC_EXPERIMENTS\\MAQC_Data\\HG-
 U133_Plus_2,CEL,full=T)[1:10]
 raw.affyPLM - ReadAffy(filenames=fn)
 fit.affyPLM - fitPLM(raw.affyPLM, verbos=9)
 data.affyPLM - coefs(fit.affyPLM)
 #Analysis of MAQC on Human U113 Plus 2

 setwd(G:\\BGC_EXPERIMENTS\\MAQC_Analysis)
 library(aroma.affymetrix)
 prefixName - MAQC_Data
 chip1 - HG-U133_Plus_2
 cdf - AffymetrixCdfFile$fromChipType(HG-U133_Plus_2)
 cs - AffymetrixCelSet$byName(prefixName, cdf=cdf, chipType=chip1)
 pattern - AFX_1_[AB]
 idxs - grep(pattern, getNames(cs))
 cs - extract(cs, idxs)
 bc - RmaBackgroundCorrection(cs)
 csBC - process(bc)
 qn

[aroma.affymetrix] Re: SNP 6.0 processing

2008-12-16 Thread Mark Robinson

Hi Anguraj.

You'll need to give more information for us to be able to help you.

First of all, give us the output of 'sessionInfo()' (are you using the  
latest version?) and perhaps what you can do is try and repeat your  
sequence of commands from a *new* R session, call  
'library(aroma.affymetrix)' and then all of the commands leading up to  
this error.

Start with that.

Cheers,
Mark

On 17/12/2008, at 4:50 AM, angu wrote:


 Hi,
 I am trying to process SNP6.0 CEL files to get CN data. I am getting
 the following error. Could anybody help me to figure it out?

 plm - AvgCnPlm(csC, mergeStrands=TRUE, combineAlleles=TRUE, shift=
 +300)
 fit(plm, verbose=verbose)
 Error in UseMethod(getCdf) : no applicable method for getCdf
 Calls: fit - fit.ProbeLevelModel - getCdf
 Execution halted

 I haven't pasted all of my code. I performed this as per the
 instructions provided in pages.

 Thanks,

 Anguraj


 

--
Mark Robinson
Epigenetics Laboratory, Garvan
Bioinformatics Division, WEHI
e: m.robin...@garvan.org.au
e: mrobin...@wehi.edu.au
p: +61 (0)3 9345 2628
f: +61 (0)3 9347 0852
--





--~--~-~--~~~---~--~~
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
aroma.affymetrix group.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe from this group, send email to 
aroma-affymetrix-unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/aroma-affymetrix?hl=en
-~--~~~~--~~--~--~---



[aroma.affymetrix] Re: Batch Effects

2008-12-15 Thread Mark Robinson

Hi Sarah.

There is currently nothing readily available for batch effect  
adjustment in aroma.affymetrix (that I know of).  But, there are ways  
of doing it that wouldn't be too difficult to do as a once-off.  What  
did you have in mind?

... however, even with a batch adjustment, it may still be difficult  
to get reliable results on such an experiment.

Cheers,
Mark



On 15/12/2008, at 10:57 PM, srgrey...@gmail.com wrote:


 Hi All,
 I have six Rat Exons arrays that were done about two years apart.  The
 arrays done at each timepoint normalize very well to each other, but I
 cannot get decent normalization between batches.  Is there a way to
 adjust for batch effects in aroma.affy that can be incorporated into
 an exon array analysis?

 Thank you,
 Sarah

 

--
Mark Robinson
Epigenetics Laboratory, Garvan
Bioinformatics Division, WEHI
e: m.robin...@garvan.org.au
e: mrobin...@wehi.edu.au
p: +61 (0)3 9345 2628
f: +61 (0)3 9347 0852
--





--~--~-~--~~~---~--~~
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
aroma.affymetrix group.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe from this group, send email to 
aroma-affymetrix-unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/aroma-affymetrix?hl=en
-~--~~~~--~~--~--~---



[aroma.affymetrix] Re: FIRMA score

2008-12-10 Thread Mark Robinson

 I assume that you referred me to the probeset annotation csv file ,
 not the transcript annotation csv file. :)

 I did get the exon coordinates! it seemed to me that the probeset_id
 in the probeset annotation file is equivalent to groupName of
 exFirma.

Yes!  The transcript CSV file may be useful for other things, but  
indeed not for exon coordinates.

In aroma.affymetrix terminology for exon arrays,  
unit=transcript_cluster_id, group=probeset_id and cell=probe.


 One question I have is in the exon array plot from GenomeGraphs, the
 unrData is the probe level data, in other words, in my case, I should
 use the normalized data, csN, is that correct?

If you look closely at the Purdom paper, there are a few things you  
may want to plot in the context of Ensembl annotation.  The normalized  
data is the obvious one, which you have linked to an object 'csN'.

In addition, you may want to plot the residuals:

rs - calculateResiduals(plmTr)  # plmTr is the fitted ExonRmaPlm with  
mergeGroups=TRUE
d - extractMatrix(rs,cells=...)

... or for example you may want to plot the raw data, adjusted by the  
probe effects.  For example, RMA fits the model

Y_{ij} = a_i + b_j

(Y_{ij} = normalized data, a_i = chip effects, b_j = probe effects),  
you may be interested in plotting:

Y_{ij} - \hat{b}_j

... i.e. raw data, adjusted by the estimated probe effects.

 another question is graph related though. Since I have two groups
 among my 9 samples, when I plot exon array, is there any way to use
 different color coding for these samples on the same graph? Thanks!

Short answer:

library(GenomeGraphs)
?ExonArray-class

Long answer:  create your ExonArray object and use DisplayPars slot to  
specify colour and line width:

ea1-new(ExonArray, intensity = d, probeStart = ..., probeEnd=...,
  probeId = ..., nProbes = ..., dp = DisplayPars(color =  
col, lwd=lwd,
  mapColor = dodgerblue2,plotMap=TRUE))

where 'col' and 'lwd' are either length 1 (in which case all lines get  
the same width and colour) or vectors of the length of the number of  
samples ...

Thanks for all these questions Sabrina.  This will help make a nice  
vignette ... when I get time :)

Unless you'd be interested in summarizing your approach! :)

Cheers,
Mark


 Sabrina

 On Dec 9, 5:02 pm, Mark Robinson [EMAIL PROTECTED] wrote:
 Hi Sabrina.

 Great work!  See below.

 On 10/12/2008, at 5:27 AM, sabrina wrote:





 Hi, Mark!
 Thank you so much! I think I pretty much figured out how to get the
 gene level, exon level expressions and comparison done. I checked  
 the
 GenomeGraphs as you suggested. I tried the code in the exon array
 section, and it worked fine. So here is the question related to my
 data set.

 In order to plot exon data, I used the following code to get exon
 summary (intensities) which is pbsetSummLog2:

 plmNoMerge- ExonRmaPlm(csN, mergeGroups=FALSE)
 fit(plmNoMerge)
 readUnits(plmNoMerge,units=1)

 chpNoMerge-getChipEffectSet(plmNoMerge)

 pbsetSumm-extractMatrix(chpNoMerge,returnUgcMap=TRUE)
 pbsetSummLog2-log2(pbsetSumm);
 pbsetNames-readCdfGroupNames(getPathname(getCdf(chpNoMerge)),
unit=unique(attr(pbsetSumm,unitGroupCellMap)[,unit]))
 rownames(pbsetSumm)-unlist(pbsetNames)
 rownames(pbsetSummLog2)-unlist(pbsetNames)

 Because each significant exon that was detected by FIRMA is  
 associated
 with one gene, and that gene has several exons, therefore, I used  
 the
 following to find all exons associated to that gene:
 (x is the result from topTable)

 temp-grep(exFirma[x$ID,1][1],exFirma[,1]);
 gp_temp-exFirma[temp,2]

 gene1-pbsetSummLog2[temp,]

 which gives me:
  array1  array 2 ...
 4308385  6.3387846.896304
 4376965  1.9731712.272406

 I know that 430835 is the groupName (aka exon id), but I am not  
 sure
 where to find the start and end position of these individual exons,
 can you show me how to do it?

 The reason I asked this is because in makeExonArray, I need the
 probeStart and probeEnd positions. Thanks!!!

 Well, you can get the probeset start and end positions from the
 'NetAffx Annotation Files' from Affymetrix.  For human, this would be
 at:

 http://www.affymetrix.com/products_services/arrays/specific/ 
 exon.affx...

 I think you said you were using mouse, so you can find this at:

 http://www.affymetrix.com/products_services/arrays/specific/ 
 mouse_exo...

 Find the 'Current NetAffx Annotation Files' section and download the
 csv.zip file.  Just to have a quick peak at a few columns of this
 file, if I run a unix tool called awk on the CSV file you get:

 
 awk '{FS=,; print $7,$1,$2,$3,$4,$5,$16}' HuEx-1_0-st-
 v2.na25.hg18.probeset.csv | grep -v ^  | more
 

 transcript_cluster_id probeset_id seqname strand start
 stop level
 ...
 2315373 2315374 chr1 + 742655 742719 core
 2315373 2315375 chr1 + 742869 743231 core
 2315373 2315376 chr1 + 743293 743434 core
 2315373 2315377 chr1 + 744094 744979 core
 2315380

Re: Reproducing RMA with Gene ST data (Was: Re: [aroma.affymetrix] Re: How do you analyze Gene ST Data?)

2008-12-03 Thread Mark Robinson
 the mailing list?


 Easy! Now to get the same data using the Affy packages:


 BIOCONDUCTOR AFFY

 You first need to create or download your mogene10stv1cdf library  
 from
 the Affy unsupported CDF file (https://stat.ethz.ch/pipermail/bioc-
 devel/2007-October/001403.html has some detail on how to do this).
 However, as Mark Robinson pointed out there are potential issues with
 using the Affy unsupported CDF files. See the following for some
 details:

 https://stat.ethz.ch/pipermail/bioconductor/2007-November/020188.html

 library(affy)
 AffyRaw - ReadAffy()
 AffyEset - rma(AffyRaw)
 data.affy - exprs(AffyEset)


 BIOCONDUCTOR OLIGO

 Download all the required Affy annotation files to your Mouse Gene v1
 ST array directory:

 http://www.affymetrix.com/support/technical/byproduct.affx?product=mogene-1_0-st-v1

 setwd(P:\\ANNOTATION\\AffyAnnotation\\Mouse\\MoGene-1_0-st-v1)
 library(pdInfoBuilder)
 pgfFile - MoGene-1_0-st-v1.r3.pgf
 clfFile - MoGene-1_0-st-v1.r3.clf
 transFile - MoGene-1_0-st-v1.na26.mm9.transcript.csv
 probeFile - MoGene-1_0-st-v1.probe.tab
 pkg - new(AffyGenePDInfoPkgSeed, author=Peter White,
 email=[EMAIL PROTECTED], version=0.1.3,
 genomebuild=UCSC mm9,  July 2007, chipName=MoGene10stv1,
 manufacturer=affymetrix, biocViews=AnnotationData,
 pgfFile=pgfFile, clfFile=clfFile, transFile=transFile,
 probeFile=probeFile)
 makePdInfoPackage(pkg, destDir=.)

 #This takes a little while to make the Package. Once created you will
 need to install the package from the Windows DOS prompt (navigate to
 the annotation directory with the newly created pd package to be
 installed):

 R CMD INSTALL pd.mogene.1.0.st.v1\

 Note for this to work you need RTools and you Path variable set up
 correctly as described at:

 http://cran.r-project.org/doc/manuals/R-admin.html#The-Windows-toolset)

 Now return to R, set the working directory to your CEL file  
 directory:

 library(pd.mogene.1.0.st.v1)
 library(oligo)
 OligoRaw-read.celfiles(filenames=list.celfiles())
 OligoEset-rma(OligoRaw)
 data.oligo-exprs(OligoEset)


 COMPARING THE TWO DATASETS

 Here is what I did to compare the data generate by affy, oligo and
 aroma.affymetrix:

 dim(data.aroma)
 [1] 3551216
 dim(data.affy)
 [1] 3551216
 length(grep(TRUE, rownames(data.affy)==rownames(data.aroma)))
 [1] 35512

 FYI, sum(rownames(data.affy)==rownames(data.aroma)) gives you the
 same.  Replacing sum() with summary() will also work.


 The output from both the affy rma and aroma.affymetrix methods  
 retains
 the same order of probes and cel files so the two files can be
 compared directly.

 That is probably because they work of the same CDF, but you should
 never rely on this/assume that this is always the case.  If you do,
 you should at least verify that the unit names (and group names)
 match.

 However,

 dim(data.oligo)
 [1] 3555716

 The normalized data file from the Oligo package includes an  
 additional
 45 Transcript IDs (there's no annotation on what these are but they
 contain anywhere from 9 to 489 probes per probeset).

 For the record, would you mind posting the names of these additional
 45 units here?  (I'm sure someone else will search the web later and
 find this thread very helpful).

 Fixed this problem as follows:

 o - match(rownames(data.aroma), rownames(data.oligo))
 data.oligo - data.oligo[o,]

 length(grep(TRUE, rownames(data.affy)==rownames(data.oligo)))
 [1] 35512
 length(grep(TRUE, rownames(data.aroma)==rownames(data.oligo)))
 [1] 35512

 Finally, there was one more issue with the aroma data. All elements  
 in
 the 18th row of the dataset were flagged Na. This transcript ID for
 this probeset was 10338063. Looking at the Affy annotation this
 appears to be a control probeset with 6,515 probes. Could it have  
 been
 flagged Na by aroma.affymetrix becuase of this (it was OK with the
 oligo and affy rma analyses)??

 Nicely spotted.  Voila'.  From aroma.affymetrix's NEWS file:

 Version: 0.9.0 [2008-02-29]
 o TIME OPTIMIZATION: Now RmaPlm and ExonRmaPlm turn to median polish
  if there are more than 500 cells *and* 6 arrays in the unit group.
  Option: aroma.affymetrix.settings$models$RmaPlm 
 $medianPolishThreshold.
  Moreover, if the unit group is ridiculously large (5000 cells), the
  unit group is skipped and all returned estimates are NAs.
  Option: aroma.affymetrix.settings$models$RmaPlm$skipThreshold.


 e- (data.aroma - data.affy)
 mean(as.vector(e^2), na.rm=T)
 [1] 0.1253547
 sd(as.vector(e^2), na.rm=T)
 [1] 0.2717275

 e - (data.aroma - data.oligo)
 mean(as.vector(e^2), na.rm=T)
 [1] 0.1239203
 sd(as.vector(e^2), na.rm=T)
 [1] 0.2653593

 As you can see the data does not pass your mean and sd cutoffs of
 0.0001.

 e- (data.affy - data.oligo)
 mean(as.vector(e^2), na.rm=T)
 [1] 0.001484371
 sd(as.vector(e^2), na.rm=T)
 [1] 0.002523521

 The difference between the affy and oligo analysis is much less
 striking. To visualize these differences I did the following plot, as
 an example I am just showing the data from

  1   2   >