[aroma.affymetrix] Discrepancy between .cdf and .cel files versions. GCRMA with custom probe file and negative controls

Marcin Jakub Kamiński Mon, 25 Aug 2014 10:51:30 -0700

Hello,
I'm analyzing data from *Affymetrix HuGene-1_0-st-v1* chips using both RMA 
and GCRMA methods.
For this purpose I'm using the binary CDF file (HuGene-1_0-st-v1,r3.cdf.gz) 
provided at http://aroma-project.org/chipTypes/HuGene-1_0-st-v1. (please 
note, that the link is wrong and pointing to the Ensembl version. The one 
I'm using, was downloaded directly from the directory: 
http://aroma-project.org/data/annotationData/chipTypes/HuGene-1_0-st-v1/ ).


The background adjustment (both rma and gcrma) completes successfully, but 
during the process I'm getting the following messages for each file:

> Cannot create CEL file of version 4 
> (probeData/partProjectGene,rma/HuGene-1_0-st-v1/1Z-1_(HuGene-1_0-st-v1).CEL.tmp).
>  
> Template CEL file is of version 1: 
> rawData/partProjectGene/HuGene-1_0-st-v1/1Z-1_(HuGene-1_0-st-v1).CEL

I guess it's because the versions of CELs and .cdf differ, but I have no 
idea, whether it affects the analysis output somehow. Should I be worried?

Since I'm quite new to microarray normalization and aroma, the mechanism of 
gcrma normalization is partially unclear to me. 
>From what I've learned reading other topics/lists, because HuGene array is 
PM-only, I should use 'affinities' model and point to control probes.
Probe.tab file for HuGene chip is quite different from the probe.tab for 
other chips (eg. HG-U133_Plus_2):
> head(HuGene-1_0-st-v1, 2)
  Probe.ID Transcript.Cluster.ID probe.x probe.y          assembly seqname 
start  stop strand            probe.sequence target.strandedness category
1   438514               7896736     663     417 build-GRCh37/hg19    chr1 
54904 54928      + AATGGCTTGTCCCTGTATTCTCAGC               Sense     main
2   685482               7896736     881     652 build-GRCh37/hg19    chr1 
54906 54930      + GCAATGGCTTGTCCCTGTATTCTCA               Sense     main
> head(HG-U133_Plus_2, 2)
  Probe.Set.Name Probe.X Probe.Y Probe.Interrogation.Position            
Probe.Sequence Target.Strandedness
1      1007_s_at     718     317                         3330 
CACCCAGCTGGTCCTGTGGATGGGA           Antisense
2      1007_s_at    1105     483                         3443 
GCCCCACTGGACAACACTGATTCCT           Antisense

Because of that, I modified the probe.tab file based on the following 
guide: http://compbio.sysbiol.cam.ac.uk/Resources/GeneST/ and it presents 
as below. 
Now, the former *"Probe.ID" has been used as *
*"Probe.Interrogation.Position"*. This of course no longer makes sense, but 
can it somehow affect the gcrma background adjustment, is this column even 
utilized by the script?
  Probe Set Name Probe X Probe Y Probe Interrogation Position            
Probe Sequence Target Strandedness
1        7896736     663     417                       438514 
AATGGCTTGTCCCTGTATTCTCAGC               Sense
2        7896736     881     652                       685482 
GCAATGGCTTGTCCCTGTATTCTCA               Sense


Another concern is about the probes I'm using to compute affinities. 
> table(HuGene-1_0-st-v1$category)
            control->affx control->bgp->antigenomic         main    normgene
->exon   normgene->intron  rescue->FLmRNA->unmapped 
                     4649                     16943       818005           
   4517              10990                      6389

Now I'm utilizing antigenomic probes only, running the following command:
ctrlAntiIndex <- which(HuGene-1_0-st-v1$category == 
'control->bgp->antigenomic')

bcGcA <- GcRmaBackgroundCorrection(cs, tags=c('gcrma','affinities'), 
type='affinities'
 , indicesNegativeControl = ctrlAntiIndex)
However I'm not sure, if I shouldn't use 'control->affx' probes too (or 
maybe instead)?. If not, should I filter them out, especially knowing that 
some of them are <25mers, so could inappropriately affect the background 
correction? 

Also, I'm not entirely sure which index for 'indicesNegativeControl= ' 
parameter should I provide. Currently it's the index of entry in the 
probe.tab file, but I don't know if it should also match the .cdf file, 
since both come from different sources.

I'd be happy if you could help me with any of those issues. 

Best regards,
Marcin Kaminski


> sessionInfo()
R version 3.1.1 (2014-07-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)


locale:
[1] LC_COLLATE=Polish_Poland.1250  LC_CTYPE=Polish_Poland.1250   
 LC_MONETARY=Polish_Poland.1250 LC_NUMERIC=C                   
LC_TIME=Polish_Poland.1250    


attached base packages:
[1] grid      parallel  stats     graphics  grDevices utils     datasets 
 methods   base     


other attached packages:
 [1] VennDiagram_1.6.7           dendextend_0.17.1           bioDist_1.36.0 
             KernSmooth_2.23-12          RColorBrewer_1.0-5         
 [6] limma_3.20.8                simpleaffy_2.40.0           
genefilter_1.46.1           preprocessCore_1.26.1       aroma.light_2.0.0   
       
[11] matrixStats_0.10.1          aroma.affymetrix_2.12.4     
aroma.core_2.12.4           R.devices_2.9.2             R.filesets_2.5.9   
        
[16] R.utils_1.32.6              R.oo_1.18.2                 affyPLM_1.40.1 
             R.methodsS3_1.6.2           affxparser_1.36.0          
[21] hugene10stv1gcrmacdf_1.40.0 AnnotationForge_1.6.1       
org.Hs.eg.db_2.14.0         RSQLite_0.11.4              DBI_0.2-7           
       
[26] gcrma_2.36.0                affy_1.42.3                 
AnnotationDbi_1.26.0        GenomeInfoDb_1.0.2          Biobase_2.24.0     
        
[31] BiocGenerics_0.10.0         makecdfenv_1.40.0           affyio_1.32.0 
              BiocInstaller_1.14.2        rj_2.0.2-1                 


loaded via a namespace (and not attached):
 [1] annotate_1.42.1   aroma.apd_0.5.0   base64enc_0.1-2   
Biostrings_2.32.1 digest_0.6.4      DNAcopy_1.38.1    IRanges_1.22.10   
magrittr_1.0.1   
 [9] PSCBS_0.43.0      R.cache_0.10.0    R.huge_0.8.0      R.rsp_0.19.3     
 rj.gd_2.0.0-1     splines_3.1.1     stats4_3.1.1      survival_2.37-7  
[17] tools_3.1.1       whisker_0.3-2     XML_3.98-1.1      xtable_1.7-4     
 XVector_0.4.0     zlibbioc_1.10.0  

-- 
-- 
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
"aroma.affymetrix" group with website http://www.aroma-project.org/.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe and other options, go to http://www.aroma-project.org/forum/

--- 
You received this message because you are subscribed to the Google Groups 
"aroma.affymetrix" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to aroma-affymetrix+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[aroma.affymetrix] Discrepancy between .cdf and .cel files versions. GCRMA with custom probe file and negative controls

Reply via email to