[aroma.affymetrix] Re: X chromosome CNV analysis

Henrik Bengtsson Thu, 09 Oct 2008 14:15:06 -0700

Hi.

On Tue, Oct 7, 2008 at 6:33 AM, marco <[EMAIL PROTECTED]> wrote:
>
> Dear Henrik,
>
>  seem that the file is found, but I still have the same problem. I
> wonder if the format might be confusing for the software?


Yes, it looks like it finds the annotationData/samples/ploidy.saf
file.  As you say, it might be that the format of you file is
incorrect.  See what you get if you do:

sas <- SampleAnnotationSet$fromPath("annotationData/samples");  # next
rel: byName()
saf <- getFile(sas, indexOf(sas, "ploidy"));
data <- readDataFrame(saf);
print(data);

You should get a data frame where each row is a sample and the columns
are all possible attributes.  For instance, using the HapMap270.saf
file I referred to in an earlier message you get:

# Warning to people reading this (now and in the future):
# This is still part of the internal API and hence not documented.
sas <- SampleAnnotationSet$fromPath("annotationData/samples");
saf <- getFile(sas, indexOf(sas, "HapMap270"));
data <- readDataFrame(saf);

    name      familyID individualID fatherID motherID gender   population tags
1,] "NA12003" "1420"   "9"          "NA"     "NA"     "male"   "CEU"      "XY"
2,] "NA12004" "1420"   "10"         "NA"     "NA"     "female" "CEU"      "XX"
3,] "NA10838" "1420"   "1"          "9"      "10"     "male"   "CEU"      "XY"

> Actually the important thing is that the arrays are processed with the
> right number of X/Y chromosomes.
> Is there any other way to check it?

Not other than checking the attributes (and highly detailed verbose output).

Cheers

/Henrik

>
> Cheers
> Marco
>
>> cs       <- AffymetrixCelSet$fromName("ESC_IBD", cdf=cdf,verbose=-20)
> Defining AffymetrixCelSet from files...
>  Defining an AffymetrixCelSet object from files...
>  Path: rawData/ESC_IBD/GenomeWideSNP_6
>  Pattern: [.](c|C)(e|E)(l|L)$
>  File class: AffymetrixCelFile
>  Scanning directory for files...
>   Found 26 files.
>  Scanning directory for files...done
>  Defining 26 files...
> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20,
> 21, 22, 23, 24, 25, 26,
>  Defining 26 files...done
>  Allocating a new AffymetrixCelSet instance...
>   Arguments:
>   Number of files:26
>    list()
>  Allocating a new AffymetrixCelSet instance...done
>  Updating newly allocated AffymetrixCelSet...
>   Updating AffymetrixCelSet...
>    Scanning for and applying sample annotation files...
>     Defining 1 files...
> 1,
>     Defining 1 files...done
>     SampleAnnotationSet:
>     Name: annotationData
>     Tags:
>     Full name: annotationData
>     Number of files: 1
>     Names: ploidy
>     Path (to the first file): annotationData/samples
>     Total file size: 0.00MB
>     RAM: 0.00MB
>    Scanning for and applying sample annotation files...done
>   Updating AffymetrixCelSet...done
>  Updating newly allocated AffymetrixCelSet...done
>  Defining an AffymetrixCelSet object from files...done
>  Retrieved files: 26...
>  The chip type according to the path is: GenomeWideSNP_6
>  Since 'checkChipType=FALSE', then the chip type specified by the
> directory name is used: GenomeWideSNP_6
>  Using prespecified CDF: GenomeWideSNP_6,Full
>  Updating the CDF for all files...
>  Updating the CDF for all files...done
>  Updating AffymetrixCelSet...
>   Scanning for and applying sample annotation files...
>    Defining 1 files...
> 1,
>    Defining 1 files...done
>    SampleAnnotationSet:
>    Name: annotationData
>    Tags:
>    Full name: annotationData
>    Number of files: 1
>    Names: ploidy
>    Path (to the first file): annotationData/samples
>    Total file size: 0.00MB
>    RAM: 0.00MB
>   Scanning for and applying sample annotation files...done
>  Updating AffymetrixCelSet...done
>  Retrieved files: 26...done
> Defining AffymetrixCelSet from files...done
>> cf    <- getFile(cs, indexOf(cs, "MD1"));
>> attrs <- getAttributes(cf);
> Error in order(names(attrs)) : argument 1 is not a vector
>>
>
>
>
> On Oct 5, 1:54 am, "Henrik Bengtsson" <[EMAIL PROTECTED]> wrote:
>> Hi.
>>
>>
>>
>> On Thu, Oct 2, 2008 at 2:39 AM, marco <[EMAIL PROTECTED]> wrote:
>>
>> > Dear Henrik,
>>
>> >  I tried the X chromosome variant but I am not sure I can het it to
>> > work.
>> > I made a *.saf file and place it into annotationData/samples/
>> > File looks like this:
>>
>> > name:MD10
>> > tags:XY
>> > name:MD11
>> > tags:XY
>> > name:MD12
>> > tags:XY
>> > name:MD13
>> > ...
>> > ...
>>
>> This looks correct to me.  Maybe you should try to add an empty line
>> between the entries.  What is the full filename of this file?
>>
>>
>>
>>
>>
>> > Anyway I cannot get the function getAttributes to work, so I am unsure
>> > if the *.saf is read correctly.
>> > Below is the output:
>> >> cdf      <- AffymetrixCdfFile$fromChipType("GenomeWideSNP_6", tags="Full")
>> >> print(cdf)
>> > AffymetrixCdfFile:
>> > Path: annotationData/chipTypes/GenomeWideSNP_6
>> > Filename: GenomeWideSNP_6,Full.cdf
>> > Filesize: 470.44MB
>> > Chip type: GenomeWideSNP_6,Full
>> > RAM: 0.00MB
>> > File format: v4 (binary; XDA)
>> > Dimension: 2572x2680
>> > Number of cells: 6892960
>> > Number of units: 1881415
>> > Cells per unit: 3.66
>> > Number of QC units: 4
>> >> gi       <- getGenomeInformation(cdf)
>> >> print(gi)
>> > UgpGenomeInformation:
>> > Name: GenomeWideSNP_6
>> > Tags: Full,na24,HB20080214
>> > Pathname: annotationData/chipTypes/GenomeWideSNP_6/
>> > GenomeWideSNP_6,Full,na24,HB20080214.ugp
>> > File size: 8.97MB
>> > RAM: 0.00MB
>> > Chip type: GenomeWideSNP_6,Full
>> >> si       <- getSnpInformation(cdf)
>> >> print(si)
>> > UflSnpInformation:
>> > Name: GenomeWideSNP_6
>> > Tags: Full,na24,HB20080214
>> > Pathname: annotationData/chipTypes/GenomeWideSNP_6/
>> > GenomeWideSNP_6,Full,na24,HB20080214.ufl
>> > File size: 7.18MB
>> > RAM: 0.00MB
>> > Chip type: GenomeWideSNP_6,Full
>> > Number of enzymes: 2
>> >> cs       <- AffymetrixCelSet$fromName("ESC", cdf=cdf)
>>
>> If you do
>>
>> cs <- AffymetrixCelSet$fromName("ESC", cdf=cdf, verbose=-20)
>>
>> You should see from the output showing what SAF files are located and
>> that they are read, e.g.
>>
>> cdf <- AffymetrixCdfFile$byChipType("Mapping50K_Hind240");
>> csR <- AffymetrixCelSet$byName("HapMap270,100K,CEU,testSet", cdf=cdf,
>> verbose=-20);
>>
>> ...
>> 20081004 16:47:49|  Allocating a new AffymetrixCelSet instance...done
>> 20081004 16:47:49|  Updating newly allocated AffymetrixCelSet...
>> 20081004 16:47:49|   Updating AffymetrixCelSet...
>> 20081004 16:47:49|    Scanning for and applying sample annotation files...
>>      SampleAnnotationSet:
>>      Name: annotationData
>>      Tags:
>>      Full name: annotationData
>>      Number of files: 7
>>      Names: 000.default, AGRF_2007a, ..., HapMap270
>>      Path (to the first file): annotationData/samples
>>      Total file size: 0.03MB
>>      RAM: 0.00MB
>> 20081004 16:47:50|    Scanning for and applying sample annotation 
>> files...done
>> 20081004 16:47:50|   Updating AffymetrixCelSet...done
>> 20081004 16:47:50|  Updating newly allocated AffymetrixCelSet...done
>> 20081004 16:47:50| Defining an AffymetrixCelSet object from files...done
>> ...
>>
>>
>>
>> >> print(cs)
>> > AffymetrixCelSet:
>> > Name: ESC
>> > Tags:
>> > Path: rawData/ESC/GenomeWideSNP_6
>> > Platform: Affymetrix
>> > Chip type: GenomeWideSNP_6,Full
>> > Number of arrays: 26
>> > Names: MD10, MD11, ..., VT06_TER2102EP
>> > Time period: 2008-07-11 11:09:02 -- 2008-09-03 14:47:43
>> > Total file size: 1712.75MB
>> > RAM: 0.04MB
>> >> cf    <- getFile(cs, indexOf(cs, "MD10"));
>> > AffymetrixCelFile:
>> > Name: MD10
>> > Tags:
>> > Pathname: rawData/ESC/GenomeWideSNP_6/MD10.CEL
>> > File size: 65.88MB
>> > RAM: 0.01MB
>> > File format: v1 (binary; CC)
>> > Platform: Affymetrix
>> > Chip type: GenomeWideSNP_6,Full
>> > Timestamp: 2008-07-17 19:31:03
>> >> attrs <- getAttributes(cf);
>> > Error in order(names(attrs)) : argument 1 is not a vector
>>
>> This does indeed indicate that there were no attributes set, i.e. it
>> looks like the *.saf file was not located.  (In next release, this
>> will return NULL instead of giving an error).
>>
>> Did the above help?
>>
>> /Henrik
>>
>> >> sessionInfo()
>>
>> > R version 2.7.2 (2008-08-25)
>> > x86_64-unknown-linux-gnu
>>
>> > locale:
>> > LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.UTF-8;LC_MONETARY=C;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATION=C
>>
>> > attached base packages:
>> > [1] stats     graphics  grDevices datasets  utils     methods
>> > base
>>
>> > other attached packages:
>> >  [1] aroma.affymetrix_0.9.4 aroma.apd_0.1.3
>> > R.huge_0.1.6
>> >  [4] affxparser_1.12.2      aroma.core_0.9.4
>> > sfit_0.1.5
>> >  [7] aroma.light_1.8.1      digest_0.3.1
>> > matrixStats_0.1.3
>> > [10] R.rsp_0.3.4            R.cache_0.1.7
>> > R.utils_1.0.4
>> > [13] R.oo_1.4.6             R.methodsS3_1.0.3
>>
>> > Best Regards
>>
>> > Marco
>>
>> > On Sep 19, 10:10 pm, "Henrik Bengtsson" <[EMAIL PROTECTED]>
>> > wrote:
>> >> Hi.
>>
>> >> On Fri, Sep 19, 2008 at 4:35 AM, marco <[EMAIL PROTECTED]> wrote:
>>
>> >> > Dear List,
>>
>> >> >  I wonder about how X chromosome is treated in aroma.affymetrix.
>> >> > Is the mix of male/female samples somehow taken into account, or X is
>> >> > processed as any other chromosome?
>>
>> >> Good idea ;)   Yes, the CRMA model does take into account the fact
>> >> that different samples have different ploidies on ChrX when using the
>> >> pool of arrays as a reference.  The idea is to calculate the robust
>> >> average across all arrays and correct for the bias that
>> >> non-copy-neutral samples introduce.   See Section '3.2.7 Reference
>> >> signals' in:
>>
>> >> H. Bengtsson; R. Irizarry; B. Carvalho; T. Speed, Estimation and
>> >> assessment of raw copy numbers at the single locus level,
>> >> Bioinformatics, 2008. [pmid: 18204055] [doi:
>> >> 10.1093/bioinformatics/btn016]
>>
>> >> for more details.  The model/method requires that at least one sample
>> >> is copy neutral, i.e. you need at least one "female" in order to
>> >> estimate a diploid reference on ChrX.   The same bias-correction
>> >> method can also be used when some of samples are say trisomy 21.   For
>> >> ChrY, our current model cannot give you a *diploid* ChrY reference,
>> >> but a *copy neutral* one, i.e. CN=1 (requires at least one "male").
>> >> To the best of my understand, none of the other methods out there use
>> >> this, but instead it is common to see that only female samples are
>> >> used for the ChrX reference.  See the above CRMA paper to see how much
>> >> the ChrX CN estimates are improved when you use the above
>> >> bias-corrected method instead.
>>
>> >> > In these case the female and males test array are supposed to have
>> >> > log2 values on the average over and below zero?
>>
>> >> I'm note sure what you mean by this, but maybe the above answered this
>> >> question too.
>>
>> >> So, how do you do this in aroma.affymetrix?  I have on purpose avoided
>> >> giving the details on this until someone asks for it, because it
>> >> involved the use of a new kind of non-finalized sample annotation
>> >> files (SAFs).  I don't want to bother people with alpha versions in
>> >> case the API/format changes.  It is unlikely that it will change much
>> >> but if you can accept that it might change, I created a new vignette
>> >> explaining how to do it:
>>
>> >> Vignette 'Sex-chromosome bias-corrected reference signals from pooled 
>> >> average'http://groups.google.com/group/aroma-affymetrix/web/sex-chromosome-bi...
>>
>> >> It also illustrated a lot of other things so it might be useful for 
>> >> others too.
>>
>> >> Hope this helps
>>
>> >> Henrik
>>
>> >> > Regards
>>
>> >> > Marco
> >
>

--~--~---------~--~----~------------~-------~--~----~
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
"aroma.affymetrix" group.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/aroma-affymetrix?hl=en
-~----------~----~----~----~------~----~------~--~---

[aroma.affymetrix] Re: X chromosome CNV analysis

Reply via email to