Hi. 2009/6/25 Lavinia Gordon <lavinia.gor...@mcri.edu.au>: > > Dear all > > A query re samples to use for a reference pool - the bigger the > better? I am not so certain. I have looked at a number of similar > samples repeatedly, and as these samples have arrived over a period of > time, so my pool of normals has grown. I have re-run a few analyses > altering the pool size, and the results are not always cleaner. My > pool samples have all been run at the same place, however they are > very heterogeneous. Does anyone have any thoughts/opinions on this? > Any suggested QC/plots to try and select the best reference samples > from my pool of ~100? [N.B. these are 6.0 chips].
When you say heterogeneous, do you mean that they contain a lot of CN aberrations or do you mean that they have different noise levels? If "not too many" things go on in your reference samples, then you should expect to get an improvement when you calculate the reference channel as the average over a larger and larger pool of samples. A few years ago I checked this on 500K data and I found a dramatic drop in SNRs when increasing from 5 to 10 to 20 reference samples and then i flattened out. However, if you look Nannya et al (2005), their CNAG method tries to identify a subset of reference samples that gives best SNRs. This is to say that you believe there is a set of reference samples that are more "normal" than others. An alternative argument, which may make even more sense is that there will always be some systematic effects remaining in the estimates and if you can locate a set of reference samples that have similar remaining effects as you test sample, they will cancel out better than if other reference samples where used. Note that this strategy uses different pools of references for each test sample. I know that the Broad Institute (Gaddy Getz, Scott Carter et al.) are doing something similar in the TCGA project and they say they get better SNRs. This is something I wanted to look into for quite a while, but there hasn't and there still isn't any time for me to do this. I think it is worth investigating how to obtain better reference signals from a pool of samples. Yet another useful project if someone has the time. At least this should be a start Henrik REFERENCES: [1] Nannya, Y.; Sanada, M.; Nakazaki, K.; Hosoya, N.; Wang, L.; Hangaishi, A.; Kurokawa, M.; Chiba, S.; Bailey, D. K.; Kennedy, G. C. & Ogawa, S., A robust algorithm for copy number detection using high-density oligonucleotide single nucleotide polymorphism genotyping arrays. #CancerRes#, 2005, 65, 6071-6079 PMID: 16024607. > > with thanks, > > Lavinia Gordon. > > > --~--~---------~--~----~------------~-------~--~----~ When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups "aroma.affymetrix" group. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe from this group, send email to aroma-affymetrix-unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/aroma-affymetrix?hl=en -~----------~----~----~----~------~----~------~--~---