[aroma.affymetrix] Total Copy Number Analysis - Hardware requirements

2009-01-14 Thread connys

HI,
I have read the documentation of this tool, but could not find
anything regarding the hardware requirements (computational need). I
want to analyze about 1000 samples affy 6.0  and I would also be
interested to know of how this requirement scales up, let's say to the
10,000 samples.
Does anyone have any idea/suggestions?


--~--~-~--~~~---~--~~
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
aroma.affymetrix group.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe from this group, send email to 
aroma-affymetrix-unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/aroma-affymetrix?hl=en
-~--~~~~--~~--~--~---



[aroma.affymetrix] Wrong layout in plotAllelePairs

2009-01-14 Thread Pierre Neuvial

Hi,

I have followed the steps of the CRMAv2 vignette:

http://groups.google.com/group/aroma-affymetrix/web/estimation-of-total-copy-numbers-using-the-crma-v2-method

using aroma.affymetrix version 1.0.0 and I got the following warning:

 plotAllelePairs(acc, array=array, what=input, xlim=xlim/3)
Warning message:
In matrix(seq(length = nbrOfPairs), nrow = floor(sqrt(nbrOfPairs))) :
  data length [7] is not a sub-multiple or multiple of the number of rows [2]

This is because there are now *7* allele probe pair groups, not 6:

 names(getSetsOfProbes(acc)$snp)
[1] A/C A/G A/T C/G C/T G/T missing

As a result the plot layout is wrong:

 nbrOfPairs - length(getSetsOfProbes(acc)$snp)
 matrix(seq(length = nbrOfPairs), nrow = floor(sqrt(nbrOfPairs)))
 [,1] [,2] [,3] [,4]
[1,]1357
[2,]2461
Warning message:
In matrix(seq(length = nbrOfPairs), nrow = floor(sqrt(nbrOfPairs))) :
  data length [7] is not a sub-multiple or multiple of the number of rows [2]

Maybe the easiest fix is to plot only the non missing allele pairs.

Note that if one wants to use several central nucleotides instead of one, as in

alpha - c(0.1, 0.075, 0.05, 0.03, 0.01, 0.0025, 1e-3, 1e-4);
acc - AllelicCrosstalkCalibration(csR, alpha=alpha, pairBy=sequence, B=3)

the number of allele pairs grows quickly: in this particular case
there are 96 allele pairs + 1 missing.
So maybe we it would make sense to hardcode a not to complex layout,
e. g. layout(matrix(seq(length=6, nrow=2))) ?

Cheers,

Pierre

 sessionInfo()
R version 2.8.1 (2008-12-22)
i386-apple-darwin8.11.1

locale:
C

attached base packages:
[1] stats graphics  grDevices datasets  utils methods   base

other attached packages:
 [1] sfit_0.1.5 aroma.affymetrix_1.0.0 aroma.apd_0.1.3
 [4] R.huge_0.1.6   affxparser_1.14.0  aroma.core_1.0.0
 [7] aroma.light_1.9.2  digest_0.3.1   matrixStats_0.1.3
[10] R.rsp_0.3.4R.cache_0.1.7  R.utils_1.1.3
[13] R.oo_1.4.6 R.methodsS3_1.0.3

--~--~-~--~~~---~--~~
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
aroma.affymetrix group.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe from this group, send email to 
aroma-affymetrix-unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/aroma-affymetrix?hl=en
-~--~~~~--~~--~--~---



[aroma.affymetrix] Re: ufl and ugp files for na27?

2009-01-14 Thread Henrik Bengtsson

Hi Christian,

I'm quite swamped but I've added links the scripts that I used to
generate the existing NetAffx 26 (na26) UFL and UGP to each of the
different SNP  CN chip type pages, e.g.

  
http://groups.google.com/group/aroma-affymetrix/web/mapping250k-nsp-mapping250k-sty

More comments below.

On Mon, Jan 12, 2009 at 6:24 AM, cstratowa
christian.strat...@vie.boehringer-ingelheim.com wrote:

 Dear Henrik

 Meanwhile I have created ufl and ugp files for both 100K and 500K
 arrays but not for GenomeWideSNP_6 aray.

The above scripts will show you how to do it for GenomeWideSNP_6.
Slightly more complicated since two NetAffx CSV files are involved.


 Can you confirm that the following code, which I use for both 100K and
 500K arrays, is correct:

 # retrieving annotation files
 chiptypes - c(Mapping50K_Hind240, Mapping50K_Xba240)
 cdfs - lapply(chiptypes, FUN=function(x){AffymetrixCdfFile$byChipType
 (x)})
 names(cdfs) - chiptypes
 print(cdfs)

 # importing data from NetAffx CSV files
 csvs - lapply(cdfs, FUN=function(cdf){AffymetrixNetAffxCsvFile
 $byChipType(getChipType(cdf), tags=.na27)})
 print(csvs)

 # allocating empty UFL (Unit Fragment Length) files
 ufls - lapply(cdfs, FUN=function(cdf){AromaUflFile$allocateFromCdf
 (cdf, tags=na27,CS20090112)})
 print(ufls)

 # import SNP data
 units - list();
 for (chipType in names(ufls)) {
   ufl - ufls[[chipType]];
   csv - csvs[[chipType]];
   units[[chipType]] - importFrom(ufl, csv, verbose=-50);
 }
 str(units)

 # allocating empty UGP (Unit Genome Position) files
 ugps - lapply(cdfs, FUN=function(cdf){AromaUgpFile$allocateFromCdf
 (cdf, tags=na27,CS20090112)})
 print(ugps)

 # import SNP data
 units - list();
 for (chipType in names(ugps)) {
   ugp - ugps[[chipType]];
   csv - csvs[[chipType]];
   units[[chipType]] - importFrom(ugp, csv, verbose=-50);
 }
 str(units)

This looks alright to me.  You might want to check toward the posted
NA26 scripts as well, because they are more recent.



 Here is the summary for the 100K arrays:
 # Summary 50K chips
 str(units)
 List of 2
  $ Mapping50K_Hind240: int [1:57244] 18632 18677 1631 18713 1630 18712
 18619 1639 18722 18608 ...
  $ Mapping50K_Xba240 : int [1:58960] 29181 18239 31302 19831 47750
 45114 19103 39711 19772 37811 ...

 ufl - AromaUflFile$byChipType(chiptypes[1], tags=na27,CS20090112);
 print(summaryOfUnits(ufl, enzymes=HindIII))
   snp cnp affxSnp other total
 enzyme1-only 56933   0   0 0 56933
 missing311   0   055   366
 total57244   0   055 57299

The NA26 version gives:
   snp cnp affxSnp other total
enzyme1-only 56933   0   0 0 56933
missing311   0   055   366
total57244   0   055 57299


 ufl - AromaUflFile$byChipType(chiptypes[2], tags=na27,CS20090112);
 print(summaryOfUnits(ufl, enzymes=XbaI))
   snp cnp affxSnp other total
 enzyme1-only 58616   0   0 0 58616
 missing344   0   055   399
 total58960   0   055 59015

NA26:
   snp cnp affxSnp other total
enzyme1-only 58616   0   0 0 58616
missing344   0   055   399
total58960   0   055 59015


 ugp - AromaUgpFile$byChipType(chiptypes[1], tags=na27,CS20090112);
 print(summary(ugp, enzymes=HindIII))
  chromosomeposition
  Min.   :  1.000   Min.   :48603
  1st Qu.:  4.000   1st Qu.: 34667112
  Median :  7.000   Median : 72677620
  Mean   :  8.402   Mean   : 80405004
  3rd Qu.: 12.000   3rd Qu.:114826216
  Max.   : 23.000   Max.   :246727435
  NA's   :363.000   NA's   :  363

NA26:

 print(summary(ugp));
 chromosomeposition
 Min.   :  1.000   Min.   :48603
 1st Qu.:  4.000   1st Qu.: 34667112
 Median :  7.000   Median : 72677621
 Mean   :  8.402   Mean   : 80405004
 3rd Qu.: 12.000   3rd Qu.:114826216
 Max.   : 23.000   Max.   :246727435
 NA's   :363.000   NA's   :  363
 print(table(ugp[,1]));

   123456789   10   11   12   13   14   15   16

4541 5072 3962 4342 4215 3968 3444 3549 2357 2743 2466 2592 2661 1931 1440 1145

  17   18   19   20   21   22   23
 985 1731  326  993  883  433 1157

 ugp - AromaUgpFile$byChipType(chiptypes[2], tags=na27,CS20090112);
 print(summary(ugp, enzymes=XbaI))
  chromosomeposition
  Min.   :  1.000   Min.   :93683
  1st Qu.:  4.000   1st Qu.: 34636629
  Median :  7.000   Median : 72249739
  Mean   :  8.507   Mean   : 80010574
  3rd Qu.: 12.000   3rd Qu.:114666170
  Max.   : 24.000   Max.   :246885089
  NA's   :390.000   NA's   :  390

NA26:

 print(summary(ugp));
 chromosomeposition
 Min.   :  1.000   Min.   :93683
 1st Qu.:  4.000   1st Qu.: 34636629
 Median :  7.000   Median : 72249739
 Mean   :  8.507   Mean   : 80010574
 3rd Qu.: 12.000   3rd Qu.:114666170
 Max.   : 24.000   Max.   :246885089
 NA's   :390.000   NA's   :  390
 print(table(ugp[,1]));

   123456789   10   11   12   13  

[aroma.affymetrix] Re: Wrong layout in plotAllelePairs

2009-01-14 Thread Pierre Neuvial

Hi again,

Henrik pointed to me that plotAllelePairs.AllelicCrosstalkCalibration
has an argument 'pairs' which can be used to specify the indexes of
the pairs that one wants to plot. In my case

plotAllelePairs(acc, array=array, pairs=1:6, what=input, xlim=xlim/3)

does the job.

Cheers,

Pierre


On Wed, Jan 14, 2009 at 11:56 AM, Pierre Neuvial
pie...@stat.berkeley.edu wrote:
 Hi,

 I have followed the steps of the CRMAv2 vignette:

 http://groups.google.com/group/aroma-affymetrix/web/estimation-of-total-copy-numbers-using-the-crma-v2-method

 using aroma.affymetrix version 1.0.0 and I got the following warning:

 plotAllelePairs(acc, array=array, what=input, xlim=xlim/3)
 Warning message:
 In matrix(seq(length = nbrOfPairs), nrow = floor(sqrt(nbrOfPairs))) :
  data length [7] is not a sub-multiple or multiple of the number of rows [2]

 This is because there are now *7* allele probe pair groups, not 6:

 names(getSetsOfProbes(acc)$snp)
 [1] A/C A/G A/T C/G C/T G/T missing

 As a result the plot layout is wrong:

 nbrOfPairs - length(getSetsOfProbes(acc)$snp)
 matrix(seq(length = nbrOfPairs), nrow = floor(sqrt(nbrOfPairs)))
 [,1] [,2] [,3] [,4]
 [1,]1357
 [2,]2461
 Warning message:
 In matrix(seq(length = nbrOfPairs), nrow = floor(sqrt(nbrOfPairs))) :
  data length [7] is not a sub-multiple or multiple of the number of rows [2]

 Maybe the easiest fix is to plot only the non missing allele pairs.

 Note that if one wants to use several central nucleotides instead of one, as 
 in

 alpha - c(0.1, 0.075, 0.05, 0.03, 0.01, 0.0025, 1e-3, 1e-4);
 acc - AllelicCrosstalkCalibration(csR, alpha=alpha, pairBy=sequence, B=3)

 the number of allele pairs grows quickly: in this particular case
 there are 96 allele pairs + 1 missing.
 So maybe we it would make sense to hardcode a not to complex layout,
 e. g. layout(matrix(seq(length=6, nrow=2))) ?

 Cheers,

 Pierre

 sessionInfo()
 R version 2.8.1 (2008-12-22)
 i386-apple-darwin8.11.1

 locale:
 C

 attached base packages:
 [1] stats graphics  grDevices datasets  utils methods   base

 other attached packages:
  [1] sfit_0.1.5 aroma.affymetrix_1.0.0 aroma.apd_0.1.3
  [4] R.huge_0.1.6   affxparser_1.14.0  aroma.core_1.0.0
  [7] aroma.light_1.9.2  digest_0.3.1   matrixStats_0.1.3
 [10] R.rsp_0.3.4R.cache_0.1.7  R.utils_1.1.3
 [13] R.oo_1.4.6 R.methodsS3_1.0.3


--~--~-~--~~~---~--~~
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
aroma.affymetrix group.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe from this group, send email to 
aroma-affymetrix-unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/aroma-affymetrix?hl=en
-~--~~~~--~~--~--~---



[aroma.affymetrix] Re: Total Copy Number Analysis - Hardware requirements

2009-01-14 Thread Henrik Bengtsson

Hi.

On Wed, Jan 14, 2009 at 7:02 AM, connys cornelia.sche...@googlemail.com wrote:

 HI,
 I have read the documentation of this tool, but could not find
 anything regarding the hardware requirements (computational need). I
 want to analyze about 1000 samples affy 6.0  and I would also be
 interested to know of how this requirement scales up, let's say to the
 10,000 samples.
 Does anyone have any idea/suggestions?

The short answer is that you can get by with ~1.5GB of RAM using any
operating system (this is pointed out on the front page).  Since
aroma.affymetrix access the file system quite a bit, you wish to have
a fast file system.  It is faster to work with data on a local HDD
than on a network-based file system.  Having a fast local drive will
help; don't know of any benchmarking but I guess the greater the HDD's
cache is, the better.  Of course, with more RAM, things will go a bit
faster (up to a certain level).  You might also want to get a 64-bit
OS for the future, although this is not (at all) required by
aroma.affymetrix. See also Page 'Improving processing time' in the
User Guide:

 http://groups.google.com/group/aroma-affymetrix/web/improving-processing-time


More details: Currently, (I believe) there is not a single method
implemented in aroma.affymetrix that is not bounded in memory
regardless of chiptype and number of samples.  In other words,
whatever method you apply, your memory usage will stay below a certain
upper limit.  People have reported that they've used aroma.affymetrix
to process 5000+ Affymetrix gene expression arrays without problems.
Note that we do this not by providing approximate algorithms, but more
clever designs of algorithms, so you will still get the same results
as if everything would be kept in memory.  The only exception to the
latter is our recent CrlmmModel for estimating genotypes according to
CRLMM; the CRLMM algorithm is hierarchical by nature and it is
therefore hard to design an exact estimator that is bounded in memory.
 To achieve bounded memory runs, SNPs are processed in chunks.   The
greater number of arrays modeled, the fewer number SNPs we have per
chunk, in order to keep the overall memory usage bounded.

Not sure what kind of GWS6 analysis you are planning, but the CRMA v2
method for estimating full-resolution raw CNs is a single-array method
by design, so it will definitely scale infinitely.  Even multi-array
methods such as CRMA (v1) should scale quite well, because of the
bounded algorithms used.

Hope this helps

/Henrik



 


--~--~-~--~~~---~--~~
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
aroma.affymetrix group.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe from this group, send email to 
aroma-affymetrix-unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/aroma-affymetrix?hl=en
-~--~~~~--~~--~--~---