[aroma.affymetrix] Re: Total Copy Number Analysis - Hardware requirements

2009-01-14 Thread Henrik Bengtsson

Hi.

On Wed, Jan 14, 2009 at 7:02 AM, connys  wrote:
>
> HI,
> I have read the documentation of this tool, but could not find
> anything regarding the hardware requirements (computational need). I
> want to analyze about 1000 samples affy 6.0  and I would also be
> interested to know of how this requirement scales up, let's say to the
> 10,000 samples.
> Does anyone have any idea/suggestions?

The short answer is that you can get by with ~1.5GB of RAM using any
operating system (this is pointed out on the front page).  Since
aroma.affymetrix access the file system quite a bit, you wish to have
a fast file system.  It is faster to work with data on a local HDD
than on a network-based file system.  Having a fast local drive will
help; don't know of any benchmarking but I guess the greater the HDD's
cache is, the better.  Of course, with more RAM, things will go a bit
faster (up to a certain level).  You might also want to get a 64-bit
OS for the future, although this is not (at all) required by
aroma.affymetrix. See also Page 'Improving processing time' in the
User Guide:

 http://groups.google.com/group/aroma-affymetrix/web/improving-processing-time


More details: Currently, (I believe) there is not a single method
implemented in aroma.affymetrix that is not bounded in memory
regardless of chiptype and number of samples.  In other words,
whatever method you apply, your memory usage will stay below a certain
upper limit.  People have reported that they've used aroma.affymetrix
to process 5000+ Affymetrix gene expression arrays without problems.
Note that we do this not by providing approximate algorithms, but more
clever designs of algorithms, so you will still get the same results
as if everything would be kept in memory.  The only exception to the
latter is our recent CrlmmModel for estimating genotypes according to
CRLMM; the CRLMM algorithm is hierarchical by nature and it is
therefore hard to design an exact estimator that is bounded in memory.
 To achieve bounded memory runs, SNPs are processed in chunks.   The
greater number of arrays modeled, the fewer number SNPs we have per
chunk, in order to keep the overall memory usage bounded.

Not sure what kind of GWS6 analysis you are planning, but the CRMA v2
method for estimating full-resolution raw CNs is a single-array method
by design, so it will definitely scale infinitely.  Even multi-array
methods such as CRMA (v1) should scale quite well, because of the
bounded algorithms used.

Hope this helps

/Henrik

>
>
> >
>

--~--~-~--~~~---~--~~
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
"aroma.affymetrix" group.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe from this group, send email to 
aroma-affymetrix-unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/aroma-affymetrix?hl=en
-~--~~~~--~~--~--~---



[aroma.affymetrix] Re: Wrong layout in plotAllelePairs

2009-01-14 Thread Pierre Neuvial

Hi again,

Henrik pointed to me that plotAllelePairs.AllelicCrosstalkCalibration
has an argument 'pairs' which can be used to specify the indexes of
the pairs that one wants to plot. In my case

plotAllelePairs(acc, array=array, pairs=1:6, what="input", xlim=xlim/3)

does the job.

Cheers,

Pierre


On Wed, Jan 14, 2009 at 11:56 AM, Pierre Neuvial
 wrote:
> Hi,
>
> I have followed the steps of the CRMAv2 vignette:
>
> http://groups.google.com/group/aroma-affymetrix/web/estimation-of-total-copy-numbers-using-the-crma-v2-method
>
> using aroma.affymetrix version 1.0.0 and I got the following warning:
>
>> plotAllelePairs(acc, array=array, what="input", xlim=xlim/3)
> Warning message:
> In matrix(seq(length = nbrOfPairs), nrow = floor(sqrt(nbrOfPairs))) :
>  data length [7] is not a sub-multiple or multiple of the number of rows [2]
>
> This is because there are now *7* allele probe pair groups, not 6:
>
>> names(getSetsOfProbes(acc)$snp)
> [1] "A/C" "A/G" "A/T" "C/G" "C/T" "G/T" "missing"
>
> As a result the plot layout is wrong:
>
>> nbrOfPairs <- length(getSetsOfProbes(acc)$snp)
>> matrix(seq(length = nbrOfPairs), nrow = floor(sqrt(nbrOfPairs)))
> [,1] [,2] [,3] [,4]
> [1,]1357
> [2,]2461
> Warning message:
> In matrix(seq(length = nbrOfPairs), nrow = floor(sqrt(nbrOfPairs))) :
>  data length [7] is not a sub-multiple or multiple of the number of rows [2]
>
> Maybe the easiest fix is to plot only the non "missing" allele pairs.
>
> Note that if one wants to use several central nucleotides instead of one, as 
> in
>
> alpha <- c(0.1, 0.075, 0.05, 0.03, 0.01, 0.0025, 1e-3, 1e-4);
> acc <- AllelicCrosstalkCalibration(csR, alpha=alpha, pairBy="sequence", B=3)
>
> the number of allele pairs grows quickly: in this particular case
> there are 96 allele pairs + 1 missing.
> So maybe we it would make sense to hardcode a not to complex layout,
> e. g. layout(matrix(seq(length=6, nrow=2))) ?
>
> Cheers,
>
> Pierre
>
>> sessionInfo()
> R version 2.8.1 (2008-12-22)
> i386-apple-darwin8.11.1
>
> locale:
> C
>
> attached base packages:
> [1] stats graphics  grDevices datasets  utils methods   base
>
> other attached packages:
>  [1] sfit_0.1.5 aroma.affymetrix_1.0.0 aroma.apd_0.1.3
>  [4] R.huge_0.1.6   affxparser_1.14.0  aroma.core_1.0.0
>  [7] aroma.light_1.9.2  digest_0.3.1   matrixStats_0.1.3
> [10] R.rsp_0.3.4R.cache_0.1.7  R.utils_1.1.3
> [13] R.oo_1.4.6 R.methodsS3_1.0.3
>

--~--~-~--~~~---~--~~
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
"aroma.affymetrix" group.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe from this group, send email to 
aroma-affymetrix-unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/aroma-affymetrix?hl=en
-~--~~~~--~~--~--~---



[aroma.affymetrix] Re: ufl and ugp files for na27?

2009-01-14 Thread Henrik Bengtsson

Hi Christian,

I'm quite swamped but I've added links the scripts that I used to
generate the existing NetAffx 26 (na26) UFL and UGP to each of the
different SNP & CN chip type pages, e.g.

  
http://groups.google.com/group/aroma-affymetrix/web/mapping250k-nsp-mapping250k-sty

More comments below.

On Mon, Jan 12, 2009 at 6:24 AM, cstratowa
 wrote:
>
> Dear Henrik
>
> Meanwhile I have created ufl and ugp files for both 100K and 500K
> arrays but not for GenomeWideSNP_6 aray.

The above scripts will show you how to do it for GenomeWideSNP_6.
Slightly more complicated since two NetAffx CSV files are involved.

>
> Can you confirm that the following code, which I use for both 100K and
> 500K arrays, is correct:
>
> # retrieving annotation files
> chiptypes <- c("Mapping50K_Hind240", "Mapping50K_Xba240")
> cdfs <- lapply(chiptypes, FUN=function(x){AffymetrixCdfFile$byChipType
> (x)})
> names(cdfs) <- chiptypes
> print(cdfs)
>
> # importing data from NetAffx CSV files
> csvs <- lapply(cdfs, FUN=function(cdf){AffymetrixNetAffxCsvFile
> $byChipType(getChipType(cdf), tags=".na27")})
> print(csvs)
>
> # allocating empty UFL (Unit Fragment Length) files
> ufls <- lapply(cdfs, FUN=function(cdf){AromaUflFile$allocateFromCdf
> (cdf, tags="na27,CS20090112")})
> print(ufls)
>
> # import SNP data
> units <- list();
> for (chipType in names(ufls)) {
>   ufl <- ufls[[chipType]];
>   csv <- csvs[[chipType]];
>   units[[chipType]] <- importFrom(ufl, csv, verbose=-50);
> }
> str(units)
>
> # allocating empty UGP (Unit Genome Position) files
> ugps <- lapply(cdfs, FUN=function(cdf){AromaUgpFile$allocateFromCdf
> (cdf, tags="na27,CS20090112")})
> print(ugps)
>
> # import SNP data
> units <- list();
> for (chipType in names(ugps)) {
>   ugp <- ugps[[chipType]];
>   csv <- csvs[[chipType]];
>   units[[chipType]] <- importFrom(ugp, csv, verbose=-50);
> }
> str(units)

This looks alright to me.  You might want to check toward the posted
NA26 scripts as well, because they are more recent.

>
>
> Here is the summary for the 100K arrays:
> # Summary 50K chips
>> str(units)
> List of 2
>  $ Mapping50K_Hind240: int [1:57244] 18632 18677 1631 18713 1630 18712
> 18619 1639 18722 18608 ...
>  $ Mapping50K_Xba240 : int [1:58960] 29181 18239 31302 19831 47750
> 45114 19103 39711 19772 37811 ...
>>
>> ufl <- AromaUflFile$byChipType(chiptypes[1], tags="na27,CS20090112");
>> print(summaryOfUnits(ufl, enzymes="HindIII"))
>   snp cnp affxSnp other total
> enzyme1-only 56933   0   0 0 56933
> missing311   0   055   366
> total57244   0   055 57299

The NA26 version gives:
   snp cnp affxSnp other total
enzyme1-only 56933   0   0 0 56933
missing311   0   055   366
total57244   0   055 57299


>> ufl <- AromaUflFile$byChipType(chiptypes[2], tags="na27,CS20090112");
>> print(summaryOfUnits(ufl, enzymes="XbaI"))
>   snp cnp affxSnp other total
> enzyme1-only 58616   0   0 0 58616
> missing344   0   055   399
> total58960   0   055 59015

NA26:
   snp cnp affxSnp other total
enzyme1-only 58616   0   0 0 58616
missing344   0   055   399
total58960   0   055 59015

>>
>> ugp <- AromaUgpFile$byChipType(chiptypes[1], tags="na27,CS20090112");
>> print(summary(ugp, enzymes="HindIII"))
>  chromosomeposition
>  Min.   :  1.000   Min.   :48603
>  1st Qu.:  4.000   1st Qu.: 34667112
>  Median :  7.000   Median : 72677620
>  Mean   :  8.402   Mean   : 80405004
>  3rd Qu.: 12.000   3rd Qu.:114826216
>  Max.   : 23.000   Max.   :246727435
>  NA's   :363.000   NA's   :  363

NA26:

> print(summary(ugp));
 chromosomeposition
 Min.   :  1.000   Min.   :48603
 1st Qu.:  4.000   1st Qu.: 34667112
 Median :  7.000   Median : 72677621
 Mean   :  8.402   Mean   : 80405004
 3rd Qu.: 12.000   3rd Qu.:114826216
 Max.   : 23.000   Max.   :246727435
 NA's   :363.000   NA's   :  363
> print(table(ugp[,1]));

   123456789   10   11   12   13   14   15   16

4541 5072 3962 4342 4215 3968 3444 3549 2357 2743 2466 2592 2661 1931 1440 1145

  17   18   19   20   21   22   23
 985 1731  326  993  883  433 1157

>> ugp <- AromaUgpFile$byChipType(chiptypes[2], tags="na27,CS20090112");
>> print(summary(ugp, enzymes="XbaI"))
>  chromosomeposition
>  Min.   :  1.000   Min.   :93683
>  1st Qu.:  4.000   1st Qu.: 34636629
>  Median :  7.000   Median : 72249739
>  Mean   :  8.507   Mean   : 80010574
>  3rd Qu.: 12.000   3rd Qu.:114666170
>  Max.   : 24.000   Max.   :246885089
>  NA's   :390.000   NA's   :  390

NA26:

> print(summary(ugp));
 chromosomeposition
 Min.   :  1.000   Min.   :93683
 1st Qu.:  4.000   1st Qu.: 34636629
 Median :  7.000   Median : 72249739
 Mean   :  8.507   Mean   : 80010574
 3rd Qu.: 12.000   3rd Qu.:114666170
 Max.   : 24.000   Max.   :246885089
 NA's   :390.000   NA's 

[aroma.affymetrix] Wrong layout in plotAllelePairs

2009-01-14 Thread Pierre Neuvial

Hi,

I have followed the steps of the CRMAv2 vignette:

http://groups.google.com/group/aroma-affymetrix/web/estimation-of-total-copy-numbers-using-the-crma-v2-method

using aroma.affymetrix version 1.0.0 and I got the following warning:

> plotAllelePairs(acc, array=array, what="input", xlim=xlim/3)
Warning message:
In matrix(seq(length = nbrOfPairs), nrow = floor(sqrt(nbrOfPairs))) :
  data length [7] is not a sub-multiple or multiple of the number of rows [2]

This is because there are now *7* allele probe pair groups, not 6:

> names(getSetsOfProbes(acc)$snp)
[1] "A/C" "A/G" "A/T" "C/G" "C/T" "G/T" "missing"

As a result the plot layout is wrong:

> nbrOfPairs <- length(getSetsOfProbes(acc)$snp)
> matrix(seq(length = nbrOfPairs), nrow = floor(sqrt(nbrOfPairs)))
 [,1] [,2] [,3] [,4]
[1,]1357
[2,]2461
Warning message:
In matrix(seq(length = nbrOfPairs), nrow = floor(sqrt(nbrOfPairs))) :
  data length [7] is not a sub-multiple or multiple of the number of rows [2]

Maybe the easiest fix is to plot only the non "missing" allele pairs.

Note that if one wants to use several central nucleotides instead of one, as in

alpha <- c(0.1, 0.075, 0.05, 0.03, 0.01, 0.0025, 1e-3, 1e-4);
acc <- AllelicCrosstalkCalibration(csR, alpha=alpha, pairBy="sequence", B=3)

the number of allele pairs grows quickly: in this particular case
there are 96 allele pairs + 1 missing.
So maybe we it would make sense to hardcode a not to complex layout,
e. g. layout(matrix(seq(length=6, nrow=2))) ?

Cheers,

Pierre

> sessionInfo()
R version 2.8.1 (2008-12-22)
i386-apple-darwin8.11.1

locale:
C

attached base packages:
[1] stats graphics  grDevices datasets  utils methods   base

other attached packages:
 [1] sfit_0.1.5 aroma.affymetrix_1.0.0 aroma.apd_0.1.3
 [4] R.huge_0.1.6   affxparser_1.14.0  aroma.core_1.0.0
 [7] aroma.light_1.9.2  digest_0.3.1   matrixStats_0.1.3
[10] R.rsp_0.3.4R.cache_0.1.7  R.utils_1.1.3
[13] R.oo_1.4.6 R.methodsS3_1.0.3

--~--~-~--~~~---~--~~
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
"aroma.affymetrix" group.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe from this group, send email to 
aroma-affymetrix-unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/aroma-affymetrix?hl=en
-~--~~~~--~~--~--~---



[aroma.affymetrix] Total Copy Number Analysis - Hardware requirements

2009-01-14 Thread connys

HI,
I have read the documentation of this tool, but could not find
anything regarding the hardware requirements (computational need). I
want to analyze about 1000 samples affy 6.0  and I would also be
interested to know of how this requirement scales up, let's say to the
10,000 samples.
Does anyone have any idea/suggestions?


--~--~-~--~~~---~--~~
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
"aroma.affymetrix" group.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe from this group, send email to 
aroma-affymetrix-unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/aroma-affymetrix?hl=en
-~--~~~~--~~--~--~---