> I was hoping to get some general advice. I know CRMAv2 is a single
> array method and thus makes processing different arrays in parallel
> possible.
> I was wondering how you would setup the rawData and annotationData
> directories when doing multi-staged analysis.
> For instance, for my project I have 20 patients.  I will be getting
> the data for the first 10 patients immediately, and then the data from
> the remaining 10 patients a few weeks later.  All experiments will be
> performed on the same chipType.
> I was thinking on this structure:
> --rawData/
>         --patient01/GenomeWideSNP_6
>         --patient02/GenomeWideSNP_6/
>         --patient03/GenomeWideSNP_6/
>         ...
>         --patient20/GenomeWideSNP_6/

Even if they come singly, I would treat those samples as being part of
the same data set, e.g.


A rule of thumb is that when you in the future would redo the same
analysis, that is how would set it up.

> And then preprocessing each patient separately in an R session with:
>        dataSet <- "patient01";
>        chipType <- "GenomeWideSNP_6";
>        cdf <- AffymetrixCdfFile$byChipType(chipType, tags="Full");
>        dsList <- doCRMAv2(dataSet, cdf=cdf, combineAlleles=FALSE,
> verbose=verbose);

So, if you add them all to the same data set as they come in, just
rerun the above; already processed arrays will be detected and
"skipped".  The 'dsList' at the end will contain all arrays currently
exist in the data set.

FYI, there is doASCRMAv2(), so that you do not have to specify
'combineAlleles', i.e.

dsList <- doASCRMAv2(dataSet, cdf=cdf, verbose=verbose);

Note also that you can do:

csR <- AffymetrixCelSet$byName("MyDataSet", cdf=cdf);
dsList <- doASCRMAv2(csR, verbose=verbose);

which is the same but more explicit, and you have the option to subset
'csR'.  That is, if you want to process different arrays on different
machines (which is what the subject of your message indicates), then
you can do for instance:

subset <- c(5,6,7);
csR <- AffymetrixCelSet$byName("MyDataSet", cdf=cdf);
csR <- extract(csR, subset);
dsList <- doASCRMAv2(csR, verbose=verbose);

If you batch process this, you can pass command line arguments to your
script and use commandArgs() to get them, i.e. you can set 'subset'
this way.

> Once I get all the arrays and have preprocessed them I would like to
> segment the data using CBS.  The first 10 patients are normal and the
> last 10 diseased -- i.e. a tumor-normal arrays for each sample.
> However, since I processed each array individually each would have
> their own AromaUnitTotalCnBinarySet.  Would I just read each in
> individually, and then manipulate it in order to create the necessary
> matching of normal over tumor needed for the CBS algorithm?

So, with the above suggestion of mine, this will not be an issue.

(FYI, one can use append() to merge data sets).

> If down the road we get another 20 arrays again with tumor normal
> samples how would I integrate these new arrays with my previous
> arrays?

As above.


> Just create additional directories:
> --rawData/
>         --patient21/GenomeWideSNP_6
>         --patient22/GenomeWideSNP_6/
>         --patient23/GenomeWideSNP_6/
>         ...
>         --patient40/GenomeWideSNP_6/
> I hope I explained my question reasonably clear.
> Thanks, Greg
> When reporting problems on aroma.affymetrix, make sure 1) to run the latest
> version of the package, 2) to report the output of sessionInfo() and
> traceback(), and 3) to post a complete code example.
> You received this message because you are subscribed to the Google Groups
> "aroma.affymetrix" group with website http://www.aroma-project.org/.
> To post to this group, send email to aroma-affymetrix@googlegroups.com
> To unsubscribe and other options, go to http://www.aroma-project.org/forum/

