[aroma.affymetrix] Re: GWS_6 hg19

2011-05-30 Thread cstratowa
Dear Henrik,

On May 27, 9:03 pm, Henrik Bengtsson  wrote:
> On Fri, May 27, 2011 at 4:02 AM, cstratowa
>
>  wrote:
> > Dear Henrik,
>
> > Thank you for this clarification.
>
> > Since I am still using an older version of aroma.affymetrix which does
> > not contain parameter 'lengthRange' I cannot specify the fragment
> > length range.
>
> Just curious, what's the reason for not updating?  It's should be a
> matter of seconds.


Due to our Firewall settings we have to download all packages
manually. Furthermore, we have to install the packages on all of our
servers and on the cluster, which I am using to distribute the GLAD
jobs.


>
> > However, at the moment I do not see any disadvantage in
> > not specifying lengthRange. When I compare the the CN ranges obtained
> > with GLAD between na30 and na31 there are only slight differences due
> > to the different annotations. What is your opinion?
>
> The UGP file that contain genomic positions are probably the same.
>
> If you are using the FragmentLengthNormalization with the na31 UFLs
> provided online, I'm rather sure that you do need to set 'lengthRange'
> otherwise you'll get an "identifiability" error, cf. thread 'CRMA v2
> failed at FragmentLengthNormalization for Mouse diversity array'
> started on 2011-01-13:
>
>  https://groups.google.com/forum/#!topic/aroma-affymetrix/iRJ7Fggi0JY
>
> Details on the fragment-lengths in na31 UFL provided online:
>
> > cdf <- AffymetrixCdfFile$byChipType("GenomeWideSNP_6,Full");
> > ufl <- getAromaUflFile(cdf);
> > ufl
>
> AromaUflFile:
> Name: GenomeWideSNP_6
> Tags: Full,na31,hg19,HB20110328
> Full name: GenomeWideSNP_6,Full,na31,hg19,HB20110328
> Pathname: 
> annotationData/chipTypes/GenomeWideSNP_6/GenomeWideSNP_6,Full,na31,hg19,HB20110328.ufl
> File size: 7.18 MB (7526452 bytes)
> RAM: 0.00 MB
> Number of data rows: 1881415
> File format: v1
> Dimensions: 1881415x2
> Column classes: integer, integer
> Number of bytes per column: 2, 2
> Footer: 20110328 19:41:33 
> PDTAffymetrix rm>GenomeWideSNP_6,FullHenrik 
> Bengtsso
> n[...]<
> filename>GenomeWideSNP_6,Full.CDF493291745 sum>3fbe0f6e7c8a346105238a3f3d10d4ecGe
> nomeWideSNP_6.na31.annot.csv1501563495
> 3f59efe386ffc1af656d01be0ab035e9Genome
> WideSNP_6.cn.na31.annot.csv8792574806a
> 21fef62c71fb56bc5dda4499b70a7f
> Chip type: GenomeWideSNP_6,Full
> Platform: Affymetrix
>
> > summaryOfUnits(ufl)
>
>                 snp    cnp affxSnp other   total
> enzyme1-only    577   9609       0     0   10186
> enzyme2-only    502  23432       0     0   23934
> both         929322 912574       0     0 1841896
> missing        1545    211    3022   621    5399
> total        931946 945826    3022   621 1881415
>
> The above summary show that there are very few SNPs that exists only
> fragments generated by only one of the two enzymes.  The estimator of
> the fragment-length normalization model requires such SNPs.  By
> filtering by length using argument 'lengthRange' more and more SNPs
> will have a length value on one enzyme only.


Although I have updated my package to be able to use CRMAv2, I am
still using CRMA for the SNP6.0 arrays, i.e. correcting for allelic
cross-talk and for fragment length effects. The reason for this is
that I did some testing by comparing CRMAv2 and CRMA for SNP6.0 with
real datasets, and the only difference that I saw when looking e.g. at
ChromosomeExplorer was that there was a slight decrease in CN values
when using CRMAv2. Thus I preferred to stay with CRMA.

I am not sure if the fragment length correction for SNP6.0 depends on
my choice of using CRMA instead of CRMAv2. In any case not filtering
for fragment length does not seem to affect my CN computations. Do you
have any comments?

Best regards
Christian


>
> Hope this clarifies
>
> /Henrik
>
>
>
> > Best regards
> > Christian
>
> > On May 26, 11:01 pm, Henrik Bengtsson  > project.org> wrote:
> >> Hi.
>
> >> On Thu, May 26, 2011 at 4:05 AM, cstratowa
>
> >>  wrote:
> >> > Dear Henrik,
>
> >> > Thank you for the update.
>
> >> > Meanwhile I have already created the ufl/ugp files by simply modifying
> >> > your R-scripts GenomeWideSNP_6,UFL,na30.R  and
> >> > GenomeWideSNP_6,UGP,na30.R.
>
> >> > Thus, my question is: Did you do something special or do you still use
> >> > these R-scripts?
>
> >> Nope, nothing special, except that I made sure that the file footer
> >> reflects the fact that the date and my initials are correct.
>
> &g

[aroma.affymetrix] Re: GWS_6 hg19

2011-05-27 Thread cstratowa
Dear Henrik,

Thank you for this clarification.

Since I am still using an older version of aroma.affymetrix which does
not contain parameter 'lengthRange' I cannot specify the fragment
length range. However, at the moment I do not see any disadvantage in
not specifying lengthRange. When I compare the the CN ranges obtained
with GLAD between na30 and na31 there are only slight differences due
to the different annotations. What is your opinion?

Best regards
Christian

On May 26, 11:01 pm, Henrik Bengtsson  wrote:
> Hi.
>
> On Thu, May 26, 2011 at 4:05 AM, cstratowa
>
>  wrote:
> > Dear Henrik,
>
> > Thank you for the update.
>
> > Meanwhile I have already created the ufl/ugp files by simply modifying
> > your R-scripts GenomeWideSNP_6,UFL,na30.R  and
> > GenomeWideSNP_6,UGP,na30.R.
>
> > Thus, my question is: Did you do something special or do you still use
> > these R-scripts?
>
> Nope, nothing special, except that I made sure that the file footer
> reflects the fact that the date and my initials are correct.
>
> One more thing specific to the na30 -> na31 transition: Affymetrix no
> longer filter the PCR fragment-length data, which means that the new
> UFL files now contains fragment lengths of a large range.  You now
> have to specify the range as FragmentLengthNormalization(...,
> lengthRange=c(400,2000)).
>
>
>
> > BTW, on your new web-site I can no longer find these scripts. Are they
> > still available?
>
> Yes, I did remove the links from the website because it is too much
> work to also keep those links up-to-date with everything else.
> Instead, I have added the archive of all build scripts (sorted by chip
> type and NA revision) to the package itself so that they will be
> accessible via:
>
>   path <- system.file("buildScripts/", package="aroma.affymetrix");
>
> This will be available in the next release of aroma.affymetrix.  Until
> then, you can also grab/browse them via:
>
> https://r-forge.r-project.org/scm/viewvc.php/pkg/aroma.affymetrix/ins...
>
> As you'll see, that for GenomeWideSNP_6 I am moving towards setting up
> somewhat interactive user scripts for doing this.  You may have to
> install the R.menu package, i.e.
>
> source("http://aroma-project.org/hbLite.R";);
> hbLite("R.menu");
>
> Hope this helps
>
> Henrik
>
>
>
> > Best regards
> > Christian
>
> > On May 26, 3:36 am, Henrik Bengtsson  > project.org> wrote:
> >> A late follow up/to close this thread:
>
> >> The requested na31/hg19 annotation files are now available:
>
> >>http://aroma-project.org/chipTypes/GenomeWideSNP_6
>
> >> /Henrik
>
> >> On Mon, Mar 28, 2011 at 6:48 AM, cstratowa
>
> >>  wrote:
> >> > Dear Henrik,
>
> >> > I am also looking for the new (old) na31 annotation files for SNP6.
>
> >> > I have seen that for the Nsp/Sty arrays you have put the na31 (hg19)
> >> > ufl/ugp files on the Chip types web-site for download.
>
> >> > However, the web-site for SNP6 still contains the old na30 (hg18) ufl/
> >> > ugp files.
> >> > Is it possible to update this site to the new na31 (hg19) ufl/ugp
> >> > files, too?
>
> >> > Best regards
> >> > Christian
>
> >> > On Feb 2, 9:39 am, Mete Civelek  wrote:
> >> >> Hi All,
>
> >> >> A few weeks ago there was a discussion about ugp and ufl files based
> >> >> on the hg19 annotation. Are those files available for share?
>
> >> >> Thank you
>
> >> > --
> >> > When reporting problems on aroma.affymetrix, make sure 1) to run the 
> >> > latest version of the package, 2) to report the output of sessionInfo() 
> >> > and traceback(), and 3) to post a complete code example.
>
> >> > You received this message because you are subscribed to the Google 
> >> > Groups "aroma.affymetrix" group with 
> >> > websitehttp://www.aroma-project.org/.
> >> > To post to this group, send email to aroma-affymetrix@googlegroups.com
> >> > To unsubscribe and other options, go 
> >> > tohttp://www.aroma-project.org/forum/
>
> > --
> > When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
> > version of the package, 2) to report the output of sessionInfo() and 
> > traceback(), and 3) to post a complete code example.
>
> > You received this message because you are subscribed to the Google Groups 
> > "aroma.affymetrix" group with websitehttp://www.aroma-project.org/.
> > To post to this group, send email to aroma-affymetrix@googlegroups.com
> > To unsubscribe and other options, go tohttp://www.aroma-project.org/forum/

-- 
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
"aroma.affymetrix" group with website http://www.aroma-project.org/.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe and other options, go to http://www.aroma-project.org/forum/


[aroma.affymetrix] Re: GWS_6 hg19

2011-05-26 Thread cstratowa
Dear Henrik,

Thank you for the update.

Meanwhile I have already created the ufl/ugp files by simply modifying
your R-scripts GenomeWideSNP_6,UFL,na30.R  and
GenomeWideSNP_6,UGP,na30.R.

Thus, my question is: Did you do something special or do you still use
these R-scripts?

BTW, on your new web-site I can no longer find these scripts. Are they
still available?

Best regards
Christian


On May 26, 3:36 am, Henrik Bengtsson  wrote:
> A late follow up/to close this thread:
>
> The requested na31/hg19 annotation files are now available:
>
> http://aroma-project.org/chipTypes/GenomeWideSNP_6
>
> /Henrik
>
> On Mon, Mar 28, 2011 at 6:48 AM, cstratowa
>
>  wrote:
> > Dear Henrik,
>
> > I am also looking for the new (old) na31 annotation files for SNP6.
>
> > I have seen that for the Nsp/Sty arrays you have put the na31 (hg19)
> > ufl/ugp files on the Chip types web-site for download.
>
> > However, the web-site for SNP6 still contains the old na30 (hg18) ufl/
> > ugp files.
> > Is it possible to update this site to the new na31 (hg19) ufl/ugp
> > files, too?
>
> > Best regards
> > Christian
>
> > On Feb 2, 9:39 am, Mete Civelek  wrote:
> >> Hi All,
>
> >> A few weeks ago there was a discussion about ugp and ufl files based
> >> on the hg19 annotation. Are those files available for share?
>
> >> Thank you
>
> > --
> > When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
> > version of the package, 2) to report the output of sessionInfo() and 
> > traceback(), and 3) to post a complete code example.
>
> > You received this message because you are subscribed to the Google Groups 
> > "aroma.affymetrix" group with websitehttp://www.aroma-project.org/.
> > To post to this group, send email to aroma-affymetrix@googlegroups.com
> > To unsubscribe and other options, go tohttp://www.aroma-project.org/forum/

-- 
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
"aroma.affymetrix" group with website http://www.aroma-project.org/.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe and other options, go to http://www.aroma-project.org/forum/


[aroma.affymetrix] Re: GWS_6 hg19

2011-03-28 Thread cstratowa
Dear Henrik,

I am also looking for the new (old) na31 annotation files for SNP6.

I have seen that for the Nsp/Sty arrays you have put the na31 (hg19)
ufl/ugp files on the Chip types web-site for download.

However, the web-site for SNP6 still contains the old na30 (hg18) ufl/
ugp files.
Is it possible to update this site to the new na31 (hg19) ufl/ugp
files, too?

Best regards
Christian

On Feb 2, 9:39 am, Mete Civelek  wrote:
> Hi All,
>
> A few weeks ago there was a discussion about ugp and ufl files based
> on the hg19 annotation. Are those files available for share?
>
> Thank you

-- 
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
"aroma.affymetrix" group with website http://www.aroma-project.org/.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe and other options, go to http://www.aroma-project.org/forum/


[aroma.affymetrix] Re: Problem with GLAD on linux cluster

2010-08-04 Thread cstratowa
Dear Henrik,

Thank you, this discussion was very helpful for me.

Your suggestion to compute getAverageFile() first is a good idea, I
will try it when I find time to change my code again. It is also good
to know that I can change ".Rcache" by using setCacheRootPath().

Best regards
Christian


On Aug 4, 10:21 am, Henrik Bengtsson 
wrote:
> Hi Christian,
>
> On Wed, Aug 4, 2010 at 9:04 AM, cstratowa
>
>  wrote:
> > Dear Henrik,
>
> > Thank you for your suggestion to use ceRef directly.
>
> > Regarding your explanation of getAverageFile() the question is where
> > the generated output will be saved.
>
> > As I have mentioned, each node creates first a plmData subdirectory,
> > e.g. "Prostate/Prostate21/plmData" and makes symbolic links to the
> > normalized CEL-files located in "Prostate/plmData". Thus the output of
> > getAverageFile() should be stored for each node separately.
>
> Ah, now I see; I've been reading it as you were linking the
> directories, not the individual CEL files.
>
>
>
> > This seems indeed to be the case, since e.g. the subdirectory
> > "Prostate/Prostate21/plmData/Prostate,ACC,-XY,QN,RMA,A+B,FLN,-XY/
> > Mapping250K_Nsp" contains the file ".average-intensities-median-
> > mad,a1c33926939ee43fbed83ae69301d215.CEL" created at a certain time
> > while subdirectory "Prostate/Prostate8/plmData/Prostate,ACC,-
> > XY,QN,RMA,A+B,FLN,-XY/Mapping250K_Nsp" contains a file with the same
> > name, i.e. ".average-intensities-median-
> > mad,a1c33926939ee43fbed83ae69301d215.CEL" created at a different
> > time.
>
> Yes.
>
> As I understand it now, you preprocess all of the data, and wait for
> everything to be done (all *,chipEffects.CEL files to be generated)
> before continuing with the above, correct?  If so, I'd suggest that
> you also wait for getAverageFile() to finish first.  Then that average/
> results file be available to all your cluster nodes as well.  I even
> think you don't have to link each CEL file separately, because nothing
> else should be written back to the data set.  It should be enough to
> link each data set directory, or even just plmData/ itself (not even
> sure the need to split it up anymore).
>
>
>
> > As far as I understand these are the files created by getAverageFile()
> > and thus each node creates its own file saved in its own subdirectory,
> > so there will be no problem.
>
> Yes.  Now I agree with you.
>
>
>
> > It seems that the problem was indeed the result of saveObject() stored
> > in ".Rcache", which caused the race conditions. Since the removal of
> > saveObject() I have until now experienced no problems.
>
> Yes.  You are correct.
>
> Since caching is mainly done for memoization purposes, that is, to
> load already calculated results that are computational expensive to
> obtain from file, it is recommended to store the cache in a fast
> place.  In other words, it is better if the .Rcache directory is on
> the local drive of the machine, rather than on a shared file system.
> If you had done that, then each machine would had to have do those
> calculations by themselves once, but when done the memoization would
> be faster and you would not have had any race conditions accessing the
> memoized results.  The default ~/.Rcache/ can be changed, 
> cf.http://www.aroma-project.org/archive/GoogleGroups/web/caching.
>
> This was a useful conversation to me; it made me see other ways for
> (unnecessary) race conditions to occur, and remind me how important it
> is to not overlook the smallest details in scientific communication
> since they can make big differences.
>
> Cheers,
>
> Henrik
>
>
>
> > Thank you for your help.
> > Best regards
> > Christian
>
> > On Aug 2, 2:54 pm, Henrik Bengtsson  wrote:
>
> > > Hi.
>
> > > On Mon, Jul 26, 2010 at 12:00 PM, cstratowa
>
> > >  wrote:
> > > > Dear Henrik,
>
> > > > Maybe, my explanation was not clear enough:
>
> > > > I have created my own package based on S4 classes, where one subclass
> > > > is "AromaSNP" with slots celset, normset, plmset, effectset as lists,
> > > > and methods readSNPData(), normalizeSNPData(), computeCN(),
> > > > computeRawCN(), among others. Furthermore, the package includes
> > > > scripts batch.aroma.norm.R, batch.aroma.model.R,
> > > > batch.aroma.combine.R, and a perl script which distributes these
> > > > scripts to the different cluster nodes.
>
> > > > 1, Normalization: Script batch.

[aroma.affymetrix] Re: Problem with GLAD on linux cluster

2010-08-04 Thread cstratowa
Dear Henrik,

Thank you for your suggestion to use ceRef directly.

Regarding your explanation of getAverageFile() the question is where
the generated output will be saved.

As I have mentioned, each node creates first a plmData subdirectory,
e.g. "Prostate/Prostate21/plmData" and makes symbolic links to the
normalized CEL-files located in "Prostate/plmData". Thus the output of
getAverageFile() should be stored for each node separately.

This seems indeed to be the case, since e.g. the subdirectory
"Prostate/Prostate21/plmData/Prostate,ACC,-XY,QN,RMA,A+B,FLN,-XY/
Mapping250K_Nsp" contains the file ".average-intensities-median-
mad,a1c33926939ee43fbed83ae69301d215.CEL" created at a certain time
while subdirectory "Prostate/Prostate8/plmData/Prostate,ACC,-
XY,QN,RMA,A+B,FLN,-XY/Mapping250K_Nsp" contains a file with the same
name, i.e. ".average-intensities-median-
mad,a1c33926939ee43fbed83ae69301d215.CEL" created at a different
time.

As far as I understand these are the files created by getAverageFile()
and thus each node creates its own file saved in its own subdirectory,
so there will be no problem.

It seems that the problem was indeed the result of saveObject() stored
in ".Rcache", which caused the race conditions. Since the removal of
saveObject() I have until now experienced no problems.

Thank you for your help.
Best regards
Christian


On Aug 2, 2:54 pm, Henrik Bengtsson  wrote:
> Hi.
>
> On Mon, Jul 26, 2010 at 12:00 PM, cstratowa
>
>
>
>  wrote:
> > Dear Henrik,
>
> > Maybe, my explanation was not clear enough:
>
> > I have created my own package based on S4 classes, where one subclass
> > is "AromaSNP" with slots celset, normset, plmset, effectset as lists,
> > and methods readSNPData(), normalizeSNPData(), computeCN(),
> > computeRawCN(), among others. Furthermore, the package includes
> > scripts batch.aroma.norm.R, batch.aroma.model.R,
> > batch.aroma.combine.R, and a perl script which distributes these
> > scripts to the different cluster nodes.
>
> > 1, Normalization: Script batch.aroma.norm.R creates first the
> > subdirectory structure which I have already described, and then does
> > the normalization. All normalization steps run on one server and the
> > results are saved as AromaSNP object "aroma" in "Prostate/
> > Prostate.Rdata". Furthermore, subdirectories "Prostate/probeData" and
> > "Prostate/plmData" are created.
>
> > 2, GLAD: Script batch.aroma.norm.R is called from each node
> > separately. For each node it creates first a plmData subdirectory,
> > e.g. "Prostate/Prostate21/plmData" and makes symbolic links to the
> > normalized CEL-files located in "Prostate/plmData". Then it loads
> > object "aroma" from "Prostate/Prostate.Rdata", whereby each node has a
> > separate RAM of 2GB. Slot "ar...@effectset" contains the normalized
> > data and is called from computeCN() as "cesList <- ar...@effectset".
> > This "cesList" (which is in the RAM of each node) is passed to "model
> > <- GladModel(cesList, refList)", and is thus used to compute
> > getAverageFile(), if refList=NULL (which is the default). Since each
> > node calls the same "cesList" object, function saveObject() writes to
> > identical filenames which causes the clash.
>
> > Here is the relevant code for each cluster node:
>
> > cesList <- ar...@effectset
> > refList <- NULL  #but see below
> > model <- GladModel(cesList, refList)
> > ce  <- ChromosomeExplorer(model, tags=tags)
> > setParallelSafe(ce, status=TRUE) #suggested by you long time ago
> > tmp <- process(ce,...)
> > cnrs <- getRegions(model, ...)
> > ar...@cnregion <- cnrs
>
> > Each node saves the result, i.e. object "aroma", in a different Rdata
> > file, e.g. "Prostate/Prostate.GLAD.21.Rdata" and saves the images in
> > its own reports subdirectory, e.g. "Prostate/Prostate21/reports".
>
> > 3, Script batch.aroma.combine.R combines the results of each Rdata
> > file in a final file "Prostate/Prostate.GLAD.All.Rdata", combines all
> > images from the different subdirectories in "Prostate/reports", and
> > finally deletes all temporary Rdata files and subdirectories.
>
> > Since you mention that I should call getAverageFile() before calling
> > GladModel(), here is one more information:
> > All necessary information is stored in a text-file "pheno.txt", which
> > is read initially and contains the names of the CEL-files, the
> > l

[aroma.affymetrix] Re: Problem with GLAD on linux cluster

2010-07-26 Thread cstratowa
Dear Henrik,

Maybe, my explanation was not clear enough:

I have created my own package based on S4 classes, where one subclass
is "AromaSNP" with slots celset, normset, plmset, effectset as lists,
and methods readSNPData(), normalizeSNPData(), computeCN(),
computeRawCN(), among others. Furthermore, the package includes
scripts batch.aroma.norm.R, batch.aroma.model.R,
batch.aroma.combine.R, and a perl script which distributes these
scripts to the different cluster nodes.


1, Normalization: Script batch.aroma.norm.R creates first the
subdirectory structure which I have already described, and then does
the normalization. All normalization steps run on one server and the
results are saved as AromaSNP object "aroma" in "Prostate/
Prostate.Rdata". Furthermore, subdirectories "Prostate/probeData" and
"Prostate/plmData" are created.


2, GLAD: Script batch.aroma.norm.R is called from each node
separately. For each node it creates first a plmData subdirectory,
e.g. "Prostate/Prostate21/plmData" and makes symbolic links to the
normalized CEL-files located in "Prostate/plmData". Then it loads
object "aroma" from "Prostate/Prostate.Rdata", whereby each node has a
separate RAM of 2GB. Slot "ar...@effectset" contains the normalized
data and is called from computeCN() as "cesList <- ar...@effectset".
This "cesList" (which is in the RAM of each node) is passed to "model
<- GladModel(cesList, refList)", and is thus used to compute
getAverageFile(), if refList=NULL (which is the default). Since each
node calls the same "cesList" object, function saveObject() writes to
identical filenames which causes the clash.

Here is the relevant code for each cluster node:

cesList <- ar...@effectset
refList <- NULL  #but see below
model <- GladModel(cesList, refList)
ce  <- ChromosomeExplorer(model, tags=tags)
setParallelSafe(ce, status=TRUE) #suggested by you long time ago
tmp <- process(ce,...)
cnrs <- getRegions(model, ...)
ar...@cnregion <- cnrs

Each node saves the result, i.e. object "aroma", in a different Rdata
file, e.g. "Prostate/Prostate.GLAD.21.Rdata" and saves the images in
its own reports subdirectory, e.g. "Prostate/Prostate21/reports".


3, Script batch.aroma.combine.R combines the results of each Rdata
file in a final file "Prostate/Prostate.GLAD.All.Rdata", combines all
images from the different subdirectories in "Prostate/reports", and
finally deletes all temporary Rdata files and subdirectories.


Since you mention that I should call getAverageFile() before calling
GladModel(), here is one more information:
All necessary information is stored in a text-file "pheno.txt", which
is read initially and contains the names of the CEL-files, the
location of each CEL-file, an alias name for each CEL-file, and
optionally columns "Reference" or "Pairs". By default "refList <-
NULL", however, when I activate option "Reference" then all CEL-files
with "1" in column "Reference" are used as reference. Here is the
relevant code:

cesList <- ar...@effectset
refList <- list();
for (chiptype in names(cesList)) {
   ces <- cesList[[chiptype]];
   refcol <- which(pheno[,reference] == 1);
   datcol <- which(pheno[,reference] == 0);
   if (reference == "Pairs") {
  cesList[[chiptype]] <- extract(ces, datcol);
  refList[[chiptype]] <- extract(ces, refcol);
   } else {
  cesRef <- extract(ces, refcol);
  ceRef  <- getAverageFile(cesRef);
  ## convert single ceRef file into a set of identical files of
same size/length as ces
  numarr <- nbrOfArrays(ces);
  cesRef <- rep(list(ceRef), numarr);
  cesRef <- newInstance(ces, cesRef, mergeStrands=ces
$mergeStrands, combineAlleles=ces$combineAlleles);
  refList[[chiptype]] <- cesRef;
   }#if
}#for
model <- GladModel(cesList, refList);

Thus, if really necessary, I can always create "pheno.txt" with
Reference column "1" for all CEL-files in order to call
getAverageFile(). However, I must admit that it is still not clear to
me what the advantage would be, especially now that you have removed
function saveObject().

I hope that this explanation could explain better what the different
steps are.

Best regards
Christian


On Jul 23, 4:35 pm, Henrik Bengtsson 
wrote:
> Hi.
>
> On Jul 22, 10:24 am, cstratowa 
>
>
> ingelheim.com> wrote:
> > Dear Henrik,
>
> > Thank you very much for changing the code for getAverageFile(), I will
> > try it and let you know.
>
> > Thank you also for the explanation of writing to a temporary file, now
> > I understand your intention.
>
> > Regarding race conditions: No, I do not assume that aroma.* takes care
&

[aroma.affymetrix] Re: Problem with GLAD on linux cluster

2010-07-22 Thread cstratowa
Dear Henrik,

Thank you very much for changing the code for getAverageFile(), I will
try it and let you know.

Thank you also for the explanation of writing to a temporary file, now
I understand your intention.

Regarding race conditions: No, I do not assume that aroma.* takes care
of potential race conditions. Here is what I do:

Assume that I have downloaded from GEO a prostate cancer dataset
consisting of 40 CEL-files. Then I create a directory "Prostate" and
subdirectories "Prostate/annotationData" and "Prostate/rawData"
following your required file structure.

 However, starting with the 2nd CEL-file I create subdirectories
"Prostate/Prostate2",...,"Prostate/Prostate40", each containing a
symbolic link to "../annotationData" and "../rawData" from "Prostate".
Thus when running GLAD each cluster node has its own directory to
write to, e.g. "Prostate/Prostate21/reports" for creating the images.
Only after all nodes have finished their computations, then I move the
relevant files to the main directory, e.g. all images are moved to
"Prostate/reports". Afterwards I delete the subdirectories
"Prostate2",...,"Prostate40" and their contents.

As you can see, using this setup there should not be any race
conditions. The only remaining problem are the temporary files which
you store in ".Rcache" in my home directory.

I know that you store the monocell files in ".Rcache/
aroma.affymetrix", so that the monocell files have to be created only
once. However, for the temporary files please allow me to suggest that
you create a temporary directory in your file structure, e.g.
"Prostate/tmp", where these files are stored. In my case this would
definitely solve my problem since each subdirectory would contain its
own temporary directory, e.g. "Prostate/Prostate21/tmp". I do not know
if this change would break any code or cause any problems, it is only
a naive suggestion. What is your  opinion?

Best regards
Christian


On Jul 21, 6:46 pm, Henrik Bengtsson 
wrote:
> Hi Christian.
>
> On Wed, Jul 21, 2010 at 2:59 PM, cstratowa
>
>
>
>  wrote:
> > Dear Henrik,
>
> > Thank you for this extensive explanation and sorry for the late reply
> > but I was pretty busy.
>
> > Yes, it did work before! As I mentioned with versions
> > aroma.affymetrix_1.1.0 and earlier I have never had a  problem doing
> > the analyses on cluster nodes.
>
> > Looking at the source code of different versions of saveObject() I
> > realize that using "saveObject(..,safe=FALSE)" would be the same as
> > using saveObject() from R.utils_0.9.1. Thus in principle this could
> > solve my problem. Is this correct?
>
> > Sadly, method AffymetrixCelSet::getAverageFile() in
> > aroma.affymetrix_1.6.2 does not allow to pass parameter "safe=FALSE"
> > to saveObject(). Is it possible for you to change it?
>
> I have decided to remove that debug code that calls saveObject(),
> because it is not really needed anymore.  The main reason why I remove
> it is because it is obsolete code.  The intention of that code snippet
> in getAverageFile() was never to protect against race conditions (it
> was just an unplanned side effect).
>
> Until next release, you can get a patched version as:
>
> library("aroma.affymetrix");
> downloadPackagePatch("aroma.affymetrix");
>
> Note, as I said in my previous reply, by processing (=here calling
> getAverageFile() on) the same data set on multiple hosts, you are
> potentially running into race conditions resulting in corrupt data.
> You should at least be aware of it and understand why this is the
> case.
>
>
>
> > It is still not clear to me why you create first a temporary file
> > which you then rename (although you mention power failures etc).
> > However, would it be possible to add a random number to the temporary
> > filename, e.g. "*.tmp.1948234", so that the problem with the existing
> > temporary file could be avoided?
>
> The main purpose of writing to a temporary file and then renaming is
> to make sure that the file is complete.  If something happens while
> writing the temporary file, the final file will not exist/be created.
> If one would write to the final file from the beginning, there is no
> way for us to know if the file was correctly created or not.  So,
> writing via a temporary file, we effectively have a way of creating
> files in one atomic action.
>
>
>
> > Probably you only need to change line 59 to:
>
> > pathnameT <- sprintf("%s.tmp.%i", pathname,
> > as.integer(runif(1,1,)))
>
> In order not to corrupt the temporary f

[aroma.affymetrix] Re: Problem with GLAD on linux cluster

2010-07-21 Thread cstratowa
Dear Henrik,

Thank you for this extensive explanation and sorry for the late reply
but I was pretty busy.

Yes, it did work before! As I mentioned with versions
aroma.affymetrix_1.1.0 and earlier I have never had a  problem doing
the analyses on cluster nodes.

Looking at the source code of different versions of saveObject() I
realize that using "saveObject(..,safe=FALSE)" would be the same as
using saveObject() from R.utils_0.9.1. Thus in principle this could
solve my problem. Is this correct?

Sadly, method AffymetrixCelSet::getAverageFile() in
aroma.affymetrix_1.6.2 does not allow to pass parameter "safe=FALSE"
to saveObject(). Is it possible for you to change it?

It is still not clear to me why you create first a temporary file
which you then rename (although you mention power failures etc).
However, would it be possible to add a random number to the temporary
filename, e.g. "*.tmp.1948234", so that the problem with the existing
temporary file could be avoided?

Probably you only need to change line 59 to:

pathnameT <- sprintf("%s.tmp.%i", pathname,
as.integer(runif(1,1,)))

Regarding your suggestion to wrap getAverageFile() in Mutex calls I
have no idea if there exists an R-package for this purpose. Neither
Rmpi nor snow seem to be suitable for this purpose (at least  not
without a complete re-write of my package).

One other question:
Is it allowed to delete the contents of directory .Rcache/
aroma.affymetrix/idChecks?

Best regards
Christian


On Jul 2, 12:47 am, Henrik Bengtsson  wrote:
> Hi Christian.
>
> On Tue, Jun 29, 2010 at 3:39 PM, cstratowa
>
>
>
>  wrote:
> > Dear Henrik,
>
> > Until now I have used aroma.affymetrix_1.1.0 with R-2.8.1 and could
> > run my analysis on our sge-cluster w/o any problems.
>
> > Now I have upgraded to R-2.11.1 and to aroma.affymetrix_1.6.2 and are
> > curently testing with 8 chips whether my package based on
> > aroma.affymetrix still works on the cluster. The normalization step on
> > a server did run fine, howeever, distributing the 8 samples on the
> > cluster to run GladModel() resulted in the problem that 3 of 8 cluster
> > nodes did stop with the following error message:
>
> > Loading required package: GLAD
> > ...
> > Loading required package: RColorBrewer
> > Loading required package: Cairo
> > Error in list(`computeCN(aroma, model = model, arrays = arrays[i],
> > chromosomes = 1:23, ref` = ,  :
>
> > [2010-06-29 15:08:49] Exception: Cannot save to file. Temporary file
> > already exists: ~/.Rcache/aroma.affymetrix/idChecks/
> > a1c33926939ee43fbed83ae69301d215.tmp
> >  at throw(Exception(...))
> >  at throw.default("Cannot save to file. Temporary file already
> > exists: ", pathn
> >  at throw("Cannot save to file. Temporary file already exists: ",
> > pathnameT)
> >  at saveObject.default(list(key = key, keyIds = lapply(key, digest2),
> > id = id),
> >  at saveObject(list(key = key, keyIds = lapply(key, digest2), id =
> > id), idPathn
> >  at getAverageFile.AffymetrixCelSet(ces, force = force, verbose =
> > less(verbose)
> >  at NextMethod(generic = "getAverageFile", object = this, indices =
> > indices, ..
> >  at getAverageFile.ChipEffectSet(ces, force = force, verbose =
> > less(verbose))
> >  at NextMethod(generic = "getAverageFile", object = this, ...)
> >  at getAverageFile.SnpChipEffectSet(ces, force = force, verbose =
> > less(verbose)
> >  at NextMethod(generic = "getAverageFile", object = this, ...)
> >  at getAverageFile.CnChipEffectS
> > Calls: computeCN ... saveObject.default -> throw -> throw.default ->
> > throw -> throw.Exception
> > Execution halted
>
> > Interestingly, on the other 5 nodes GladModel() seems to run fine.
>
> > Do you have any idea what the reason for this problem might be?
>
> This seems to be due to a race condition, because several processes
> calls getAverageFile() on the same data set (set of data files).  It
> has nothing to do with the GladModel - that is only calling
> getAverageFile() in order to calculate the average signal across all
> samples in the data set.
>
> More precisely, in this particular case it is saveObject() of R.utils
> that detects that there already exist a temporary file (added file
> name extension *.tmp) that is currently being created and written to
> by another process.  This temporary file is renamed to its final name
> when done.  The reason why didn't observe it before is most likely
> because this additional feature was added to saveObject() in R.utils
> v1.2.4:
>
> Version: 1.2.4 [2009-10-30]
> o ROBUSTIFICATION: Lowered

[aroma.affymetrix] Problem with GLAD on linux cluster

2010-06-29 Thread cstratowa
Dear Henrik,

Until now I have used aroma.affymetrix_1.1.0 with R-2.8.1 and could
run my analysis on our sge-cluster w/o any problems.

Now I have upgraded to R-2.11.1 and to aroma.affymetrix_1.6.2 and are
curently testing with 8 chips whether my package based on
aroma.affymetrix still works on the cluster. The normalization step on
a server did run fine, howeever, distributing the 8 samples on the
cluster to run GladModel() resulted in the problem that 3 of 8 cluster
nodes did stop with the following error message:

Loading required package: GLAD
...
Loading required package: RColorBrewer
Loading required package: Cairo
Error in list(`computeCN(aroma, model = model, arrays = arrays[i],
chromosomes = 1:23, ref` = ,  :

[2010-06-29 15:08:49] Exception: Cannot save to file. Temporary file
already exists: ~/.Rcache/aroma.affymetrix/idChecks/
a1c33926939ee43fbed83ae69301d215.tmp
  at throw(Exception(...))
  at throw.default("Cannot save to file. Temporary file already
exists: ", pathn
  at throw("Cannot save to file. Temporary file already exists: ",
pathnameT)
  at saveObject.default(list(key = key, keyIds = lapply(key, digest2),
id = id),
  at saveObject(list(key = key, keyIds = lapply(key, digest2), id =
id), idPathn
  at getAverageFile.AffymetrixCelSet(ces, force = force, verbose =
less(verbose)
  at NextMethod(generic = "getAverageFile", object = this, indices =
indices, ..
  at getAverageFile.ChipEffectSet(ces, force = force, verbose =
less(verbose))
  at NextMethod(generic = "getAverageFile", object = this, ...)
  at getAverageFile.SnpChipEffectSet(ces, force = force, verbose =
less(verbose)
  at NextMethod(generic = "getAverageFile", object = this, ...)
  at getAverageFile.CnChipEffectS
Calls: computeCN ... saveObject.default -> throw -> throw.default ->
throw -> throw.Exception
Execution halted

Interestingly, on the other 5 nodes GladModel() seems to run fine.

Do you have any idea what the reason for this problem might be?

> sessionInfo()
R version 2.11.1 (2010-05-31)
x86_64-unknown-linux-gnu

locale:
[1] C

attached base packages:
[1] stats graphics  grDevices utils datasets  methods
base

other attached packages:
 [1] Cairo_1.4-5RColorBrewer_1.0-2
GLAD_2.10.0
 [4] biasnp_0.2.54  RODBC_1.3-1
aroma.affymetrix_1.6.2
 [7] aroma.apd_0.1.7affxparser_1.20.0
R.huge_0.2.0
[10] aroma.core_1.6.2   aroma.light_1.16.0
matrixStats_0.2.1
[13] R.rsp_0.3.6R.cache_0.3.0
R.filesets_0.8.2
[16] digest_0.4.2   R.utils_1.4.2
R.oo_1.7.3
[19] R.methodsS3_1.2.0

loaded via a namespace (and not attached):
[1] tools_2.11.1
Warning message:
'DESCRIPTION' file has 'Encoding' field and re-encoding is not
possible

Best regards
Christian

P.S.:
Is Google groups still the place to post questions?

-- 
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
"aroma.affymetrix" group with website http://www.aroma-project.org/.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe and other options, go to http://www.aroma-project.org/forum/


[aroma.affymetrix] Re: Question for custom CDF of ST-Array

2010-02-08 Thread cstratowa
Dear Henrik

I have sent this offline to Affymetrix DevNet, so you will not find
any note.

I have already done this once for the MoGene and RaGene arrays where
Affymetrix has deleted one AFFX control in the PGF-files but not in
the corresponding annotation files, see:
https://www.stat.math.ethz.ch/pipermail/bioconductor/2009-August/029049.html
However, this time they have added a few hundred probesets to the
annotation file which are not present in the PGF-file, see their
README file for the updated HuGene annotation file.

BTW, I really hope that Affymetrix (and you) will soon update the
annotation files for the SNP arrays to hg19, i.e. human genome build
37.

Best regards
Christian


On Feb 8, 11:09 am, Henrik Bengtsson 
wrote:
> Hi.
>
> On Mon, Feb 1, 2010 at 9:02 AM, cstratowa
>
>
>
>  wrote:
> > Dear Branko,
>
> > I have just seen your comment that you have an issue with XPS, which
> > is my BioC package. However, you have never contacted me at
> >  nor have sent a mail to the BioC mailing list. I
> > would really appreciate that you contact me first before making such a
> > statement.
>
> > Regarding your problem with HuGene-1_0-st-v1.na30.hg19.probeset._csv,
> > (which could be the reason of your problem with xps), Affymetrix had
> > included probesets in this annotation file which were not included in
> > the corresponding *.PGF file. For this reason I have contacted
> > Affymetrix DevNet and asked them to remove the extra probesets, which
> > they kindly did resulting in file HuGene-1_0-st-
> > v1.na30.1.hg19.probeset._csv.
>
> Thanks for this note/update.  Did you send this offline to Affymetrix
> or did you use their Affymetrix Scientific Community Forums
> [https://www.affymetrix.com/community/forums/index.jspa]?  I could
> find it in the latter, but there are quite a few forums too.
>
> Cheers,
>
> Henrik
>
>
>
> > Best regards
> > Christian
>
> > On Jan 28, 4:58 pm, branko  wrote:
> >> Hi all,
>
> >>   First to thank you Mark , Henrik for nice  tool and also  all others
> >> for  informative mailing list.
> >> This is my first email so hopefully not so many dummy questions.
>
> >> I found this thread as it is linked to my first 2  questions:
> >> Month  ago I did some basic analysis  with 300 chips where your tool
> >> got very handy. And I'm starting again to analyze things ...
> >>  (Btw. I found  about Aroma  on BioC list as I had issue with XPS
> >> package that I couldn’t resolve . I would like to have them both
> >> working so  Xps will come on board eventually.)
>
> >>  Now regarding this tread.
> >> 1)     When I compared HuGene-1_0-st-v1.na30.hg19.probeset._csv.  with
> >> the Aroma CDF file there are
> >>      33257  PSets  in csv while  Aroma has 33252 so 4 different. So
> >> maybe the CDF  update is needed in Aroma?
> >>      I also just saw some more recent update on Affymetrix to “na30.1”
> >> and I see I have used “na30” csv file.
>
> >>   Following  seems weird , many  ProbeSets have no gene symbol
> >> annotation in “main“ category :
>
> >>   tr <- read.csv ( "HuGene-1_0-st-v1.na30.hg19.transcript.csv" ,
> >> stringsAsFactors =F ,comment.char="#")
>
> >>   sum(tr$gene_assignment=="---" & tr$category=="main") #6711
> >>   sum(tr$unigene=="---" & tr$category=="main")  #7376
> >>   sum(tr$swissprot=="---" & tr$category=="main") # 8021
>
> >>    Affy 33257  Probesets are classified as:
> >>       TYPE                              TOTAL
> >> control->affx                           57
> >> control->bgp->antigenomic    45
> >> main                                    28829
> >> normgene->exon                 1195
> >> normgene->intron                  2904
> >> rescue->FLmRNA->unmapped    227
>
> >>    Do you  maybe know why are there so many e.g above 6711 without
> >> gene_assignment ?
>
> >> 2) One more related to 1st Q  : How to run normalization only on
> >> ProbeSet  classified as “main” ?  ( Affymetrix csv file says this
> >> class has 28829
> >>       Psets in main class)   Idea is of course  to minimize
> >> normalization bias of non-main class Psets.
>
> >> 3) Few  questions regarding Quality checks and basic data
> >> manipulations in Aroma:
> >>     I would like to give meaningful labels to  the Arrays in the e.g.
> >> box-plots (instead of CEL file names, as said i have 300) .
> 

[aroma.affymetrix] Re: Question for custom CDF of ST-Array

2010-02-08 Thread cstratowa
Dear Henrik

I have sent this offline to Affymetrix DevNet, so you will not find
any note.

I have already done this once for the MoGene and RaGene arrays where
Affymetrix has deleted one AFFX control in the PGF-files but not in
the corresponding annotation files, see:
https://www.stat.math.ethz.ch/pipermail/bioconductor/2009-August/029049.html
However, this time they have added a few hundred probesets to the
annotation file which are not present in the PGF-file, see their
README file for the updated HuGene annotation file.

BTW, I really hope that Affymetrix (and you) will soon update the
annotation files for the SNP arrays to hg19, i.e. human genome build
37.

Best regards
Christian


On Feb 8, 11:09 am, Henrik Bengtsson 
wrote:
> Hi.
>
> On Mon, Feb 1, 2010 at 9:02 AM, cstratowa
>
>
>
>  wrote:
> > Dear Branko,
>
> > I have just seen your comment that you have an issue with XPS, which
> > is my BioC package. However, you have never contacted me at
> >  nor have sent a mail to the BioC mailing list. I
> > would really appreciate that you contact me first before making such a
> > statement.
>
> > Regarding your problem with HuGene-1_0-st-v1.na30.hg19.probeset._csv,
> > (which could be the reason of your problem with xps), Affymetrix had
> > included probesets in this annotation file which were not included in
> > the corresponding *.PGF file. For this reason I have contacted
> > Affymetrix DevNet and asked them to remove the extra probesets, which
> > they kindly did resulting in file HuGene-1_0-st-
> > v1.na30.1.hg19.probeset._csv.
>
> Thanks for this note/update.  Did you send this offline to Affymetrix
> or did you use their Affymetrix Scientific Community Forums
> [https://www.affymetrix.com/community/forums/index.jspa]?  I could
> find it in the latter, but there are quite a few forums too.
>
> Cheers,
>
> Henrik
>
>
>
> > Best regards
> > Christian
>
> > On Jan 28, 4:58 pm, branko  wrote:
> >> Hi all,
>
> >>   First to thank you Mark , Henrik for nice  tool and also  all others
> >> for  informative mailing list.
> >> This is my first email so hopefully not so many dummy questions.
>
> >> I found this thread as it is linked to my first 2  questions:
> >> Month  ago I did some basic analysis  with 300 chips where your tool
> >> got very handy. And I'm starting again to analyze things ...
> >>  (Btw. I found  about Aroma  on BioC list as I had issue with XPS
> >> package that I couldn’t resolve . I would like to have them both
> >> working so  Xps will come on board eventually.)
>
> >>  Now regarding this tread.
> >> 1)     When I compared HuGene-1_0-st-v1.na30.hg19.probeset._csv.  with
> >> the Aroma CDF file there are
> >>      33257  PSets  in csv while  Aroma has 33252 so 4 different. So
> >> maybe the CDF  update is needed in Aroma?
> >>      I also just saw some more recent update on Affymetrix to “na30.1”
> >> and I see I have used “na30” csv file.
>
> >>   Following  seems weird , many  ProbeSets have no gene symbol
> >> annotation in “main“ category :
>
> >>   tr <- read.csv ( "HuGene-1_0-st-v1.na30.hg19.transcript.csv" ,
> >> stringsAsFactors =F ,comment.char="#")
>
> >>   sum(tr$gene_assignment=="---" & tr$category=="main") #6711
> >>   sum(tr$unigene=="---" & tr$category=="main")  #7376
> >>   sum(tr$swissprot=="---" & tr$category=="main") # 8021
>
> >>    Affy 33257  Probesets are classified as:
> >>       TYPE                              TOTAL
> >> control->affx                           57
> >> control->bgp->antigenomic    45
> >> main                                    28829
> >> normgene->exon                 1195
> >> normgene->intron                  2904
> >> rescue->FLmRNA->unmapped    227
>
> >>    Do you  maybe know why are there so many e.g above 6711 without
> >> gene_assignment ?
>
> >> 2) One more related to 1st Q  : How to run normalization only on
> >> ProbeSet  classified as “main” ?  ( Affymetrix csv file says this
> >> class has 28829
> >>       Psets in main class)   Idea is of course  to minimize
> >> normalization bias of non-main class Psets.
>
> >> 3) Few  questions regarding Quality checks and basic data
> >> manipulations in Aroma:
> >>     I would like to give meaningful labels to  the Arrays in the e.g.
> >> box-plots (instead of CEL file names, as said i have 300) .
> 

[aroma.affymetrix] Re: Question for custom CDF of ST-Array

2010-02-01 Thread cstratowa
Dear Branko,

I have just seen your comment that you have an issue with XPS, which
is my BioC package. However, you have never contacted me at
 nor have sent a mail to the BioC mailing list. I
would really appreciate that you contact me first before making such a
statement.

Regarding your problem with HuGene-1_0-st-v1.na30.hg19.probeset._csv,
(which could be the reason of your problem with xps), Affymetrix had
included probesets in this annotation file which were not included in
the corresponding *.PGF file. For this reason I have contacted
Affymetrix DevNet and asked them to remove the extra probesets, which
they kindly did resulting in file HuGene-1_0-st-
v1.na30.1.hg19.probeset._csv.

Best regards
Christian

On Jan 28, 4:58 pm, branko  wrote:
> Hi all,
>
>   First to thank you Mark , Henrik for nice  tool and also  all others
> for  informative mailing list.
> This is my first email so hopefully not so many dummy questions.
>
> I found this thread as it is linked to my first 2  questions:
> Month  ago I did some basic analysis  with 300 chips where your tool
> got very handy. And I'm starting again to analyze things ...
>  (Btw. I found  about Aroma  on BioC list as I had issue with XPS
> package that I couldn’t resolve . I would like to have them both
> working so  Xps will come on board eventually.)
>
>  Now regarding this tread.
> 1)     When I compared HuGene-1_0-st-v1.na30.hg19.probeset._csv.  with
> the Aroma CDF file there are
>      33257  PSets  in csv while  Aroma has 33252 so 4 different. So
> maybe the CDF  update is needed in Aroma?
>      I also just saw some more recent update on Affymetrix to “na30.1”
> and I see I have used “na30” csv file.
>
>   Following  seems weird , many  ProbeSets have no gene symbol
> annotation in “main“ category :
>
>   tr <- read.csv ( "HuGene-1_0-st-v1.na30.hg19.transcript.csv" ,
> stringsAsFactors =F ,comment.char="#")
>
>   sum(tr$gene_assignment=="---" & tr$category=="main") #6711
>   sum(tr$unigene=="---" & tr$category=="main")  #7376
>   sum(tr$swissprot=="---" & tr$category=="main") # 8021
>
>    Affy 33257  Probesets are classified as:
>       TYPE                              TOTAL
> control->affx                           57
> control->bgp->antigenomic    45
> main                                    28829
> normgene->exon                 1195
> normgene->intron                  2904
> rescue->FLmRNA->unmapped    227
>
>    Do you  maybe know why are there so many e.g above 6711 without
> gene_assignment ?
>
> 2) One more related to 1st Q  : How to run normalization only on
> ProbeSet  classified as “main” ?  ( Affymetrix csv file says this
> class has 28829
>       Psets in main class)   Idea is of course  to minimize
> normalization bias of non-main class Psets.
>
> 3) Few  questions regarding Quality checks and basic data
> manipulations in Aroma:
>     I would like to give meaningful labels to  the Arrays in the e.g.
> box-plots (instead of CEL file names, as said i have 300) .
>     How can I do that?
>     Also how can I sort them ?
>     I ask this silly questions because Using R commands like str()
> doesn’t show me the
>     object fields etc. so I can’t use standard R matrix commands,
> also  help (“some Aroma command” ) doesn’t show enough information.
>     Sometimes it gives empty help page.
>     I could not find pdf manual in Aroma installed libraries nor in
> the Google group. I see only html file showing me all the functions &
> classes.
>      Is there easier way to look for functions than main html pages ?
>     Code of functions are not visible by just typing  func.name() , I
> guess I can always get source code and search but there is likely easy
> way to do it.
>
>      It is visible that  Aroma uses different classes than
> BioConductor.  I assume there is a good reason for that, but maybe you
> can give some link with explanation?
>
>     Oh and thank you for putting clean instructions for basic
> preprocessing steps to get Expression matrix out for further
> analysis.
>
> One example of my attempt to searching things :
>     cs <- AffymetrixCelSet$byName(dataSet, cdf=cdf)
>     print(cs)
>     str(cs)
>     #Classes 'AffymetrixCelSet', 'AffymetrixFileSet',
> 'AromaPlatformInterface', 'AromaMicroarrayDataSet',
> 'GenericDataFileSet', 'FullNameInterface', 'Object'  atomic [1:1] NA
> ….
>    ?classname put me on some track.
>
>    So I looked a bit and I saw GenericDataFileSet class provides
> function
>    getFullNames()  and it does  give me the my CEL filenames (btw.
> help on ?getFullNames gives me no help .)
>    setNames() point to basic library…  but  before going deeper and
> making some mess I hope you
>    can give me answer on above questions or some links ?
>
> 4) Last  one , regarding QC  issue with plotting … SO when doing Array
> (pseudo) image plots  my RGui in windows complains e.g.:
>
>   If I do:   cf <- getFile (cs, 1)
>                plotImage(cf, transform=list(log2), palette=rainbow
> (256))
>
>          

[aroma.affymetrix] Re: Problem with paired copynumber analysis for SNP6.0

2009-07-09 Thread cstratowa

Dear Henrik,

Thank you for your reply. I must admit that I had removed all verbose
statements from the code fragment I have sent you. From my verbose
statements I knew already that the CEL set is empty but I did not
understand why.

After following your advice and stepping through each statement
manually it turned out to be a problem with "chiptype": For certain
parts of my code the chipname must be "GenomeWideSNP_6.Full", which is
the correct chip name, but for the aroma.affymetrix parts chiptype
must be "GenomeWideSNP_6". Thus in the paired case I forgot to remove
".Full".

Best regards
Christian


On Jul 8, 2:22 am, Henrik Bengtsson  wrote:
> Hi,
>
> the only thing I can conclude is that nbrOfFiles(this) == 0 in:
>
> > at Arguments$getIndex(idx, range = c(1, nbrOfFiles(this)))
>
> and from
>
> >  at getUnitNamesFile.AffymetrixCelSet(this)
>
> I learn that 'this' refers to a CEL set (which is empty).
>
> However, it is impossible for me to know where/what causes this.  I
> suggest that you add some verbose statements to computeCN() to narrow
> down at what lines of code the error occurs, or alternatively "unwrap"
> it from computeCN() and step through each of its commands manually.
>
> This is exactly why I add so many verbose statements and sanity checks
> in my code so that any errors/bugs are caught as soon as possible.
> See my code how to add verbose statements.
>
> Cheers,
>
> /Henrik
>
> On Thu, Jul 2, 2009 at 6:24 AM,
>
> cstratowa wrote:
>
> > Dear Henrik,
>
> > Sadly, paired analysis with SNP6.0 arrays results still in the Error
> > messages shown below.
>
> > Please note the following:
>
> > 1, With the new versions of aroma.* I have just successfully tested
> > CEL-files from Mapping250K_Sty arrays both with and w/o paired
> > analysis.
>
> > 2, As mentioned yesterday I have successfully tested CEL-files from
> > SNP6.0 arrays w/o paired analysis.
>
> > 3, Since the error seems to occur in function "computeCN()" please
> > note that I have already listed the relevant code in my initial
> > message.
>
> > Here is the error message I get:
>
> > Loading required package: DNAcopy
> > Error in list(`computeCN(aroma, model = model, arrays = arrays[i],
> > chromosomes = 1:23, ref` = ,  :
>
> > [2009-07-02 10:21:14] Exception: Range of argument 'idx' is out of
> > range [1,0]: [1,1]
> >  at throw(Exception(...))
> >  at throw.default(sprintf("Range of argument '%s' is out of range [%s,
> > %s]: [%s,
> >  at throw(sprintf("Range of argument '%s' is out of range [%s,%s]:
> > [%s,%s]", .n
> >  at getNumerics.Arguments(static, ..., asMode = "integer", disallow =
> > disallow)
> >  at getNumerics(static, ..., asMode = "integer", disallow = disallow)
> >  at getIntegers.Arguments(static, ..., range = range)
> >  at getIntegers(static, ..., range = range)
> >  at getIndices.Arguments(static, ..., length = length)
> >  at getIndices(static, ..., length = length)
> >  at method(static, ...)
> >  at Arguments$getIndex(idx, range = c(1, nbrOfFiles(this)))
> >  at getFile.GenericDataFileSet(this, 1)
> >  at getFile(this, 1)
> >  at getUnitNamesFile.AffymetrixCelSet(this)
> >  at getUnitNamesFile(this)
> >  at getChipType.AffymetrixCelSet(X[[1]], ...)
> >  at FUN(X[[1]], ...)
> >  at lapply(X, FUN, ...)
> >  at sapply.default(csList, FUN = getChipTyp
> > Calls: computeCN ... getNumerics.Arguments -> throw -> throw.default -
> >> throw -> throw.Exception
> > Execution halted
>
> > Do you have any idea what might be the reason?
>
> > Since I do not get this error with Sty arrays it must be special for
> > SNP6.0 arrays when doing paired analysis. Could it be that there may
> > be a problem with aroma functions "getNames()" or "extract()" for
> > SNP6.0?
>
> > Here once again the sessionInfo:
> >> sessionInfo()
> > R version 2.8.1 (2008-12-22)
> > x86_64-unknown-linux-gnu
>
> > locale:
> > C
>
> > attached base packages:
> > [1] tcltk     stats     graphics  grDevices utils     datasets
> > methods
> > [8] base
>
> > other attached packages:
> >  [1] GLAD_1.18.0            Cairo_1.4-5
> > DNAcopy_1.19.2
> >  [4] biasnp_0.2.43          RODBC_1.2-5
> > aroma.affymetrix_1.1.0
> >  [7] aroma.apd_0.1.3        R.huge_0.1.7
> > affxparser_1.14.2
> > [10] aroma.core_1.1.2       aroma.light_1.13.2
> > matrixStats_0.1.6
> &g

[aroma.affymetrix] Re: Problem with paired copynumber analysis for SNP6.0

2009-07-02 Thread cstratowa

Dear Henrik,

Sadly, paired analysis with SNP6.0 arrays results still in the Error
messages shown below.

Please note the following:

1, With the new versions of aroma.* I have just successfully tested
CEL-files from Mapping250K_Sty arrays both with and w/o paired
analysis.

2, As mentioned yesterday I have successfully tested CEL-files from
SNP6.0 arrays w/o paired analysis.

3, Since the error seems to occur in function "computeCN()" please
note that I have already listed the relevant code in my initial
message.

Here is the error message I get:

Loading required package: DNAcopy
Error in list(`computeCN(aroma, model = model, arrays = arrays[i],
chromosomes = 1:23, ref` = ,  :

[2009-07-02 10:21:14] Exception: Range of argument 'idx' is out of
range [1,0]: [1,1]
  at throw(Exception(...))
  at throw.default(sprintf("Range of argument '%s' is out of range [%s,
%s]: [%s,
  at throw(sprintf("Range of argument '%s' is out of range [%s,%s]:
[%s,%s]", .n
  at getNumerics.Arguments(static, ..., asMode = "integer", disallow =
disallow)
  at getNumerics(static, ..., asMode = "integer", disallow = disallow)
  at getIntegers.Arguments(static, ..., range = range)
  at getIntegers(static, ..., range = range)
  at getIndices.Arguments(static, ..., length = length)
  at getIndices(static, ..., length = length)
  at method(static, ...)
  at Arguments$getIndex(idx, range = c(1, nbrOfFiles(this)))
  at getFile.GenericDataFileSet(this, 1)
  at getFile(this, 1)
  at getUnitNamesFile.AffymetrixCelSet(this)
  at getUnitNamesFile(this)
  at getChipType.AffymetrixCelSet(X[[1]], ...)
  at FUN(X[[1]], ...)
  at lapply(X, FUN, ...)
  at sapply.default(csList, FUN = getChipTyp
Calls: computeCN ... getNumerics.Arguments -> throw -> throw.default -
> throw -> throw.Exception
Execution halted

Do you have any idea what might be the reason?

Since I do not get this error with Sty arrays it must be special for
SNP6.0 arrays when doing paired analysis. Could it be that there may
be a problem with aroma functions "getNames()" or "extract()" for
SNP6.0?

Here once again the sessionInfo:
> sessionInfo()
R version 2.8.1 (2008-12-22)
x86_64-unknown-linux-gnu

locale:
C

attached base packages:
[1] tcltk stats graphics  grDevices utils datasets
methods
[8] base

other attached packages:
 [1] GLAD_1.18.0Cairo_1.4-5
DNAcopy_1.19.2
 [4] biasnp_0.2.43  RODBC_1.2-5
aroma.affymetrix_1.1.0
 [7] aroma.apd_0.1.3R.huge_0.1.7
affxparser_1.14.2
[10] aroma.core_1.1.2   aroma.light_1.13.2
matrixStats_0.1.6
[13] R.rsp_0.4.0R.httpd_0.1.1
R.filesets_0.5.2
[16] digest_0.3.1   R.cache_0.1.7
R.utils_1.1.6
[19] R.oo_1.4.6 R.methodsS3_1.0.3

Best regards
Christian


On Jul 1, 3:31 pm, cstratowa  wrote:
> Dear Henrik,
>
> Thank you for your suggestions.
>
> The answer to your first reply is that the security settings prevent
> us from using biocLite() etc so we need to download all files and
> usually run install scripts.
>
> Following your suggestion I have now installed DNAcopy v 1.19.2 which
> caused some problems on the cluster because of a missing
> libgfortran.so. (As I have mentioned in an earlier post, I am running
> CBS and GLAD on cluster nodes in order to speed up processing.)
>
> I have just started another session and at the moment it seems to
> work, the current output is:
>
> Loading required package: DNAcopy
> Loading required package: Cairo
> Loading required package: GLAD
> [1] "Have fun with GLAD"
>
> However, I still need to test the paired analysis with the SNP6.0
> arrays, this will take some time.
>
> Here is how the sessionInfo is supposed to look like:> sessionInfo()
>
> R version 2.8.1 (2008-12-22)
> x86_64-unknown-linux-gnu
>
> locale:
> C
>
> attached base packages:
> [1] tcltk     stats     graphics  grDevices utils     datasets
> methods
> [8] base
>
> other attached packages:
>  [1] GLAD_1.18.0            Cairo_1.4-5
> DNAcopy_1.19.2
>  [4] biasnp_0.2.43          RODBC_1.2-5
> aroma.affymetrix_1.1.0
>  [7] aroma.apd_0.1.3        R.huge_0.1.7
> affxparser_1.14.2
> [10] aroma.core_1.1.2       aroma.light_1.13.2
> matrixStats_0.1.6
> [13] R.rsp_0.4.0            R.httpd_0.1.1
> R.filesets_0.5.2
> [16] digest_0.3.1           R.cache_0.1.7
> R.utils_1.1.6
> [19] R.oo_1.4.6             R.methodsS3_1.0.3
>
> Best regards
> Christian
>
> On Jun 29, 4:34 pm, Henrik Bengtsson  wrote:
>
> > Hi.
>
> > On Mon, Jun 29, 2009 at 5:14 AM,
>
> > cstratowa wrote:
>
> > > Dear Henrik,
>
> > > Meanwhile I have downloaded the latest versions from your repository
> > > and installed them on R-2.8.1. Please note that due to our settings it

[aroma.affymetrix] Re: Problem with paired copynumber analysis for SNP6.0

2009-07-01 Thread cstratowa

Dear Henrik,

Thank you for your suggestions.

The answer to your first reply is that the security settings prevent
us from using biocLite() etc so we need to download all files and
usually run install scripts.

Following your suggestion I have now installed DNAcopy v 1.19.2 which
caused some problems on the cluster because of a missing
libgfortran.so. (As I have mentioned in an earlier post, I am running
CBS and GLAD on cluster nodes in order to speed up processing.)

I have just started another session and at the moment it seems to
work, the current output is:

Loading required package: DNAcopy
Loading required package: Cairo
Loading required package: GLAD
[1] "Have fun with GLAD"

However, I still need to test the paired analysis with the SNP6.0
arrays, this will take some time.

Here is how the sessionInfo is supposed to look like:
> sessionInfo()
R version 2.8.1 (2008-12-22)
x86_64-unknown-linux-gnu

locale:
C

attached base packages:
[1] tcltk stats graphics  grDevices utils datasets
methods
[8] base

other attached packages:
 [1] GLAD_1.18.0Cairo_1.4-5
DNAcopy_1.19.2
 [4] biasnp_0.2.43  RODBC_1.2-5
aroma.affymetrix_1.1.0
 [7] aroma.apd_0.1.3R.huge_0.1.7
affxparser_1.14.2
[10] aroma.core_1.1.2   aroma.light_1.13.2
matrixStats_0.1.6
[13] R.rsp_0.4.0R.httpd_0.1.1
R.filesets_0.5.2
[16] digest_0.3.1   R.cache_0.1.7
R.utils_1.1.6
[19] R.oo_1.4.6 R.methodsS3_1.0.3

Best regards
Christian


On Jun 29, 4:34 pm, Henrik Bengtsson  wrote:
> Hi.
>
> On Mon, Jun 29, 2009 at 5:14 AM,
>
>
>
> cstratowa wrote:
>
> > Dear Henrik,
>
> > Meanwhile I have downloaded the latest versions from your repository
> > and installed them on R-2.8.1. Please note that due to our settings it
> > is NOT POSSIBLE for me to use hbInstall("aroma.affymetrix")!!!
>
> > Sadly, now CBS does no longer work at all. I get the message:
>
> > Loading required package: DNAcopy
> > Loading required package: Cairo
> > Error: package 'DNAcopy' does not have a name space
> > In addition: Warning messages:
> > 1: In log(theta/thetaRef, base = 2) : NaNs produced
> > 2: In log(theta * thetaRef, base = 2) : NaNs produced
> > Execution halted
>
> Output from traceback() would be helpful here.  I can make a guess
> where/why it occurs, but I'm not sure what you are doing.
>
>
>
>
>
> > Thus it seems that the new versions do not work with R-2.8.1, so I
> > need to downgrade to the old versions.
>
> > Here is the sessionInfo() for the new versions:
> >> sessionInfo()
> > R version 2.8.1 (2008-12-22)
> > x86_64-unknown-linux-gnu
>
> > locale:
> > C
>
> > attached base packages:
> > [1] tcltk     stats     graphics  grDevices utils     datasets
> > methods
> > [8] base
>
> > other attached packages:
> >  [1] biasnp_0.2.43          RODBC_1.2-5
> > aroma.affymetrix_1.1.0
> >  [4] aroma.apd_0.1.3        R.huge_0.1.7
> > affxparser_1.14.2
> >  [7] aroma.core_1.1.2       aroma.light_1.13.2
> > matrixStats_0.1.4
> > [10] R.rsp_0.4.0            R.httpd_0.1.1
> > R.filesets_0.5.2
> > [13] digest_0.3.1           R.cache_0.1.7
> > R.utils_1.1.6
> > [16] R.oo_1.4.6             R.methodsS3_1.0.3
>
> Version of DNAcopy?
>
> So, I am quite sure the aroma.* code is working on much older
> versions, but I cannot speak for external packages.
>
> However, in this case aroma.* do assume that DNAcopy has a namespace
> and that was most likely introduced in v1.18.0.  You might have to
> upgrade DNAcopy to v1.18.0 or newer.  I really recommend v 1.19.2
> because it is now much much faster (really!).  For people who can use
> our install scripts, this can be done by fooling bioc to think you
> have a newer R version as follows:
>
> source("http://www.braju.com/R/hbLite.R";);
> biocLite("DNAcopy", rver="2.10.0");
>
> If you cannot use this, you probably have to download the file
> *.tar.gz (*.zip) file and install it manually.
>
> Let us know if this solves the problem.
>
> /Henrik
>
>
>
> > Best regards
> > Christian
>
> > P.S.: I know that this is OSS and everybody is very busy, but
> > nevertheless I would like to mention, that I am not the only person
> > who is not happy with the policy of many open source software systems
> > to supply a new version every 6 months and no longer support older
> > versions. This is no problem when running the software on one's
> > personal PC but often causes problems in company settings where you
> > need to upgrade a lot of systems and the philosophy is not to change a
> > runn

[aroma.affymetrix] Re: Problem with paired copynumber analysis for SNP6.0

2009-06-29 Thread cstratowa

Dear Henrik,

Meanwhile I have downloaded the latest versions from your repository
and installed them on R-2.8.1. Please note that due to our settings it
is NOT POSSIBLE for me to use hbInstall("aroma.affymetrix")!!!

Sadly, now CBS does no longer work at all. I get the message:

Loading required package: DNAcopy
Loading required package: Cairo
Error: package 'DNAcopy' does not have a name space
In addition: Warning messages:
1: In log(theta/thetaRef, base = 2) : NaNs produced
2: In log(theta * thetaRef, base = 2) : NaNs produced
Execution halted

Thus it seems that the new versions do not work with R-2.8.1, so I
need to downgrade to the old versions.

Here is the sessionInfo() for the new versions:
> sessionInfo()
R version 2.8.1 (2008-12-22)
x86_64-unknown-linux-gnu

locale:
C

attached base packages:
[1] tcltk stats graphics  grDevices utils datasets
methods
[8] base

other attached packages:
 [1] biasnp_0.2.43  RODBC_1.2-5
aroma.affymetrix_1.1.0
 [4] aroma.apd_0.1.3R.huge_0.1.7
affxparser_1.14.2
 [7] aroma.core_1.1.2   aroma.light_1.13.2
matrixStats_0.1.4
[10] R.rsp_0.4.0R.httpd_0.1.1
R.filesets_0.5.2
[13] digest_0.3.1   R.cache_0.1.7
R.utils_1.1.6
[16] R.oo_1.4.6 R.methodsS3_1.0.3

Best regards
Christian

P.S.: I know that this is OSS and everybody is very busy, but
nevertheless I would like to mention, that I am not the only person
who is not happy with the policy of many open source software systems
to supply a new version every 6 months and no longer support older
versions. This is no problem when running the software on one's
personal PC but often causes problems in company settings where you
need to upgrade a lot of systems and the philosophy is not to change a
running system. In our case this means to update the servers and the
Linux clusters, too, since I have created a package which allows me to
run CBS and GLAD on the cluster nodes.


On Jun 26, 7:35 am, Henrik Bengtsson  wrote:
> 2009/6/22 cstratowa :
>
>
>
> > Does this update also work with R-2.8.1?
>
> I avoid increasing the dependency in the DESCRIPTION files unless I
> know they have to.  Currently I think it is roughly R 2.6.x that is
> specified.   It does not mean it is guaranteed to work.  However, I
> only support and run redundancy tests on the lastest stable version of
> R (now R 2.9.x).
>
> Did you try?
>
> /Henrik
>
>
>
> > Best regards
> > Christian
>
> > On Jun 18, 5:39 pm, Henrik Bengtsson  wrote:
> >> Hi,
>
> >> update aroma.affymetrix et al. and retry.
>
> >> /Henrik
>
> >> On Thu, Jun 18, 2009 at 5:01 AM,
>
> >> cstratowa wrote:
>
> >> > Dear all,
>
> >> > Below I show you the relevant code fragments of my function "computeCN
> >> > ()" to compute copy-number regions using model CBS. This function can
> >> > compute copy-number regions for the usual case (reference=NULL), for
> >> > pairs of samples (reference="Pairs") or using reference samples
> >> > (reference="Reference").
>
> >> > I am using this function for a long time for 100K and 500K mapping
> >> > arrays w/o any problems. However for the SNP6.0 array this function
> >> > works only for the usual case (reference=NULL), but not for the Pairs/
> >> > Reference cases.
>
> >> > For SNP6.0 array and Pairs/Reference cases I get always the following
> >> > error message:
>
> >> > Error in this$files[[1]] : subscript out of bounds
> >> > Calls: computeCN ... clearCache.AffymetrixCelSet -> getCdf ->
> >> > getCdf.AffymetrixCelSet -> getCdf
> >> > Execution halted
>
> >> > Do you have any idea why I get these errors?
> >> > Do I have to change the code for Pairs/Reference cases, and how?
>
> >> > Here is the relevant code of function "computeCN()":
>
> >> > "computeCN" <- function(object, pheno, arrays=NULL, reference=NULL,
> >> > chromosomes=NULL) {
> >> >   ## get normalized data
> >> >   cesList <- effectSet(object);
>
> >> >   ## get optional reference set
> >> >   if (is.null(reference)) {
> >> >      refList <- NULL;
> >> >   } else if (is.character(reference)) {
> >> >      ## extract reference samples (and data samples if paired)
> >> >      refList <- list();
> >> >      for (chiptype in names(cesList)) {
> >> >         ces      <- cesList[[chiptype]];
> >> >         cesnames <- getNames(ces);
>
> >> >         ## get row numbers containing r

[aroma.affymetrix] Re: Problem with paired copynumber analysis for SNP6.0

2009-06-21 Thread cstratowa

Does this update also work with R-2.8.1?

Best regards
Christian

On Jun 18, 5:39 pm, Henrik Bengtsson  wrote:
> Hi,
>
> update aroma.affymetrix et al. and retry.
>
> /Henrik
>
> On Thu, Jun 18, 2009 at 5:01 AM,
>
> cstratowa wrote:
>
> > Dear all,
>
> > Below I show you the relevant code fragments of my function "computeCN
> > ()" to compute copy-number regions using model CBS. This function can
> > compute copy-number regions for the usual case (reference=NULL), for
> > pairs of samples (reference="Pairs") or using reference samples
> > (reference="Reference").
>
> > I am using this function for a long time for 100K and 500K mapping
> > arrays w/o any problems. However for the SNP6.0 array this function
> > works only for the usual case (reference=NULL), but not for the Pairs/
> > Reference cases.
>
> > For SNP6.0 array and Pairs/Reference cases I get always the following
> > error message:
>
> > Error in this$files[[1]] : subscript out of bounds
> > Calls: computeCN ... clearCache.AffymetrixCelSet -> getCdf ->
> > getCdf.AffymetrixCelSet -> getCdf
> > Execution halted
>
> > Do you have any idea why I get these errors?
> > Do I have to change the code for Pairs/Reference cases, and how?
>
> > Here is the relevant code of function "computeCN()":
>
> > "computeCN" <- function(object, pheno, arrays=NULL, reference=NULL,
> > chromosomes=NULL) {
> >   ## get normalized data
> >   cesList <- effectSet(object);
>
> >   ## get optional reference set
> >   if (is.null(reference)) {
> >      refList <- NULL;
> >   } else if (is.character(reference)) {
> >      ## extract reference samples (and data samples if paired)
> >      refList <- list();
> >      for (chiptype in names(cesList)) {
> >         ces      <- cesList[[chiptype]];
> >         cesnames <- getNames(ces);
>
> >         ## get row numbers containing reference and data samples
> >         refcol <- which(pheno[,reference] == 1);
> >         datcol <- which(pheno[,reference] == 0);
>
> >         if (reference == "Pairs") {
> >            cesList[[chiptype]] <- extract(ces, datcol);
> >            refList[[chiptype]] <- extract(ces, refcol);
> >         } else if (reference == "Reference") {
> >            cesRef <- extract(ces, refcol);
> >            ceRef  <- getAverageFile(cesRef);
>
> >            ## convert single ceRef file into a set of identical files
> > of same size/length as ces
> >            numarr <- nbrOfArrays(ces);
> >            cesRef <- rep(list(ceRef), numarr);  # becomes a list of
> > CnChipEffectFile:s
> >            cesRef <- newInstance(ces, cesRef, mergeStrands=ces
> > $mergeStrands, combineAlleles=ces$combineAlleles);
> >            refList[[chiptype]] <- cesRef;
> >         }#if
> >      }#for
> >   }#if
>
> >   ## select model
> >   model <- CbsModel(cesList, refList);
>
> >   ## get index of arrays
> >   if (is.null(arrays)) {
> >      arrays <- indexOfArrays(model);
> >   } else {
> >      arrays <- which(getNames(model)==arrays);
> >   }#if
>
> >   ## fit model with Chromosome Explorer
> >   ce  <- ChromosomeExplorer(model, tags=tags);
> >   setParallelSafe(ce, status=TRUE); #necessary for running on cluster
> >   tmp <- process(ce, arrays=arrays, chromosomes=chromosomes,
> > force=force, verbose=verbose);
>
> >   ## get fitted CN regions from model
> >   cnrs <- getRegions(model, arrays=arrays, chromosomes=chromosomes,
> > organism="Human", hgVersion="hg18");
>
> >   return(object);
> > }#computeCN
>
> > Here is my sessionInfo:
>
> >> sessionInfo()
> > R version 2.8.1 (2008-12-22)
> > x86_64-unknown-linux-gnu
>
> > locale:
> > C
>
> > attached base packages:
> > [1] tcltk     stats     graphics  grDevices utils     datasets
> > methods
> > [8] base
>
> > other attached packages:
> >  [1] biasnp_0.2.41          RODBC_1.2-5
> > aroma.affymetrix_1.0.1
> >  [4] aroma.apd_0.1.3        R.huge_0.1.6
> > affxparser_1.14.2
> >  [7] aroma.core_1.0.1       aroma.light_1.11.1
> > digest_0.3.1
> > [10] matrixStats_0.1.4      R.rsp_0.4.0
> > R.httpd_0.1.1
> > [13] R.cache_0.1.7          R.utils_1.1.3
> > R.oo_1.4.6
> > [16] R.methodsS3_1.0.3
>
> > Thank you in advance for your help.
>
> > Best regards
> > Christian
--~--~-~--~~~---~--~~
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
"aroma.affymetrix" group.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe from this group, send email to 
aroma-affymetrix-unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/aroma-affymetrix?hl=en
-~--~~~~--~~--~--~---



[aroma.affymetrix] Problem with paired copynumber analysis for SNP6.0

2009-06-18 Thread cstratowa

Dear all,

Below I show you the relevant code fragments of my function "computeCN
()" to compute copy-number regions using model CBS. This function can
compute copy-number regions for the usual case (reference=NULL), for
pairs of samples (reference="Pairs") or using reference samples
(reference="Reference").

I am using this function for a long time for 100K and 500K mapping
arrays w/o any problems. However for the SNP6.0 array this function
works only for the usual case (reference=NULL), but not for the Pairs/
Reference cases.

For SNP6.0 array and Pairs/Reference cases I get always the following
error message:

Error in this$files[[1]] : subscript out of bounds
Calls: computeCN ... clearCache.AffymetrixCelSet -> getCdf ->
getCdf.AffymetrixCelSet -> getCdf
Execution halted

Do you have any idea why I get these errors?
Do I have to change the code for Pairs/Reference cases, and how?

Here is the relevant code of function "computeCN()":

"computeCN" <- function(object, pheno, arrays=NULL, reference=NULL,
chromosomes=NULL) {
   ## get normalized data
   cesList <- effectSet(object);

   ## get optional reference set
   if (is.null(reference)) {
  refList <- NULL;
   } else if (is.character(reference)) {
  ## extract reference samples (and data samples if paired)
  refList <- list();
  for (chiptype in names(cesList)) {
 ces  <- cesList[[chiptype]];
 cesnames <- getNames(ces);

 ## get row numbers containing reference and data samples
 refcol <- which(pheno[,reference] == 1);
 datcol <- which(pheno[,reference] == 0);

 if (reference == "Pairs") {
cesList[[chiptype]] <- extract(ces, datcol);
refList[[chiptype]] <- extract(ces, refcol);
 } else if (reference == "Reference") {
cesRef <- extract(ces, refcol);
ceRef  <- getAverageFile(cesRef);

## convert single ceRef file into a set of identical files
of same size/length as ces
numarr <- nbrOfArrays(ces);
cesRef <- rep(list(ceRef), numarr);  # becomes a list of
CnChipEffectFile:s
cesRef <- newInstance(ces, cesRef, mergeStrands=ces
$mergeStrands, combineAlleles=ces$combineAlleles);
refList[[chiptype]] <- cesRef;
 }#if
  }#for
   }#if

   ## select model
   model <- CbsModel(cesList, refList);

   ## get index of arrays
   if (is.null(arrays)) {
  arrays <- indexOfArrays(model);
   } else {
  arrays <- which(getNames(model)==arrays);
   }#if

   ## fit model with Chromosome Explorer
   ce  <- ChromosomeExplorer(model, tags=tags);
   setParallelSafe(ce, status=TRUE); #necessary for running on cluster
   tmp <- process(ce, arrays=arrays, chromosomes=chromosomes,
force=force, verbose=verbose);

   ## get fitted CN regions from model
   cnrs <- getRegions(model, arrays=arrays, chromosomes=chromosomes,
organism="Human", hgVersion="hg18");

   return(object);
}#computeCN

Here is my sessionInfo:

> sessionInfo()
R version 2.8.1 (2008-12-22)
x86_64-unknown-linux-gnu

locale:
C

attached base packages:
[1] tcltk stats graphics  grDevices utils datasets
methods
[8] base

other attached packages:
 [1] biasnp_0.2.41  RODBC_1.2-5
aroma.affymetrix_1.0.1
 [4] aroma.apd_0.1.3R.huge_0.1.6
affxparser_1.14.2
 [7] aroma.core_1.0.1   aroma.light_1.11.1
digest_0.3.1
[10] matrixStats_0.1.4  R.rsp_0.4.0
R.httpd_0.1.1
[13] R.cache_0.1.7  R.utils_1.1.3
R.oo_1.4.6
[16] R.methodsS3_1.0.3

Thank you in advance for your help.

Best regards
Christian

--~--~-~--~~~---~--~~
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
"aroma.affymetrix" group.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe from this group, send email to 
aroma-affymetrix-unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/aroma-affymetrix?hl=en
-~--~~~~--~~--~--~---



[aroma.affymetrix] Re: Interpretation of GLAD results for SNP6.0

2009-04-14 Thread cstratowa

Dear Nicolas,

Thank you for your comments, I will delete all segments with count=1.

Since currently aroma.affymetrix supports both GLAD and CBS, I am
routinely using both algorithms for CN analysis.

Ultrasome looks interesting, but first I would need aroma.affymetrix
anyhow for normalization of the raw copynumbers, and second it is free
for academic use only.

Currently I am more worried about the normalization step, I would wish
that aroma.affymetrix could include the "reference alignment
procedure" of S.Pounds et al.

Best regards
Christian


On Apr 8, 12:11 am, stran...@broad.mit.edu wrote:
> Hi Christian,
>
> On Mar 10, 10:53 am, cstratowa 
> ingelheim.com> wrote:
> > Dear all,
>
> > I am currently determining copynumber regions for tumor cell lines
> > using GLAD as model for the SNP6.0 array. (This takes about 13 hrs per
> > array and per cluster node)
>
> Despite GLAD being a good segmentation algorithm, you may want to
> consider alternatives that are less computationally intensive and give
> equivalent results. CBS (package DNAcopy in bioconductor) runs in 1.5h
> for a SNP6.0 array. A fairly new but very promising program is
> Ultrasome (http://www.broad.mit.edu/ultrasome). It segments a SNP6.0
> array in 2 seconds (plus 1 minute to read the array). It's a
> standalone program.
>
> > My problem is the interpretation of the results.
> > Here are some examples:
>
> > 1. example:
> > "chromosome"  "start"       "stop"        "mean"        "count"       "call"
> > 1         3674045         10539859       0.154251128612596       4517   
> > "neutral"
> > 1         10544961        10544961       1.57871123504559        1      
> > "gain"
> > 1         10546623        10546623       -2.21390332082091       1      
> > "loss"
> > 1         10548849        21751915       0.150400582947561       6872   
> > "neutral"
>
> > This example has two regions where start=stop position (10544961,
> > 10546623).
> > How do you interpret this result?
> > How should regions with count=1 be interpreted?
> > Are these two regions considered to be outliers?
>
> SNP6.0 arrays are very noisy. You should be very careful when you
> consider any segment that has a size < 10 probes, and in any case, you
> should not believe any segment that has a size of one probe. In your
> case, I would just delete these segments. They are likely to be marked
> as "outliers" by GLAD.
>
> > 2. example:
> > "chromosome"  "start"       "stop"        "mean"        "count"       "call"
> > 6         18253390        18253886       -1.33576824261601       2      
> > "loss"
> > 6         18254264        29945167       -0.0986937482773974     7921   
> > "neutral"
> > 6         29945167        29945167       1.47894372023   1      "gain"
> > 6         29945294        30015940       0.225347608366043       42     
> > "gain"
>
> > Here you see two regions with different start positions but identical
> > stop positions (=29945167), where one region shows a gain but the
> > other region is neutral.
> > How do you interpret this result?
>
> Same problem, count is too small.
>
>
>
> > 3. example:
> > "chromosome"  "start"       "stop"        "mean"        "count"       "call"
> > 8         15672606        15672655       -2.45103201175721       2      
> > "loss"
> > 8         15672695        15672695       -4.68547262667068       1      
> > "loss"
> > 8         15672695        15678118       -2.65573944523429       10     
> > "loss"
> > 8         15678687        15678880       -2.01991964356680       3      
> > "loss"
>
> > In this case two regions have identical start positions (=15672695).
> > How can this be the case?
> > Could this be a bug, since it is assumed that two regions should not
> > overlap?
>
> > 4. example:
> > "chromosome"  "start"       "stop"        "mean"        "count"       "call"
> > 17        23813398        32995436       -0.272936124639797      5529   
> > "neutral"
> > 17        32999012        32999012       -2.26635873379834       1      
> > "loss"
> > 17        32999012        32999012       -8.28862000630996       1      
> > "loss"
> > 17        33000940        36676630     

[aroma.affymetrix] Re: Problem understanding FIRMA

2009-03-23 Thread cstratowa

Dear Mark

I have sent you the PNG file to your email.

Best regards
Christian

On Mar 23, 9:44 am, "Mark Robinson"  wrote:
> Hi Christian.
>
> Thanks for this.  That's great!
>
> Can you send me a copy of your PNG file to my email?
>
> I will add this 
> to:http://groups.google.com/group/aroma-affymetrix/web/redundancy-tests
>
> Cheers,
> Mark
>
>
>
> > Dear Mark,
>
> > I have now created the following function, which people can use to
> > compute firma scores:
>
> > # - - - - - - - - - - - - - - - - - - -
> > firma <- function(m, method="rlm") {
> > ## function to compute FIRMA scores (adapted from aroma.affymetrix)
> > ## m:      dataframe containing normalized probe intensities (linear
> > scale)
> > ##         with 1.column containing the transcript_cluster_id
> > ##         and 2. column containing the corresponding probeset_ids
> > ## method: fitting model, one of "rlm" or "mdp"
>
> >    ## convert expression levels to log2
> >    y <- as.matrix(log2(m[,3:ncol(m)]));
>
> >    ## dimensions
> >    K <- nrow(y);   # number of probes
> >    I <- ncol(y);   # number of arrays
>
> >    ## fit log-additive model
> >    if (method == "rlm") {
> >       fit <- .Call("R_rlm_rma_default_model", y, 0, 1.345,
> > PACKAGE="preprocessCore");
> >    } else if (method == "mdp") {
> >       mp <- medpolish(y, trace.iter=FALSE);
> >       fit <- list(Estimates = c(mp$overall + mp$col,  mp$row),
> >                   StdErrors = rep(0, length(c(mp$row, mp$col;
> >    } else {
> >       stop("method must be ");
> >    }#if
>
> >    ## extract parameters
> >    est <- fit$Estimates;
> >    se  <- fit$StdErrors;
>
> >    ## chip effects
> >    beta <- est[1:I];
>
> >    ## probe affinities
> >    if (K == 1) {
> >       ## if only one probe must have affinity=1 since sum constraint
> >       alpha <- 0;
> >    } else {
> >       ## affinities sum to zero (on log scale)
> >       alpha <- est[(I+1):length(est)];
> >       alpha[length(alpha)] <- -sum(alpha[1:(length(alpha)-1)]);
> >    }#if
>
> >     ## estimates on the intensity scale
> >     theta <- 2^beta;
> >     phi   <- 2^alpha;
>
> >    ## calculate residuals
> >    phi   <- matrix(phi,   nrow=K, ncol=I, byrow=FALSE);
> >    theta <- matrix(theta, nrow=K, ncol=I, byrow=TRUE);
> >    yhat  <- phi *theta;
> >    eps   <- (2^y)/yhat;  # rma uses y/yhat
>
> >    ## estimate of standard error
> >    u.mad <- mad(unlist(log2(eps)), center=0);
>
> >    ## get probeset_ids
> >    id <- unique(m[,2]);
>
> >    ## firma scores
> >    fs <- sapply(id,
> > function(x){apply(log2(eps[which(m[,2]==x),,drop=F])/u.mad, 2,
> > median)});
> >    fs <- t(fs);
> >    rownames(fs) <- id;
>
> >    return(fs);
> > }#firma
> > # - - - - - - - - - - - - - - - - - - -
>
> > Now I can apply this function to the normalized intensities of gene
> > UNR:
>
> >  > library(preprocessCore)
> >  > score.mdp <- firma(unr, "mdp")
> >  > score.rml <- firma(unr, "rlm")
>
> > Then I can plot the firma scores for robust fitting ("rlm") compared
> > to
> > median-polish ("mdp"):
>
> >  > png(file="firmaplots6.png", width=540, height=800)
> >  > oldpar <- par(pty="m", mfcol=c(3, 2), mar=c(5,5,4,2))
> >  > for (i in 1:6) {
> >  >    plot(score.rml[,i], type="l", ylim=c(-3,5),
> > main=colnames(score.rml)[i], xlab="Probeset", ylab="Score")
> >  >    abline(h=0, lty=3)
> >  >    lines(score.mdp[,i], lty=2)
> >  > }
> >  > par(oldpar)
> >  > dev.off()
>
> > The attached plots compare "rlm" (solid lines) and "mdp" (dashed
> > lines)
> > and should reflect the firma scores for heart and muscle shown in
> > Figure
> > 3B of Purdom et al.
> > As you can see the difference between"rlm" and "mdp" is pretty small.
>
> > Best regards
> > Christian
>
> > P.S.: It seems that I cannot attach the png-file.
>
> > On Mar 18, 8:28 am, cstratowa  > ingelheim.com> wrote:
> >> Dear Mark,
>
> >> Probably, there is not much difference, maybe I will check it.
> >> I will let you know but it will take some time.
>

[aroma.affymetrix] Re: Problem understanding FIRMA

2009-03-23 Thread cstratowa

Dear Mark,

I have now created the following function, which people can use to
compute firma scores:

# - - - - - - - - - - - - - - - - - - -
firma <- function(m, method="rlm") {
## function to compute FIRMA scores (adapted from aroma.affymetrix)
## m:  dataframe containing normalized probe intensities (linear
scale)
## with 1.column containing the transcript_cluster_id
## and 2. column containing the corresponding probeset_ids
## method: fitting model, one of "rlm" or "mdp"

   ## convert expression levels to log2
   y <- as.matrix(log2(m[,3:ncol(m)]));

   ## dimensions
   K <- nrow(y);   # number of probes
   I <- ncol(y);   # number of arrays

   ## fit log-additive model
   if (method == "rlm") {
  fit <- .Call("R_rlm_rma_default_model", y, 0, 1.345,
PACKAGE="preprocessCore");
   } else if (method == "mdp") {
  mp <- medpolish(y, trace.iter=FALSE);
  fit <- list(Estimates = c(mp$overall + mp$col,  mp$row),
  StdErrors = rep(0, length(c(mp$row, mp$col;
   } else {
  stop("method must be ");
   }#if

   ## extract parameters
   est <- fit$Estimates;
   se  <- fit$StdErrors;

   ## chip effects
   beta <- est[1:I];

   ## probe affinities
   if (K == 1) {
  ## if only one probe must have affinity=1 since sum constraint
  alpha <- 0;
   } else {
  ## affinities sum to zero (on log scale)
  alpha <- est[(I+1):length(est)];
  alpha[length(alpha)] <- -sum(alpha[1:(length(alpha)-1)]);
   }#if

## estimates on the intensity scale
theta <- 2^beta;
phi   <- 2^alpha;

   ## calculate residuals
   phi   <- matrix(phi,   nrow=K, ncol=I, byrow=FALSE);
   theta <- matrix(theta, nrow=K, ncol=I, byrow=TRUE);
   yhat  <- phi *theta;
   eps   <- (2^y)/yhat;  # rma uses y/yhat

   ## estimate of standard error
   u.mad <- mad(unlist(log2(eps)), center=0);

   ## get probeset_ids
   id <- unique(m[,2]);

   ## firma scores
   fs <- sapply(id,
function(x){apply(log2(eps[which(m[,2]==x),,drop=F])/u.mad, 2,
median)});
   fs <- t(fs);
   rownames(fs) <- id;

   return(fs);
}#firma
# - - - - - - - - - - - - - - - - - - -

Now I can apply this function to the normalized intensities of gene
UNR:

 > library(preprocessCore)
 > score.mdp <- firma(unr, "mdp")
 > score.rml <- firma(unr, "rlm")

Then I can plot the firma scores for robust fitting ("rlm") compared
to
median-polish ("mdp"):

 > png(file="firmaplots6.png", width=540, height=800)
 > oldpar <- par(pty="m", mfcol=c(3, 2), mar=c(5,5,4,2))
 > for (i in 1:6) {
 >plot(score.rml[,i], type="l", ylim=c(-3,5),
main=colnames(score.rml)[i], xlab="Probeset", ylab="Score")
 >abline(h=0, lty=3)
 >lines(score.mdp[,i], lty=2)
 > }
 > par(oldpar)
 > dev.off()

The attached plots compare "rlm" (solid lines) and "mdp" (dashed
lines)
and should reflect the firma scores for heart and muscle shown in
Figure
3B of Purdom et al.
As you can see the difference between"rlm" and "mdp" is pretty small.

Best regards
Christian

P.S.: It seems that I cannot attach the png-file.


On Mar 18, 8:28 am, cstratowa  wrote:
> Dear Mark,
>
> Probably, there is not much difference, maybe I will check it.
> I will let you know but it will take some time.
>
> Best regards
> Christian
>
> On Mar 17, 11:46 am, Mark Robinson  wrote:
>
> > Hi Christian.
>
> > > However, in this respect I have also the following question:
> > > How does using "median polish" compare to using
> > > "R_rlm_rma_default_model"?
> > > Are the final scores still of some use if you use medpol?
>
> > Short answer is I haven't investigated this too thoroughly.
>
> > But, my guess is that it wouldn't be "too" different.  That prediction  
> > is based on the fact that the chip effects are in the same ballpark,  
> > as you can see from the Aroma_vs_Affy (Aroma=R_rlm_rma, Affy=medpol)  
> > plot in the following thread:
>
> >http://groups.google.com/group/aroma-affymetrix/browse_thread/thread/...
>
> > But, I'd be interested to hear more details if you do look into it more.
>
> > Cheers,
> > Mark
>
> > > Best regards
> > > Christian
>
> > > On Mar 16, 9:13 am, Mark Robinson  wrote:
> > >> Hi Christian.
>
> > >>  From what I can tell looking at your code (rather quickly, i must
> > >> admit), there will be 2 differences between aroma.affymetrix and what
> > >> you have:
>
> > >> 1. We use the 'preprocessCore' codebase for 

[aroma.affymetrix] Re: Problem understanding FIRMA

2009-03-18 Thread cstratowa

Dear Mark,

Probably, there is not much difference, maybe I will check it.
I will let you know but it will take some time.

Best regards
Christian


On Mar 17, 11:46 am, Mark Robinson  wrote:
> Hi Christian.
>
>
>
> > However, in this respect I have also the following question:
> > How does using "median polish" compare to using
> > "R_rlm_rma_default_model"?
> > Are the final scores still of some use if you use medpol?
>
> Short answer is I haven't investigated this too thoroughly.
>
> But, my guess is that it wouldn't be "too" different.  That prediction  
> is based on the fact that the chip effects are in the same ballpark,  
> as you can see from the Aroma_vs_Affy (Aroma=R_rlm_rma, Affy=medpol)  
> plot in the following thread:
>
> http://groups.google.com/group/aroma-affymetrix/browse_thread/thread/...
>
> But, I'd be interested to hear more details if you do look into it more.
>
> Cheers,
> Mark
>
>
>
> > Best regards
> > Christian
>
> > On Mar 16, 9:13 am, Mark Robinson  wrote:
> >> Hi Christian.
>
> >>  From what I can tell looking at your code (rather quickly, i must
> >> admit), there will be 2 differences between aroma.affymetrix and what
> >> you have:
>
> >> 1. We use the 'preprocessCore' codebase for the robust fitting of the
> >> linear model (... but maybe you are just using median polish as an
> >> illustration).  For example, you might try:
>
> >> library(preprocessCore)
> >> f <- .Call("R_rlm_rma_default_model", log2(yTr), 0,
> >> 1.345,PACKAGE="preprocessCore")
> >> [... and piece together the alpha, beta, etc ...]
>
> >> 2. The "estimate of standard error" is calculated genewise, over
> >> residuals from all probes/samples (i.e. u.mad should be a scalar  
> >> not a
> >> vector).
>
> >> Hope that helps.
> >> Mark
>
> >> On 16/03/2009, at 6:32 PM, cstratowa wrote:
>
> >>> Dear all,
>
> >>> After reading the FIRMA paper I would like to understand the
> >>> implementation, but this is not easy since the source code is hard  
> >>> to
> >>> read. Nevertheless, I tried and would like to know if this is  
> >>> correct.
>
> >>> According to the page on exon array analysis you do the following:
>
> >>> I, fit a summary of the entire transcript
> >>>> plmTr <- ExonRmaPlm(csN, mergeGroups=TRUE)
> >>>> fit(plmTr, verbose=verbose)
>
> >>> II, fit the FIRMA model for each exon
> >>>> firma <- FirmaModel(plmTr)
> >>>> fit(firma, verbose=verbose)
>
> >>> However, I would like to understand the underlying source code.
>
> >>> For this example let us assume that we have quantile-normalized
> >>> intensities yTr for a transcript  containing two exons:
> >>>> yTr
> >>>     HeartA   HeartB    HeartC  MuscleA  MuscleB  MuscleC
> >>> 1   5.74954   18.0296    2.50436   15.5857   26.1744   31.0075
> >>> 2   9.59819   23.0093   22.01120   70.1742   32.8408  102.0080
> >>> 3 114.50800   87.1742   70.34080  312.3410  266.1740  601.3410
> >>> 4  66.34080   52.0075   67.34080  184.1740  266.1740  147.0080
> >>> 5 210.17400  142.0080  173.34100  514.5080  659.1740  509.6740
> >>> 6 104.00800   84.3408   70.34080  333.5080  324.1740  231.0080
> >>> 7 194.00800  124.5080  234.00800  443.6740  767.5080  716.8410
> >>> 8 319.34100  282.6740  283.50800  656.0080  807.6740  954.6740
>
> >>> Here rows 1:4 code for exon 1 and rows 5:8 code for exon 2.
>
> >>> I, fit a summary of the entire transcript
> >>> To simplify issues I will fit the data using median polish:
> >>> # 1. fit median polish
> >>>> mp <- medpolish(log2(yTr))
>
> >>> # 2. data set specific estimates (probe affinities)
> >>>> beta  <- mp$overall+mp$col
> >>>> thetaTr <- 2^beta
>
> >>> # 3. array-specific estimates
> >>>> alpha <- mp$row
> >>>> alpha[length(alpha)] <- -sum(alpha[1:(length(alpha)-1)])
> >>>> phiTr <- 2^alpha
>
> >>> II, fit FIRMA model for each exon
> >>> # 1. calculate residuals
> >>>> phi   <- matrix(phiTr, nrow=nrow(yTr), ncol=ncol(yTr))
> >>>> theta <- matrix(thetaTr, nrow=nrow(yTr), ncol=ncol(yTr),
> >>> byrow=TRUE)
> >>>> yhat  &l

[aroma.affymetrix] Re: Problem understanding FIRMA

2009-03-17 Thread cstratowa

Dear Mark,

Thank you for your fast reply. You are right:

ad 2, "estimate of standard error", when I do:
 > u.mad <- mad(unlist(log2(eps)), center=0)
 > u.mad
[1] 0.3797971

I get for the 1.probeset
 > y1 <- log2(eps[1:4,])
 > F1 <- apply(y1/u.mad, 2, median)
 HeartA  HeartB  HeartC MuscleA MuscleB
MuscleC
-0.37628338  0.13465003 -0.63489610  0.06787988 -0.30099279
0.99667714

and for the 2.probeset
 > y2 <- log2(eps[5:8,])
 > F2 <- apply(y2/u.mad, 2, median)
   HeartAHeartBHeartC   MuscleA
MuscleB MuscleC
-9.719282e-02 -7.810612e-01 -3.951703e-01 -4.654384e-01
-2.844947e-16
 -1.108233e+0

ad 1, Yes, for the moment I am using median polish as an illustration.

However, in this respect I have also the following question:
How does using "median polish" compare to using
"R_rlm_rma_default_model"?
Are the final scores still of some use if you use medpol?

Best regards
Christian



On Mar 16, 9:13 am, Mark Robinson  wrote:
> Hi Christian.
>
>  From what I can tell looking at your code (rather quickly, i must  
> admit), there will be 2 differences between aroma.affymetrix and what  
> you have:
>
> 1. We use the 'preprocessCore' codebase for the robust fitting of the  
> linear model (... but maybe you are just using median polish as an  
> illustration).  For example, you might try:
>
> library(preprocessCore)
> f <- .Call("R_rlm_rma_default_model", log2(yTr), 0,  
> 1.345,PACKAGE="preprocessCore")
> [... and piece together the alpha, beta, etc ...]
>
> 2. The "estimate of standard error" is calculated genewise, over  
> residuals from all probes/samples (i.e. u.mad should be a scalar not a  
> vector).
>
> Hope that helps.
> Mark
>
> On 16/03/2009, at 6:32 PM, cstratowa wrote:
>
>
>
>
>
> > Dear all,
>
> > After reading the FIRMA paper I would like to understand the
> > implementation, but this is not easy since the source code is hard to
> > read. Nevertheless, I tried and would like to know if this is correct.
>
> > According to the page on exon array analysis you do the following:
>
> > I, fit a summary of the entire transcript
> >> plmTr <- ExonRmaPlm(csN, mergeGroups=TRUE)
> >> fit(plmTr, verbose=verbose)
>
> > II, fit the FIRMA model for each exon
> >> firma <- FirmaModel(plmTr)
> >> fit(firma, verbose=verbose)
>
> > However, I would like to understand the underlying source code.
>
> > For this example let us assume that we have quantile-normalized
> > intensities yTr for a transcript  containing two exons:
> >> yTr
> >     HeartA   HeartB    HeartC  MuscleA  MuscleB  MuscleC
> > 1   5.74954   18.0296    2.50436   15.5857   26.1744   31.0075
> > 2   9.59819   23.0093   22.01120   70.1742   32.8408  102.0080
> > 3 114.50800   87.1742   70.34080  312.3410  266.1740  601.3410
> > 4  66.34080   52.0075   67.34080  184.1740  266.1740  147.0080
> > 5 210.17400  142.0080  173.34100  514.5080  659.1740  509.6740
> > 6 104.00800   84.3408   70.34080  333.5080  324.1740  231.0080
> > 7 194.00800  124.5080  234.00800  443.6740  767.5080  716.8410
> > 8 319.34100  282.6740  283.50800  656.0080  807.6740  954.6740
>
> > Here rows 1:4 code for exon 1 and rows 5:8 code for exon 2.
>
> > I, fit a summary of the entire transcript
> > To simplify issues I will fit the data using median polish:
> > # 1. fit median polish
> >> mp <- medpolish(log2(yTr))
>
> > # 2. data set specific estimates (probe affinities)
> >> beta  <- mp$overall+mp$col
> >> thetaTr <- 2^beta
>
> > # 3. array-specific estimates
> >> alpha <- mp$row
> >> alpha[length(alpha)] <- -sum(alpha[1:(length(alpha)-1)])
> >> phiTr <- 2^alpha
>
> > II, fit FIRMA model for each exon
> > # 1. calculate residuals
> >> phi   <- matrix(phiTr, nrow=nrow(yTr), ncol=ncol(yTr))
> >> theta <- matrix(thetaTr, nrow=nrow(yTr), ncol=ncol(yTr),
> > byrow=TRUE)
> >> yhat  <- phi *theta
> >> eps   <- yTr/yhat    # rma uses y/yhat
>
> > # 2. estimate of standard error
> >> u.mad <- apply(log2(eps), 2, mad, center=0)
>
> > # 3. compute final score statisitc
> > # for 1. exon
> >> y1 <- log2(eps[1:4,])
> >> F1 <- apply(y1/u.mad, 2, median)
> >> F1
> >     HeartA      HeartB      HeartC     MuscleA     MuscleB
> > MuscleC
> > -0.89938777 -0.03792624 -0.69409936  0.11536565 -0.61385296
> > 1.08709568
>
> > # for 2. exon
> >> y2 <- log2(eps[5:8,])
> >> F2 <

[aroma.affymetrix] Problem understanding FIRMA

2009-03-16 Thread cstratowa

Dear all,

After reading the FIRMA paper I would like to understand the
implementation, but this is not easy since the source code is hard to
read. Nevertheless, I tried and would like to know if this is correct.

According to the page on exon array analysis you do the following:

I, fit a summary of the entire transcript
 > plmTr <- ExonRmaPlm(csN, mergeGroups=TRUE)
 > fit(plmTr, verbose=verbose)

II, fit the FIRMA model for each exon
 > firma <- FirmaModel(plmTr)
 > fit(firma, verbose=verbose)


However, I would like to understand the underlying source code.

For this example let us assume that we have quantile-normalized
intensities yTr for a transcript  containing two exons:
 > yTr
 HeartA   HeartBHeartC  MuscleA  MuscleB  MuscleC
1   5.74954   18.02962.50436   15.5857   26.1744   31.0075
2   9.59819   23.0093   22.01120   70.1742   32.8408  102.0080
3 114.50800   87.1742   70.34080  312.3410  266.1740  601.3410
4  66.34080   52.0075   67.34080  184.1740  266.1740  147.0080
5 210.17400  142.0080  173.34100  514.5080  659.1740  509.6740
6 104.00800   84.3408   70.34080  333.5080  324.1740  231.0080
7 194.00800  124.5080  234.00800  443.6740  767.5080  716.8410
8 319.34100  282.6740  283.50800  656.0080  807.6740  954.6740

Here rows 1:4 code for exon 1 and rows 5:8 code for exon 2.


I, fit a summary of the entire transcript
To simplify issues I will fit the data using median polish:
# 1. fit median polish
 > mp <- medpolish(log2(yTr))

# 2. data set specific estimates (probe affinities)
 > beta  <- mp$overall+mp$col
 > thetaTr <- 2^beta

# 3. array-specific estimates
 > alpha <- mp$row
 > alpha[length(alpha)] <- -sum(alpha[1:(length(alpha)-1)])
 > phiTr <- 2^alpha


II, fit FIRMA model for each exon
# 1. calculate residuals
 > phi   <- matrix(phiTr, nrow=nrow(yTr), ncol=ncol(yTr))
 > theta <- matrix(thetaTr, nrow=nrow(yTr), ncol=ncol(yTr),
byrow=TRUE)
 > yhat  <- phi *theta
 > eps   <- yTr/yhat# rma uses y/yhat

# 2. estimate of standard error
 > u.mad <- apply(log2(eps), 2, mad, center=0)

# 3. compute final score statisitc
# for 1. exon
 > y1 <- log2(eps[1:4,])
 > F1 <- apply(y1/u.mad, 2, median)
 > F1
 HeartA  HeartB  HeartC MuscleA MuscleB
MuscleC
-0.89938777 -0.03792624 -0.69409936  0.11536565 -0.61385296
1.08709568

# for 2. exon
 > y2 <- log2(eps[5:8,])
 > F2 <- apply(y2/u.mad, 2, median)
 > F2
 HeartA  HeartB  HeartC MuscleA MuscleB
MuscleC
-0.02899616 -1.64645153 -0.70048533 -0.39996057  0.02666064
-1.46657055


Now my question is:
Is this calculation of the final score statistic F1 for exon 1 and F2
for exon 2 correct?
Did I miss something?

Best regards
Christian


--~--~-~--~~~---~--~~
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
"aroma.affymetrix" group.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe from this group, send email to 
aroma-affymetrix-unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/aroma-affymetrix?hl=en
-~--~~~~--~~--~--~---



[aroma.affymetrix] Interpretation of GLAD results for SNP6.0

2009-03-10 Thread cstratowa

Dear all,

I am currently determining copynumber regions for tumor cell lines
using GLAD as model for the SNP6.0 array. (This takes about 13 hrs per
array and per cluster node)

My problem is the interpretation of the results.
Here are some examples:

1. example:
"chromosome""start" "stop"  "mean"  "count" "call"
1 3674045 10539859   0.154251128612596   4517   
"neutral"
1 1054496110544961   1.578711235045591  "gain"
1 1054662310546623   -2.21390332082091   1  "loss"
1 1054884921751915   0.150400582947561   6872   
"neutral"

This example has two regions where start=stop position (10544961,
10546623).
How do you interpret this result?
How should regions with count=1 be interpreted?
Are these two regions considered to be outliers?


2. example:
"chromosome""start" "stop"  "mean"  "count" "call"
6 1825339018253886   -1.33576824261601   2  "loss"
6 1825426429945167   -0.0986937482773974 7921   
"neutral"
6 2994516729945167   1.47894372023   1  "gain"
6 2994529430015940   0.225347608366043   42 "gain"

Here you see two regions with different start positions but identical
stop positions (=29945167), where one region shows a gain but the
other region is neutral.
How do you interpret this result?


3. example:
"chromosome""start" "stop"  "mean"  "count" "call"
8 1567260615672655   -2.45103201175721   2  "loss"
8 1567269515672695   -4.68547262667068   1  "loss"
8 1567269515678118   -2.65573944523429   10 "loss"
8 1567868715678880   -2.01991964356680   3  "loss"

In this case two regions have identical start positions (=15672695).
How can this be the case?
Could this be a bug, since it is assumed that two regions should not
overlap?


4. example:
"chromosome""start" "stop"  "mean"  "count" "call"
172381339832995436   -0.272936124639797  5529   
"neutral"
173299901232999012   -2.26635873379834   1  "loss"
173299901232999012   -8.28862000630996   1  "loss"
173300094036676630   -0.295688057544771  1841   
"neutral"

This example seems to be a combination of examples 2 and 3:
Here you have two identical regions with identical positions and
start=stop (=32999012)
How can two regions be identical but show a different mean (-2.266 vs
-8.288)?
Could this be a bug?


In summary my questions are:
How should I interpret the results?
Would it be better to eliminate these regions?

Thank you in advance.
Best regards
Christian

--~--~-~--~~~---~--~~
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
"aroma.affymetrix" group.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe from this group, send email to 
aroma-affymetrix-unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/aroma-affymetrix?hl=en
-~--~~~~--~~--~--~---



[aroma.affymetrix] Re: Broken link to Mapping50K_Xba240,na26,HB20080916.ufl

2009-02-24 Thread cstratowa

Thank you, Christian

On Feb 23, 4:52 pm, Henrik Bengtsson  wrote:
> Fixed. /Henrik
>
> On Mon, Feb 23, 2009 at 6:24 AM, cstratowa
>
>  wrote:
>
> > Dear Henrik
>
> > On your web-page the link to:
> > Mapping50K_Xba240,na26,HB20080916.ufl
> > is broken
>
> > Best regards
> > Christian
--~--~-~--~~~---~--~~
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
"aroma.affymetrix" group.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe from this group, send email to 
aroma-affymetrix-unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/aroma-affymetrix?hl=en
-~--~~~~--~~--~--~---



[aroma.affymetrix] Broken link to Mapping50K_Xba240,na26,HB20080916.ufl

2009-02-23 Thread cstratowa

Dear Henrik

On your web-page the link to:
Mapping50K_Xba240,na26,HB20080916.ufl
is broken

Best regards
Christian
--~--~-~--~~~---~--~~
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
"aroma.affymetrix" group.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe from this group, send email to 
aroma-affymetrix-unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/aroma-affymetrix?hl=en
-~--~~~~--~~--~--~---



[aroma.affymetrix] Re: ufl and ugp files for na27?

2009-01-15 Thread cstratowa
p)...done
 Reading (unitName, fragmentLength+) from file...done


2. Running "importFrom(ufl, csv, verbose=-50)" for StyI has the
following partial output:

  Reading AffymetrixNetAffxCsvFile...done
  Extracting fragment lengths from ([enzyme], lengths, start, stop)...
   Inferring if enzyme names are specified...
Has enzyme names: TRUE
   Inferring if enzyme names are specified...done
   Identifying number of enzymes...
nbrOfEnzymes
 1  2  3
==> 93436 144865  3
Max number of enzymes: 3
   Identifying number of enzymes...done
   Splitting into subunits and padding with NAs...
   Splitting into subunits and padding with NAs...done
   Extracting enzyme names...
Identified enzymes: NspI, StyI
   Extracting enzyme names...done
   Identifying the location of the fragment lengths...
Offset: 3
   Identifying the location of the fragment lengths...done
   Extracting fragment lengths...
Summary of *all* fragment lengths:
   Min. 1st Qu.  MedianMean 3rd Qu.Max.NA's
 22 481 695 788 9662000  332341
   Extracting fragment lengths...done
   Sorting data by enzyme...
   V1  V2 V3
 Min.   :   22   Min.   : 115   Min.   :NA
 1st Qu.:  672   1st Qu.: 440   1st Qu.:NA
 Median : 1078   Median : 598   Median :NA
 Mean   : 1082   Mean   : 609   Mean   :   NaN
 3rd Qu.: 1499   3rd Qu.: 771   3rd Qu.:NA
 Max.   : 2000   Max.   :1266   Max.   :NA
==> NA's   :93436   NA's   : 607   NA's   :238304
   Sorting data by enzyme...done
int [1:238304, 1] NA 1139 599 1872 1857 1465 382 1651 NA 1833 ...
  Extracting fragment lengths from ([enzyme], lengths, start,
stop)...done
 Reading (unitName, fragmentLength+) from file...done

As you see the higlighted lines for NspI and STyI are:
   V1  V2 V3
==> NA's   : 701.0   NA's   :131564   NA's   :262264
==> NA's   :93436   NA's   : 607   NA's   :238304

This means missing = 701 appears in V1 for NspI, but missing = 607
appears in V2 for StyI.
What does this mean?

Best regards
Christian


On Jan 15, 1:50 am, Henrik Bengtsson  wrote:
> Hi Christian,
>
> I'm quite swamped but I've added links the scripts that I used to
> generate the existing NetAffx 26 (na26) UFL and UGP to each of the
> different SNP & CN chip type pages, e.g.
>
>  http://groups.google.com/group/aroma-affymetrix/web/mapping250k-nsp-m...
>
> More comments below.
>
> On Mon, Jan 12, 2009 at 6:24 AM, cstratowa
>
>  wrote:
>
> > Dear Henrik
>
> > Meanwhile I have created ufl and ugp files for both 100K and 500K
> > arrays but not for GenomeWideSNP_6 aray.
>
> The above scripts will show you how to do it for GenomeWideSNP_6.
> Slightly more complicated since two NetAffx CSV files are involved.
>
>
>
>
>
> > Can you confirm that the following code, which I use for both 100K and
> > 500K arrays, is correct:
>
> > # retrieving annotation files
> > chiptypes <- c("Mapping50K_Hind240", "Mapping50K_Xba240")
> > cdfs <- lapply(chiptypes, FUN=function(x){AffymetrixCdfFile$byChipType
> > (x)})
> > names(cdfs) <- chiptypes
> > print(cdfs)
>
> > # importing data from NetAffx CSV files
> > csvs <- lapply(cdfs, FUN=function(cdf){AffymetrixNetAffxCsvFile
> > $byChipType(getChipType(cdf), tags=".na27")})
> > print(csvs)
>
> > # allocating empty UFL (Unit Fragment Length) files
> > ufls <- lapply(cdfs, FUN=function(cdf){AromaUflFile$allocateFromCdf
> > (cdf, tags="na27,CS20090112")})
> > print(ufls)
>
> > # import SNP data
> > units <- list();
> > for (chipType in names(ufls)) {
> >   ufl <- ufls[[chipType]];
> >   csv <- csvs[[chipType]];
> >   units[[chipType]] <- importFrom(ufl, csv, verbose=-50);
> > }
> > str(units)
>
> > # allocating empty UGP (Unit Genome Position) files
> > ugps <- lapply(cdfs, FUN=function(cdf){AromaUgpFile$allocateFromCdf
> > (cdf, tags="na27,CS20090112")})
> > print(ugps)
>
> > # import SNP data
> > units <- list();
> > for (chipType in names(ugps)) {
> >   ugp <- ugps[[chipType]];
> >   csv <- csvs[[chipType]];
> >   units[[chipType]] <- importFrom(ugp, csv, verbose=-50);
> > }
> > str(units)
>
> This looks alright to me.  You might want to check toward the posted
> NA26 scripts as well, because they are more recent.
>
>
>
>
>
> > Here is the summary for the 100K arrays:
> > # Summary 50K chips
> >> str(units)
> > List of 2
> >  $ Mapping50K_Hind240: int [1:57244] 18632

[aroma.affymetrix] Re: ufl and ugp files for na27?

2009-01-12 Thread cstratowa
's   :775.000   NA's   :  775


Could it be that function summaryOfUnits() does not work as expected?
For ufl it prints "enzyme1-only" instead  of e.g. "NspI-only"
For ugp it prints an error: no applicable method


Is it correct that for na27 there are now 93436 missing SNPs for StyI
compared to only 75 missing SNPs for na24 (as shown on your page about
building UFL files)?


Can I build the ufl and ugp files in the same way for GenomeWideSNP_6?
I assume that I need to repeat everything done for ".na27" with
".cn.na27"?
Will these files be identical to the files supplied by you or do you
use the file "GenomeWideSNP_6_build36_SNPandCN.tab", which only you
have (as decribed in your pages)?


Here is the sessionInfo:
> sessionInfo()
R version 2.7.1 (2008-06-23)
x86_64-unknown-linux-gnu

locale:
C

attached base packages:
[1] stats graphics  grDevices datasets  utils methods
base

other attached packages:
 [1] aroma.affymetrix_0.9.4 aroma.apd_0.1.3
R.huge_0.1.6
 [4] affxparser_1.12.2  aroma.core_0.9.4
sfit_0.1.5
 [7] aroma.light_1.8.1  digest_0.3.1
matrixStats_0.1.3
[10] R.rsp_0.3.4R.cache_0.1.7
R.utils_1.0.4
[13] R.oo_1.4.5 R.methodsS3_1.0.3
>

Best regards
Christian


On Jan 8, 2:39 pm, cstratowa  wrote:
> Dear Henrik
>
> Would it be possible for you to supply the ufl and ugp files for the
> new Affymetrix xxx.na27.annot.csv files for the mapping arrays?
>
> Thank you in advance
> Best regards
> Christian
--~--~-~--~~~---~--~~
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
"aroma.affymetrix" group.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe from this group, send email to 
aroma-affymetrix-unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/aroma-affymetrix?hl=en
-~--~~~~--~~--~--~---



[aroma.affymetrix] ufl and ugp files for na27?

2009-01-08 Thread cstratowa

Dear Henrik

Would it be possible for you to supply the ufl and ugp files for the
new Affymetrix xxx.na27.annot.csv files for the mapping arrays?

Thank you in advance
Best regards
Christian

--~--~-~--~~~---~--~~
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
"aroma.affymetrix" group.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe from this group, send email to 
aroma-affymetrix-unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/aroma-affymetrix?hl=en
-~--~~~~--~~--~--~---