Re: annotationData over network using Windows Shortcuts *.lnk (Was: Re: [aroma.affymetrix] Re: How do you analyze Gene ST Data?)

2008-12-03 Thread pwhiteusa

Sorry I missed the 0 and -1 at the edge of the screen. See below, but
basically when the mode was set to 1 (executable), it returned the
error (-1), the other modes all returned a 0.

On Dec 2, 4:22 pm, "Henrik Bengtsson" <[EMAIL PROTECTED]> wrote:
> Hi,
>
> thanks.  Almost there.  Unfortunately, you left out the one answer I
> was most interest in; see below.
>
>
>
> On Tue, Dec 2, 2008 at 12:34 PM, pwhiteusa <[EMAIL PROTECTED]> wrote:
>
> > Hi Henrik,
>
> > The results are interspersed below:
>
> > On Nov 26, 4:27 pm, "Henrik Bengtsson" <[EMAIL PROTECTED]> wrote:
> >> Hi,
>
> >> so this is a bit confusing.  I've been trying to reproduce it myself
> >> over a local Windows network, but I cannot.  I did however locate a
> >> potential problem where R incorrectly believes it does not have the
> >> permission to read the file, which might be what you are experience.
>
> >> Could you please try the following?  If you get an error, please
> >> report what traceback() gives.
>
> >> library("aroma.affymetrix");
>
> >> # This should be the absolute pathname to the CDF file
> >> pathname <- 
> >> "//RES2K3FS01/RESAffymetrix/ANNOTATION/aromaAffymetrix/annotationData/chipTypes/MoGene-1_0-st-v1/MoGene-1_0-st-v1,r3.cdf"
>
> >> # Assert that the path is correct
> >> stopifnot(isDirectory(dirname(pathname)));
>
> > TRUE
>
> >> # Get some file info
> >> print(as.list(file.info(pathname)));
>
> > $size
> > [1] 18486014
>
> > $isdir
> > [1] FALSE
>
> > $mode
> > [1] "666"
>
> > $mtime
> > [1] "2008-10-23 10:26:09 EDT"
>
> > $ctime
> > [1] "2008-11-20 12:45:36 EST"
>
> > $atime
> > [1] "2008-11-29 00:30:18 EST"
>
> > $exe
> > [1] "no"
>
> >> # Check file permissions

print(file.access(pathname, mode=0));

//RES2K3FS01/RESAffymetrix/ANNOTATION/aromaAffymetrix/annotationData/
chipTypes/MoGene-1_0-st-v1/MoGene-1_0-st-v1,r3.cdf
 
0

print(file.access(pathname, mode=1));
//RES2K3FS01/RESAffymetrix/ANNOTATION/aromaAffymetrix/annotationData/
chipTypes/MoGene-1_0-st-v1/MoGene-1_0-st-v1,r3.cdf
 
-1
print(file.access(pathname, mode=2));
//RES2K3FS01/RESAffymetrix/ANNOTATION/aromaAffymetrix/annotationData/
chipTypes/MoGene-1_0-st-v1/MoGene-1_0-st-v1,r3.cdf
 
0
print(file.access(pathname, mode=4));
//RES2K3FS01/RESAffymetrix/ANNOTATION/aromaAffymetrix/annotationData/
chipTypes/MoGene-1_0-st-v1/MoGene-1_0-st-v1,r3.cdf
 
0


> > All return the same:
>
> > //RES2K3FS01/RESAffymetrix/ANNOTATION/aromaAffymetrix/annotationData/
> > chipTypes/MoGene-1_0-st-v1/MoGene-1_0-st-v1,r3.cdf
>
> HERE: That is the "name" of the result. They also return either 0 or
> -1.  What is the value for each of them?  My hypothesis is that they
> return an incorrect value (-1).  If so, I know why and have a
> workaround.
>
> /Henrik
>
>
>
> >> # Try to read the CDF reader using affxparser
> >> hdr <- readCdfHeader(pathname);
> >> str(hdr);
>
> > List of 12
> >  $ ncols      : int 1050
> >  $ nrows      : int 1050
> >  $ nunits     : int 35512
> >  $ nqcunits   : int 1
> >  $ refseq     : chr ""
> >  $ chiptype   : chr "RES2K3FS01/RESAffymetrix/ANNOTATION/
> > aromaAffymetrix/annotationData/chipTypes/MoGene-1_0-st-v1/MoGene-1_0-
> > st-v1,r3"
> >  $ filename   : chr "RES2K3FS01/RESAffymetrix/ANNOTATION/
> > aromaAffymetrix/annotationData/chipTypes/MoGene-1_0-st-v1/MoGene-1_0-
> > st-v1,r3.cdf"
> >  $ rows       : int 1050
> >  $ cols       : int 1050
> >  $ probesets  : int 35512
> >  $ qcprobesets: int 1
> >  $ reference  : chr ""
>
> >> cdf <- AffymetrixCdfFile(pathname);
> >> print(cdf);
>
> > AffymetrixCdfFile:
> > Path: \\RES2K3FS01/RESAffymetrix/ANNOTATION/aromaAffymetrix/
> > annotationData/chipTypes/MoGene-1_0-st-v1
> > Filename: MoGene-1_0-st-v1,r3.cdf
> > Filesize: 17.63MB
> > Chip type: RES2K3FS01/RESAffymetrix/ANNOTATION/aromaAffymetrix/
> > annotationData/chipTypes/MoGene-1_0-st-v1/MoGene-1_0-st-v1,r3
> > RAM: 0.00MB
> > File format: v4 (binary; XDA)
> > Dimension: 1050x1050
> > Number of cells: 1102500
> > Number of units: 35512
> > Cells per unit: 31.05
> > Number of QC units: 1
>
> >> pathname2 <- AffymetrixCdfFile$findByChipType("MoGene-1_0-st-v1",
> >> tags="r3", verbose=-100);
>
> > Searching for annotation data file(s)...
> >  Name: MoGene-1_0-st-v1
> >  Tags:
> >  Set: chipTypes
> >  Filename pattern: ^MoGene-1_0-st-v1,r3[.](c|C)(d|D)(f|F)$
> >  Paths (from argument):
> >  Paths (default): annotationData
> >  Paths (final): //RES2K3FS01/RESAffymetrix/ANNOTATION/aromaAffymetrix/
> > annotationData/chipTypes/MoGene-1_0-st-v1
> >  Arguments to findFiles:
> >  List of 5
> >  $ pattern  : chr "^MoGene-1_0-st-v1,r3[.](c|C)(d|D)(f|F)$"
> >  $ allFiles : logi FALSE
> >  $ paths    : Named chr "//RES2K3FS01/RESAffymetrix/ANNOTATION/
> > aromaAffymetrix/annotationData/chipTypes/MoGene-1_0-st-v1"
> >   ..- attr(*, "names")= chr "annotationData/chipTypes/MoGene-1_0-st-
> > v1"
> >  $ recursive: logi TRUE
> >  $ firstOnly: logi FALSE
> >  Pathname:
> > Searching for annotation data file(s)...done
> > Searching for annotation data file(s)...
> >  Name:

[aroma.affymetrix] Re: Error with chromosomeExplorer

2008-12-03 Thread mortiz

hi,

ive proccessed chromosomeExplorer for all the chromosomes and i get
many plots but also the error (although it doesnt stop cause of it)...
if i do the traceback i get this

>traceback()
32: unlist(args)
31: as.list(unlist(args))
30: par("lwd")
29: int_abline(a = a, b = b, h = h, v = v, untf = untf, ...)
28: abline(h = log2(level), col = "blue", lty = 2)
27: withCallingHandlers(expr, warning = function(w) invokeRestart
("muffleWarning"))
26: suppressWarnings({
newPlot(this, xlim = xlim, ylim = ylim, flavor = "ce", unit =
unit,
...)
drawXAxisRuler(xrange = c(0, nbrOfBases)/10^unit, ticksBy =
ticksBy)
if (!identical(this$.plotCytoband, FALSE)) {
drawCytoband(this, chromosome = chromosome, unit = unit)
}
cnLevels <- c(1/2, 1, 3/2)
for (level in cnLevels) {
abline(h = log2(level), col = "blue", lty = 2)
}
rawCns <- extractRawCopyNumbers(fit)
verbose && print(verbose, rawCns, level = -50)
n <- nbrOfLoci(rawCns, na.rm = TRUE)
stext(text = sprintf("n=%d", n), side = 4, pos = 0, line = 0,
cex = 0.8)
pointsRawCNs(fit, unit = unit, ...)
cnRegions <- extractCopyNumberRegions(fit)
verbose && print(verbose, cnRegions, level = -50)
drawLevels(cnRegions, lwd = 4, col = "black", xScale =
1/10^unit)
drawExtraAnnotations(fit)
onFitAddGenotypeCalls(fit, callList = callList, arrayName =
arrayName,
resScale = 10^unit * resScale, ...)
})
25: doTryCatch(return(expr), name, parentenv, handler)
24: tryCatchOne(expr, names, parentenv, handlers[[1]])
23: tryCatchList(expr, classes, parentenv, handlers)
22: tryCatch({
verbose && enter(verbose, "Plotting graph")
opar <- par(xaxs = "r")
suppressWarnings({
newPlot(this, xlim = xlim, ylim = ylim, flavor = "ce",
unit = unit, ...)
drawXAxisRuler(xrange = c(0, nbrOfBases)/10^unit, ticksBy
= ticksBy)
if (!identical(this$.plotCytoband, FALSE)) {
drawCytoband(this, chromosome = chromosome, unit =
unit)
}
cnLevels <- c(1/2, 1, 3/2)
for (level in cnLevels) {
abline(h = log2(level), col = "blue", lty = 2)
}
rawCns <- extractRawCopyNumbers(fit)
verbose && print(verbose, rawCns, level = -50)
n <- nbrOfLoci(rawCns, na.rm = TRUE)
stext(text = sprintf("n=%d", n), side = 4, pos = 0, line =
0,
cex = 0.8)
pointsRawCNs(fit, unit = unit, ...)
cnRegions <- extractCopyNumberRegions(fit)
verbose && print(verbose, cnRegions, level = -50)
drawLevels(cnRegions, lwd = 4, col = "black", xScale =
1/10^unit)
drawExtraAnnotations(fit)
onFitAddGenotypeCalls(fit, callList = callList, arrayName
= arrayName,
resScale = 10^unit * resScale, ...)
})
stext(chipType, side = 4, pos = 1, line = 0, cex = 0.8)
verbose && exit(verbose)
}, error = function(ex) {
print(ex)
}, finally = {
par(opar)
if (!imageFormat %in% c("screen", "current"))
dev.off()
})
21: doTryCatch(return(expr), name, parentenv, handler)
20: tryCatchOne(expr, names, parentenv, handlers[[1]])
19: tryCatchList(expr, classes, parentenv, handlers)
18: tryCatch({
arrayFullName <- gsub("^(.*),chr[0-9][0-9].*$", "\\1",
fullname)
arrayName <- gsub("^([^,]*).*$", "\\1", arrayFullName)
nbrOfBases <- genome$nbrOfBases[chromosome]
widthMb <- nbrOfBases/10^unit
if (is.null(xlim)) {
xlim <- c(0, getChromosomeLength(chromosome))/10^unit
}
verbose && enter(verbose, sprintf("Plotting %s for chromosome
%02d [%.2fMB]",
arrayName, chromosome, widthMb))
for (zz in seq(along = zooms)) {
zoom <- zooms[zz]
imgName <- sprintf("%s,chr%02d,x%04d.%s", arrayFullName,
chromosome, zoom, imageFormat)
pathname <- filePath(path, imgName)
pathname <- gsub(" ", "_", pathname)
if (!imageFormat %in% c("screen", "current")) {
if (skip && isFile(pathname)) {
next
}
}
ticksBy <- 10^ceiling(log10(pixelsPerTick/(zoom *
pixelsPerMb)))
width <- round(zoom * widthMb * pixelsPerMb + sum
(xmargin))
verbose && printf(verbose, "Pathname: %s\n", pathname)
verbose && printf(verbose, "Dimensions: %dx%d\n", width,
height)
verbose && printf(verbose, "Ticks by: %f\n", ticksBy)
if (!is.null(plotDev))
plotDev(pathname, width = width, height = height)
tryCatch({
verbose && enter(verbose, "Plotting graph")
opar <- par(xaxs = "r")
suppressWarnings({
 

[aroma.affymetrix] How to reload CBS/GLAD segmentation result files (.xdr) and generate output by ChromosomeExplorer

2008-12-03 Thread Harry

Hi,

I am currently analyzing over 3000 Affy 500k samples on our Linux
cluster. However, the cluster does not support ChromosomeExplorer. So
my plan is to run all the analyzes till I get the segmentation results
on the cluster, which will be located in gladData or cbsData folder,
and then copy all these segmentation result folders to my desktop
computer to generate ChromosomeExplorer output. So my question is how
can I reload the results (.xdr files) back to R without repeating all
the analyzes? and are those .xdr files sufficient to generate
ChromosomeExplorer output or I need more files from previous steps?

Thanks!

Harry

--~--~-~--~~~---~--~~
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
"aroma.affymetrix" group.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/aroma-affymetrix?hl=en
-~--~~~~--~~--~--~---



[aroma.affymetrix] Re: Batch Normalization

2008-12-03 Thread Henrik Bengtsson

Hi.

On Tue, Dec 2, 2008 at 3:46 PM, joshy <[EMAIL PROTECTED]> wrote:
>
> Hi
>
> I have already normalized a set of Affymetrix SNP 6.0 CEL files. The
> following steps are used
>
> AllelicCrosstalkCalibration()
> AvgCnPlm( ,mergeStrands = TRUE, combineAlleles=TRUE)
> getChipEffect()
> FragmentLengthNormalization()
>
>
> IF I need to add few more CEL Files to this study, do I have to
> renormalize all the arrays together again.

First, please post the complete script/code as you use it (it's free)
to avoid any risk of ambiguity.  This will be even more important as
we soon release the CRMA v2 method, which will look very similar but
uses slightly different arguments.

I will try to guide what you need to do, but it makes sense if I first
explain the background a bit.


THEORY:
Theoretically, the allelic-crosstalk calibration and the averaging
probe-level summarization are both real single-array methods, that is,
you can process each array independent of the others and the outcome
of each array remain the same regardless of what other arrays you have
in the data set.

The fragment-length normalization as described in the CRMA paper uses
a reference which is calculated from the pool of all arrays.  This is
the version that is implemented in aroma.affymetrix v0.9.5 and
described online.  However, the normalization can equally well be done
towards a constant (as mentioned in the CRMA paper; and described in
detail in the submitted CRMA v2 paper), which then also makes this
step a truly single-array normalization method.  This approach will be
available in the next release (very soon) [which you then will be able
to specify as FragmentLengthNormalization(ces, target="zero")].

To conclude, in CRMA [v1] all steps but the last are single array by
nature, and in CRMA v2 the complete pipeline will be single array by
nature.

PRACTICE:
Traditionally, most Affymetrix analysis has been based on multi-array
methods.  The framework for probe-level summarization in
aroma.affymetrix is based on this as well, which currently complicated
single-array processing.  I am working on how to modify
aroma.affymetrix such that this can be done easily.  Until then, you
can do then following:

Step 1) For each batch of arrays (say old and new arrays), crosstalk
calibration and probe summarize (but don't do fragment-length
normalization) them as separate data set:

library("aroma.affymetrix");
log <- Arguments$getVerbose(-10, timestamp=TRUE);

cdf <- AffymetrixCdfFile$byChipType("GenomeWideSNP_6", tags="Full");

cesList <- list();
for (dataSet in c("DataSet1", "DataSet2")) {
  csR <- AffymetrixCelSet$byName(dataSet, cdf=cdf);
  acc <- AllelicCrosstalkCalibration(csR);
  csC <- process(acc, verbose=log);
  plm <- AvgCnPlm(csC, mergeStrands=TRUE, combineAlleles=TRUE);
  units <- fit(plm, verbose=log);
  ces <- getChipEffectSet(plm);
  cesList[[dataSet]] <- ces;
}
print(cesList);

Note, if a data sets is already processed it will be quick.  Any
number of data sets can be merged this way.

2) The above will give you a list 'cesList' of ChipEffectSet:s (here
CnChipEffectSet:s).  We can now combine these into one big data set as
follows:

ces <- clone(cesList[[1]]);
cesList <- cesList[-1];
for (kk in seq(along=cesList)) {
  append(ces, cesList[[kk]]);
}
rm(cesList);
setTags(ces, c("*,combined"));
print(ces);

You should see that the combined data set will contain all arrays -
sum up the counts you get from print(cesList).

Whenever you use append(ces, other) on a data set, then fullname of
the combined data set will be that of 'ces', i.e. getFullName(ces).
To avoid mixing it up with the original set cesList[[1]], we add the
tag 'combined' (you can choose whatever you want).  For example, you
will get something like:

print(getFullName(cesList[[1]]))
[1] DataSet1,ACC,ra,-XY,AVG,+300,A+B
print(getFullName(cesList[[2]]))
[1] DataSet2,ACC,ra,-XY,AVG,+300,A+B

print(getFullName(ces))
[1] DataSet1,ACC,ra,-XY,AVG,+300,A+B,combined

Step 3) Now you have a ChipEffectSet 'ces' of all chip-effect sets
combined which you can pass to the fragment-length normalization, e.g.

fln <- FragmentLengthNormalization(ces);
cesN <- process(fln, verbose=log);

As mentioned above, in next release, you will be able to put the
fragment-length normalization within the above for loop (CRMA v2),
saving you even more time/hassle.  Further in the future
aroma.affymetrix will be clever enough so you won't have to worry
about any of the above (except understanding the difference between
single- and multi-array models/methods).

Hope this helps.

Henrik

REFERENCES:

[CRMA paper] H. Bengtsson; R. Irizarry; B. Carvalho; T. Speed,
Estimation and assessment of raw copy numbers at the single locus
level, Bioinformatics, 2008. [pmid: 18204055] [doi:
10.1093/bioinformatics/btn016]


http://groups.google.com/group/aroma-affymetrix/browse_thread/thread/269282aae0842811


>
> Thanks a lot in advance
>
> Joshy
> >
>

--~--~-~--~~~---~--~~
When reporting

Re: Reproducing RMA with Gene ST data (Was: Re: [aroma.affymetrix] Re: How do you analyze Gene ST Data?)

2008-12-03 Thread Henrik Bengtsson
Hi,

thanks for sharing all this.

On Tue, Dec 2, 2008 at 11:54 AM, pwhiteusa <[EMAIL PROTECTED]> wrote:
>
> Hi All,
>
> Here is the exact code I used to analyze Gene ST data for an
> experiment performed with the MoGene-1_0-st-v1 array.
>
> AROMA.AFFYMETRIX
>
> library(aroma.affymetrix)
> cdf <- AffymetrixCdfFile$fromChipType("MoGene-1_0-st-v1",tags="r3")
> prefixName <- "Pierson"
> cs <- AffymetrixCelSet$byName(prefixName, cdf=cdf)
> bc <- RmaBackgroundCorrection(cs)
> csBC <- process(bc)
> qn <- QuantileNormalization(csBC, typesToUpdate="pm")
> csN <- process(qn)
> plm <- RmaPlm(csN, flavor="affyPLM")  #flavor="oligo", must library
> (oligo)
> fit(plm)
> ces <- getChipEffectSet(plm)
> getExprs <- function(ces, ...) {
>  cdf <- getCdf(ces)
>  theta <- extractMatrix(ces, ..., returnUgcMap=TRUE)
>  ugcMap <- attr(theta, "unitGroupCellMap")
>  un<-getUnitNames(cdf, ugcMap[,"unit"])
>  rownames(theta) <- un
>  log2(theta)
> }
> data.aroma <- getExprs(ces)

I think that getExprs() call can be replaced by:

data.aroma <- extractDataFrame(ces, addNames=TRUE)

The difference is that the unit names will be in a separate column and
not as row names. You will also get group names and more, but those
you can drop if you want to.

Mark, is it correct that we can "deprecate" the suggestion to use
getExprs()?  Btw, is this documented somewhere online, or is this
knowledge only from the mailing list?

>
> Easy! Now to get the same data using the Affy packages:
>
>
> BIOCONDUCTOR AFFY
>
> You first need to create or download your mogene10stv1cdf library from
> the Affy unsupported CDF file (https://stat.ethz.ch/pipermail/bioc-
> devel/2007-October/001403.html has some detail on how to do this).
> However, as Mark Robinson pointed out there are potential issues with
> using the Affy unsupported CDF files. See the following for some
> details:
>
> https://stat.ethz.ch/pipermail/bioconductor/2007-November/020188.html
>
> library(affy)
> AffyRaw <- ReadAffy()
> AffyEset <- rma(AffyRaw)
> data.affy <- exprs(AffyEset)
>
>
> BIOCONDUCTOR OLIGO
>
> Download all the required Affy annotation files to your Mouse Gene v1
> ST array directory:
>
> http://www.affymetrix.com/support/technical/byproduct.affx?product=mogene-1_0-st-v1
>
> setwd("P:\\ANNOTATION\\AffyAnnotation\\Mouse\\MoGene-1_0-st-v1")
> library(pdInfoBuilder)
> pgfFile <- "MoGene-1_0-st-v1.r3.pgf"
> clfFile <- "MoGene-1_0-st-v1.r3.clf"
> transFile <- "MoGene-1_0-st-v1.na26.mm9.transcript.csv"
> probeFile <- "MoGene-1_0-st-v1.probe.tab"
> pkg <- new("AffyGenePDInfoPkgSeed", author="Peter White",
> email="[EMAIL PROTECTED]", version="0.1.3",
> genomebuild="UCSC mm9,  July 2007", chipName="MoGene10stv1",
> manufacturer="affymetrix", biocViews="AnnotationData",
> pgfFile=pgfFile, clfFile=clfFile, transFile=transFile,
> probeFile=probeFile)
> makePdInfoPackage(pkg, destDir=".")
>
> #This takes a little while to make the Package. Once created you will
> need to install the package from the Windows DOS prompt (navigate to
> the annotation directory with the newly created pd package to be
> installed):
>
> R CMD INSTALL pd.mogene.1.0.st.v1\
>
> Note for this to work you need RTools and you Path variable set up
> correctly as described at:
>
> http://cran.r-project.org/doc/manuals/R-admin.html#The-Windows-toolset)
>
> Now return to R, set the working directory to your CEL file directory:
>
> library(pd.mogene.1.0.st.v1)
> library(oligo)
> OligoRaw<-read.celfiles(filenames=list.celfiles())
> OligoEset<-rma(OligoRaw)
> data.oligo<-exprs(OligoEset)
>
>
> COMPARING THE TWO DATASETS
>
> Here is what I did to compare the data generate by affy, oligo and
> aroma.affymetrix:
>
> dim(data.aroma)
> [1] 3551216
> dim(data.affy)
> [1] 3551216
> length(grep(TRUE, rownames(data.affy)==rownames(data.aroma)))
> [1] 35512

FYI, sum(rownames(data.affy)==rownames(data.aroma)) gives you the
same.  Replacing sum() with summary() will also work.

>
> The output from both the affy rma and aroma.affymetrix methods retains
> the same order of probes and cel files so the two files can be
> compared directly.

That is probably because they work of the same CDF, but you should
never rely on this/assume that this is always the case.  If you do,
you should at least verify that the unit names (and group names)
match.

> However,
>
> dim(data.oligo)
> [1] 3555716
>
> The normalized data file from the Oligo package includes an additional
> 45 Transcript IDs (there's no annotation on what these are but they
> contain anywhere from 9 to 489 probes per probeset).

For the record, would you mind posting the names of these "additional"
45 units here?  (I'm sure someone else will search the web later and
find this thread very helpful).

> Fixed this problem as follows:
>
> o <- match(rownames(data.aroma), rownames(data.oligo))
> data.oligo <- data.oligo[o,]
>
>> length(grep(TRUE, rownames(data.affy)==rownames(data.oligo)))
> [1] 35512
>> length(grep(TRUE, rownames(data.aroma)==rownames(data.oligo)))
> [1] 

Re: Reproducing RMA with Gene ST data (Was: Re: [aroma.affymetrix] Re: How do you analyze Gene ST Data?)

2008-12-03 Thread Mark Robinson

Hi all.

First of all, thanks Peter for 1) doing this testing and 2) for  
spelling everything out.  I expect to refer people to this thread in  
the future, so thanks for that.

Just wanted to add 3 more tidbits of hopefully useful information.

1. I dug a bit into why flavor="oligo" doesn't work within  
aroma.affymetrix.  It turns out it was a simple fix.  Since I don't  
use it regularly (it doesn't give probe affinities!) and the  
underlying 'oligo' functions had changed, it stopped working.  Its  
corrected now.  I've checked in the fix, so flavor='oligo' will be  
available in the next release.  In my tests, it appears VERY close to  
'affy' ... and since its based on 'oligo' code, it should be VERY VERY  
similar.

...
plm1 <- RmaPlm(csN,flavor="oligo")
fit(plm1,verbose=verbose)
ces <- getChipEffectSet(plm1)
data.aroma.oligo <- getExprs(ces)
...

 > mean( (data.affy[,1]-data.aroma.oligo[,1])^2 )
[1] 0.0003193267

2. I dug a bit into the unsupported CDF and the 'platformDesign'  
objects from oligo and from what I can tell, the probes used in the  
33252 units (I'm looking at Human) within aroma.affymetrix are  
identical to the probes used within oligo (as built with  
pdInfoBuilder) ... not a single probe no accounted for.  In case you  
haven't dug into pdInfoBuilder before and the SQLite db behind, here  
are some commands you may find useful ...

---
library(pd.hugene.1.0.st.v1)
library(pdInfoBuilder)
fn <- dir("rawData/tissues/HuGene-1_0-st-v1","CEL",full=TRUE)[1:3]
x <- read.celfiles(filenames=fn,pkgname="pd.hugene.1.0.st.v1")
pd <-getPlatformDesign(x)
ff <- dbGetQuery(db(pd), "select * from pmfeature")

# three 3 lines speed up the splitting ...
ffs <- split(ff, substr(ff$fsetid,1,4))
ffs <- unlist( lapply(ffs, FUN=function(u) split(u,u$fsetid)),  
recursive=FALSE)
names(ffs) <- substr(names(ffs), 6,nchar(names(ffs)))

cs <- AffymetrixCelSet$fromName(name, chipType=chip)
cdf <- getCdf(cs)
cdfCells <- readCdfUnits(getPathname(cdf), units=NULL, readXY=FALSE,  
readBases=FALSE, readExpos=FALSE,
  readType=FALSE,  
readDirection=FALSE,stratifyBy=c("pm"), readIndices=TRUE, verbose=0)

un <- unique(ff$fsetid)
ids <- intersect(un,names(cdfCells))

mf <- match(ids,names(ffs))
mc <- match(ids,names(cdfCells))

matches <- matrix(NA,nr=length(ids),nc=3)
rownames(matches) <- ids
colnames(matches) <- c("pdInfoBuilder","unsupportedCDF","union")

for(i in 1:nrow(matches)) {
   a <- cdfCells[[ ids[i] ]]$groups[[1]]$indices
   b <- ffs[[ ids[i] ]]$fid
   matches[i,1] <- length(b)
   matches[i,2] <- length(a)
   matches[i,3] <- length(union(a,b))
   cat(ids[i],"\n")
}
---

... this gives ..
 > matches[1:5,]
 pdInfoBuilder unsupportedCDF union
798132627 2727
809500542 4242
810031010 1010
794811715 1515
815587725 2525
 > table(matches[,3]-matches[,1])
 0
33252
 > table(matches[,3]-matches[,2])
 0
33252

3.  This doesn't address the problem of the missing probesets.  I'm  
happy to go and collect these if people want them, but based on the  
reply from Affymetrix (thanks to Mark Cowley for the leg work here),  
they are probably 'low-coverage transcript clusters' that can be  
'safely ignored'.  See:

http://article.gmane.org/gmane.science.biology.informatics.conductor/19738


SUMMARY:

[aroma.affymetrix flavor='oligo'] = rma in affy, rma in oligo

[aroma.affymetrix flavor='affyPLM'] = fitPLM in affyPLM

... where '=' means 'for all intents and purposes equivalent', not  
strictly equal.

Cheers,
Mark





On 04/12/2008, at 7:43 AM, Henrik Bengtsson wrote:

> Hi,
>
> thanks for sharing all this.
>
> On Tue, Dec 2, 2008 at 11:54 AM, pwhiteusa <[EMAIL PROTECTED]>  
> wrote:
>>
>> Hi All,
>>
>> Here is the exact code I used to analyze Gene ST data for an
>> experiment performed with the MoGene-1_0-st-v1 array.
>>
>> AROMA.AFFYMETRIX
>>
>> library(aroma.affymetrix)
>> cdf <- AffymetrixCdfFile$fromChipType("MoGene-1_0-st-v1",tags="r3")
>> prefixName <- "Pierson"
>> cs <- AffymetrixCelSet$byName(prefixName, cdf=cdf)
>> bc <- RmaBackgroundCorrection(cs)
>> csBC <- process(bc)
>> qn <- QuantileNormalization(csBC, typesToUpdate="pm")
>> csN <- process(qn)
>> plm <- RmaPlm(csN, flavor="affyPLM")  #flavor="oligo", must library
>> (oligo)
>> fit(plm)
>> ces <- getChipEffectSet(plm)
>> getExprs <- function(ces, ...) {
>> cdf <- getCdf(ces)
>> theta <- extractMatrix(ces, ..., returnUgcMap=TRUE)
>> ugcMap <- attr(theta, "unitGroupCellMap")
>> un<-getUnitNames(cdf, ugcMap[,"unit"])
>> rownames(theta) <- un
>> log2(theta)
>> }
>> data.aroma <- getExprs(ces)
>
> I think that getExprs() call can be replaced by:
>
> data.aroma <- extractDataFrame(ces, addNames=TRUE)
>
> The difference is that the unit names will be in a separate column and
> not as row names. You will also get group names and more, but those
> you can drop if you want to.
>
> Ma

[aroma.affymetrix] Re: FIRMA score

2008-12-03 Thread sabrina

Hi, Mark:

Thank you so much for the detailed explanation!
Actually I am working on Mouse Exon arrays.  I did check the UCSC
genome browser, but they do not have the Exon Array available  as
Human Exon.

My study is trying to find alternative splicing events between two
groups with identical genotypes but different treatments. each group
has 4 or 5 biological replicates. So I did all as described in the
examples from Human Exon array, then used the following codes to find
the significant genes with potential splicing events:


...
exFirma<-extractDataFrame(firmaScore,addNames=TRUE);
exFirma[,6:ncol(exFirma)]<-log2(exFirma[,6:ncol(exFirma)])


library(limma)

design <- cbind(Grp1=c(1,1,1,1,1,0,0,0,0),Grp2=c(0,0,0,0,0,1,1,1,1))

fit<-lmFit(exFirma[,6:ncol(exFirma)],design)
fit<-eBayes(fit)
x<-topTable(fit)

However I only got about 10 IDs with all of which adjusted P values as
1. I wonder if there is anything I did wrong here. Also when I checked
the IDs back to Affymetrix annotation file, I could only find one
match, I wonder what went wrong. Thanks!

BTW, I used the coreR1 , A20080718, MR, cdf version

Sabrina

On Dec 2, 6:27 pm, Mark Robinson <[EMAIL PROTECTED]> wrote:
> Hi Sabrina.
>
> Indeed, you'll want to set 'mergeGroups=TRUE' for the probe level  
> model (ExonRmaPlm object) that you send to 'FirmaModel' ...
>
> -http://groups.google.com/group/aroma-affymetrix/web/human-exon-array-...
>   says:
> ...
> plmTr <- ExonRmaPlm(csN, mergeGroups=TRUE)
> print(plmTr)
> ...
> firma <- FirmaModel(plmTr)
> fit(firma, verbose=verbose)
> -
>
> NO, the unitName is not the same as gene id in NCBI.  If you are using  
> the CDF from:
>
> http://groups.google.com/group/aroma-affymetrix/web/affymetrix-define...
>
> (e.g. HuEx-1_0-st-v2,coreR3,A20071112,EP.cdf) the unitName is an  
> Affymetrix-defined identifier.
>
> If you go to:
>
> http://www.affymetrix.com/products_services/arrays/specific/exon.affx...
>
> you will see a 'HuEx-1_0-st-v2 Probeset Annotations' file that you can  
> download.  In there, you will find the annotations you need (unitName  
> corresponds to the 'transcript_cluster_id' column and groupName  
> corresponds to the 'probeset_id' column)
>
> Also, you may be interested to know that the UCSC genome browser has a  
> track called 'Affy HuEx 1.0' under the set of 'Expression' tracks.  
> Unfortunately, its not searchable on the browser, but quite useful if  
> you can locate it in the browser and find out which exon is  
> differentially spliced.
>
> Hope that helps.
> Mark
>
> On 03/12/2008, at 5:56 AM, sabrina wrote:
>
>
>
> > Hi, all:
> > I just started learning to use firma analysis. My impression from
> > previous discussions of other users is that we should only use plm
> > with mergeGroup=True, is that correct? Another question is : Is
> > unitName same as gene uid in NCBI database? how can I map groupName to
> > EXON name from NCBI? Thanks !!!
>
> > Sabrina
>
> --
> Mark Robinson
> Epigenetics Laboratory, Garvan
> Bioinformatics Division, WEHI
> e: [EMAIL PROTECTED]
> e: [EMAIL PROTECTED]
> p: +61 (0)3 9345 2628
> f: +61 (0)3 9347 0852
> --
--~--~-~--~~~---~--~~
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
"aroma.affymetrix" group.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/aroma-affymetrix?hl=en
-~--~~~~--~~--~--~---



[aroma.affymetrix] Re: FIRMA score

2008-12-03 Thread Mark Robinson


Sabrina.

See below.


On 04/12/2008, at 8:41 AM, sabrina wrote:
> Hi, Mark:
>
> Thank you so much for the detailed explanation!
> Actually I am working on Mouse Exon arrays.  I did check the UCSC
> genome browser, but they do not have the Exon Array available  as
> Human Exon.


Right! Good point.


> My study is trying to find alternative splicing events between two
> groups with identical genotypes but different treatments. each group
> has 4 or 5 biological replicates. So I did all as described in the
> examples from Human Exon array, then used the following codes to find
> the significant genes with potential splicing events:
>
>
> ...
> exFirma<-extractDataFrame(firmaScore,addNames=TRUE);
> exFirma[,6:ncol(exFirma)]<-log2(exFirma[,6:ncol(exFirma)])
>
>
> library(limma)
>
> design <- cbind(Grp1=c(1,1,1,1,1,0,0,0,0),Grp2=c(0,0,0,0,0,1,1,1,1))
>
> fit<-lmFit(exFirma[,6:ncol(exFirma)],design)
> fit<-eBayes(fit)
> x<-topTable(fit)


Careful here as to what you are fitting and testing with limma.  You  
are fitting a parameterization that would require a contrast to be fit  
( ... such as 'contrasts.fit' ... ) in order to assess the differences  
between your 2 groups.  Did you get this example from a web page or a  
thread somewhere?  If so, I should correct it.

Probably what you want is:

design <- cbind(Grp1=1,Grp2=c(0,0,0,0,0,1,1,1,1))

...

topTable(fit,coef=2)

Under this parameterization, setting coef=2 will test the difference  
between the 2 groups.  Note that topTable by default gives the top 10  
values, regardless of statistics.

Does that make sense?


> However I only got about 10 IDs with all of which adjusted P values as
> 1. I wonder if there is anything I did wrong here. Also when I checked
> the IDs back to Affymetrix annotation file, I could only find one
> match, I wonder what went wrong.

How exactly did you match them back?

Mark




> Thanks!
>
> BTW, I used the coreR1 , A20080718, MR, cdf version
>
> Sabrina
>
> On Dec 2, 6:27 pm, Mark Robinson <[EMAIL PROTECTED]> wrote:
>> Hi Sabrina.
>>
>> Indeed, you'll want to set 'mergeGroups=TRUE' for the probe level
>> model (ExonRmaPlm object) that you send to 'FirmaModel' ...
>>
>> -http://groups.google.com/group/aroma-affymetrix/web/human-exon-array-
>>  
>> ...
>>   says:
>> ...
>> plmTr <- ExonRmaPlm(csN, mergeGroups=TRUE)
>> print(plmTr)
>> ...
>> firma <- FirmaModel(plmTr)
>> fit(firma, verbose=verbose)
>> -
>>
>> NO, the unitName is not the same as gene id in NCBI.  If you are  
>> using
>> the CDF from:
>>
>> http://groups.google.com/group/aroma-affymetrix/web/affymetrix- 
>> define...
>>
>> (e.g. HuEx-1_0-st-v2,coreR3,A20071112,EP.cdf) the unitName is an
>> Affymetrix-defined identifier.
>>
>> If you go to:
>>
>> http://www.affymetrix.com/products_services/arrays/specific/ 
>> exon.affx...
>>
>> you will see a 'HuEx-1_0-st-v2 Probeset Annotations' file that you  
>> can
>> download.  In there, you will find the annotations you need (unitName
>> corresponds to the 'transcript_cluster_id' column and groupName
>> corresponds to the 'probeset_id' column)
>>
>> Also, you may be interested to know that the UCSC genome browser  
>> has a
>> track called 'Affy HuEx 1.0' under the set of 'Expression' tracks.
>> Unfortunately, its not searchable on the browser, but quite useful if
>> you can locate it in the browser and find out which exon is
>> differentially spliced.
>>
>> Hope that helps.
>> Mark
>>
>> On 03/12/2008, at 5:56 AM, sabrina wrote:
>>
>>
>>
>>> Hi, all:
>>> I just started learning to use firma analysis. My impression from
>>> previous discussions of other users is that we should only use plm
>>> with mergeGroup=True, is that correct? Another question is : Is
>>> unitName same as gene uid in NCBI database? how can I map  
>>> groupName to
>>> EXON name from NCBI? Thanks !!!
>>
>>> Sabrina
>>
>> --
>> Mark Robinson
>> Epigenetics Laboratory, Garvan
>> Bioinformatics Division, WEHI
>> e: [EMAIL PROTECTED]
>> e: [EMAIL PROTECTED]
>> p: +61 (0)3 9345 2628
>> f: +61 (0)3 9347 0852
>> --
> >

--
Mark Robinson
Epigenetics Laboratory, Garvan
Bioinformatics Division, WEHI
e: [EMAIL PROTECTED]
e: [EMAIL PROTECTED]
p: +61 (0)3 9345 2628
f: +61 (0)3 9347 0852
--





--~--~-~--~~~---~--~~
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
"aroma.affymetrix" group.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/aroma-affymetrix?hl=en
-~--~~~~--~~--~--~---



Re: Reproducing RMA with Gene ST data (Was: Re: [aroma.affymetrix] Re: How do you analyze Gene ST Data?)

2008-12-03 Thread Henrik Bengtsson

Hi,

thanks Mark for this.

So, it all has to do with *how* the log-additive probe-level model is
*fitted*, correct?  Thus, the model is the same but the algorithms
differ.  That gives us some sense of how much variance we get from
using different algorithms regardless of model.   Simulation studies
(under the model) could then show if for instance one of the
algorithms is more biased than others.

Thanks for fixing the flavor="oligo".  It will be part of the next release.

Cheers

Henrik

On Wed, Dec 3, 2008 at 1:36 PM, Mark Robinson <[EMAIL PROTECTED]> wrote:
>
> Hi all.
>
> First of all, thanks Peter for 1) doing this testing and 2) for
> spelling everything out.  I expect to refer people to this thread in
> the future, so thanks for that.
>
> Just wanted to add 3 more tidbits of hopefully useful information.
>
> 1. I dug a bit into why flavor="oligo" doesn't work within
> aroma.affymetrix.  It turns out it was a simple fix.  Since I don't
> use it regularly (it doesn't give probe affinities!) and the
> underlying 'oligo' functions had changed, it stopped working.  Its
> corrected now.  I've checked in the fix, so flavor='oligo' will be
> available in the next release.  In my tests, it appears VERY close to
> 'affy' ... and since its based on 'oligo' code, it should be VERY VERY
> similar.
>
> ...
> plm1 <- RmaPlm(csN,flavor="oligo")
> fit(plm1,verbose=verbose)
> ces <- getChipEffectSet(plm1)
> data.aroma.oligo <- getExprs(ces)
> ...
>
>  > mean( (data.affy[,1]-data.aroma.oligo[,1])^2 )
> [1] 0.0003193267
>
> 2. I dug a bit into the unsupported CDF and the 'platformDesign'
> objects from oligo and from what I can tell, the probes used in the
> 33252 units (I'm looking at Human) within aroma.affymetrix are
> identical to the probes used within oligo (as built with
> pdInfoBuilder) ... not a single probe no accounted for.  In case you
> haven't dug into pdInfoBuilder before and the SQLite db behind, here
> are some commands you may find useful ...
>
> ---
> library(pd.hugene.1.0.st.v1)
> library(pdInfoBuilder)
> fn <- dir("rawData/tissues/HuGene-1_0-st-v1","CEL",full=TRUE)[1:3]
> x <- read.celfiles(filenames=fn,pkgname="pd.hugene.1.0.st.v1")
> pd <-getPlatformDesign(x)
> ff <- dbGetQuery(db(pd), "select * from pmfeature")
>
> # three 3 lines speed up the splitting ...
> ffs <- split(ff, substr(ff$fsetid,1,4))
> ffs <- unlist( lapply(ffs, FUN=function(u) split(u,u$fsetid)),
> recursive=FALSE)
> names(ffs) <- substr(names(ffs), 6,nchar(names(ffs)))
>
> cs <- AffymetrixCelSet$fromName(name, chipType=chip)
> cdf <- getCdf(cs)
> cdfCells <- readCdfUnits(getPathname(cdf), units=NULL, readXY=FALSE,
> readBases=FALSE, readExpos=FALSE,
>  readType=FALSE,
> readDirection=FALSE,stratifyBy=c("pm"), readIndices=TRUE, verbose=0)
>
> un <- unique(ff$fsetid)
> ids <- intersect(un,names(cdfCells))
>
> mf <- match(ids,names(ffs))
> mc <- match(ids,names(cdfCells))
>
> matches <- matrix(NA,nr=length(ids),nc=3)
> rownames(matches) <- ids
> colnames(matches) <- c("pdInfoBuilder","unsupportedCDF","union")
>
> for(i in 1:nrow(matches)) {
>   a <- cdfCells[[ ids[i] ]]$groups[[1]]$indices
>   b <- ffs[[ ids[i] ]]$fid
>   matches[i,1] <- length(b)
>   matches[i,2] <- length(a)
>   matches[i,3] <- length(union(a,b))
>   cat(ids[i],"\n")
> }
> ---
>
> ... this gives ..
>  > matches[1:5,]
> pdInfoBuilder unsupportedCDF union
> 798132627 2727
> 809500542 4242
> 810031010 1010
> 794811715 1515
> 815587725 2525
>  > table(matches[,3]-matches[,1])
> 0
> 33252
>  > table(matches[,3]-matches[,2])
> 0
> 33252
>
> 3.  This doesn't address the problem of the missing probesets.  I'm
> happy to go and collect these if people want them, but based on the
> reply from Affymetrix (thanks to Mark Cowley for the leg work here),
> they are probably 'low-coverage transcript clusters' that can be
> 'safely ignored'.  See:
>
> http://article.gmane.org/gmane.science.biology.informatics.conductor/19738
>
>
> SUMMARY:
>
> [aroma.affymetrix flavor='oligo'] = rma in affy, rma in oligo
>
> [aroma.affymetrix flavor='affyPLM'] = fitPLM in affyPLM
>
> ... where '=' means 'for all intents and purposes equivalent', not
> strictly equal.
>
> Cheers,
> Mark
>
>
>
>
>
> On 04/12/2008, at 7:43 AM, Henrik Bengtsson wrote:
>
>> Hi,
>>
>> thanks for sharing all this.
>>
>> On Tue, Dec 2, 2008 at 11:54 AM, pwhiteusa <[EMAIL PROTECTED]>
>> wrote:
>>>
>>> Hi All,
>>>
>>> Here is the exact code I used to analyze Gene ST data for an
>>> experiment performed with the MoGene-1_0-st-v1 array.
>>>
>>> AROMA.AFFYMETRIX
>>>
>>> library(aroma.affymetrix)
>>> cdf <- AffymetrixCdfFile$fromChipType("MoGene-1_0-st-v1",tags="r3")
>>> prefixName <- "Pierson"
>>> cs <- AffymetrixCelSet$byName(prefixName, cdf=cdf)
>>> bc <- RmaBackgroundCorrection(cs)
>>> csBC <- process(bc)
>>> qn <- QuantileNormalization

[aroma.affymetrix] Re: Estimated time left: 36844.5min

2008-12-03 Thread Henrik Bengtsson

Hi Nico.

On Tue, Dec 2, 2008 at 12:04 PM, Nicolas Stransky
<[EMAIL PROTECTED]> wrote:
>
> Hi,
>
> I'm trying to analyze 973 SNP6.0 arrays with aroma.affymetrix. The
> early steps of the analysis went fine, but the probe level
> summarization step takes a huge amount of time, as you can see in the
> short log I show below.
> Here is the command I used:
> fit(plm, verbose=verbose, ram=10)
>
> The machine has 128GB of RAM and a lot of RAM is also used by Linux
> cached files. probeData and plmData are local on the hard drive.
>
> Is there something that I could do to speed up the process? It seems
> to me that it doesn't scale with computation times I have observed for
> smaller numbers of arrays.


> 20081201 20:15:30|  Started: 20081125 15:01:07
> 20081201 20:15:30|  Estimated time left: 36844.5min
> 20081201 20:15:30|  ETA: 20081227 10:20:02
> 20081201 20:15:30| Fitting chunk #179 of 916...done


Wow!  This is a great chance for you to take a vacation and at the
same time tell your boss you're working on the problem, which you are
;)

FITTING CN UNITS FASTER:
Starting with aroma.affymetrix v0.9.5 you can fit all CN units/probes
much much faster by:

if (length(findUnitsTodo(plm)) > 0) {
  units <- fitCnProbes(plm, verbose=log);
  str(units);  # Displays the indices of all CN units.
}

The reason for this special method is because all CN units are
single-probe units, which means that there is nothing much to fit, but
just taking the chip effect estimate to be the probe intensity.
Hopefully, in a not to far future, this will be automatically done
internally, but for now you have to do it explicitly.   Also, until
next release (v0.9.6), calling this method twice will refit the CN
probes; in next release it will skip already fitted CN probes.

Then when you call:

fit(plm, verbose=verbose, ram=10)

only the SNP units have to be fitted.  The above will probably bring
down the processing time to half of what you have now.

FYI, since the SNPs comes before an CN units in the CDF, this means
that your running process (if you still run), are still working on the
SNPs.  That is, you can interrupt it, call the above fitCnProbes(),
and then restart fit().


PROCESSING DATA IN PARALLEL:
As argued in other threads, parts of the CRMA method are single array
by nature (especially with AvgCnPlm) and the upcoming CRMA v2 will be
truly a single-array method.  This means that you can process subsets
of your data set in parallel on multiple machines and then combine
them at the end of the pipeline.  Currently this has to be done
manually.  Here is an example illustrating how it could be done
(though, I haven't had time to verify it):

library("aroma.affymetrix");
log <- Arguments$getVerbose(-10, timestamp=TRUE);

# Setup the complete data set
cdf <- AffymetrixCdfFile$byChipType("GenomeWideSNP_6", tags="Full");
csR <- AffymetrixCelSet$byName"CompleteDataSet", cdf=cdf);

# Split the data set in chunks of 20 arrays.  Each chunk should be
# processed by one host.
arrays <- splitInChunks(1:nbrOfArrays(csR), chunkSize=20);

# Assume a host processing Chunk #3

# Extract 3rd chunk, and add a tag indicating which arrays are processed
csR <- extract(csR, arrays[[3]]);
arraysTag <- seqToHumanReadable(arrays[[3]]);  # e.g. "41-60"
setTags(csR, c("*", arraysTag));

acc <- AllelicCrosstalkCalibration(csR);
csC <- process(acc, verbose=log);
plm <- AvgCnPlm(csC, mergeStrands=TRUE, combineAlleles=TRUE);
if (length(findUnitsTodo(plm)) > 0) {
  dummy <- fitCnProbes(plm, verbose=log);   # Fits all CN probes
  fit(plm, verbose=log); # Fits the remaining units, i.e. the SNPs
}
ces <- getChipEffectSet(plm);

You will end up with multiple chip-effect data sets with tags "1-20",
"21-40", and so on.  What remains is to merge them back at the end,
before passing them to the fragment-length normalization.  For this,
you can use the approach as I suggest in today's thread 'Batch
Normalization' on Dec 3, 2008:

  
http://groups.google.com/group/aroma-affymetrix/browse_thread/thread/4bff6520f5ad0a70#

FYI, if you want to skip ahead (i.e. skip calling all processing
steps, which has some overhead) and just setup the chip-effect data
sets, you can do this as:

cdf <- AffymetrixCdfFile$byChipType("GenomeWideSNP_6", tags="Full");
csR <- AffymetrixCelSet$byName"CompleteDataSet", cdf=cdf);

cdfM <- getMonocellCdf(cdf);

arrays <- splitInChunks(1:nbrOfArrays(csR), chunkSize=20);
ces <- NULL;
for (kk in seq(along=arrays)) {
  arraysTag <- seqToHumanReadable(arrays[[kk]]);
  tags <- c(arraysTag, "ACC,ra,-XY,AVG,+300,A+B,FLN,-XY");
  cesKK <- CnChipEffectSet$byName("CompleteDataSet", tags=tags, cdf=cdfM,
mergeStrands=TRUE,
combineAlleles=TRUE);
  print(cesKK);
  if (is.null(ces)) {
ces <- cesKK;
  } else {
append(ces, cesKK);
  }
} # for (kk ...)

# Finally, update the "arrays" tag.
tags <- getTags(ces);
arraysTag <- seqToHumanReadable(1:nbrOfArrays(csR));
tags <- gsub("1-20", arraysTag, fixed=TRUE, tags);
setTag

[aroma.affymetrix] Re: Error with chromosomeExplorer

2008-12-03 Thread Henrik Bengtsson

Hi,

do you know if this only happens to the larger plots, e.g. large
chromosomes and/or large amount of zoom.  What zoom tags do you have
on the images that are succefully generated?

Let's see what PNG device driver you are using.  What do you get if you do:

pngDev <- findPngDevice(transparent=FALSE);
print(pngDev);

I believe it is your PNG device driver that is not working, and
changing it to another one (there are a few different alternatives
depending on system setup), could help.  The following threads reports
similar problems and how they were solved:

1) Thread 'R crashing during CNV analysis', Nov 11-20, 2008:
http://groups.google.com/group/aroma-affymetrix/browse_thread/thread/9853243d78fabf2

2) Thread 'ChromosomeExplorer memory leaks' on Aug 18-20, 2008:
http://groups.google.com/group/aroma-affymetrix/browse_thread/thread/e06bc012b5359dea

Let us know if this helps and if so what your solution was.

Henrik


On Wed, Dec 3, 2008 at 6:40 AM, mortiz <[EMAIL PROTECTED]> wrote:
>
> hi,
>
> ive proccessed chromosomeExplorer for all the chromosomes and i get
> many plots but also the error (although it doesnt stop cause of it)...
> if i do the traceback i get this
>
>>traceback()
> 32: unlist(args)
> 31: as.list(unlist(args))
> 30: par("lwd")
> 29: int_abline(a = a, b = b, h = h, v = v, untf = untf, ...)
> 28: abline(h = log2(level), col = "blue", lty = 2)
> 27: withCallingHandlers(expr, warning = function(w) invokeRestart
> ("muffleWarning"))
> 26: suppressWarnings({
>newPlot(this, xlim = xlim, ylim = ylim, flavor = "ce", unit =
> unit,
>...)
>drawXAxisRuler(xrange = c(0, nbrOfBases)/10^unit, ticksBy =
> ticksBy)
>if (!identical(this$.plotCytoband, FALSE)) {
>drawCytoband(this, chromosome = chromosome, unit = unit)
>}
>cnLevels <- c(1/2, 1, 3/2)
>for (level in cnLevels) {
>abline(h = log2(level), col = "blue", lty = 2)
>}
>rawCns <- extractRawCopyNumbers(fit)
>verbose && print(verbose, rawCns, level = -50)
>n <- nbrOfLoci(rawCns, na.rm = TRUE)
>stext(text = sprintf("n=%d", n), side = 4, pos = 0, line = 0,
>cex = 0.8)
>pointsRawCNs(fit, unit = unit, ...)
>cnRegions <- extractCopyNumberRegions(fit)
>verbose && print(verbose, cnRegions, level = -50)
>drawLevels(cnRegions, lwd = 4, col = "black", xScale =
> 1/10^unit)
>drawExtraAnnotations(fit)
>onFitAddGenotypeCalls(fit, callList = callList, arrayName =
> arrayName,
>resScale = 10^unit * resScale, ...)
>})
> 25: doTryCatch(return(expr), name, parentenv, handler)
> 24: tryCatchOne(expr, names, parentenv, handlers[[1]])
> 23: tryCatchList(expr, classes, parentenv, handlers)
> 22: tryCatch({
>verbose && enter(verbose, "Plotting graph")
>opar <- par(xaxs = "r")
>suppressWarnings({
>newPlot(this, xlim = xlim, ylim = ylim, flavor = "ce",
>unit = unit, ...)
>drawXAxisRuler(xrange = c(0, nbrOfBases)/10^unit, ticksBy
> = ticksBy)
>if (!identical(this$.plotCytoband, FALSE)) {
>drawCytoband(this, chromosome = chromosome, unit =
> unit)
>}
>cnLevels <- c(1/2, 1, 3/2)
>for (level in cnLevels) {
>abline(h = log2(level), col = "blue", lty = 2)
>}
>rawCns <- extractRawCopyNumbers(fit)
>verbose && print(verbose, rawCns, level = -50)
>n <- nbrOfLoci(rawCns, na.rm = TRUE)
>stext(text = sprintf("n=%d", n), side = 4, pos = 0, line =
> 0,
>cex = 0.8)
>pointsRawCNs(fit, unit = unit, ...)
>cnRegions <- extractCopyNumberRegions(fit)
>verbose && print(verbose, cnRegions, level = -50)
>drawLevels(cnRegions, lwd = 4, col = "black", xScale =
> 1/10^unit)
>drawExtraAnnotations(fit)
>onFitAddGenotypeCalls(fit, callList = callList, arrayName
> = arrayName,
>resScale = 10^unit * resScale, ...)
>})
>stext(chipType, side = 4, pos = 1, line = 0, cex = 0.8)
>verbose && exit(verbose)
>}, error = function(ex) {
>print(ex)
>}, finally = {
>par(opar)
>if (!imageFormat %in% c("screen", "current"))
>dev.off()
>})
> 21: doTryCatch(return(expr), name, parentenv, handler)
> 20: tryCatchOne(expr, names, parentenv, handlers[[1]])
> 19: tryCatchList(expr, classes, parentenv, handlers)
> 18: tryCatch({
>arrayFullName <- gsub("^(.*),chr[0-9][0-9].*$", "\\1",
> fullname)
>arrayName <- gsub("^([^,]*).*$", "\\1", arrayFullName)
>nbrOfBases <- genome$nbrOfBases[chromosome]
>widthMb <- nbrOfBases/10^unit
>if (is.null(xlim)) {
>xlim <- c(0, getChromosomeLength(chromosome))/10^unit
>}
>verbose && enter(verbose, sprintf("Plotting %s for chromosome
> %02d [%.2fMB]",
>  

[aroma.affymetrix] Re: How to reload CBS/GLAD segmentation result files (.xdr) and generate output by ChromosomeExplorer

2008-12-03 Thread Henrik Bengtsson

Hi,

my 2nd wow today (3000 arrays).

The cbsData/ and gladData/ directories only stores the fitted
segmentation (saved as *.xdr files).  In order to plot the CN
estimates, ChromosomeExplorer will also need the chip effect data set.
 In other words, you will have to copy both the chip effect (*.CEL)
files in the plmData/ directory as well as the corresponding cbsData/
(gladData/) directories, to get something like:

cbsData/
  MyDataSet,ACC,-XY,RMA,+300,A+B,FLN,-XY/
Mapping250K_Nsp+Sty/
  *.xdr

plmData/
  MyDataSet,ACC,-XY,RMA,+300,A+B,FLN,-XY/
Mapping250K_Nsp/
  *.chipEffects.CEL
Mapping250K_Sty/
  *.chipEffects.CEL

Don't forget the .average-intensities-*.CEL files in the plmData/
subdirectories, which contain the robust averages of all 3000 arrays,
otherwise they have to be recomputed as well.

Then setup the chip effect data sets as:

cesList <- list();
for (chipType in c("Mapping250K_Nsp", "Mapping250K_Sty")) {
  cdf <- AffymetrixCdfFile$byChipType(chipType);
  cdfM <- getMonocellCdf(cdf);
  ces <- CnChipEffectSet$byName("MyDataSet,ACC,-XY,RMA,+300,A+B,FLN,-XY",
cdf=cdfM, mergeStrands=TRUE, combineAlleles=TRUE);
  cesList[[chipType]] <- ces;
}

and continue as "usual", e.g.

cbs <- CbsModel(cesList);

or whatever you do.

Hope this helps

Henrik

On Wed, Dec 3, 2008 at 9:19 AM, Harry <[EMAIL PROTECTED]> wrote:
>
> Hi,
>
> I am currently analyzing over 3000 Affy 500k samples on our Linux
> cluster. However, the cluster does not support ChromosomeExplorer. So
> my plan is to run all the analyzes till I get the segmentation results
> on the cluster, which will be located in gladData or cbsData folder,
> and then copy all these segmentation result folders to my desktop
> computer to generate ChromosomeExplorer output. So my question is how
> can I reload the results (.xdr files) back to R without repeating all
> the analyzes? and are those .xdr files sufficient to generate
> ChromosomeExplorer output or I need more files from previous steps?
>
> Thanks!
>
> Harry
>
> >
>

--~--~-~--~~~---~--~~
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
"aroma.affymetrix" group.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/aroma-affymetrix?hl=en
-~--~~~~--~~--~--~---



Re: Reproducing RMA with Gene ST data (Was: Re: [aroma.affymetrix] Re: How do you analyze Gene ST Data?)

2008-12-03 Thread Mark Robinson


On 04/12/2008, at 10:17 AM, Henrik Bengtsson wrote:

> So, it all has to do with *how* the log-additive probe-level model is
> *fitted*, correct?


Correct.  Identical linear model, different procedure for fitting.

(as a bit of an aside ... I think of these things in terms of  
influence functions -- median polish will have a different IF than the  
defaults in affyPLM's robust fit)

M.


> Thus, the model is the same but the algorithms
> differ.  That gives us some sense of how much variance we get from
> using different algorithms regardless of model.   Simulation studies
> (under the model) could then show if for instance one of the
> algorithms is more biased than others.
>
> Thanks for fixing the flavor="oligo".  It will be part of the next  
> release.
>
> Cheers
>
> Henrik
>
> On Wed, Dec 3, 2008 at 1:36 PM, Mark Robinson  
> <[EMAIL PROTECTED]> wrote:
>>
>> Hi all.
>>
>> First of all, thanks Peter for 1) doing this testing and 2) for
>> spelling everything out.  I expect to refer people to this thread in
>> the future, so thanks for that.
>>
>> Just wanted to add 3 more tidbits of hopefully useful information.
>>
>> 1. I dug a bit into why flavor="oligo" doesn't work within
>> aroma.affymetrix.  It turns out it was a simple fix.  Since I don't
>> use it regularly (it doesn't give probe affinities!) and the
>> underlying 'oligo' functions had changed, it stopped working.  Its
>> corrected now.  I've checked in the fix, so flavor='oligo' will be
>> available in the next release.  In my tests, it appears VERY close to
>> 'affy' ... and since its based on 'oligo' code, it should be VERY  
>> VERY
>> similar.
>>
>> ...
>> plm1 <- RmaPlm(csN,flavor="oligo")
>> fit(plm1,verbose=verbose)
>> ces <- getChipEffectSet(plm1)
>> data.aroma.oligo <- getExprs(ces)
>> ...
>>
>>> mean( (data.affy[,1]-data.aroma.oligo[,1])^2 )
>> [1] 0.0003193267
>>
>> 2. I dug a bit into the unsupported CDF and the 'platformDesign'
>> objects from oligo and from what I can tell, the probes used in the
>> 33252 units (I'm looking at Human) within aroma.affymetrix are
>> identical to the probes used within oligo (as built with
>> pdInfoBuilder) ... not a single probe no accounted for.  In case you
>> haven't dug into pdInfoBuilder before and the SQLite db behind, here
>> are some commands you may find useful ...
>>
>> ---
>> library(pd.hugene.1.0.st.v1)
>> library(pdInfoBuilder)
>> fn <- dir("rawData/tissues/HuGene-1_0-st-v1","CEL",full=TRUE)[1:3]
>> x <- read.celfiles(filenames=fn,pkgname="pd.hugene.1.0.st.v1")
>> pd <-getPlatformDesign(x)
>> ff <- dbGetQuery(db(pd), "select * from pmfeature")
>>
>> # three 3 lines speed up the splitting ...
>> ffs <- split(ff, substr(ff$fsetid,1,4))
>> ffs <- unlist( lapply(ffs, FUN=function(u) split(u,u$fsetid)),
>> recursive=FALSE)
>> names(ffs) <- substr(names(ffs), 6,nchar(names(ffs)))
>>
>> cs <- AffymetrixCelSet$fromName(name, chipType=chip)
>> cdf <- getCdf(cs)
>> cdfCells <- readCdfUnits(getPathname(cdf), units=NULL, readXY=FALSE,
>> readBases=FALSE, readExpos=FALSE,
>> readType=FALSE,
>> readDirection=FALSE,stratifyBy=c("pm"), readIndices=TRUE, verbose=0)
>>
>> un <- unique(ff$fsetid)
>> ids <- intersect(un,names(cdfCells))
>>
>> mf <- match(ids,names(ffs))
>> mc <- match(ids,names(cdfCells))
>>
>> matches <- matrix(NA,nr=length(ids),nc=3)
>> rownames(matches) <- ids
>> colnames(matches) <- c("pdInfoBuilder","unsupportedCDF","union")
>>
>> for(i in 1:nrow(matches)) {
>>  a <- cdfCells[[ ids[i] ]]$groups[[1]]$indices
>>  b <- ffs[[ ids[i] ]]$fid
>>  matches[i,1] <- length(b)
>>  matches[i,2] <- length(a)
>>  matches[i,3] <- length(union(a,b))
>>  cat(ids[i],"\n")
>> }
>> ---
>>
>> ... this gives ..
>>> matches[1:5,]
>>pdInfoBuilder unsupportedCDF union
>> 798132627 2727
>> 809500542 4242
>> 810031010 1010
>> 794811715 1515
>> 815587725 2525
>>> table(matches[,3]-matches[,1])
>>0
>> 33252
>>> table(matches[,3]-matches[,2])
>>0
>> 33252
>>
>> 3.  This doesn't address the problem of the missing probesets.  I'm
>> happy to go and collect these if people want them, but based on the
>> reply from Affymetrix (thanks to Mark Cowley for the leg work here),
>> they are probably 'low-coverage transcript clusters' that can be
>> 'safely ignored'.  See:
>>
>> http://article.gmane.org/gmane.science.biology.informatics.conductor/19738
>>
>>
>> SUMMARY:
>>
>> [aroma.affymetrix flavor='oligo'] = rma in affy, rma in oligo
>>
>> [aroma.affymetrix flavor='affyPLM'] = fitPLM in affyPLM
>>
>> ... where '=' means 'for all intents and purposes equivalent', not
>> strictly equal.
>>
>> Cheers,
>> Mark
>>
>>
>>
>>
>>
>> On 04/12/2008, at 7:43 AM, Henrik Bengtsson wrote:
>>
>>> Hi,
>>>
>>> thanks for sharing all this.
>>>
>>> On Tue, Dec 2, 2008 at 11:54 AM, pwhiteusa <[EMAIL PROTECTED]>
>>> wrote:

 Hi All,

 Here

Re: annotationData over network using Windows Shortcuts *.lnk (Was: Re: [aroma.affymetrix] Re: How do you analyze Gene ST Data?)

2008-12-03 Thread Henrik Bengtsson

Hi,

On Wed, Dec 3, 2008 at 6:36 AM, pwhiteusa <[EMAIL PROTECTED]> wrote:
>
> Sorry I missed the 0 and -1 at the edge of the screen. See below, but
> basically when the mode was set to 1 (executable), it returned the
> error (-1), the other modes all returned a 0.

thanks.  Then it wasn't what I thought.  However...

I've finally got access to a Windows network and I think I managed to
reproduce your problem.  It came down to a bug in filePath() of
R.utils that did not handle pathnames over networks with *backslashes*
at the beginning.  For instance, "RES2K3FS01/..." became
"\\RES2K3FS01/..." which in turn was interpreted as "/RES2K3FS01/..."
and that is not a known/valid path in Windows.

I've fixed this in R.utils v1.1.1.  Please install the new version by:

source("http://www.braju.com/R/hbLite.R";);
hbLite("R.utils");

and let me know if this solves the problem.  Then I'll commit the
updates to CRAN.

Thanks

/Henrik

>
> On Dec 2, 4:22 pm, "Henrik Bengtsson" <[EMAIL PROTECTED]> wrote:
>> Hi,
>>
>> thanks.  Almost there.  Unfortunately, you left out the one answer I
>> was most interest in; see below.
>>
>>
>>
>> On Tue, Dec 2, 2008 at 12:34 PM, pwhiteusa <[EMAIL PROTECTED]> wrote:
>>
>> > Hi Henrik,
>>
>> > The results are interspersed below:
>>
>> > On Nov 26, 4:27 pm, "Henrik Bengtsson" <[EMAIL PROTECTED]> wrote:
>> >> Hi,
>>
>> >> so this is a bit confusing.  I've been trying to reproduce it myself
>> >> over a local Windows network, but I cannot.  I did however locate a
>> >> potential problem where R incorrectly believes it does not have the
>> >> permission to read the file, which might be what you are experience.
>>
>> >> Could you please try the following?  If you get an error, please
>> >> report what traceback() gives.
>>
>> >> library("aroma.affymetrix");
>>
>> >> # This should be the absolute pathname to the CDF file
>> >> pathname <- 
>> >> "//RES2K3FS01/RESAffymetrix/ANNOTATION/aromaAffymetrix/annotationData/chipTypes/MoGene-1_0-st-v1/MoGene-1_0-st-v1,r3.cdf"
>>
>> >> # Assert that the path is correct
>> >> stopifnot(isDirectory(dirname(pathname)));
>>
>> > TRUE
>>
>> >> # Get some file info
>> >> print(as.list(file.info(pathname)));
>>
>> > $size
>> > [1] 18486014
>>
>> > $isdir
>> > [1] FALSE
>>
>> > $mode
>> > [1] "666"
>>
>> > $mtime
>> > [1] "2008-10-23 10:26:09 EDT"
>>
>> > $ctime
>> > [1] "2008-11-20 12:45:36 EST"
>>
>> > $atime
>> > [1] "2008-11-29 00:30:18 EST"
>>
>> > $exe
>> > [1] "no"
>>
>> >> # Check file permissions
>
> print(file.access(pathname, mode=0));
>
> //RES2K3FS01/RESAffymetrix/ANNOTATION/aromaAffymetrix/annotationData/
> chipTypes/MoGene-1_0-st-v1/MoGene-1_0-st-v1,r3.cdf
>
> 0
>
> print(file.access(pathname, mode=1));
> //RES2K3FS01/RESAffymetrix/ANNOTATION/aromaAffymetrix/annotationData/
> chipTypes/MoGene-1_0-st-v1/MoGene-1_0-st-v1,r3.cdf
>
> -1
> print(file.access(pathname, mode=2));
> //RES2K3FS01/RESAffymetrix/ANNOTATION/aromaAffymetrix/annotationData/
> chipTypes/MoGene-1_0-st-v1/MoGene-1_0-st-v1,r3.cdf
>
> 0
> print(file.access(pathname, mode=4));
> //RES2K3FS01/RESAffymetrix/ANNOTATION/aromaAffymetrix/annotationData/
> chipTypes/MoGene-1_0-st-v1/MoGene-1_0-st-v1,r3.cdf
>
> 0
>
>
>> > All return the same:
>>
>> > //RES2K3FS01/RESAffymetrix/ANNOTATION/aromaAffymetrix/annotationData/
>> > chipTypes/MoGene-1_0-st-v1/MoGene-1_0-st-v1,r3.cdf
>>
>> HERE: That is the "name" of the result. They also return either 0 or
>> -1.  What is the value for each of them?  My hypothesis is that they
>> return an incorrect value (-1).  If so, I know why and have a
>> workaround.
>>
>> /Henrik
>>
>>
>>
>> >> # Try to read the CDF reader using affxparser
>> >> hdr <- readCdfHeader(pathname);
>> >> str(hdr);
>>
>> > List of 12
>> >  $ ncols  : int 1050
>> >  $ nrows  : int 1050
>> >  $ nunits : int 35512
>> >  $ nqcunits   : int 1
>> >  $ refseq : chr ""
>> >  $ chiptype   : chr "RES2K3FS01/RESAffymetrix/ANNOTATION/
>> > aromaAffymetrix/annotationData/chipTypes/MoGene-1_0-st-v1/MoGene-1_0-
>> > st-v1,r3"
>> >  $ filename   : chr "RES2K3FS01/RESAffymetrix/ANNOTATION/
>> > aromaAffymetrix/annotationData/chipTypes/MoGene-1_0-st-v1/MoGene-1_0-
>> > st-v1,r3.cdf"
>> >  $ rows   : int 1050
>> >  $ cols   : int 1050
>> >  $ probesets  : int 35512
>> >  $ qcprobesets: int 1
>> >  $ reference  : chr ""
>>
>> >> cdf <- AffymetrixCdfFile(pathname);
>> >> print(cdf);
>>
>> > AffymetrixCdfFile:
>> > Path: \\RES2K3FS01/RESAffymetrix/ANNOTATION/aromaAffymetrix/
>> > annotationData/chipTypes/MoGene-1_0-st-v1
>> > Filename: MoGene-1_0-st-v1,r3.cdf
>> > Filesize: 17.63MB
>> > Chip type: RES2K3FS01/RESAffymetrix/ANNOTATION/aromaAffymetrix/
>> > annotationData/chipTypes/MoGene-1_0-st-v1/MoGene-1_0-st-v1,r3
>> > RAM: 0.00MB
>> > File format: v4 (binary; XDA)
>> > Dimension: 1050x1050
>> > Number of cells: 1102500
>> > Number of units: 35512
>> > Cells per unit: 31.05
>> > Number of QC units: 1
>>
>> >> pathname2 <- AffymetrixCdfFile$findByChipType("MoGene-1_0-st-v1",