[aroma.affymetrix] Re: Reproducing RMA with Gene ST data (Was: Re: [aroma.affymetrix] Re: How do you analyze Gene ST Data?)

2008-11-26 Thread Mark Robinson

Hi all.

Just to follow up on these comments here.

'fitPLM' with default parameters in the affyPLM package should give  
practically identical results to the 'standard' pipeline (RMA bg corr  
+ quantile + fit) within aroma.affymetrix, assuming the underlying  
annotation is the same.  This was an easy comparison back in the day  
of 3' IVT arrays.  Now, its a little more difficult.

If anyone is willing, I'd be keen to know if these two actually do  
give the same results on the newer chips i.e. is the underlying  
annotation the same?  I seem to recall that because these newer chips  
occasionally have probes that are shared amongst different probesets,  
that the older style affy package would not be able to handle it.  For  
example, Jim MacDonald's post:

http://article.gmane.org/gmane.science.biology.informatics.conductor/19184

Jim says there that you "don't want to use affy" for these chips (not  
100% sure why).  He suggests the whole pdInfoBuilder/oligo thing which  
at one time had some bugs, but is probably more stable now.  I haven't  
dug deeper as to whether the annotation that 'fitPLM' uses by default  
('hugene10st.db' presumably?) matches the annotation that would be  
used by aroma.affymetrix (the converted-to-binary 'unsupported' CDF  
file).  I know Mark Cowley did find some inconsistencies:

http://article.gmane.org/gmane.science.biology.informatics.conductor/19738

This makes me think that we may want to alternatively construct the  
XXGene 1.0 CDFs (XX=Hu,Mo,Ra) directly from the PGF/CLF files instead  
of from the unsupported CDF.  I suspect that there will only be minor  
changes, so I haven't looked too deeply into it.

Anyone want to check?

In addition to what Henrik says about flavour="affyPLM" ... for a lot  
of my work, there is definitely additional value in using the  
auxiliary information from the PLMs (i.e. weights, residuals) ... you  
don't get this directly with oligo/median polish.


Few more specific notes ...

>> library(affy)
>> TestData <- ReadAffy()
>> TestEset <- rma(TestData)
>>
>> If you plot(AromaEset[,1],TestEset[,1]) you can visualize how
>> different the data is.


I assume you ensured the probesets and samples are in the same order?   
(Or, is this somehow covered by the plot method ...)  I can't tell  
from this sequence of commands.  I don't know what this plot looks  
like, so I don't whether to be alarmed or not.


>> However, I noticed that my GeneST normalized data is quite different
>> from the data that I produce using the Affy package. When looking at
>> the controls on the array I see that the Aroma normalized data is
>> between 5-10% lower than that produced by the Bioconductor Affy
>> packages. However, for some probes this difference is can be quite
>> large (values are averaged across 16 samples):
>>
>> ProbeBionconductorAroma
>> 10341096 (neg_control)   6079 852
>> 10341735 (neg_control)   25   87
>> 10340969 (pos_control)   3953 1758
>> 10338477 (pos_control)   293  611
>>
>> Ultimately, when my downstream analysis looks for differential
>> expression the differences between the two analysis approaches become
>> minimal, but I did notice that the Aroma package seems to call more
>> control probes and probes with no known gene as being differentially
>> expressed (i.e. it looks noisier). These probes were all pretty close
>> to my 2 fold cutoff, and at 3 fold or greater the data looked the
>> same.


Tough to really tell.  You'd probably want to average on the log2  
scale, not the linear scale.  This could be a difference in median- 
polish versus robust PLM, or could be a differnce in annotation.   
Requires some digging.

Also, you may want to remove control probesets before doing DE  
analysis --- for one thing, you pay a slightly lesser penalty for  
multiple testing.

Cheers,
Mark


--
Mark Robinson
Epigenetics Laboratory, Garvan
Bioinformatics Division, WEHI
e: [EMAIL PROTECTED]
e: [EMAIL PROTECTED]
p: +61 (0)3 9345 2628
f: +61 (0)3 9347 0852
--





--~--~-~--~~~---~--~~
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
"aroma.affymetrix" group.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/aroma-affymetrix?hl=en
-~--~~~~--~~--~--~---



[aroma.affymetrix] Re: Generation of annotation for custom/unknown cel/cdfs

2008-11-26 Thread Henrik Bengtsson

Hi.

On Wed, Nov 26, 2008 at 2:08 PM, David Rosenberg
<[EMAIL PROTECTED]> wrote:
> Thank you much for your help.  A couple of follow-ups:
> 1. In terms of 'x', 'y' and 'other' chromosomes (and positions), should I
> just assign 'dummy' values? Should any be left NA?

For the human genome, I map ChrX=23 and ChrY=24.  This seems to be the
de facto standard.   For mitochondrial DNA, I've assigned it ChrM=25.
I don't know of a standard for this.

It is ok to have NAs, that is, for those units you cannot map
estimates to the genome.  They will not show up in the
ChromosomeExplorer and they will not be part of any segmentation
either and so.

> 2. Lacking a NetAffx csv file, how might I determine fragment lengths?

I have no aces for this.  If you know the restriction enzymes used,
you can find out at what locations they digest the DNA.  This
information will define the lengths of all digested fragments.  Then
you have to map the units in your CDF to these fragments.   I haven't
done this myself and you will have to turn to external resources for
identifying the locations etc.

You could also try to contact Affymetrix to see if they have this
information for your custom CDF.

Cheers

/Henrik

> Thanks again.
>
> On Nov 26, 2008 1:12 AM, "Henrik Bengtsson" <[EMAIL PROTECTED]> wrote:
>
>
> Hi.
>
> On Tue, Nov 25, 2008 at 10:56 AM, David Rosenberg
> <[EMAIL PROTECTED]> wrote: > > I apolog...
>
> The UGP and UFL files are handled by the AromaUgpFile and the
> AromaUflFile classes which are both subclasses of the
> AromaTabularBinaryFile which always defines the underlying binary file
> format (see
> http://groups.google.com/group/aroma-affymetrix/web/aromatabularbinaryfile).
>
> See Page 'Aroma Unit Genome Position (UGP) files'
> (http://groups.google.com/group/aroma-affymetrix/web/unit-genome-position-ugp-files)
> for examples how to import NetAffx CSV data or other data to UGP files
> or manually assign the values.  Note that the records of an UGP file
> (and also an UFL file) are indexed by units, that is, each row maps to
> the corresponding unit (with the same index) in the CDF.  Since your
> unit names (we avoid the term probeset because a unit can contain
> multiple groups/sets of probes) have format
> -- you can infer the (unit,
> chromosome, position) from the unit names alone, in case you don't
> have an NetAffx CSV file.  For example:
>
> unitNames <- getUnitNames(cdf);
> pattern <- "(.*)-(.*)-(.*)";
> chr <- gsub(pattern, "\\2", unitNames);
> pos <- gsub(pattern, "\\3", unitNames);
>
> chr <- as.integer(chr);  # If there are 'X' and 'Y' you have to deal
> with those too.
> pos <- as.integer(pos);
>
> Then, allocate an empty UGP file and assign (chr, pos) to it:
>
> ugp <- AromaUgpFile$allocateFromCdf(cdf, tags="mm33,DM20081126");
> ugp[,1] <- chr;
> ugp[,2] <- pos;
>
> That should do.
>
> You can do similar things for your UFL file.  See Page 'Unit Fragment
> Length (UFL) files'
> (http://groups.google.com/group/aroma-affymetrix/web/unit-fragment-length-ufl-files)
> for further details.
>
> Hope this helps
>
> Henrik
>
>> > > > >
>

--~--~-~--~~~---~--~~
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
"aroma.affymetrix" group.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/aroma-affymetrix?hl=en
-~--~~~~--~~--~--~---



[aroma.affymetrix] Re: Generation of annotation for custom/unknown cel/cdfs

2008-11-26 Thread David Rosenberg


Thank you much for your help.  A couple of follow-ups:
1. In terms of 'x', 'y' and 'other' chromosomes (and positions), should I 
just assign 'dummy' values? Should any be left NA?
2. Lacking a NetAffx csv file, how might I determine fragment lengths?
Thanks again.

On Nov 26, 2008 1:12 AM, "Henrik Bengtsson" <[EMAIL PROTECTED]> wrote:


Hi.

On Tue, Nov 25, 2008 at 10:56 AM, David Rosenberg <
[EMAIL PROTECTED]> wrote: > > I apolog...
The UGP and UFL files are handled by the AromaUgpFile and the
AromaUflFile classes which are both subclasses of the
AromaTabularBinaryFile which always defines the underlying binary file
format (see 
http://groups.google.com/group/aroma-affymetrix/web/aromatabularbinaryfile).

See Page 'Aroma Unit Genome Position (UGP) files'
(
http://groups.google.com/group/aroma-affymetrix/web/unit-genome-position-ugp-files
)
for examples how to import NetAffx CSV data or other data to UGP files
or manually assign the values.  Note that the records of an UGP file
(and also an UFL file) are indexed by units, that is, each row maps to
the corresponding unit (with the same index) in the CDF.  Since your
unit names (we avoid the term probeset because a unit can contain
multiple groups/sets of probes) have format
-- you can infer the (unit,
chromosome, position) from the unit names alone, in case you don't
have an NetAffx CSV file.  For example:

unitNames <- getUnitNames(cdf);
pattern <- "(.*)-(.*)-(.*)";
chr <- gsub(pattern, "\\2", unitNames);
pos <- gsub(pattern, "\\3", unitNames);

chr <- as.integer(chr);  # If there are 'X' and 'Y' you have to deal
with those too.
pos <- as.integer(pos);

Then, allocate an empty UGP file and assign (chr, pos) to it:

ugp <- AromaUgpFile$allocateFromCdf(cdf, tags="mm33,DM20081126");
ugp[,1] <- chr;
ugp[,2] <- pos;

That should do.

You can do similar things for your UFL file.  See Page 'Unit Fragment
Length (UFL) files'
(
http://groups.google.com/group/aroma-affymetrix/web/unit-fragment-length-ufl-files
)
for further details.

Hope this helps

Henrik

> > > > --~--~-~--~~~---~--~~ When reporting 
problems on aroma.af...


--~--~-~--~~~---~--~~
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
"aroma.affymetrix" group.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/aroma-affymetrix?hl=en
-~--~~~~--~~--~--~---



[aroma.affymetrix] Re: annotationData over network using Windows Shortcuts *.lnk (Was: Re: [aroma.affymetrix] Re: How do you analyze Gene ST Data?)

2008-11-26 Thread Henrik Bengtsson

Hi,

so this is a bit confusing.  I've been trying to reproduce it myself
over a local Windows network, but I cannot.  I did however locate a
potential problem where R incorrectly believes it does not have the
permission to read the file, which might be what you are experience.

Could you please try the following?  If you get an error, please
report what traceback() gives.

library("aroma.affymetrix");

# This should be the absolute pathname to the CDF file
pathname <- 
"//RES2K3FS01/RESAffymetrix/ANNOTATION/aromaAffymetrix/annotationData/chipTypes/MoGene-1_0-st-v1/MoGene-1_0-st-v1,r3.cdf"

# Assert that the path is correct
stopifnot(isDirectory(dirname(pathname)));

# Get some file info
print(as.list(file.info(pathname)));

# Check file permissions
print(file.access(pathname, mode=0));
print(file.access(pathname, mode=1));
print(file.access(pathname, mode=2));
print(file.access(pathname, mode=4));

# Try to read the CDF reader using affxparser
hdr <- readCdfHeader(pathname);
str(hdr);

cdf <- AffymetrixCdfFile(pathname);
print(cdf);

pathname2 <- AffymetrixCdfFile$findByChipType("MoGene-1_0-st-v1",
tags="r3", verbose=-100);
print(pathname2);

print(as.list(file.info(pathname2)));

cdf <- AffymetrixCdfFile$byChipType("MoGene-1_0-st-v1", tags="r3",
verbose=-100);
print(pathname2);


/Henrik


On Wed, Nov 26, 2008 at 8:46 AM, pwhiteusa <[EMAIL PROTECTED]> wrote:
>
> Hi Henrik,
>
> This was the output this time around:
>
>> path2 <- filePath(path, "chipTypes", "MoGene-1_0-st-v1")
>> print(path2);
> [1] "//RES2K3FS01/RESAffymetrix/ANNOTATION/aromaAffymetrix/
> annotationData/chipTypes/MoGene-1_0-st-v1"
>> print(file.info(path2));
>
> size
> //RES2K3FS01/RESAffymetrix/ANNOTATION/aromaAffymetrix/annotationData/
> chipTypes/MoGene-1_0-st-v10
>
> isdir
> //RES2K3FS01/RESAffymetrix/ANNOTATION/aromaAffymetrix/annotationData/
> chipTypes/MoGene-1_0-st-v1  TRUE
>
> mode
> //RES2K3FS01/RESAffymetrix/ANNOTATION/aromaAffymetrix/annotationData/
> chipTypes/MoGene-1_0-st-v1  777
>
> mtime
> //RES2K3FS01/RESAffymetrix/ANNOTATION/aromaAffymetrix/annotationData/
> chipTypes/MoGene-1_0-st-v1 2008-11-20 12:45:36
>
> ctime
> //RES2K3FS01/RESAffymetrix/ANNOTATION/aromaAffymetrix/annotationData/
> chipTypes/MoGene-1_0-st-v1 2008-11-20 12:45:36
>
> atime
> //RES2K3FS01/RESAffymetrix/ANNOTATION/aromaAffymetrix/annotationData/
> chipTypes/MoGene-1_0-st-v1 2008-11-25 18:07:38
>
> exe
> //RES2K3FS01/RESAffymetrix/ANNOTATION/aromaAffymetrix/annotationData/
> chipTypes/MoGene-1_0-st-v1  no
>
>> print(list.files(path=path2));
> [1] "MoGene-1_0-st-v1,r3.cdf"
>
> Cheers,
>
> Peter
>
>
> On Nov 24, 2:59 pm, "Henrik Bengtsson" <[EMAIL PROTECTED]> wrote:
>> Hi,
>>
>> one typo by me, please retry with:
>>
>> path2 <- filePath(path, "chipTypes", "MoGene-1_0-st-v1");
>>
>> [I forgot 'chipTypes']
>>
>> /Henrik
>>
>> On Mon, Nov 24, 2008 at 11:56 AM, pwhiteusa <[EMAIL PROTECTED]> wrote:
>>
>> > The answers are dispersed below
>>
>> >> > I just wanted to update you on the issue of creating Windows shortcuts
>> >> > to point to the correct directory.
>>
>> >> > So the CDF files I want to use is on a mapped network drive - it is
>> >> > mapped to my "P Drive":
>>
>> >> > P:\ANNOTATION\aromaAffymetrix\annotationData\chipTypes\HuGene-1_0-st-
>> >> > v1\HuGene-1_0-st-v1,r3.cdf
>>
>> >> > I created the following shortcut in my working experiment directory,
>> >> > "P:\EXPERIMENTS\NCHRI\TEST\CEL\annotationData.lnk", with the following
>> >> > in the Target box:
>>
>> >> > P:\ANNOTATION\aromaAffymetrix\annotationData
>>
>> >> Did you create that by "drag'n'drop' by a right clicking or did you
>> >> edit the "Target box" manually?  Always do the drag'n'drop, because
>> >> the property box of a Windows Shortcut does not show all internal
>> >> fields or the shortcut and editing it is not safe.
>>
>> > Yes, I used drag and drop. I just checked the Target box to confirm.
>>
>> >> > When I type in the command
>>
>> >> > cdf <- AffymetrixCdfFile$fromChipType("MoGene-1_0-st-v1",tags="r3")
>>
>> >> > it throws an error because it is unable to find the file. Displaying
>> >> > the path reveals:
>>
>> >> >> Arguments$getReadablePath("annotationData")
>> >> > [1] "//RES2K3FS01/RESAffymetrix/ANNOTATION/aromaAffymetrix/
>> >> > annotationData"
>>
>> >> What happens if you do:
>>
>> >> path <- Arguments$getReadablePath("annotationData");
>> >> print(list.files(path=path));
>>
>> > $ path
>> > [1] "//RES2K3FS01/RESAffymetrix/ANNOTATION/aromaAffymetrix/
>> > annotationData"
>> > $ print(list.files(path=path))
>> > [1] "chipTypes"
>>
>> >> path2 <- filePath(path, "MoGene-1_0-st-v1");
>> >> print(path2);
>> >> print(file.info(path2));
>> >> print(list.files(path=path2));
>>
>> > $ print(path2);
>> > [1] "//RES2K3FS01/RESAffymetrix/ANNOTATION/aromaAffymetrix/
>> > annotationData/MoGene-1_0-st-v1"
>> > $ print(file.info(path2));
>>
>> > size
>> > //RES2K3FS01/RESAffymetrix/ANNOTATION/aromaAffymetrix/annotationData/
>> > MoGene-1_0-st-v1   NA
>>
>> > isdir
>> > //R

[aroma.affymetrix] Re: Reproducing RMA with Gene ST data (Was: Re: [aroma.affymetrix] Re: How do you analyze Gene ST Data?)

2008-11-26 Thread Henrik Bengtsson

Hi.

a comment on RmaPlm and argument 'flavor':  The RmaPlm class is only
summarizing the probe signals - normalization etc are done before
RmaPlm.   The summarization model is the log-additive model with probe
affinities and chip effects.   The 'flavor' argument specifies which
implementation of the fitting algorithm to use.

The default is "affyPLM", which indicates that it uses the
implementation in 'affyPLM', which now has moved to the
'preprocessCore' package.  Both 'affyPLM' and 'preprocessCore' are
developed and maintained by Ben Bolstad.

The "oligo" flavor was using the fitting algorithm in the 'oligo'
package, which I think in turn originates from the 'affy' package,
which in turn was an early version of Ben Bolstad's code.  I haven't
tried this in awhile, and it appears that 'oligo' has been updated and
the internal function for fitting the model is no longer available.
This is why you get the error message.

Since BB maintain preprocessCore and there has been a lot of
improvements in the algorithm (e.g. it supports probe-specific weights
and much more), I recommend that you use the "affyPLM" flavor.  Since
this is default, you do not have to specify the argument 'flavor' at
all.

For reproducibility of the RMA pipeline in aroma.affymetrix when
compared with 'affyPLM', see Page 'Reproducibility of other
implementations':

  http://groups.google.com/group/aroma-affymetrix/web/redundancy-tests

As you see, the results are remarkable similar.  FYI, for each update
of the package this comparison is part of the redundancy testing.
Note that only PMs are quantile normalized, this might be a reason for
you observing differences.  There is more meat on how
QuantileNormalization and plotDensity() behaves on Page 'Empirical
probe-signal densities and rank-based quantile normalization':

  
http://groups.google.com/group/aroma-affymetrix/web/empirical-probe-signal-densities-and-rank-based-quantile-normalization

I am also not sure how similar affyPLM and Affy is, but I trust
affyPLM more than Affy, because I see affyPLM as a more up to date
version of Affy.  Please feel free to contribute with code for
comparing toward Affy.  If you do, please use the same data set.

Cheers

Henrik


On Wed, Nov 26, 2008 at 12:44 PM, pwhiteusa <[EMAIL PROTECTED]> wrote:
>
> Hi Manasa,
>
> So I get exactly the same results with either of the following:
>
> plm <- RmaPlm(csN, flavor="affyPLM") #according to the documentation
> this is the default
> plm <- ExonRmaPlm(csN, flavor = "affyPLM", mergeGroups=TRUE) #seems to
> take longer to run
>
> However, I noticed that my GeneST normalized data is quite different
> from the data that I produce using the Affy package. When looking at
> the controls on the array I see that the Aroma normalized data is
> between 5-10% lower than that produced by the Bioconductor Affy
> packages. However, for some probes this difference is can be quite
> large (values are averaged across 16 samples):
>
> ProbeBionconductorAroma
> 10341096 (neg_control)   6079 852
> 10341735 (neg_control)   25   87
> 10340969 (pos_control)   3953 1758
> 10338477 (pos_control)   293  611
>
> Ultimately, when my downstream analysis looks for differential
> expression the differences between the two analysis approaches become
> minimal, but I did notice that the Aroma package seems to call more
> control probes and probes with no known gene as being differentially
> expressed (i.e. it looks noisier). These probes were all pretty close
> to my 2 fold cutoff, and at 3 fold or greater the data looked the
> same.
>
> Do you know where the packages diverge if their probe summation
> approach or is it a difference in background correction? I did attempt
> to use the flavor="oligo" (as described in the ?RmaPlm help file) but
> this returned the following error:
>
>> plm2 <- RmaPlm(csN, flavor="oligo")
>> fit(plm2)
> Exception: The fit function for requested RMA PLM flavor failed: oligo
>
> Thanks,
>
> Peter
>
> P.S. Here is a summary of the code I used for the Bioconductor
> approach:
>
> library(affy)
> TestData <- ReadAffy()
> TestEset <- rma(TestData)
>
> If you plot(AromaEset[,1],TestEset[,1]) you can visualize how
> different the data is.
>
> On Nov 24, 11:42 pm, ManasaR <[EMAIL PROTECTED]> wrote:
>> Hi,
>>
>> Just to say that i've been working with some GeneST1.0 data following
>> the tips above and it has been a breeze thanks to you guys. I noticed
>> something and thought i should mention it - though it might be trivial
>> and most users might have already discovered it. Just in case there
>> are people like me out there :-)
>>
>> With regards to the first reply Mark made above and the following
>> point,
>>
>> >1. Process Gene 1.0 ST data much the same as Exon array data, except
>> >that you'll need to replace 'ExonRmaPlm' with 'RmaPlm'
>>
>> RmaPlm does not have "mergeGroups" as an argument. Im assuming t

[aroma.affymetrix] Re: any way to get the name of the SNP as rs#?

2008-11-26 Thread Henrik Bengtsson

Hi,

On Wed, Nov 26, 2008 at 7:46 AM, mortiz <[EMAIL PROTECTED]> wrote:
>
> hi everyone,
>
> i was wondering if there is any function which gives as a result the
> names of the SNPs in their "rs#" form?

this mapping information is available in the Affymetrix NetAffx CSV
files, which you can download from the Affymetrix support page for
each chip type.  You find links from the chip type pages on the
aroma.affymetrix site.

An NetAffx CSV file is a large tab-delimited file. Put in under to
corresponding annotationData/ directory and aroma.affymetrix will be
able to find it.  Here is an example how to read such files in
aroma.affymetrix:

The pathname of the NetAffx CSV file is:
annotationData/chipTypes/Mapping250K_Nsp/Affymetrix/Mapping250K_Nsp.na26.annot.csv

library("aroma.affymetrix");
chipType <- "Mapping250K_Nsp";
csv <- AffymetrixNetAffxCsvFile$byChipType(chipType, tags=".na26");
print(csv);

AffymetrixNetAffxCsvFile:
Name: Mapping250K_Nsp.na26.annot
Tags:
Pathname: 
annotationData/chipTypes/Mapping250K_Nsp/Affymetrix/Mapping250K_Nsp.na26.annot.csv
File size: 426.48MB
RAM: 0.03MB
Number of data rows: NA
Columns [26]: 'probeSetID', 'affySNPID', 'dbSNPRSID', 'chromosome', 'physicalPos
ition', 'strand', 'chrXPseudo-autosomalRegion1', 'cytoband', 'flank', 'alleleA',
 'alleleB', 'associatedGene', 'geneticMap', 'microsatellite', 'fragmentEnzymeTyp
eLengthStartStop', 'alleleFrequencies', 'heterozygousAlleleFrequencies', 'number
Ofindividuals/NumberOfChromosomes', 'inHapmap', 'strandVersusdbSNP', 'copyNumber
Variation', 'probeCount', 'chrXPseudo-autosomalRegion2', 'inFinalList', 'minorAl
lele', 'minorAlleleFrequency'
Number of text lines: NA

# We only want to read two of the columns.  We use regular expression
to identify
# them by their names, and we read them as "character" strings.  Everything else
# will be ignored (the corresponding 'colClasses' will be set to "NULL").
colClassPatterns <- c("(probeSetID|dbSNPRSID)"="character");
df <- readDataFrame(csv, colClassPatterns=colClassPatterns, rows=10:20);
print(df);

  probeSetID  dbSNPRSID
10 SNP_A-1780611  rs7660291
11 SNP_A-1780610  rs1340013
12 SNP_A-1780415  rs7730126
13 SNP_A-1780576  rs6675133
14 SNP_A-1780413 rs10834942
15 SNP_A-1780412 rs17419765
16 SNP_A-1780574 rs12233450
17 SNP_A-1780379  rs7995987
18 SNP_A-1780572 rs10091369
19 SNP_A-1780378  rs7828844
20 SNP_A-1780377 rs11040883

To read all entries, just drop the 'rows' argument.

Hope this helps

Henrik



>
> thanks!
>
> maria
> >
>

--~--~-~--~~~---~--~~
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
"aroma.affymetrix" group.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/aroma-affymetrix?hl=en
-~--~~~~--~~--~--~---



[aroma.affymetrix] Re: How do you analyze Gene ST Data?

2008-11-26 Thread pwhiteusa

Hi Manasa,

So I get exactly the same results with either of the following:

plm <- RmaPlm(csN, flavor="affyPLM") #according to the documentation
this is the default
plm <- ExonRmaPlm(csN, flavor = "affyPLM", mergeGroups=TRUE) #seems to
take longer to run

However, I noticed that my GeneST normalized data is quite different
from the data that I produce using the Affy package. When looking at
the controls on the array I see that the Aroma normalized data is
between 5-10% lower than that produced by the Bioconductor Affy
packages. However, for some probes this difference is can be quite
large (values are averaged across 16 samples):

ProbeBionconductorAroma
10341096 (neg_control)   6079 852
10341735 (neg_control)   25   87
10340969 (pos_control)   3953 1758
10338477 (pos_control)   293  611

Ultimately, when my downstream analysis looks for differential
expression the differences between the two analysis approaches become
minimal, but I did notice that the Aroma package seems to call more
control probes and probes with no known gene as being differentially
expressed (i.e. it looks noisier). These probes were all pretty close
to my 2 fold cutoff, and at 3 fold or greater the data looked the
same.

Do you know where the packages diverge if their probe summation
approach or is it a difference in background correction? I did attempt
to use the flavor="oligo" (as described in the ?RmaPlm help file) but
this returned the following error:

> plm2 <- RmaPlm(csN, flavor="oligo")
> fit(plm2)
Exception: The fit function for requested RMA PLM flavor failed: oligo

Thanks,

Peter

P.S. Here is a summary of the code I used for the Bioconductor
approach:

library(affy)
TestData <- ReadAffy()
TestEset <- rma(TestData)

If you plot(AromaEset[,1],TestEset[,1]) you can visualize how
different the data is.

On Nov 24, 11:42 pm, ManasaR <[EMAIL PROTECTED]> wrote:
> Hi,
>
> Just to say that i've been working with some GeneST1.0 data following
> the tips above and it has been a breeze thanks to you guys. I noticed
> something and thought i should mention it - though it might be trivial
> and most users might have already discovered it. Just in case there
> are people like me out there :-)
>
> With regards to the first reply Mark made above and the following
> point,
>
> >1. Process Gene 1.0 ST data much the same as Exon array data, except
> >that you'll need to replace 'ExonRmaPlm' with 'RmaPlm'
>
> RmaPlm does not have "mergeGroups" as an argument. Im assuming this
> might be a default but do correct me if im wrong since i do want
> values for each transcript separately. So if you have error messages
> at this point,
>
> >plmTr <- RmaPlm(csN, mergeGroups=TRUE)
> >print(plmTr)
>
> this is probably why
>
> Also, i declared flavour = "affyPLM" since that's what i was using to
> analyse this data outside of aroma.affymetrix and i wasnt sure what
> the default was.
>
> Cheers,
>
> Manasa
>
> On Nov 25, 6:59 am, "Henrik Bengtsson" <[EMAIL PROTECTED]> wrote:
>
> > Hi,
>
> > one typo by me, please retry with:
>
> > path2 <- filePath(path, "chipTypes", "MoGene-1_0-st-v1");
>
> > [I forgot 'chipTypes']
>
> > /Henrik
>
> > On Mon, Nov 24, 2008 at 11:56 AM, pwhiteusa <[EMAIL PROTECTED]> wrote:
>
> > > The answers are dispersed below
>
> > >> > I just wanted to update you on the issue of creating Windows shortcuts
> > >> > to point to the correct directory.
>
> > >> > So the CDF files I want to use is on a mapped network drive - it is
> > >> > mapped to my "P Drive":
>
> > >> > P:\ANNOTATION\aromaAffymetrix\annotationData\chipTypes\HuGene-1_0-st-
> > >> > v1\HuGene-1_0-st-v1,r3.cdf
>
> > >> > I created the following shortcut in my working experiment directory,
> > >> > "P:\EXPERIMENTS\NCHRI\TEST\CEL\annotationData.lnk", with the following
> > >> > in the Target box:
>
> > >> > P:\ANNOTATION\aromaAffymetrix\annotationData
>
> > >> Did you create that by "drag'n'drop' by a right clicking or did you
> > >> edit the "Target box" manually?  Always do the drag'n'drop, because
> > >> the property box of a Windows Shortcut does not show all internal
> > >> fields or the shortcut and editing it is not safe.
>
> > > Yes, I used drag and drop. I just checked the Target box to confirm.
>
> > >> > When I type in the command
>
> > >> > cdf <- AffymetrixCdfFile$fromChipType("MoGene-1_0-st-v1",tags="r3")
>
> > >> > it throws an error because it is unable to find the file. Displaying
> > >> > the path reveals:
>
> > >> >> Arguments$getReadablePath("annotationData")
> > >> > [1] "//RES2K3FS01/RESAffymetrix/ANNOTATION/aromaAffymetrix/
> > >> > annotationData"
>
> > >> What happens if you do:
>
> > >> path <- Arguments$getReadablePath("annotationData");
> > >> print(list.files(path=path));
>
> > > $ path
> > > [1] "//RES2K3FS01/RESAffymetrix/ANNOTATION/aromaAffymetrix/
> > > annotationData"
> > > $ print(list.files(path=path))
> > > [1] "c

[aroma.affymetrix] Update on reported issues with generating PNGs using the built-in cairo device

2008-11-26 Thread Henrik Bengtsson

Hi,

previously a few people have reported problems when generating PNGs
using the new built-in cairo device.   From the NEWS file of R v2.8.0
*patched* [http://cran.r-project.org/bin/windows/base/NEWS.R-2.8.0pat]
their is one bug fix that might fix this problem:

BUG FIXES
...
o   bmp(), jpeg(), and png() on unix with type="cairo" would segfault
on closure if the output file could not be opened.

So, in case you experience segfaults/core dumps when running the
ChromosomeExplorer or ArrayExplorer, try to update to a newer version
of R v2.8.0 *patched*.  If anyone notices that this solves the
problem, please let us know.

Cheers

Henrik

--~--~-~--~~~---~--~~
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
"aroma.affymetrix" group.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/aroma-affymetrix?hl=en
-~--~~~~--~~--~--~---



[aroma.affymetrix] Re: Discussion on affymetrix-defined-transcript-clusters

2008-11-26 Thread Joe Fass

Thanks Henrik and Mark,

I had found the comment at the dChip forum that you found, Henrik, but
wasn't convinced ... I'll try contacting Affymetrix and try doing some
analysis, and post back here if that clears this up.

~Joe


On Nov 25, 6:19 pm, "Henrik Bengtsson" <[EMAIL PROTECTED]> wrote:
> I found this comment by Cheng Li at the dChip forum: "You can rename
> "HuEx-1_0-st-v2.pgf" to "HuEx-1_0-st-v1.pgf". I understand the two
> should be for the same physical array, but just different annotation
> of probes into probeset."
> [http://dchip.forum5.com/viewtopic.php?t=214&mforum=dchip]
>
> More a long the same lines from Affymetrix: "changed the chiptype
> setting in the CEL files from HuEx-1_0-st-v1 to HuEx-1_0-st-v2: This
> should mitigate chiptype mismatches when using 3rd party software with
> the unsupported CDF file."
> [http://www.affymetrix.com/analysis/netaffx/exon/announcements.affx]
>
> Don't take it is a fact, but as a start.  I'd say, send an email to
> Affymetrix support (see their webpage) for a definite answer.  If you
> find out more about this, please let us know.
>
> Also, at some stage there seems to have existed a 'HuEx-1_0-st-ta1'
> chip type as well, 
> cf.http://www.affymetrix.com/support/developer/powertools/changelog/file...
>
> Cheers
>
> Henrik
>
> On Tue, Nov 25, 2008 at 5:36 PM, Mark Robinson <[EMAIL PROTECTED]> wrote:
>
> > Hi Joe.
>
> > While I've analyzed several Exon array datasets, I have never seen any
> > HuEx-1_0-st-v1 CEL files. Did you get a very early version of these
> > chips?  From what I can tell from the Affymetrix web site, there is no
> > annotation or anything available for this.  But, typically the
> > annotation from Affy is versioned separately.  For example, from:
>
> >http://www.affymetrix.com/products_services/arrays/specific/exon.affx...
>
> > you can see that the 'Annotations CSV' has version na21, na22, etc.
> > This seems to be quite separate from the v1/v2 notation.
>
> > I know Ken had analyzed some data from an earlier version of the Exon
> > chip, but that was not made commerically available.
>
> > All this to say:  I don't know. :)
>
> > Cheers,
> > Mark
>
> > On 26/11/2008, at 12:22 PM, Henrik Bengtsson wrote:
>
> >> I'll let the experts on exon arrays answer this, but for clarification
> >> this question is related to page:
>
> >>http://groups.google.com/group/aroma-affymetrix/web/affymetrix-define...
>
> >> /Henrik
>
> >> On Tue, Nov 25, 2008 at 4:57 PM, Joe Fass <[EMAIL PROTECTED]>
> >> wrote:
>
> >>> This may not be the best place to ask this, but can I ask for a quick
> >>> clarification about the v1 / v2 notation?  Does this simply refer to
> >>> more updated annotations?  Is it valid to use the v2 cdf for v1 cel
> >>> files?
>
> > --
> > Mark Robinson
> > Epigenetics Laboratory, Garvan
> > Bioinformatics Division, WEHI
> > e: [EMAIL PROTECTED]
> > e: [EMAIL PROTECTED]
> > p: +61 (0)3 9345 2628
> > f: +61 (0)3 9347 0852
> > --
--~--~-~--~~~---~--~~
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
"aroma.affymetrix" group.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/aroma-affymetrix?hl=en
-~--~~~~--~~--~--~---



[aroma.affymetrix] Re: How do you analyze Gene ST Data?

2008-11-26 Thread pwhiteusa

Hi Henrik,

This was the output this time around:

> path2 <- filePath(path, "chipTypes", "MoGene-1_0-st-v1")
> print(path2);
[1] "//RES2K3FS01/RESAffymetrix/ANNOTATION/aromaAffymetrix/
annotationData/chipTypes/MoGene-1_0-st-v1"
> print(file.info(path2));
 
size
//RES2K3FS01/RESAffymetrix/ANNOTATION/aromaAffymetrix/annotationData/
chipTypes/MoGene-1_0-st-v10
 
isdir
//RES2K3FS01/RESAffymetrix/ANNOTATION/aromaAffymetrix/annotationData/
chipTypes/MoGene-1_0-st-v1  TRUE
 
mode
//RES2K3FS01/RESAffymetrix/ANNOTATION/aromaAffymetrix/annotationData/
chipTypes/MoGene-1_0-st-v1  777
 
mtime
//RES2K3FS01/RESAffymetrix/ANNOTATION/aromaAffymetrix/annotationData/
chipTypes/MoGene-1_0-st-v1 2008-11-20 12:45:36
 
ctime
//RES2K3FS01/RESAffymetrix/ANNOTATION/aromaAffymetrix/annotationData/
chipTypes/MoGene-1_0-st-v1 2008-11-20 12:45:36
 
atime
//RES2K3FS01/RESAffymetrix/ANNOTATION/aromaAffymetrix/annotationData/
chipTypes/MoGene-1_0-st-v1 2008-11-25 18:07:38
 
exe
//RES2K3FS01/RESAffymetrix/ANNOTATION/aromaAffymetrix/annotationData/
chipTypes/MoGene-1_0-st-v1  no

> print(list.files(path=path2));
[1] "MoGene-1_0-st-v1,r3.cdf"

Cheers,

Peter


On Nov 24, 2:59 pm, "Henrik Bengtsson" <[EMAIL PROTECTED]> wrote:
> Hi,
>
> one typo by me, please retry with:
>
> path2 <- filePath(path, "chipTypes", "MoGene-1_0-st-v1");
>
> [I forgot 'chipTypes']
>
> /Henrik
>
> On Mon, Nov 24, 2008 at 11:56 AM, pwhiteusa <[EMAIL PROTECTED]> wrote:
>
> > The answers are dispersed below
>
> >> > I just wanted to update you on the issue of creating Windows shortcuts
> >> > to point to the correct directory.
>
> >> > So the CDF files I want to use is on a mapped network drive - it is
> >> > mapped to my "P Drive":
>
> >> > P:\ANNOTATION\aromaAffymetrix\annotationData\chipTypes\HuGene-1_0-st-
> >> > v1\HuGene-1_0-st-v1,r3.cdf
>
> >> > I created the following shortcut in my working experiment directory,
> >> > "P:\EXPERIMENTS\NCHRI\TEST\CEL\annotationData.lnk", with the following
> >> > in the Target box:
>
> >> > P:\ANNOTATION\aromaAffymetrix\annotationData
>
> >> Did you create that by "drag'n'drop' by a right clicking or did you
> >> edit the "Target box" manually?  Always do the drag'n'drop, because
> >> the property box of a Windows Shortcut does not show all internal
> >> fields or the shortcut and editing it is not safe.
>
> > Yes, I used drag and drop. I just checked the Target box to confirm.
>
> >> > When I type in the command
>
> >> > cdf <- AffymetrixCdfFile$fromChipType("MoGene-1_0-st-v1",tags="r3")
>
> >> > it throws an error because it is unable to find the file. Displaying
> >> > the path reveals:
>
> >> >> Arguments$getReadablePath("annotationData")
> >> > [1] "//RES2K3FS01/RESAffymetrix/ANNOTATION/aromaAffymetrix/
> >> > annotationData"
>
> >> What happens if you do:
>
> >> path <- Arguments$getReadablePath("annotationData");
> >> print(list.files(path=path));
>
> > $ path
> > [1] "//RES2K3FS01/RESAffymetrix/ANNOTATION/aromaAffymetrix/
> > annotationData"
> > $ print(list.files(path=path))
> > [1] "chipTypes"
>
> >> path2 <- filePath(path, "MoGene-1_0-st-v1");
> >> print(path2);
> >> print(file.info(path2));
> >> print(list.files(path=path2));
>
> > $ print(path2);
> > [1] "//RES2K3FS01/RESAffymetrix/ANNOTATION/aromaAffymetrix/
> > annotationData/MoGene-1_0-st-v1"
> > $ print(file.info(path2));
>
> > size
> > //RES2K3FS01/RESAffymetrix/ANNOTATION/aromaAffymetrix/annotationData/
> > MoGene-1_0-st-v1   NA
>
> > isdir
> > //RES2K3FS01/RESAffymetrix/ANNOTATION/aromaAffymetrix/annotationData/
> > MoGene-1_0-st-v1    NA
>
> > mode
> > //RES2K3FS01/RESAffymetrix/ANNOTATION/aromaAffymetrix/annotationData/
> > MoGene-1_0-st-v1 
>
> > mtime
> > //RES2K3FS01/RESAffymetrix/ANNOTATION/aromaAffymetrix/annotationData/
> > MoGene-1_0-st-v1  
>
> > ctime
> > //RES2K3FS01/RESAffymetrix/ANNOTATION/aromaAffymetrix/annotationData/
> > MoGene-1_0-st-v1  
>
> > atime
> > //RES2K3FS01/RESAffymetrix/ANNOTATION/aromaAffymetrix/annotationData/
> > MoGene-1_0-st-v1  
>
> > exe
> > //RES2K3FS01/RESAffymetrix/ANNOTATION/aromaAffymetrix/annotationData/
> > MoGene-1_0-st-v1 
>
> > $ print(list.files(path=path2));
> > character(0)
> > Warning message:
> > In list.files(path = path2) :
> >  list.files: '//RES2K3FS01/RESAffymetrix/ANNOTATION/aromaAffymetrix/
> > annotationData/MoGene-1_0-st-v1' is not a readable directory
>
> >> > So it looks as though the issue is that "P:\" is being replaced by R
> >> > with the actual name of the Network drive. If I copy the
> >> > annotationData directory to my C:\ and create the shortcut to point
> >> > there it works just fine, but the arguments command returns "C:\":
>
> >> >> Arguments$getReadablePath("annotationData")
> >> > [1] "C:/annotationData"
>
> >> So, if I understand you correctly, you are saying that "shortcuting"
> >> to another location on the local C: drive works, but when you try to
> >> do the same for a drive unit (P:) mounted to a network drive, then you
> >> do not get the drive unit path, but th

[aroma.affymetrix] any way to get the name of the SNP as rs#?

2008-11-26 Thread mortiz

hi everyone,

i was wondering if there is any function which gives as a result the
names of the SNPs in their "rs#" form?

thanks!

maria
--~--~-~--~~~---~--~~
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
"aroma.affymetrix" group.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/aroma-affymetrix?hl=en
-~--~~~~--~~--~--~---