[aroma.affymetrix] Re: Mouse diversity array --building the required files for aroma.affymetrix UGP, UFL

Ivanek, Robert Wed, 09 Jun 2010 02:57:17 -0700

Hi Henrik,

one more question to this array.
During the analysis (Step 4 - Normalization for PCR fragment-length
effect) I am getting the following error:


[2010-06-09 11:51:49] Exception: Cannot fit normalization function to
enzyme, because there are no (finite) data points that are unique to
this enzyme: 1
  at throw(Exception(...))
  at throw.default("Cannot fit normalization function to enzyme,
because there are no (finite) data points that are unique to this
enzyme: ", ee)
  at throw("Cannot fit normalization function to enzyme, because there
are no (finite) data points that are unique to this enzyme: ", ee)
  at normalizeFragmentLength.default(y, fragmentLengths = fl,
targetFcns = targetFcns, subsetToFit = subset, onMissing =
onMissing, ...)
  at normalizeFragmentLength(y, fragmentLengths = fl, targetFcns =
targetFcns, subsetToFit = subset, onMissing = onMissing, ...)
  at process.FragmentLengthNormalization(fln, verbose = verbose)

I think that this is caused by the fact that the UFL file contains
also "theoretical" values for all fragments and therefore there is no
single SNP/CN probe annoatated as produced by only one enzyme.
What do you think about filtering out fragments bigger than 5kb (or
even smaller)? They should not be amplified anyway during the PCR.

Best Regards

Robert



On Jun 3, 3:22 pm, Henrik Bengtsson <h...@stat.berkeley.edu> wrote:
> On Thu, Jun 3, 2010 at 12:21 PM, Ivanek, Robert <robert.iva...@fmi.ch> wrote:
> > Hi Henrik,
>
> > I think you are right, the fragment sizes are theoretical ones. I
> > would guess that the reason why also the long fragments are reported
> > is because the same SNP is present in short fragment produced by the
> > other enzyme.
>
> > Thank you very much for the patch.
>
> > Would you mind to update the MOUSEDIVm520650 chipType page and add
> > there the UGP and UFL files?
>
> Ideally users contribute with UGP and UFL too, though this time I've
> done it since I've already done most of the work.  Please compare to
> what you got when you did.
>
> /Henrik
>
>
>
> > Best Regards
>
> > Robert
>
> > On Jun 2, 6:47 pm, Henrik Bengtsson <h...@stat.berkeley.edu> wrote:
> >> Hi.
>
> >> On Wed, Jun 2, 2010 at 11:16 AM, Ivanek, Robert <robert.iva...@fmi.ch> 
> >> wrote:
> >> > HI Henrik,
>
> >> > I was a little bit investigating the error and I found out that some
> >> > of the fragments reported in NetAffx files are really long.
> >> > Why they got a negative value of -32768 and not a positive one?
>
> >> Thanks for reporting.  It turns out to be a bug in aroma.core causing
> >> it to censor values into [-32767,32768], whereas it should have been
> >> [-32768,32767].  Thus, the fragment lengths that are too large where
> >> written as 32768, which when read back became -32768 (that's how
> >> signed integers loops around when output of range).  That should have
> >> been written as 32767.
>
> >> I have fixed this in the next release of aroma.core.  Until that is
> >> released, you can install a patch as explained in:
>
> >>  http://aroma-project.org/howtos/updateOrPatch
>
> >> With the patch, you will get correct censoring and more informative
> >> warnings, e.g.
>
> >> Warning messages:
> >> 1: In updateDataColumn.AromaTabularBinaryFile(this, .con = con, .hdr = 
> >> hdr,  :
> >>   33 values to be assigned were out of range [-32768,32767] and
> >> therefore censored to fit the range. Of these, 33 values in
> >> [35102,655381] were too large.
> >> 2: In updateDataColumn.AromaTabularBinaryFile(this, .con = con, .hdr = 
> >> hdr,  :
> >>   21 values to be assigned were out of range [-32768,32767] and
> >> therefore censored to fit the range. Of these, 21 values in
> >> [50496,56758] were too large.
>
> >> About the very large fragment lengths:  My guess is that they are
> >> "theoretical" fragments lengths.  After running the PCR in the assay,
> >> very long fragments are not amplified and hence filtered out.  For the
> >> specific enzyme, you should not get any hybrization signal for very
> >> long fragments.  It is possible that you have signal from the cuts of
> >> the other enzyme.   Maybe someone else has a better explanation of why
> >> they are so long and still on the array?   You could also drop a
> >> message on the Affymetrix forums and ask.
>
> >> /Henrik
>
> >> > Robert
>
> >> > On Jun 1, 7:16 pm, "Ivanek, Robert" <robert.iva...@fmi.ch> wrote:
> >> >> Hi Henrik,
>
> >> >> Thanks for the answer and also the ACS file.
> >> >> I have one more question regarding the UFL file generation.
>
> >> >> I tried it by using the NettAffx and I got the following error:
>
> >> >> R> ufl <- AromaUflFile$allocateFromCdf(cdf, nbrOfEnzymes=2,
> >> >> tags=c("na30", "RI20100601"))
> >> >> R> csv <- AffymetrixNetAffxCsvFile$byChipType(chipType, tags=".na30");
> >> >> R> units <- importFrom(ufl, csv);
> >> >> Warning messages:
> >> >> 1: In updateDataColumn.AromaTabularBinaryFile(this, .con = con, .hdr =
> >> >> hdr,  :
> >> >>   Values to be assigned were out of range [-32767,32768] and therefore
> >> >> censored to fit the range.
> >> >> 2: In updateDataColumn.AromaTabularBinaryFile(this, .con = con, .hdr =
> >> >> hdr,  :
> >> >>   Values to be assigned were out of range [-32767,32768] and therefore
> >> >> censored to fit the range.
>
> >> >> R> csv <- AffymetrixNetAffxCsvFile$byChipType(chipType,
> >> >> tags=".cn.na30");
> >> >> R> units <- importFrom(ufl, csv);
> >> >> Warning messages:
> >> >> 1: In updateDataColumn.AromaTabularBinaryFile(this, .con = con, .hdr =
> >> >> hdr,  :
> >> >>   Values to be assigned were out of range [-32767,32768] and therefore
> >> >> censored to fit the range.
> >> >> 2: In updateDataColumn.AromaTabularBinaryFile(this, .con = con, .hdr =
> >> >> hdr,  :
> >> >>   Values to be assigned were out of range [-32767,32768] and therefore
> >> >> censored to fit the range.
>
> >> >> And the summary produce the following
> >> >> R> summary(ufl)
> >> >>  length           length.02
> >> >>  Min.   :-32768   Min.   :-32768
> >> >>  1st Qu.:   614   1st Qu.:   541
> >> >>  Median :  1146   Median :   997
> >> >>  Mean   :  1601   Mean   :  1466
> >> >>  3rd Qu.:  2195   3rd Qu.:  2000
> >> >>  Max.   : 22095   Max.   : 30002
> >> >>  NA's   :230775   NA's   :230775
>
> >> >> Would you be so kind and build also the UFL and UGP files?
>
> >> >> Best Regards
>
> >> >> Robert
>
> >> >> On May 30, 7:27 pm, Henrik Bengtsson <h...@stat.berkeley.edu> wrote:
>
> >> >> > Hi.
>
> >> >> > On Wed, May 26, 2010 at 3:24 PM, Ivanek, Robert 
> >> >> > <robert.iva...@fmi.ch> wrote:
> >> >> > > Dear Sir or Madam,
>
> >> >> > > I would like to analyse the copy number variation data from 
> >> >> > > Affymetrix
> >> >> > > Mouse Diversity Array. I have not found any information on your 
> >> >> > > website
> >> >> > > about this particular array.
>
> >> >> > I have created page for this:
>
> >> >> >http://aroma-project.org/chipTypes/MOUSEDIVm520650
>
> >> >> > > I have tried to build the annotation files
> >> >> > > which are required by aroma but without success. I have few 
> >> >> > > questions
> >> >> > > regarding that:
>
> >> >> > > 1: Is aroma.affymetrix able to analyse the "Mouse Diversity Array" ?
>
> >> >> > Yes, because there should be no reason why it shouldn't - it uses a
> >> >> > standard CDF etc.  As you've noted, UGP (and UFL) files have not been
> >> >> > created by anyone yet.
>
> >> >> > For CN analysis, at least the UGP (genome positions) annotation data
> >> >> > file needs to be there.
>
> >> >> > > 2: I tried to build the "UGP" file directly from NetAffx annotation
> >> >> > > files using the code on your website, however I am getting the 
> >> >> > > following
> >> >> > > error.
>
> >> >> > > ##
> >> >> > > library("aroma.affymetrix")
> >> >> > > ##
> >> >> > > ## create UGP from NetAffx files
> >> >> > > cdf <- AffymetrixCdfFile$byChipType("MOUSEDIVm520650")
> >> >> > > ##
> >> >> > > ## Creates an empty UGP file for the CDF, if missing.
> >> >> > > ugp <- AromaUgpFile$allocateFromCdf(cdf, tags=c("na30", 
> >> >> > > "RI20100526"))
> >> >> > > ##
> >> >> > > ## Import NetAffx unit position data
> >> >> > > csv <- AffymetrixNetAffxCsvFile$byChipType("MOUSEDIVm520650",
> >> >> > > otags=".na30")
>
> >> >> > > Error in 
> >> >> > > list(`AffymetrixNetAffxCsvFile$byChipType("MOUSEDIVm520650",
> >> >> > > tags = ".na30")` = <environment>,  :
>
> >> >> > > [2010-05-26 15:11:00] Exception: File format error of the tabular 
> >> >> > > file
> >> >> > > ('annotationData/chipTypes/MOUSEDIVm520650/NetAffx/MOUSEDIVm520650.na30.annot.csv'):
> >> >> > >  \
> >> >> > > line 1 did not have 12 elements
> >> >> > >  at throw(Exception(...))
> >> >> > >  at throw.default("File format error of the tabular file ('",
> >> >> > > getPathname(this), "'): ", ex$message)
> >> >> > >  at throw("File format error of the tabular file ('",
> >> >> > > getPathname(this), "'): ", ex$message)
> >> >> > >  at value[[3]](cond)
> >> >> > >  at tryCatchOne(expr, names, parentenv, handlers[[1]])
> >> >> > >  at tryCatchList(expr, classes, parentenv, handlers)
> >> >> > >  at tryCatch({
> >> >> > >  at verify.TabularTextFile(this, ...)
> >> >> > >  at verify(this, ...)
> >> >> > >  at this(...)
> >> >> > >  at newInstance.Class(clazz, ...)
> >> >> > >  at newInstance(clazz, ...)
> >> >> > >  at newInstance.Object(static, pathname)
> >> >> > >  at newInstance(static, pathname)
> >> >> > >  at method(static, ...)
> >> >> > >  at AffymetrixNetAffxCsvFile$byChipType("MOUSEDIVm520650", tags =
> >> >> > > ".na30")
> >> >> > > In addition: Warning message:
> >> >> > > In read.table(3L, header = TRUE, colClasses = c(NA_character_,
> >> >> > > NA_character_,  :
> >> >> > >  not all columns named in 'colClasses' exist
>
> >> >> > I had a look at the MOUSEDIVm520650.na30.annot.csv file.  The line
> >> >> > containing column names, that is:
>
> >> >> > "Probe Set ID","dbSNP RS ID","Chromosome","Physical
> >> >> > Position","Strand","Cytoband","Allele A","Allele B","Associated
> >> >> > Gene","Genetic Map","Fragment Enzyme Type Length Start Stop",
>
> >> >> > contains a trailing comma (,) that shouldn't be there ("file format
> >> >> > error").  This cause R to think there should be 12 and not 11 columns
> >> >> > in the data set.  Open the file in an editor and remove that trailing
> >> >> > comma and any whitespace after "Fragment Enzyme Type Length Start
> >> >> > Stop".  Then save the file.  That should solve the problem.
>
> >> >> > The other CSV file - MOUSEDIVm520650.cn.na30.annot.csv - does not have
> >> >> > this problem.
>
> >> >> > > 3. I tried it also by using the "manual" approach using the
> >> >> > > tab=delimited file, however it seems to me that the mitochondria 
> >> >> > > probes
> >> >> > > are skipped  (NA values in ugp[,1] but valid values in ugp[,2]).
>
> >> >> > The Affymetrix NetAffx CSV files use s "M" for the mitochondria
> >> >> > chromosome.  In aroma we encode this by integer 25.
>
> >> >> > > Another
> >> >> > > problem is that some positions for other chromosomes are not loaded 
> >> >> > > in
> >> >> > > properly (valid values in ugp[,1] but NA values in ugp[,2]).
>
> >> >> > You don't show how you read the data "manually", so it is hard to say
> >> >> > what you are doing wrong here.  But note that there are quite a few
> >> >> > arguments in read.table() that you need to set correctly in order to
> >> >> > read Affymetrix NetAffx CSV files (it doesn't make easier that
> >> >> > Affymetrix changes the file format once in a while and have stray
> >> >> > erroneous symbols such as the above comma).
>
> >> >> > Also, search our forum for 'MOUSEDIVm520650', because about a year ago
> >> >> > David Rosenberg disscussed this chip type and I think he did create
> >> >> > various annotation data files for the
>
> ...
>
> read more »

-- 
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
"aroma.affymetrix" group with website http://www.aroma-project.org/.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe and other options, go to http://www.aroma-project.org/forum/

[aroma.affymetrix] Re: Mouse diversity array --building the required files for aroma.affymetrix UGP, UFL

Reply via email to