[aroma.affymetrix] Re: about snp 250sty paired CN

Henrik Bengtsson Thu, 30 Oct 2008 09:08:45 -0700

Hi,

this is probably due to a corrupt transfer of the CEL file.


Since the area under each curve should be one, it is clear that for
the red curve some of the density is not shown.  I'm quite sure it is
all your zeros that are not shown, because they are all -Inf on the
log-scale.  So, you have a lot of zeros in Array #91 (and maybe others
too).  Why?  See below.  How to avoid/detect it?  See below.

I've seen this myself and I know of at least one more case where a CEL
file looked perfectly fine, but it had a lot of zeros.  In my case I
copied the files to an external harddrive, [flew across the world],
and copied them to another server.  It was only by chance from looking
at spatial plots that I discovered perfectly horizontal short white
stripes all over the place - these were all due to zero signals.  When
I looked at the CEL file, not only the intensities, but also the
stddvs and the number of pixels were all zeros, which didn't make
sense.  I went back to the source server and ran MD5
[http://en.wikipedia.org/wiki/MD5] on each CEL file to generate
hashcodes and discovered that my files there and here didn't match.
Something when wrong in the transfer.  I believe it might have to do
with copying to an external USB drive.  One hypothesis I have is that
first an empty file of the *correct size* on the drive is allocated,
and then the file is copied block by block.  Blocks are transfer via
the computers file cache it might have been that all blocks were not
written to the drive.  This is why you have to "eject" the drive
before pulling the USB cable (I think I did it in my case, but I
cannot remember).

To avoid this, when you transfer CEL files, the safest is to gzip/zip
them, because that will detect if something went wrong.  You cannot
gunzip/unzip a file having patches of missing blocks (zeros).  That
will give an error.

I would go back to the source of where you've got the CEL file and
retransfer it (if it still exists).

If you have aroma.affymetrix installed on your source computer as
well, then you can generate MD5 checksums before transferring the
files, by doing:

cdf <- AffymetrixCdfFile$byChipType("Mapping250K_Sty");
cs <- AffymetrixCelSet$byName("snp_all", cdf=cdf);
res <- lapply(cs, writeChecksum);

which will generated *.CEL.md5 files.  Then copy/transfer both *.CEL
and *.CEL.md5 files.  On the new computer, do:

cdf <- AffymetrixCdfFile$byChipType("Mapping250K_Sty");
cs <- AffymetrixCelSet$byName("snp_all", cdf=cdf);
res <- lapply(cs, validateChecksum);

This will *validate* each CEL file against the known MD5 checksum (the
value in the *.CEL.md5 file).  If a discrepancy is found, an error
will be thrown.

You can use the above approach to generate *.CEL.md5 files on the
source computer, copy only the *.CEL.md5 files to your existing
directory and run the validation test.  That will help you identify
which files when wrong.  Your Array #91 will give an error if the
zeros were introduced during the transfer.

Hope this helps

Henrik


On Thu, Oct 30, 2008 at 6:58 AM, hailei zhang <[EMAIL PROTECTED]> wrote:
> Hi Henrik,
>
>
> The attached is the density picture of Array 91:100, the red curve is Array
> 91. Thanks.
>
> Hailei
>
> On Thu, Oct 30, 2008 at 9:39 AM, [EMAIL PROTECTED] <[EMAIL PROTECTED]>
> wrote:
>>
>>
>> Hi Henrik,
>>
>> I plot the density of 91:100 arrays, and found that the Array 91 is
>> too different from others. It's density value is much lower than
>> others.
>> I do not kown how to put the density picture here. I will send it to
>> your private email.
>>
>> I think if I remove this array, then I will get the object. I will try
>> it later.
>>
>> Thanks.
>>
>> Hailei
>>
>> On Oct 29, 5:42 pm, "Henrik Bengtsson" <[EMAIL PROTECTED]> wrote:
>> > Hi.
>> >
>> >
>> >
>> >
>> >
>> > On Wed, Oct 29, 2008 at 1:03 PM, hailei zhang <[EMAIL PROTECTED]>
>> > wrote:
>> > > Hi,
>> >
>> > > I have 228 paired snp arrays, the platform is snp250k_sty. I following
>> > > the
>> > > steps which is provied on the aroma website.
>> > > When I runing the step of "Calibration for allelic crosstalk", I meet
>> > > the
>> > > problem. After I run this commond "csAcc <-process(acc,verbose=log)",I
>> > > can
>> > > not get the csAcc objec.
>> >
>> > > Thanks.
>> >
>> > > Hailei
>> >
>> > > When runing process function, I got this information:
>> > >  List of 2
>> > >     $ snps   :List of 6
>> > >      ..$ A/C: int [1:128036, 1:2] 2624 2678 2704 2706 2822 2840 3108
>> > > 3180
>> > > 3262 3488 ...
>> > >      .. ..- attr(*, "dimnames")=List of 2
>> > >      .. .. ..$ : NULL
>> > >      .. .. ..$ : chr [1:2] "A" "C"
>> > >      ..$ A/G: int [1:557849, 1:2] 2588 2594 2604 2630 2636 2640 2642
>> > > 2648
>> > > 2662 2682 ...
>> > >      .. ..- attr(*, "dimnames")=List of 2
>> > >      .. .. ..$ : NULL
>> > >      .. .. ..$ : chr [1:2] "A" "G"
>> > >      ..$ A/T: int [1:103336, 1:2] 2634 2654 2780 2800 2916 3054 3260
>> > > 3388
>> > > 3586 3622 ...
>> > >      .. ..- attr(*, "dimnames")=List of 2
>> > >      .. .. ..$ : NULL
>> > >      .. .. ..$ : chr [1:2] "A" "T"
>> > >      ..$ C/G: int [1:156296, 1:2] 2572 2714 2730 2740 3006 3016 3030
>> > > 3080
>> > > 3136 3170 ...
>> > >      .. ..- attr(*, "dimnames")=List of 2
>> > >      .. .. ..$ : NULL
>> > >      .. .. ..$ : chr [1:2] "C" "G"
>> > >      ..$ C/T: int [1:537767, 1:2] 2576 2578 2580 2646 2664 2672 2676
>> > > 2680
>> > > 2708 2734 ...
>> > >      .. ..- attr(*, "dimnames")=List of 2
>> > >      .. .. ..$ : NULL
>> > >      .. .. ..$ : chr [1:2] "C" "T"
>> > >      ..$ G/T: int [1:118188, 1:2] 2602 2606 2656 2712 2854 2904 2966
>> > > 2990
>> > > 3004 3040 ...
>> > >      .. ..- attr(*, "dimnames")=List of 2
>> > >      .. .. ..$ : NULL
>> > >      .. .. ..$ : chr [1:2] "G" "T"
>> > >     $ nonSNPs: NULL
>> > >     - attr(*, "version")= int 1
>> > > 20081029 15:41:17|   Reading all probe intensities...
>> > > 20081029 15:41:18|   Reading all probe intensities...done
>> > > 20081029 15:41:18|   Fitting calibration model...
>> > > 20081029 15:41:18|    Allele probe-pair group #1 ('A/C') of 6...
>> > > 20081029 15:41:18|     Fitting...
>> > > 20081029 15:41:18|      Model/algorithm flavor: sfit
>> > > 20081029 15:41:18|      Model parameters:
>> > >       List of 3
>> > >        $ alpha: num [1:5] 0.1 0.075 0.05 0.03 0.01
>> > >        $ q    : num 2
>> > >        $ Q    : num 98
>> > > 20081029 15:41:18|      Number of data points: 128036
>> > > Error in solve.default(W) :
>> > >   system is computationally singular: reciprocal condition number =
>> > > 3.02973e-25
>> > > 20081029 15:41:19|     Fitting...done
>> > > 20081029 15:41:19|    Allele probe-pair group #1 ('A/C') of 6...done
>> > > 20081029 15:41:19|   Fitting calibration model...done
>> > > 20081029 15:41:19|  Array #91
>> > > ('CHAMS_p_Sty31_(CO-124089)_Mapping250K_Sty_H09_112458') of 228...done
>> > > 20081029 15:41:19| Calibrating 228 arrays...done
>> > > 20081029 15:41:19|Calibrating data set for allelic cross talk...done
>> >
>> > Wow, I have run ACC (using the sfit algorithm) on tons of CEL files
>> > and I never observed a singularity problem this far.  There is
>> > obviously something with Array #91 that causes the crosstalk estimator
>> > to fail.  You could do some simple density plots to see if there is
>> > something funny going on with Array #91, e.g.
>> >
>> > # Extract Arrays 91-100
>> > cs2 <- extract(cs, 91:100);
>> > col <- rep("black", nbrOfArrays(cs2));
>> > col[1] <- "red";
>> > plotDensity(cs2, types="pm", col=col, lwd=2, ylim=c(0,0.5));
>> >
>> > Does Array #91 ("red") look the same as the others?
>> >
>> > What about if you do some spatial plots of the probe intensities?
>> >  
>> > Seehttp://groups.google.com/group/aroma-affymetrix/web/exploratory-analy...
>> > for examples.  If you use the ArrayExplorer you can specify which
>> > arrays you want to plot, e.g. process(ae, arrays=91:95).  Does Array
>> > #91 look peculiar?
>> >
>> > If you can zip the
>> > CHAMS_p_Sty31_(CO-124089)_Mapping250K_Sty_H09_112458.CEL file and post
>> > it somewhere where I can download it, I can have a look at it. You can
>> > send me the download details offline to my private address if you want
>> > to.
>> >
>> > /Henrik
>> >
>> >
>> >
>> >
>> >
>> > > The following is my commonds:
>> >
>> > > library("aroma.affymetrix")
>> > > chipTypes <- c("Mapping250K_Sty")
>> > > cdfs <- AffymetrixCdfFile$byChipType(chipTypes)
>> > > dataSetName <- "snp_all"
>> > > csRawList <- list();
>> > > pairs <-
>> > > read.table(file="/home/hzhang/aroma/pairs_all.txt",sep="\t",header=T)
>> > > pairs <- as.matrix(pairs)
>> > > dim(pairs)
>> > > cs <- AffymetrixCelSet$byName(dataSetName, chipType=chipTypes);
>> > > stopifnot(all(getNames(cs) %in% pairs));
>> > > csRawList[[chipTypes]] <- cs;
>> > > gis <- getGenomeInformation(cdfs)
>> > > sis <- getSnpInformation(cdfs)
>> > > log <- Arguments$getVerbose(-4,timestamp=TRUE)
>> > > cs <- csRawList[[chipTypes]]
>> > > acc <- AllelicCrosstalkCalibration(cs)
>> > > csAcc <-process(acc,verbose=log)
>> > > ls()- Hide quoted text -
>> >
>> > - Show quoted text -- Hide quoted text -
>> >
>> > - Show quoted text -
>> >>
>

--~--~---------~--~----~------------~-------~--~----~
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
"aroma.affymetrix" group.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/aroma-affymetrix?hl=en
-~----------~----~----~----~------~----~------~--~---

[aroma.affymetrix] Re: about snp 250sty paired CN

Reply via email to