Hi Christian, I'm quite swamped but I've added links the scripts that I used to generate the existing NetAffx 26 (na26) UFL and UGP to each of the different SNP & CN chip type pages, e.g.
http://groups.google.com/group/aroma-affymetrix/web/mapping250k-nsp-mapping250k-sty More comments below. On Mon, Jan 12, 2009 at 6:24 AM, cstratowa <christian.strat...@vie.boehringer-ingelheim.com> wrote: > > Dear Henrik > > Meanwhile I have created ufl and ugp files for both 100K and 500K > arrays but not for GenomeWideSNP_6 aray. The above scripts will show you how to do it for GenomeWideSNP_6. Slightly more complicated since two NetAffx CSV files are involved. > > Can you confirm that the following code, which I use for both 100K and > 500K arrays, is correct: > > # retrieving annotation files > chiptypes <- c("Mapping50K_Hind240", "Mapping50K_Xba240") > cdfs <- lapply(chiptypes, FUN=function(x){AffymetrixCdfFile$byChipType > (x)}) > names(cdfs) <- chiptypes > print(cdfs) > > # importing data from NetAffx CSV files > csvs <- lapply(cdfs, FUN=function(cdf){AffymetrixNetAffxCsvFile > $byChipType(getChipType(cdf), tags=".na27")}) > print(csvs) > > # allocating empty UFL (Unit Fragment Length) files > ufls <- lapply(cdfs, FUN=function(cdf){AromaUflFile$allocateFromCdf > (cdf, tags="na27,CS20090112")}) > print(ufls) > > # import SNP data > units <- list(); > for (chipType in names(ufls)) { > ufl <- ufls[[chipType]]; > csv <- csvs[[chipType]]; > units[[chipType]] <- importFrom(ufl, csv, verbose=-50); > } > str(units) > > # allocating empty UGP (Unit Genome Position) files > ugps <- lapply(cdfs, FUN=function(cdf){AromaUgpFile$allocateFromCdf > (cdf, tags="na27,CS20090112")}) > print(ugps) > > # import SNP data > units <- list(); > for (chipType in names(ugps)) { > ugp <- ugps[[chipType]]; > csv <- csvs[[chipType]]; > units[[chipType]] <- importFrom(ugp, csv, verbose=-50); > } > str(units) This looks alright to me. You might want to check toward the posted NA26 scripts as well, because they are more recent. > > > Here is the summary for the 100K arrays: > # Summary 50K chips >> str(units) > List of 2 > $ Mapping50K_Hind240: int [1:57244] 18632 18677 1631 18713 1630 18712 > 18619 1639 18722 18608 ... > $ Mapping50K_Xba240 : int [1:58960] 29181 18239 31302 19831 47750 > 45114 19103 39711 19772 37811 ... >> >> ufl <- AromaUflFile$byChipType(chiptypes[1], tags="na27,CS20090112"); >> print(summaryOfUnits(ufl, enzymes="HindIII")) > snp cnp affxSnp other total > enzyme1-only 56933 0 0 0 56933 > missing 311 0 0 55 366 > total 57244 0 0 55 57299 The NA26 version gives: snp cnp affxSnp other total enzyme1-only 56933 0 0 0 56933 missing 311 0 0 55 366 total 57244 0 0 55 57299 >> ufl <- AromaUflFile$byChipType(chiptypes[2], tags="na27,CS20090112"); >> print(summaryOfUnits(ufl, enzymes="XbaI")) > snp cnp affxSnp other total > enzyme1-only 58616 0 0 0 58616 > missing 344 0 0 55 399 > total 58960 0 0 55 59015 NA26: snp cnp affxSnp other total enzyme1-only 58616 0 0 0 58616 missing 344 0 0 55 399 total 58960 0 0 55 59015 >> >> ugp <- AromaUgpFile$byChipType(chiptypes[1], tags="na27,CS20090112"); >> print(summary(ugp, enzymes="HindIII")) > chromosome position > Min. : 1.000 Min. : 48603 > 1st Qu.: 4.000 1st Qu.: 34667112 > Median : 7.000 Median : 72677620 > Mean : 8.402 Mean : 80405004 > 3rd Qu.: 12.000 3rd Qu.:114826216 > Max. : 23.000 Max. :246727435 > NA's :363.000 NA's : 363 NA26: > print(summary(ugp)); chromosome position Min. : 1.000 Min. : 48603 1st Qu.: 4.000 1st Qu.: 34667112 Median : 7.000 Median : 72677621 Mean : 8.402 Mean : 80405004 3rd Qu.: 12.000 3rd Qu.:114826216 Max. : 23.000 Max. :246727435 NA's :363.000 NA's : 363 > print(table(ugp[,1])); 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 4541 5072 3962 4342 4215 3968 3444 3549 2357 2743 2466 2592 2661 1931 1440 1145 17 18 19 20 21 22 23 985 1731 326 993 883 433 1157 >> ugp <- AromaUgpFile$byChipType(chiptypes[2], tags="na27,CS20090112"); >> print(summary(ugp, enzymes="XbaI")) > chromosome position > Min. : 1.000 Min. : 93683 > 1st Qu.: 4.000 1st Qu.: 34636629 > Median : 7.000 Median : 72249739 > Mean : 8.507 Mean : 80010574 > 3rd Qu.: 12.000 3rd Qu.:114666170 > Max. : 24.000 Max. :246885089 > NA's :390.000 NA's : 390 NA26: > print(summary(ugp)); chromosome position Min. : 1.000 Min. : 93683 1st Qu.: 4.000 1st Qu.: 34636629 Median : 7.000 Median : 72249739 Mean : 8.507 Mean : 80010574 3rd Qu.: 12.000 3rd Qu.:114666170 Max. : 24.000 Max. :246885089 NA's :390.000 NA's : 390 > print(table(ugp[,1])); 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 4669 5274 3864 4231 4149 4111 3612 3422 2439 2953 2896 2685 2567 2083 1590 1245 17 18 19 20 21 22 23 24 971 1837 364 1101 1027 330 1204 1 > > > Here is the summary for the 500K arrays: > # Summary 500K chips >> str(units) > List of 2 > $ Mapping250K_Sty: int [1:238304] 15133 175423 164715 237140 112643 > 189587 162193 79611 196992 73555 ... > $ Mapping250K_Nsp: int [1:262264] 34952 76 74370 232354 3677 72977 > 73533 176215 161345 238482 ... >> >> ufl <- AromaUflFile$byChipType(chiptypes[1], tags="na27,CS20090112"); >> print(summaryOfUnits(ufl, enzymes="StyI")) > snp cnp affxSnp other total > enzyme1-only 144868 0 0 0 144868 > missing 93436 0 0 74 93510 > total 238304 0 0 74 238378 NA26: snp cnp affxSnp other total enzyme1-only 237697 0 0 0 237697 missing 607 0 0 74 681 total 238304 0 0 74 238378 >> ufl <- AromaUflFile$byChipType(chiptypes[2], tags="na27,CS20090112"); >> print(summaryOfUnits(ufl, enzymes="NspI")) > snp cnp affxSnp other total > enzyme1-only 261563 0 0 0 261563 > missing 701 0 0 74 775 > total 262264 0 0 74 262338 NA26: snp cnp affxSnp other total enzyme1-only 261563 0 0 0 261563 missing 701 0 0 74 775 total 262264 0 0 74 262338 >> >> ugp <- AromaUgpFile$byChipType(chiptypes[1], tags="na27,CS20090112"); >> print(summary(ugp, enzymes="StyI")) > chromosome position > Min. : 1.000 Min. : 2994 > 1st Qu.: 4.000 1st Qu.: 31306881 > Median : 8.000 Median : 67082398 > Mean : 9.117 Mean : 77333484 > 3rd Qu.: 13.000 3rd Qu.:114799352 > Max. : 23.000 Max. :247135059 > NA's :677.000 NA's : 677 NA26: > print(summary(ugp)); chromosome position Min. : 1.000 Min. : 2994 1st Qu.: 4.000 1st Qu.: 31306881 Median : 8.000 Median : 67082398 Mean : 9.117 Mean : 77333484 3rd Qu.: 13.000 3rd Qu.:114799352 Max. : 23.000 Max. :247135059 NA's :677.000 NA's : 677 > print(table(ugp[,1])); 1 2 3 4 5 6 7 8 9 10 11 12 13 20273 19147 15396 13236 14854 14296 11822 12601 10900 14190 12928 11863 8052 14 15 16 17 18 19 20 21 22 23 7531 7323 8272 6403 6726 3670 6555 3182 3669 4812 >> ugp <- AromaUgpFile$byChipType(chiptypes[2], tags="na27,CS20090112"); >> print(summary(ugp, enzymes="NspI")) > chromosome position > Min. : 1.000 Min. : 17408 > 1st Qu.: 4.000 1st Qu.: 32574796 > Median : 8.000 Median : 70596240 > Mean : 8.758 Mean : 79224244 > 3rd Qu.: 13.000 3rd Qu.:114776300 > Max. : 23.000 Max. :247110269 > NA's :775.000 NA's : 775 NA26: > print(summary(ugp)); chromosome position Min. : 1.000 Min. : 17408 1st Qu.: 4.000 1st Qu.: 32574797 Median : 8.000 Median : 70596240 Mean : 8.758 Mean : 79224244 3rd Qu.: 13.000 3rd Qu.:114776301 Max. : 23.000 Max. :247110269 NA's :775.000 NA's : 775 > print(table(ugp[,1])); 1 2 3 4 5 6 7 8 9 10 11 12 13 19810 22178 18347 19016 17133 17097 13912 14820 11899 14241 13254 13026 11094 14 15 16 17 18 19 20 21 22 23 8165 6982 7005 4830 8136 2661 5823 3927 2511 5696 > > > Could it be that function summaryOfUnits() does not work as expected? > For ufl it prints "enzyme1-only" instead of e.g. "NspI-only" I've noticed that too; the enzyme name is not reported. But don't worry, the files will contain the same information regardless. > For ugp it prints an error: no applicable method summaryOfUnits() is only available for AromaUflFile objects, not AromaUgpFile objects. > > > Is it correct that for na27 there are now 93436 missing SNPs for StyI > compared to only 75 missing SNPs for na24 (as shown on your page about > building UFL files)? No, that does not seem to be correct. This could be an error in the NetAffx CSV file, e.g. once they [Affymetrix] added Windows newlines in funny place. However, since the same CSV file is used to import the UGP data and that seems to be alright, I don't think this is the case here. It could also be that they've changed to column names so that importFrom() does no longer recognize the format, but this sounds less likely since part of the SNPs are still found, and it seems correct on Mapping250K_Nsp. A more far fetched, but still not impossible, is that they are now reusing the annotation data from the GWS6 platform where the two enzymes have different distributions. This would be wrong to do, but I can see how someone could make such a mistake. Could you please see try to identify any obvious differences between the contents of the NA26 and NA27 NetAffx CSV files? They're available on Affymetrix webpage (link via the aroma.affymetrix chip type page). If they haven't added new column, a simple Unix diff might work. > > > Can I build the ufl and ugp files in the same way for GenomeWideSNP_6? > I assume that I need to repeat everything done for ".na27" with > ".cn.na27"? > Will these files be identical to the files supplied by you or do you > use the file "GenomeWideSNP_6_build36_SNPandCN.tab", which only you > have (as decribed in your pages)? See posted scripts on the GenomeWideSNP_6 chip type page: http://groups.google.com/group/aroma-affymetrix/web/genomewidesnp-6 Let me know if any this helps Henrik > > > Here is the sessionInfo: >> sessionInfo() > R version 2.7.1 (2008-06-23) > x86_64-unknown-linux-gnu > > locale: > C > > attached base packages: > [1] stats graphics grDevices datasets utils methods > base > > other attached packages: > [1] aroma.affymetrix_0.9.4 aroma.apd_0.1.3 > R.huge_0.1.6 > [4] affxparser_1.12.2 aroma.core_0.9.4 > sfit_0.1.5 > [7] aroma.light_1.8.1 digest_0.3.1 > matrixStats_0.1.3 > [10] R.rsp_0.3.4 R.cache_0.1.7 > R.utils_1.0.4 > [13] R.oo_1.4.5 R.methodsS3_1.0.3 >> > > Best regards > Christian > > > On Jan 8, 2:39 pm, cstratowa <christian.strat...@vie.boehringer- > ingelheim.com> wrote: >> Dear Henrik >> >> Would it be possible for you to supply the ufl and ugp files for the >> new Affymetrix xxx.na27.annot.csv files for the mapping arrays? >> >> Thank you in advance >> Best regards >> Christian > > > --~--~---------~--~----~------------~-------~--~----~ When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups "aroma.affymetrix" group. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe from this group, send email to aroma-affymetrix-unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/aroma-affymetrix?hl=en -~----------~----~----~----~------~----~------~--~---