[aroma.affymetrix] Re: ufl and ugp files for na27?

Henrik Bengtsson Wed, 14 Jan 2009 16:50:20 -0800

Hi Christian,

I'm quite swamped but I've added links the scripts that I used to
generate the existing NetAffx 26 (na26) UFL and UGP to each of the
different SNP & CN chip type pages, e.g.


  
http://groups.google.com/group/aroma-affymetrix/web/mapping250k-nsp-mapping250k-sty

More comments below.

On Mon, Jan 12, 2009 at 6:24 AM, cstratowa
<christian.strat...@vie.boehringer-ingelheim.com> wrote:
>
> Dear Henrik
>
> Meanwhile I have created ufl and ugp files for both 100K and 500K
> arrays but not for GenomeWideSNP_6 aray.

The above scripts will show you how to do it for GenomeWideSNP_6.
Slightly more complicated since two NetAffx CSV files are involved.

>
> Can you confirm that the following code, which I use for both 100K and
> 500K arrays, is correct:
>
> # retrieving annotation files
> chiptypes <- c("Mapping50K_Hind240", "Mapping50K_Xba240")
> cdfs <- lapply(chiptypes, FUN=function(x){AffymetrixCdfFile$byChipType
> (x)})
> names(cdfs) <- chiptypes
> print(cdfs)
>
> # importing data from NetAffx CSV files
> csvs <- lapply(cdfs, FUN=function(cdf){AffymetrixNetAffxCsvFile
> $byChipType(getChipType(cdf), tags=".na27")})
> print(csvs)
>
> # allocating empty UFL (Unit Fragment Length) files
> ufls <- lapply(cdfs, FUN=function(cdf){AromaUflFile$allocateFromCdf
> (cdf, tags="na27,CS20090112")})
> print(ufls)
>
> # import SNP data
> units <- list();
> for (chipType in names(ufls)) {
>   ufl <- ufls[[chipType]];
>   csv <- csvs[[chipType]];
>   units[[chipType]] <- importFrom(ufl, csv, verbose=-50);
> }
> str(units)
>
> # allocating empty UGP (Unit Genome Position) files
> ugps <- lapply(cdfs, FUN=function(cdf){AromaUgpFile$allocateFromCdf
> (cdf, tags="na27,CS20090112")})
> print(ugps)
>
> # import SNP data
> units <- list();
> for (chipType in names(ugps)) {
>   ugp <- ugps[[chipType]];
>   csv <- csvs[[chipType]];
>   units[[chipType]] <- importFrom(ugp, csv, verbose=-50);
> }
> str(units)

This looks alright to me.  You might want to check toward the posted
NA26 scripts as well, because they are more recent.

>
>
> Here is the summary for the 100K arrays:
> # Summary 50K chips
>> str(units)
> List of 2
>  $ Mapping50K_Hind240: int [1:57244] 18632 18677 1631 18713 1630 18712
> 18619 1639 18722 18608 ...
>  $ Mapping50K_Xba240 : int [1:58960] 29181 18239 31302 19831 47750
> 45114 19103 39711 19772 37811 ...
>>
>> ufl <- AromaUflFile$byChipType(chiptypes[1], tags="na27,CS20090112");
>> print(summaryOfUnits(ufl, enzymes="HindIII"))
>               snp cnp affxSnp other total
> enzyme1-only 56933   0       0     0 56933
> missing        311   0       0    55   366
> total        57244   0       0    55 57299

The NA26 version gives:
               snp cnp affxSnp other total
enzyme1-only 56933   0       0     0 56933
missing        311   0       0    55   366
total        57244   0       0    55 57299


>> ufl <- AromaUflFile$byChipType(chiptypes[2], tags="na27,CS20090112");
>> print(summaryOfUnits(ufl, enzymes="XbaI"))
>               snp cnp affxSnp other total
> enzyme1-only 58616   0       0     0 58616
> missing        344   0       0    55   399
> total        58960   0       0    55 59015

NA26:
               snp cnp affxSnp other total
enzyme1-only 58616   0       0     0 58616
missing        344   0       0    55   399
total        58960   0       0    55 59015

>>
>> ugp <- AromaUgpFile$byChipType(chiptypes[1], tags="na27,CS20090112");
>> print(summary(ugp, enzymes="HindIII"))
>  chromosome        position
>  Min.   :  1.000   Min.   :    48603
>  1st Qu.:  4.000   1st Qu.: 34667112
>  Median :  7.000   Median : 72677620
>  Mean   :  8.402   Mean   : 80405004
>  3rd Qu.: 12.000   3rd Qu.:114826216
>  Max.   : 23.000   Max.   :246727435
>  NA's   :363.000   NA's   :      363

NA26:

> print(summary(ugp));
 chromosome        position
 Min.   :  1.000   Min.   :    48603
 1st Qu.:  4.000   1st Qu.: 34667112
 Median :  7.000   Median : 72677621
 Mean   :  8.402   Mean   : 80405004
 3rd Qu.: 12.000   3rd Qu.:114826216
 Max.   : 23.000   Max.   :246727435
 NA's   :363.000   NA's   :      363
> print(table(ugp[,1]));

   1    2    3    4    5    6    7    8    9   10   11   12   13   14   15   16

4541 5072 3962 4342 4215 3968 3444 3549 2357 2743 2466 2592 2661 1931 1440 1145

  17   18   19   20   21   22   23
 985 1731  326  993  883  433 1157

>> ugp <- AromaUgpFile$byChipType(chiptypes[2], tags="na27,CS20090112");
>> print(summary(ugp, enzymes="XbaI"))
>  chromosome        position
>  Min.   :  1.000   Min.   :    93683
>  1st Qu.:  4.000   1st Qu.: 34636629
>  Median :  7.000   Median : 72249739
>  Mean   :  8.507   Mean   : 80010574
>  3rd Qu.: 12.000   3rd Qu.:114666170
>  Max.   : 24.000   Max.   :246885089
>  NA's   :390.000   NA's   :      390

NA26:

> print(summary(ugp));
 chromosome        position
 Min.   :  1.000   Min.   :    93683
 1st Qu.:  4.000   1st Qu.: 34636629
 Median :  7.000   Median : 72249739
 Mean   :  8.507   Mean   : 80010574
 3rd Qu.: 12.000   3rd Qu.:114666170
 Max.   : 24.000   Max.   :246885089
 NA's   :390.000   NA's   :      390
> print(table(ugp[,1]));

   1    2    3    4    5    6    7    8    9   10   11   12   13   14   15   16

4669 5274 3864 4231 4149 4111 3612 3422 2439 2953 2896 2685 2567 2083 1590 1245

  17   18   19   20   21   22   23   24
 971 1837  364 1101 1027  330 1204    1


>
>
> Here is the summary for the 500K arrays:
> # Summary 500K chips
>> str(units)
> List of 2
>  $ Mapping250K_Sty: int [1:238304] 15133 175423 164715 237140 112643
> 189587 162193 79611 196992 73555 ...
>  $ Mapping250K_Nsp: int [1:262264] 34952 76 74370 232354 3677 72977
> 73533 176215 161345 238482 ...
>>
>> ufl <- AromaUflFile$byChipType(chiptypes[1], tags="na27,CS20090112");
>> print(summaryOfUnits(ufl, enzymes="StyI"))
>                snp cnp affxSnp other  total
> enzyme1-only 144868   0       0     0 144868
> missing       93436   0       0    74  93510
> total        238304   0       0    74 238378

NA26:
                snp cnp affxSnp other  total
enzyme1-only 237697   0       0     0 237697
missing         607   0       0    74    681
total        238304   0       0    74 238378

>> ufl <- AromaUflFile$byChipType(chiptypes[2], tags="na27,CS20090112");
>> print(summaryOfUnits(ufl, enzymes="NspI"))
>                snp cnp affxSnp other  total
> enzyme1-only 261563   0       0     0 261563
> missing         701   0       0    74    775
> total        262264   0       0    74 262338

NA26:
                snp cnp affxSnp other  total
enzyme1-only 261563   0       0     0 261563
missing         701   0       0    74    775
total        262264   0       0    74 262338

>>
>> ugp <- AromaUgpFile$byChipType(chiptypes[1], tags="na27,CS20090112");
>> print(summary(ugp, enzymes="StyI"))
>  chromosome        position
>  Min.   :  1.000   Min.   :     2994
>  1st Qu.:  4.000   1st Qu.: 31306881
>  Median :  8.000   Median : 67082398
>  Mean   :  9.117   Mean   : 77333484
>  3rd Qu.: 13.000   3rd Qu.:114799352
>  Max.   : 23.000   Max.   :247135059
>  NA's   :677.000   NA's   :      677

NA26:

> print(summary(ugp));
 chromosome        position
 Min.   :  1.000   Min.   :     2994
 1st Qu.:  4.000   1st Qu.: 31306881
 Median :  8.000   Median : 67082398
 Mean   :  9.117   Mean   : 77333484
 3rd Qu.: 13.000   3rd Qu.:114799352
 Max.   : 23.000   Max.   :247135059
 NA's   :677.000   NA's   :      677
> print(table(ugp[,1]));

    1     2     3     4     5     6     7     8     9    10    11    12    13
20273 19147 15396 13236 14854 14296 11822 12601 10900 14190 12928 11863  8052
   14    15    16    17    18    19    20    21    22    23
 7531  7323  8272  6403  6726  3670  6555  3182  3669  4812


>> ugp <- AromaUgpFile$byChipType(chiptypes[2], tags="na27,CS20090112");
>> print(summary(ugp, enzymes="NspI"))
>  chromosome        position
>  Min.   :  1.000   Min.   :    17408
>  1st Qu.:  4.000   1st Qu.: 32574796
>  Median :  8.000   Median : 70596240
>  Mean   :  8.758   Mean   : 79224244
>  3rd Qu.: 13.000   3rd Qu.:114776300
>  Max.   : 23.000   Max.   :247110269
>  NA's   :775.000   NA's   :      775

NA26:

> print(summary(ugp));
 chromosome        position
 Min.   :  1.000   Min.   :    17408
 1st Qu.:  4.000   1st Qu.: 32574797
 Median :  8.000   Median : 70596240
 Mean   :  8.758   Mean   : 79224244
 3rd Qu.: 13.000   3rd Qu.:114776301
 Max.   : 23.000   Max.   :247110269
 NA's   :775.000   NA's   :      775
> print(table(ugp[,1]));

    1     2     3     4     5     6     7     8     9    10    11    12    13
19810 22178 18347 19016 17133 17097 13912 14820 11899 14241 13254 13026 11094
   14    15    16    17    18    19    20    21    22    23
 8165  6982  7005  4830  8136  2661  5823  3927  2511  5696

>
>
> Could it be that function summaryOfUnits() does not work as expected?
> For ufl it prints "enzyme1-only" instead  of e.g. "NspI-only"

I've noticed that too; the enzyme name is not reported.  But don't
worry, the files will contain the same information regardless.

> For ugp it prints an error: no applicable method

summaryOfUnits() is only available for AromaUflFile objects, not
AromaUgpFile objects.

>
>
> Is it correct that for na27 there are now 93436 missing SNPs for StyI
> compared to only 75 missing SNPs for na24 (as shown on your page about
> building UFL files)?

No, that does not seem to be correct.  This could be an error in the
NetAffx CSV file, e.g. once they [Affymetrix] added Windows newlines
in funny place.  However, since the same CSV file is used to import
the UGP data and that seems to be alright, I don't think this is the
case here.  It could also be that they've changed to column names so
that importFrom() does no longer recognize the format, but this sounds
less likely since part of the SNPs are still found, and it seems
correct on Mapping250K_Nsp.  A more far fetched, but still not
impossible, is that they are now reusing the annotation data from the
GWS6 platform where the two enzymes have different distributions.
This would be wrong to do, but I can see how someone could make such a
mistake.

Could you please see try to identify any obvious differences between
the contents of the NA26 and NA27 NetAffx CSV files?  They're
available on Affymetrix webpage (link via the aroma.affymetrix chip
type page).  If they haven't added new column, a simple Unix diff
might work.

>
>
> Can I build the ufl and ugp files in the same way for GenomeWideSNP_6?
> I assume that I need to repeat everything done for ".na27" with
> ".cn.na27"?
> Will these files be identical to the files supplied by you or do you
> use the file "GenomeWideSNP_6_build36_SNPandCN.tab", which only you
> have (as decribed in your pages)?

See posted scripts on the GenomeWideSNP_6 chip type page:

  http://groups.google.com/group/aroma-affymetrix/web/genomewidesnp-6


Let me know if any this helps

Henrik

>
>
> Here is the sessionInfo:
>> sessionInfo()
> R version 2.7.1 (2008-06-23)
> x86_64-unknown-linux-gnu
>
> locale:
> C
>
> attached base packages:
> [1] stats     graphics  grDevices datasets  utils     methods
> base
>
> other attached packages:
>  [1] aroma.affymetrix_0.9.4 aroma.apd_0.1.3
> R.huge_0.1.6
>  [4] affxparser_1.12.2      aroma.core_0.9.4
> sfit_0.1.5
>  [7] aroma.light_1.8.1      digest_0.3.1
> matrixStats_0.1.3
> [10] R.rsp_0.3.4            R.cache_0.1.7
> R.utils_1.0.4
> [13] R.oo_1.4.5             R.methodsS3_1.0.3
>>
>
> Best regards
> Christian
>
>
> On Jan 8, 2:39 pm, cstratowa <christian.strat...@vie.boehringer-
> ingelheim.com> wrote:
>> Dear Henrik
>>
>> Would it be possible for you to supply the ufl and ugp files for the
>> new Affymetrix xxx.na27.annot.csv files for the mapping arrays?
>>
>> Thank you in advance
>> Best regards
>> Christian
> >
>

--~--~---------~--~----~------------~-------~--~----~
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
"aroma.affymetrix" group.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe from this group, send email to 
aroma-affymetrix-unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/aroma-affymetrix?hl=en
-~----------~----~----~----~------~----~------~--~---

[aroma.affymetrix] Re: ufl and ugp files for na27?

Reply via email to