Hello Shuying,
The reason this occurs is because we generate our CpG Island data using
a repeat-masked genome. Any CG sites within repeat regions are blocked.

It may be helpful to turn on the RepeatMasker track in the 'Variation and 
Repeats' section to see which parts of the genome have been masked.

I hope this clears things up for you. Please contact us again if you
have further inquiries.

Best
Antonio Coelho
UCSC Genome Bioinformatics Group


----- Original Message -----
From: "S. Sun" <[email protected]>
To: [email protected]
Cc: [email protected]
Sent: Monday, June 14, 2010 8:34:30 PM GMT -08:00 US/Canada Pacific
Subject: [Genome] Questions about CG count in one CpG island: 
chrX:38071367-38071954.

Hello,

I have a simple question about the number of CG site in one CpG
island: chrX:38071367-38071954.

The UCSC genome browser shows that it has 58 CG sites, but when I
count the number of CG site by myself, it is 63 CG sites. I got 63 in
both my R code results and manual counting (i.e., in linux text file,
search "CG" and highlight" them).  In fact, I got 63 in both the DNA
sequences I downloaded from UCSC genome browser and the hg18 version
sequence I got from R Bioconductor package. See the following for more
details. Do you have any idea why we have this type of inconsistent
result? Is it because those 5 CG sites located in the repeat region,
so they are not included? If yes, why these 5 CG sites are dealt in
this way?

############ UCSC hg18 version DNA sequences ############################

>hg18_cpgIslandExt_CpG: 58 range=chrX:38071367-38071954 5'pad=0 3'pad=0 
>strand=+ repeatMasking=lower

CGTCCGGTCCTCTGCCCTCAGTCATTCGCGGGAGCGCAACCAGCGATCCC  # 7  (including
the one with C at the end and G at the beginning)
GCCCCAGTCCGGCTGCCAAGCCTGGGGCCTGTCCCCCTACAGGGCCGATC  # 2
CGGAggcggggcccggccgcccgcggACCCTCCCTCCCGGCCTTCCGCCA  # 8
CCGGCGCGGGCGCAACTCACCGGGCATCAGCTCTTCCGGCTCCCTCATGC  # 6
CACGGGCAGTACGGGCAGCCTGCGCCGGGGCCAGGAGGCTGTAGAGGACG  # 5
GTTTGGTCGGGGCTAAAGCAGCTACTCCGCACCGACGCGGGCCGCGAAAG  # 7
CCCCCAAGTTCCGCATGGCGAAACTCCGGAGATCAACTACAACCGCGCTC  # 5
CCGGAAGTCAACAAACAGCCGCTACGGGCAACGGGGGCGGAGCTTGGGAA  # 5
TGCAAGGCGGGACAGGCGCCGTTGGGGAGGGGAACGGAGGCCGGGTGGCT  # 5
GGTAAGGGGCAGGCTCAGGCACAGCGGAGGGGCAGTAGAGACCACGCGCC  # 3
CTCTGGCGGCCTGGAGCAGAGAGGCGGCCACGCCGCGCAGTGATGCTGTG  # 5
GAGTCCGCGCCCTTGTGCCGTTGGAGGTCCAGGCGCCG              # 5

###################### From R Bioconductor genome sequence
CGTCCGGTCCTCTGCCCTCAGTCATTCGCGGGAGCGCAACCAGCGATCCC # 7
GCCCCAGTCCGGCTGCCAAGCCTGGGGCCTGTCCCCCTACAGGGCCGATC # 2
CGGAGGCGGGGCCCGGCCGCCCGCGGACCCTCCCTCCCGGCCTTCCGCCA # 8
CCGGCGCGGGCGCAACTCACCGGGCATCAGCTCTTCCGGCTCCCTCATGC # 6
CACGGGCAGTACGGGCAGCCTGCGCCGGGGCCAGGAGGCTGTAGAGGACG # 5
GTTTGGTCGGGGCTAAAGCAGCTACTCCGCACCGACGCGGGCCGCGAAAG # 7
CCCCCAAGTTCCGCATGGCGAAACTCCGGAGATCAACTACAACCGCGCTC # 5
CCGGAAGTCAACAAACAGCCGCTACGGGCAACGGGGGCGGAGCTTGGGAA # 5
TGCAAGGCGGGACAGGCGCCGTTGGGGAGGGGAACGGAGGCCGGGTGGCT # 5
GGTAAGGGGCAGGCTCAGGCACAGCGGAGGGGCAGTAGAGACCACGCGCC # 3
CTCTGGCGGCCTGGAGCAGAGAGGCGGCCACGCCGCGCAGTGATGCTGTG # 5
GAGTCCGCGCCCTTGTGCCGTTGGAGGTCCAGGCGCCG                     # 5

Shuying
_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome
_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Reply via email to