Hello Maria, The field cpgIslandExt.name is not an identifier for a particular region. Rather, this name is contamination of a fixed word "CpG:", plus a space, then the value cpgIslandExt.cpgNum.
Format: CpG: <cpgIslandExt.cpgNum> Individual CpG Islands in the track and best identified uniquely by the genome position (chrom, chromStart, chromEnd). Regarding chromosome names, the Gateway page for each assembly explains the dataset (Source, Methods, Credits, etc.). How to locate the gateway page for an assembly: http://genome.ucsc.edu -> Genome Browser -> set clade, genome, assembly As an example, below is the relevant information in the credits section of the gateway page for hg19. Note that assemblies can be formatted differently, so it would be important to review each individually. Chromosome naming scheme In addition to the "regular" chromosomes, the hg19 browser contains nine haplotype chromosomes and 59 unplaced contigs. If an unplaced contig is localized to a chromosome, the contig name is appended to the regular chromosome name, as in chr1_gl000191_random. If the chromosome is unknown, the contig is represented with the name "chrUn" followed by the contig indentifier, as in chrUn_gl000211. Note that the chrUn contigs are no longer placed in a single, artificial chromosome as they have been in previous UCSC assemblies. See the sequences* page for a complete list of hg19 chromosome names. * this is a link to view the sequences on an html page To download a text file of the same data, use the Table browser and extract the table "chromInfo". 1) http://genome.ucsc.edu/cgi-bin/hgTables 2) choose clade, genome, assembly 3) group = All Tables, table = chromInfo 4) name file for download and click on "get output" We hope this helps! If you need more information, please let us know, Jennifer --------------------------------- Jennifer Jackson UCSC Genome Informatics Group http://genome.ucsc.edu/ On 4/22/10 4:16 AM, Maria Iglesias wrote: > > > > HI, > > I am not familiar working with genome annotation data so I have > questions about the output files of CpG island and Refgene from UCSC > browser. > > First I have download a table with all the CpG island annotation. The > file have 4 columns 1:chrom name : 2nd and 3rd are start and end > position and the fourth one is the number or name give to each CpG > island. I notice there are different positions with the same number in > the fourth column. Why this happened? Sometimes could be due to the > length of the CpG island (range 201-3000pb) but another times these > position are more than 10kb distant. > > The second question is in both cases with CpG island file and Refgene I > got all the features from chromosome 1 until chromosome 22 but later in > the table appear another annotation that I don't really understand. > > > chr6_dbb_hap3 > . > . > ChrUN_... > > chr17_ctg5_hap1 > ... > chr1_gl000191_random > > > What they are? Should I used them? > > > Thanks a lot in advance for your help. > > > María Jesús > > _______________________________________________ > Genome maillist - [email protected] > https://lists.soe.ucsc.edu/mailman/listinfo/genome _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome
