Hi Jia,

You get coordinates for all intergenic regions using the Table Browser 
or the public MySQL server:

Using the Table Browser, first, get a track that represents the all 
genes (using your chosen gene set):

1. Select geneset (e.g. track Ensemble Gene, table ensGene)
2. Select output from "selected fields from primary and related tables"
3. Get output to a local file (simply provide an output file name in the 
field)
4. Click 'get output'
5. Check chrom,txStart,txEnd
6. 'get output' and save the file.

Now upload this file as a new custom track in the table browser ('add 
custom tracks' button). (Make sure to give the track a unique name - 
don't use the default or you next custom track will overwrite this one.)

You can now collapse those items down into single non-overlapping items:

1. Select your custom track in the table browser.
2. Create an intersection (click the Create button next to "intersection:")
3. Intersect the track with itself and select: Base-pair-wise 
intersection (AND) of 'MyTrack1' and 'MyTrack1' (where MyTrack1 will be 
the name you gave your track). Click submit for the intersection.
4. Select output format=custom track and get output (name the track and 
load it back into the Table Browser)

Now, run the same self-intersection again using the new custom track, 
but this time, on the Intersection screen, check BOTH of the boxes next 
to the options: "Complement MyTrack1/2 before base-pair-wise 
intersection/union" and submit.

You can now download this as a BED file which will contain a list of 
coordinates of "not in any gene".  You can also create a custom track of 
this data and view it together with Ensemble genes (both in dense) to 
get a visual on the "not in any gene" file you created.

Alternatively, you can also accomplish the above using the public MySQL 
server:

$ hgsql -N -e "select chrom,txStart,txEnd,name from knownGene" > 
hg19.knownGene.bed

To get a bed track that represents the entire genome:

$ hgsql -N -e "select chrom,0,size,chrom from chromInfo;" hg19 > 
hg19.chromSizes.bed

Now, you can intersect these two bed files, add a NOT, and you can get 
everything that is not in a gene.

Please let us know if you have any additional questions:[email protected]

-
Greg Roe
UCSC Genome Bioinformatics Group





On 10/24/11 2:25 PM, Zeng, Jia wrote:
> Dear UCSC genome browser staff:
>
>
> I would like to extract the coordinates of  different human genomic region. I 
> used the Ensemble gene tract to extract the coordinates for exon,intron,UTRs. 
> Firstly, I download the sorted start and end positions for all the ensemble 
> transcript and thought that each intergenic region should be from the end 
> position of one transcript to the start of the followed gene. But I found 
> there are lots of transcript are actually from the same gene and they have 
> lots of overlapping. So I couldn't do in this way. Is there any suggestion to 
> get the whole genome intergenic coordinates? Thank you.
>
>
> Jia Zeng
>
>
> _______________________________________________
> Genome maillist  -  [email protected]
> https://lists.soe.ucsc.edu/mailman/listinfo/genome
_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Reply via email to