Hi Carly,

See answers interspersed below:

> I am aware that I can check the boxes chrom, chromeSTART, and
> chromeEND and then copy and paste that into the Genome Browser. Is it
> possible for the tables to provide the location of the promoter (+/-
> 500bps to the left and right of that region) instead of a being
> thrown onto a random area of the gene?

The Table Browser makes it fairly easy to get regions that are some 
number of bases upstream of a gene; it slightly more work to get a 
region that is both upstream and downstream from the transcription start 
site.  Depending on how you do it, you may wind up with gene names 
included or not included in your output.

There are some different options for getting a custom track that has 
your regions of interest with gene names.  One way would be to start by 
getting the BED file as suggested before (be sure to include "name" in 
the output), and then use either your own tools (such as Excel) to add 
or subtract 500 bases from the appropriate lines, or use Galaxy: 
http://main.g2.bx.psu.edu/.  Galaxy works in conjunction with the Table 
Browser, and it has a lot more data and text manipulation tools.

A perhaps easier way is to generate a BED file using MySQL to query the 
tables directly.  These two queries will generate BED files from the 
knownCanonical and knownGene tables (knownCanonical contains one 
representative transcript for each cluster of transcript in UCSC Genes 
-- see more on the description page: 
http://genome.ucsc.edu/cgi-bin/hgTrackUi?db=hg19&g=knownGene):

mysql> select knownCanonical.chrom, chromStart-500 as start, 
chromStart+500 as end, name, 0 as score, strand from knownGene, 
knownCanonical where knownGene.name=knownCanonical.transcript and 
strand='+';

mysql> select knownCanonical.chrom, chromStart-500 as start, 
chromStart+500 as end, name, 0 as score, strand from knownGene, 
knownCanonical where knownGene.name=knownCanonical.transcript and and 
strand='+';

The results from these two queries can be concatenated into one file and 
uploaded as a custom track.  Here is a session that contains exactly that:

http://genome.ucsc.edu/cgi-bin/hgTracks?hgS_doOtherUser=submit&hgS_otherUserName=Rhead&hgS_otherUserSessionName=CarlyHomPromoters

Feel free to use the Table Browser to download the contents of the 
custom track . . . use either "all fields from selected table" or BED as 
the output format.

> Also, how exactly is the histone methylation score measured? I want
> to be able to select a specific score range (ex:>= 500), but I am
> unsure of what to qualify as a significant enough of a signal since I
> do not know what the score is being measured relative to and what the
> top score possible is.

When you have the table selected in the Table Browser, hit the "describe 
table schema" button.  You should see a description of the score, and 
usually, a link to the range of scores that occur in the table.

> Lastly, is there a way to add the gene name to the table output? It
> makes it a lot easier than having to determine the gene names when
> some genes are very close to each other and have an area of overlap.
> I know that the gene name check box is available in the group
> knownCANONICAL, but I can't seem to find it when I am in the
> Regulation group with the 'selected fields from primary and related
> tables' output format. If you could get back to me on these items
> that would be great. Thank you!

Generally, when you do an intersection in the Table Browser, the fields 
of the table that are selected first are the fields that are retained in 
the output.  (We don't have a way to get fields from both sets of 
tables, but Galaxy does.)  Because you are creating a filter for the 
second table, you will need to first create a custom track of the 
regions in your second table that pass your filter.  Then select your 
promoter custom track and intersect it with your second custom track.

I hope this helps!  If you have further questions, please reply to 
[email protected].

--
Brooke Rhead
UCSC Genome Bioinformatics Group


On 2/4/12 1:41 AM, Carly Hom wrote:
> Hello, I received a response from you receiving instruction on how to
> filter the browser according to these instructions:
>
> Hi my name is Carly Hom and I am an undergraduate student researcher at
> Arizona State University working with Dr. Karmella Haynes. In my current
> lab I am using Synthetic Biology and Bioinformatics toinvestigate reliable
> and predictable reactivation of dormant genes that can help treat cancer
> and enable tissue re-growth. By determining which silenced genes will
> switch to an active state in osteosarcoma cells, with the presence of the
> synthetic transcription factor PC-TF, my work will establish a
> comprehensive method for predicting the effect of rationally designed
> protein-based drugs. Pc-TF, a synthetic transcription factor developed by
> Dr. Haynes, regulates cell states by binding the repressive
> trimethyl-histone H3 lysine 27 signal (H3K27me3) and switching silenced
> genes to an active state in osteosarcoma cells. Since a comprehensive ChIP
> map is not available for osteosarcoma, I will be identifying genes
> associated with H3K27me3 in liver (HepG2) and fibroblast (BJ) cell lines.
> Overall, I will need to collect about 1000 genes from the ENCODE database
> that show a significant enough H3K27me3 signal at the promoter of the gene.
> I have already figured out how to project only information from the HepG2
> and BJ cell lines in relation to H3K27me3, but by just clicking to move
> through the cell line to find genes will take entirely too long and can
> cause me to miss important genes. At the request of Dr. Haynes I am asking
> if ENCODE has some sort of filter program that will provide a list of genes
> where the promoter site shows a high level of the H3K27me3 histone
> methylation. I will need it to be able to find the beginning of the gene's
> promoter on the UCSC Genome Browser and then show about 500bps to the left
> and 500bps to the right of the promoter . Ultimately, I want to be able to
> navigate through the genes in this cell line that show a significant enough
> H3K27me3 signal at the promoter (everything else with a low H3K27me3 signal
> I do not care about). If you could get back to me on whether this is even
> possible to do within the Genome Browser, and if yes, how I would be able
> to do this that would be great. Thank you!
>
>
> After a couple of emails back and forth this was the best response we
> received:
>
> Hello, Karmella.
> To expand upon Luvina's instructions, to create the filter, perform the
> following steps in the Table Browser:
> 1. Select the following options:
> Clade: Mammal
> Genome: Human
> Assembly: Feb. 2009 (GRCh37/hg19)
> Group: Genes and Gene Prediction Tracks
> Track: UCSC Genes
> Table: knownCanonical
> 2. Next to "filter", click the "create" button
> 3. In the "Linked Tables" section, scroll down and check the hg19.knownGene
> checkbox
> 4. Scroll to the bottom of the page and click the "Allow Filtering Using
> Fields in Checked Tables" button
> 5. In the "hg19.knownGene based filters" section, the third line should read
> "strand does match +"
> 6. Click the "submit" button
> Also note that these tables:
> wgEncodeUwHistoneHepg2H3k27me3StdPkRep1
> wgEncodeUwHistoneHepg2H3k27me3StdPkRep2
> wgEncodeUwHistoneBjH3k27me3StdPkRep1
> wgEncodeUwHistoneBjH3k27me3StdPkRep2
> contain the methylation scores for H3k27me3 in Hepg and Bj cell lines as
> calculated according to the process outlined in the UW Histone track
> description here:
> http://genome.ucsc.edu/cgi-bin/hgTrackUi?g=wgEncodeUwHistone
> and in the reference contained therein. You may also be interested in these
> additional tables:
> wgEncodeUwHistoneHepg2H3k27me3StdHotspotsRep1
> wgEncodeUwHistoneHepg2H3k27me3StdHotspotsRep2
> wgEncodeUwHistoneBjH3k27me3StdHotspotsRep1
> wgEncodeUwHistoneBjH3k27me3StdHotspotsRep2
> which also contain methylation hotspot data. You can read more about both
> sets of tables in the aforementioned description.
> Please be aware that the encode tables contain all the methylation scores,
> not just the high scores. If you're only interested in the high methylation
> scores, you'll need to filter the encode tables similar to my above example:
> 1. Select the following options:
> Clade: Mammal
> Genome: Human
> Assembly: Feb. 2009 (GRCh37/hg19)
> Group: Regulation
> Track: UW Histone
> Table: select the appropriate tables
> 2. Next to "filter", click the "create" button
> 3. Edit the "score" line so that it contains the values you are interested
> in such as "score is>= 500"
> 4. Click the "submit" button
>
>
> These instructions helped me out a lot, but I still need couple more things
> to be done. If certain things are not possible let me know. I am aware that
> I can check the boxes chrom, chromeSTART, and chromeEND and then copy and
> paste that into the Genome Browser. Is it possible for the tables to
> provide the location of the promoter (+/- 500bps to the left and right of
> that region) instead of a being thrown onto a random area of the gene?
>   Also, how exactly is the histone methylation score measured? I want to be
> able to select a specific score range (ex:>= 500), but I am unsure of what
> to qualify as a significant enough of a signal since I do not know what the
> score is being measured relative to and what the top score possible is.
> Lastly, is there a way to add the gene name to the table output? It makes
> it a lot easier than having to determine the gene names when some genes are
> very close to each other and have an area of overlap. I know that the gene
> name check box is available in the group knownCANONICAL, but I can't seem
> to find it when I am in the Regulation group with the 'selected fields from
> primary and related tables' output format. If you could get back to me on
> these items that would be great. Thank you!
>
> - Carly
>
_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Reply via email to