Hi Ann, I looked at your file and it appears to be a FASTA file of all the records from the all_sts_primer table from the mm9 assembly that have "D1MIT" in the name.
The best way to do this is to A) create a custom track of introns, another one of 5' UTR exons, another one of coding exons, etc and B) then sequentially intersect these custom tracks with the all_sts_primer table while filtering out the non-DMit records. A) First, let's create a custom track of 5' UTR exons. Go to the table browser (http://genome.ucsc.edu/cgi-bin/hgTables) and select the following: clade: mammal genome: Mouse assembly: July 2007 (NCBI31/mm9) group: Genes and Gene Prediction Tracks track: UCSC Genes table: knownGene region: genome output format: custom track and click "submit". On the next page, change the name at the top to be something like "5UTRknownGene", choose the "5' UTR exons" button and click "get custom track in table browser". Repeat the steps above for introns, coding exons (CDS) and 3' UTR Exons. Note that if you choose Exons, this will include both UTR and coding exons both. B) Now, go back to the table browser and select the following: clade: mammal genome: Mouse assembly: July 2007 (NCBI31/mm9) group: Mapping and Sequencing Tracks track: STS Markers table: all_sts_primer region: genome filter: click on "create". At the bottom of the page, enter the following into the Free-form query section: qName like "%D1MIT%". Click "submit". intersection: click on "create". Select: group: Custom Tracks track: 5UTRknownGene (or whatever you named your track in step A) table: there will be only one table so there is no need to select one choose: "All all_sts_primer records that have any overlap with 5UTRknownGene" and click "submit" output format: BED - browser extensible data and click "get output". The fourth column should be the names of the DMit loci that are located in the 5' UTR. Repeat B for your previously created custom tracks of introns, coding exons (CDS) and 3' UTR Exons. Note, that any DMit loci not found to be within 5' UTR Exons, introns, coding exons (CDS) or 3' UTR Exons are, necessarily, intergenic. I hope this information is helpful. Please feel free to contact the mail list again if you require further assistance. Best, Mary ------------------ Mary Goldman UCSC Bioinformatics Group On 3/28/11 6:51 PM, Ann Eileen Miller Baker wrote: > 28Mr11 > Mary Goldman, > Thanks for more details, trying to help me learn enough to > find the info I seek. I don't keep the correct email and > would be grateful if you forwarded this email to them or whenever > you respond you include the correct email and I will forward. > Email is fine: I suggested phone because I thought it would > be faster. I guess that recombination rates might be guestimated > (to an order of magnitude?) with recombinant inbred lines. Any > other (indirect way) to guestimate recombination rates appreciated. > (1a) let me be clearer- the main data I am looking for is > ""where mouse usat (microsatellite loci DMit) are located relative > to ((regulatory genes (promoters, 5' UTR, intergenes etc and > genes (exons)))"" > (1b) in brief, I wish to do for mouse usat DMit what Payseur > (attachment) did for human usat > (2) I have no colleague who refers me to UCSC data on location of > usats (microsatellite loci) with respect to regulatory genes > (3) I know you are trying to do your best to help me, but can > you suggest where I might look to find this information as a file > (rather than entering each of the >> 7500 usat individually) > (4) I will be looking at wikipedia and the helix websites you > suggested asap (in middle of some analyses now, yet wanted > to answer your comments, questions as much as I could. > (5) How I would like to see output: I WANT TO SEE ALL > because I don't know which DMit locus is in exons, introns etc: > (5a) DMit locus name > (5b) is the DMit locus in an > (5b1) exon? > (5b2) coding exon? > (5b3) intron > (5b4) intergene > (5b5) 5'UTR (most likely regulatory gene location) > (5b6) 3'UTR > (5b7) I don't know what cds are (coding?) but will look up in > wikipedia/google > (5c) These output columns could be summarized as "no evidence > for selection (introns)" versus " strong evidence for selection (exons > in genes important for survival; this would include key promoters too)" > (6) I have DMit amplified sequences that a colleague sent me > (attached)- the DMit name is embedded within a string of other > info > Mnay thanks for trying to help- we make progress; I will read what > you suggested asap. Payseur's paragraph in the MatMeth explains > probably more clearly than I can now (still learning). > Look forward to hearing from you; will send on to correct email; > may write more after reading what you suggested. > Ann > On Mon, Mar 28, 2011 at 4:55 PM, Mary Goldman <[email protected] > <mailto:[email protected]>> wrote: > > Hi Ann, > > I'm sorry but I've looked again and I can't find data on DMits > here. Perhaps your colleague knows which track he used to obtain > the data from UCSC? > > To better help you there are a couple of things that need to be > clarified: > > 1. Do you have genomic coordinates or sequences? If you are unsure > send a small sample in your reply. > 2. Specify exactly what loci types are you looking for > (intergenic, 3' UTR, CDS, intron or 5'UTR)?. Please note that > there is no recombination rate data available for mouse. > 3. What are your desired output columns? I'm not sure, but I think > they are DMit locus name, DMit sequence and locus type > (intron/exon/etc). > > Before you reply, I highly recommend watching both OpenHelix > tutorials below (the first one is about the Genome Browser > generally, while the second one focuses on Custom Tracks and the > Table Browser, both of which are software features that you will > be using): > http://www.openhelix.com/downloads/ucsc/ucsc_home.shtml > http://www.openhelix.com//cgi/tutorialInfo.cgi?id=28 > > Additionally, this Wikipedia article may help you understand what > we mean by "assembly" here at the Genome Browser: > http://en.wikipedia.org/wiki/Genome_project. In particular, this > section should help make it clear why there are several assemblies > for each organism: > > http://en.wikipedia.org/wiki/Genome_project#When_is_a_genome_project_finished.3F. > > Finally, I'm very sorry but we are not funded to provide help via > phone. Please reply to the mailing list with any further replies > and questions. > > Best, > > Mary > ------------------ > Mary Goldman > UCSC Bioinformatics Group > > > > On 3/22/11 6:14 PM, Ann Eileen Miller Baker wrote: > > ---------- Forwarded message ---------- > From: Ann Eileen Miller Baker<[email protected] > <mailto:[email protected]>> > Date: Tue, Mar 22, 2011 at 2:19 PM > Subject: Re: [Genome] pls reply to [email protected] > <mailto:[email protected]> > To: Vanessa Kirkup Swing<[email protected] > <mailto:[email protected]>> > > > 22Mr11 > Dr. Kirkup Swing, > 0. I am asking for more info because now I am reading a study > paralleling > my work for which the author made further distinctions than what I > originally requested > 1. thnx- the original DMit amplified sequences came from JAX > Informatics; > a colleague sent me the updated sequences which he cited as > coming from UCSC- so your sending me back to JAX (now reproduction > not informatics) seems odd, but I am willing to try to follow your > suggestions > once you respond to my more detailed objectives (below, > including ## > annotations within your helpful suggestions## > 2. let me try to be clearer > 3. i have ca. 7500 DMit loci (amplified sequences) for usats > (microsatellites) making it logistically difficult to handle > one locus at a > time, which I think > you imply below (these DMit amplification sequences are in your > ucsc mouse DMit usat archive if I understand my colleague) > 4. i am looking for excel files that would have for all DMit > usat loci > the location of the amplified loci with respect to the > following; i.e., > which > DMit usat loci are in introns? coding exons? etc): > 4a. intron > 4b. exon versus coding exons > 4c. intergenic > 4d. 3' and 5' UTR > 4e. upstream from transcription start site > 5. is there some way to do a "bulk submission" of all these DMit > loci rather than piecemeal (one at a time as I think you imply)? > Ann > ## see below because I don't know what to list for what you > state I need to submit > > On Mon, Mar 21, 2011 at 3:50 PM, Vanessa Kirkup Swing > <[email protected] <mailto:[email protected]>>wrote: > > Hi Ann, > > Sorry for the delayed response. We suggest you try > contacting the mailing > list > (http://reproductivegenomics.jax.org/mailing_list.html) at the > Jackson Laboratory to see if you can obtain the sequences > and positions from > for the DMit microsatellite markers. > > Once you have those, go to the main page on our site. > Click on "Tables" > from the blue navigation bar to get to our table browser. > > Set the clade, genome, and assembly. > > ## what goes into clade? I assume genome implies Mus musculus > domesticus; > i don't know what assembly means > ## I need the output to come in excel or text delimited files > because I do > the > analyses in EXCEL and in ACCESS; i.e., these formats require a > "dedicated" > column for each kind of data > ## my colleague sent me the DMit amplified sequences in FASTA > format; > however, since the amplified sequence, has no regular > (predictable) > structure, > there is no way to program to get data into "dedicated > columns"; i.e., colA > DMit > locus name; colB amplified sequence > > ## may I be sent your phone# so I can try calling if you fail > to understand > this? I am usually at 707 786 5342 except for Fridays; this > week has an > unpredictable schedule because it is partly a vacation week; > otherwise that > phone I will answer > > Set the following: > > track: "UCSC Genes" > table: "knownGene" > > ### > ((this is where we need to have bulk submission since I have > ca. 7500 DMit > > region: "position" and click on "define regions." Paste > your regions in the > box and click "submit" > output format: "all fields from selected table" > > click "get output" > > ## sorry but I am not good at making deductions unless I know > well what > I am doing > > exon= expressed gene > intron= where the boundary of the exon is; the intron is > outside the exon > boundary > > ## given what I wrote above, how can the pcr silico be of use? > > The result is the record for your gene. The exon/intron > boundaries can be > deduced from the exonStarts and exonEnds fields. If there > are many fields in > the record that you don't need, you can hit the back > button and change the > output format to "selected fields from primary and related > tables." Click > "get output" and select the fields you wish to see in your > results, be sure > to include exonStarts and exonEnds since these fields > contain the data you > asked about. > > Also, you may be interested in UCSC In-Silico PCR for > mapping PCR products. > To get to it, click on "PCR" from the blue navigation bar. > > Hope this helps! If you have further questions related to > the Genome > Browser, please contact the mailing list. > > ## we are further along, but I want to hold off trying to use > UCSC (which I > tried > w/o help and got nowhere) but I think you are only giving me > - intron > - exon > whereas I also need > - recombination hotspots > - coding exon versus noncoding exon > - 3' and 5' UTR > - intergenic > > Vanessa Kirkup Swing > UCSC Genome Bioinformatics Group > > ----- Original Message ----- > From: "Ann Eileen Miller Baker"<[email protected] > <mailto:[email protected]>> > To: [email protected] <mailto:[email protected]> > Sent: Wednesday, March 16, 2011 7:53:57 PM > Subject: [Genome] pls reply to [email protected] > <mailto:[email protected]> > > pls reply to [email protected] > <mailto:[email protected]> because I am unclear how > to access > the > normal channel where replies are put > > (1) How can I find mouse DMit (microsatellite loci) > amplified sequences? > (url) > (2) alleles at inbred strains for DMit microsatellite > loci? (url) > (3) How can I find which DMit microsatellite loci are in > - introns > - exons > - recombination hot spots > _______________________________________________ > Genome maillist - [email protected] > <mailto:[email protected]> > https://lists.soe.ucsc.edu/mailman/listinfo/genome > > _______________________________________________ > Genome maillist - [email protected] > <mailto:[email protected]> > https://lists.soe.ucsc.edu/mailman/listinfo/genome > > _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome
