On Fri, Mar 27, 2009 at 7:37 AM, <[email protected]> wrote:
> Send Genome mailing list submissions to > [email protected] > > To subscribe or unsubscribe via the World Wide Web, visit > http://www.soe.ucsc.edu/mailman/listinfo/genome > or, via email, send a message with subject or body 'help' to > [email protected] > > You can reach the person managing the list at > [email protected] > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Genome digest..." > > > Today's Topics: > > 1. Re: convert large genomic regions from mouse to human > (Hiram Clawson) > 2. Re: SelfChain table (Atif Shahab) > 3. Questions about Blat -fastmap option. (Royden Clark) > 4. Re: Questions about Blat -fastmap option. (Galt Barber) > 5. Coloring sub-parts of a track (Dhiral Phadke) > 6. Re: Coloring sub-parts of a track (Hiram Clawson) > 7. gladHumESOtherData (Charu Gupta Kumar) > 8. quality score 98 (Michael Hiller) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Tue, 24 Mar 2009 22:26:34 -0700 > From: Hiram Clawson <[email protected]> > Subject: Re: [Genome] convert large genomic regions from mouse to > human > To: Jiantao Shi <[email protected]> > Cc: [email protected] > Message-ID: <[email protected]> > Content-Type: text/plain; charset=ISO-8859-1; format=flowed > > Good Evening Jiantao: > > The hg18 genome aligns to less than %40 of the Mouse genome: > > http://genomewiki.ucsc.edu/index.php/Mm9_multiple_alignment > > It would be expected that many regions do not translate unless > these are all highly conserved regions. > > Take one of your regions that does not map and carefully > examine the human chain and net tracks to see how it breaks up. > > --Hiram > > Jiantao Shi wrote: > > Hi, > > > > I have a about 3000 mouse (mm8) genomic regions (2000bps on average) > > identified by ChIP-seq experiment downloaded from a public data set. And > i > > want to convert these coordinates to those of human (hg18). However, i > got > > nothing returned using the liftover webserver in ucsc genome website. So > i > > downloaded the liftover program and run it locally. Again, only about 400 > of > > these regions were successfully converted, others were reported as > > "Partially deleted in new". Any suggestions? > > > > Best, > > Jiantao Shi > > > ------------------------------ > > Message: 2 > Date: Wed, 25 Mar 2009 23:10:17 +0800 > From: Atif Shahab <[email protected]> > Subject: Re: [Genome] SelfChain table > To: Jennifer Jackson <[email protected]> > Cc: [email protected] > Message-ID: <[email protected]> > Content-Type: text/plain; charset=ISO-8859-1; format=flowed > > thanks! That clarifies the issue. > > - atif > > On 3/25/2009 8:12 AM, Jennifer Jackson wrote: > > Hello, > > Please see our documentation > > http://genome.ucsc.edu/goldenPath/help/chain.html > > http://genomewiki.ucsc.edu/index.php/Coordinate_Transforms > > > > The coordinates for this chr20 chain are in the (-) strand. Chain file > > formats are different from many of our other file formats in that they > > are based on the reverse-complement strand (not converted to be > > positive strand). However, they are still ordered smallest -> largest > > in value. And the smallest coordinate is still zero-based. These chain > > coordinates are converted to be positive stranded and to be a 1-based > > start when displayed in the browser chain description page. > > > > The chromosome 20 length is: 62,435,964 > > > > Doing the math to compare the browser data points to the chain file > > data points: > > > > start: 62435964 - 19465389 = 42970575 > > end: 62435964 - 19472380 = 42963584 +1 (to convert the zero-based > > "smallest" coordinate) = 42963585 > > > > I hope I have addressed all of your questions, but let me know if I > > missed anything, > > Jennifer Jackson > > UCSC Genome Bioinformatics Group > > > > Atif Shahab wrote: > >> > >> Sorry copied the wrong row. Following is the row I find in > >> chr11.chainSelf > >> > >> 725, 212246, 'chr11', 134452384, 18466024, 18469085, 'chr20', > >> 62435964, '-', 19465389, 19472380, 80309, 77.2 > >> > >> while what I see in UCSC browser is > >> > >> *Human position:* chr11:18466025-18469085 size: 3061 > >> *Strand:* - > >> *Human position: * chr20:42963585-42970575 > >> < > http://ucsc.gis.a-star.edu.sg/cgi-bin/hgTracks?db=hg18&position=chr20%3A42963585-42970575 > > > >> size: 6991 > >> *Chain ID:* 80309 > >> *Score:* 212246 *Approximate Score within browser window:* 1240 > >> > >> The browser reports reports the position to be in > >> chr20:42963585-42970575 while what I get from the chr11.chainSelf is > >> chr20:19465389-19472380. > >> > >> - atif > >> > >> On 3/24/2009 2:44 AM, Jennifer Jackson wrote: > >>> Hello, > >>> > >>> The data reported for the region chr11:56259903-56260492 on the > >>> track description page represents a subset of the data from the > >>> entire chain record (chr11_chainSelf.id = 1118118). That is why the > >>> score is approximate. The track description page explains the > >>> display conventions in the Description section. > >>> > >>> We hope this helps to explain the data, > >>> Jennifer Jackson > >>> UCSC Genome Bioinformatics Group > >>> > >>> > >>> > >>> > >>> > >>> Atif Shahab wrote: > >>>> Hi, > >>>> > >>>> In the UCSC browser I get the following info > >>>> > >>>> *Human position:* chr11:56259903-56260492 size: 590 > >>>> *Strand:* - > >>>> *Human position: * chr20:56904056-56908860 > >>>> < > http://ucsc.gis.a-star.edu.sg/cgi-bin/hgTracks?db=hg18&position=chr20%3A56904056-56908860 > > > >>>> size: 4805 > >>>> *Chain ID:* 1118118 > >>>> *Score:* 38233 *Approximate Score within browser window:* 2221 > >>>> > >>>> But when I look into the mysql table for chr11. I find the following > >>>> > >>>> 725, 212246, 'chr11', 134452384, 18466024, 18469085, 'chr20', > >>>> 62435964, '-', 19465389, 19472380, 80309, 77.2 > >>>> > >>>> Notice that the chr20 start/end are different from what is show in > >>>> the browser. Does the browser use some formula to compute the > >>>> start/end? > >>>> > >>>> Also the chain track displays boxes joined by single/double lines. > >>>> Which table contains the info regarding the start/end positions of > >>>> these boxes? > >>>> > >>>> - atif > >>>> _______________________________________________ > >>>> Genome maillist - [email protected] > >>>> http://www.soe.ucsc.edu/mailman/listinfo/genome > > > ------------------------------ > > Message: 3 > Date: Wed, 25 Mar 2009 12:17:18 -0400 > From: Royden Clark <[email protected]> > Subject: [Genome] Questions about Blat -fastmap option. > To: [email protected] > Message-ID: <[email protected]> > Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes > > Greetings UCSC Genome Group, > > Sorry if this question has already been answered but I could not find > it in the archives or the faqs. > > The documentation for blat > at > http://genome.ucsc.edu/goldenPath/help/blatSpec.html > > states > > -fastMap Run for fast DNA/DNA remapping - not allowing introns, > > requiring high %ID > > > > > The question is what is the %ID it requires? > Can that % be modified? > > Thank you > Royden Clark > > IT Specialist II > University of Virginia > > > ------------------------------ > > Message: 4 > Date: Wed, 25 Mar 2009 12:37:25 -0700 (PDT) > From: Galt Barber <[email protected]> > Subject: Re: [Genome] Questions about Blat -fastmap option. > To: Royden Clark <[email protected]> > Cc: [email protected] > Message-ID: <pine.gso.4.64.0903251153350.25...@sundance> > Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed > > > I think the idea for -fastMap is that it doesn't > try to chain alignments together, so no introns. > > The "high %ID" required for said fastMap is not modifiable. > Presumably it's very high (>90%). > > Of course simple %-identity threshold filtering is set with -minIdentity. > > fastMap does not enhance sensitivity but it might > be good for speed in situations where gaps are > not expected. > > I see that it is used for liftOver from one assembly > to the next of the same species; > and for mapping clones to contigs; > > Essentially any place where you expect really long > identical alignments with no gaps or introns. > > Normally blat runs a banded dynamic programming algorithm > to finalize and extend alignments. This is an expensive > and slow step and does not scale well with long queries. > > fastMap helps blat perform better when there are these > long identical regions by skipping those extra costly slow > steps. Even so, applications usually chunk the query > into 3kb pieces and run cluster jobs. > Blat/gfServer have a default limit query size of 40kb. > It just runs too slowly for longer queries. > > Several examples of fastMap usage look like this: > blat -minScore=100 -minIdentity=98 -fastMap > > fastMap is for mapping a chunk of virtually identical > dna onto a larger object like a clone or scaffold or chrom. > > It will tolerate a few substitutions but not much else. > Certainly no long gaps. > > -Galt > > > On Wed, 25 Mar 2009, Royden Clark wrote: > > > Greetings UCSC Genome Group, > > > > Sorry if this question has already been answered but I could not find > > it in the archives or the faqs. > > > > The documentation for blat > > at > > http://genome.ucsc.edu/goldenPath/help/blatSpec.html > > > > states > > > > -fastMap Run for fast DNA/DNA remapping - not allowing introns, > > > > requiring high %ID > > > > > > > > > > The question is what is the %ID it requires? > > Can that % be modified? > > > > Thank you > > Royden Clark > > > > IT Specialist II > > University of Virginia > > _______________________________________________ > > Genome maillist - [email protected] > > http://www.soe.ucsc.edu/mailman/listinfo/genome > > > > > ------------------------------ > > Message: 5 > Date: Thu, 26 Mar 2009 11:58:47 -0700 (PDT) > From: Dhiral Phadke <[email protected]> > Subject: [Genome] Coloring sub-parts of a track > To: [email protected] > Cc: [email protected] > Message-ID: <[email protected]> > Content-Type: text/plain; charset=iso-8859-1 > > Hi, > ? > I have a track that I would like to display using 2 different colors: For > example: > ? > track type=bedGraph name="TEST1-BedGraph Format" description="BedGraph > format" visibility=full color=255,0,0 priority=20 > chr19 59302000 59302300 -1.0 > chr19 59302300 59302600 -0.75 > chr19 59302600 59302900 -0.50 > chr19 59302900 59303200 -0.25 > chr19 59303200 59303500 0.0 > chr19 59303500 59303800 0.25 > chr19 59303800 59304100 0.50 > chr19 59304100 59304400 0.75 > chr19 59304400 59304700 1.00 > > I would like to display positions 59302000 - 59303500 in red and positions > 59303500 - 59304700?in green. Is this possible? How can this be done? > ? > Thanks a lot. > ? > -Dhiral. > > > > > ------------------------------ > > Message: 6 > Date: Thu, 26 Mar 2009 12:02:44 -0700 > From: Hiram Clawson <[email protected]> > Subject: Re: [Genome] Coloring sub-parts of a track > To: [email protected] > Cc: [email protected], [email protected] > Message-ID: <[email protected]> > Content-Type: text/plain; charset=ISO-8859-1; format=flowed > > Try using color=0,255,0 altColor=255,0,0 > > This will display positive values in green, and negative values in red. > You only have control over the color via this positive/negative indication. > You do not have control of individual colors for individual plot positions. > > --Hiram > > Dhiral Phadke wrote: > > Hi, > > > > I have a track that I would like to display using 2 different colors: For > example: > > > > track type=bedGraph name="TEST1-BedGraph Format" description="BedGraph > format" visibility=full color=255,0,0 priority=20 > > chr19 59302000 59302300 -1.0 > > chr19 59302300 59302600 -0.75 > > chr19 59302600 59302900 -0.50 > > chr19 59302900 59303200 -0.25 > > chr19 59303200 59303500 0.0 > > chr19 59303500 59303800 0.25 > > chr19 59303800 59304100 0.50 > > chr19 59304100 59304400 0.75 > > chr19 59304400 59304700 1.00 > > > > I would like to display positions 59302000 - 59303500 in red and > positions 59303500 - 59304700 in green. Is this possible? How can this be > done? > > > > Thanks a lot. > > > > -Dhiral. > > > > > > > > _______________________________________________ > > Genome maillist - [email protected] > > http://www.soe.ucsc.edu/mailman/listinfo/genome > > > > > > ------------------------------ > > Message: 7 > Date: Thu, 26 Mar 2009 16:08:53 -0500 > From: "Charu Gupta Kumar" <[email protected]> > Subject: [Genome] gladHumESOtherData > To: <[email protected]> > Message-ID: <003801c9ae57$120c6600$362532...@edu> > Content-Type: text/plain; charset="us-ascii" > > Hi, > > > > I was curious what this table data shows. What are qVal and hVal ? Why is > there a single tissue name associated with a probe? > > > > Thanks, > > > > Charu > > > > Charu Gupta Kumar > > University of Illinois at Urbana-Champaign > > > > > > ------------------------------ > > Message: 8 > Date: Thu, 26 Mar 2009 15:04:23 -0700 > From: Michael Hiller <[email protected]> > Subject: [Genome] quality score 98 > To: [email protected] > Message-ID: <[email protected]> > Content-Type: text/plain; charset=ISO-8859-1; format=flowed > > Dear UCSC team, > > I have a question concerning the "manually set" quality score 98 that > represents missing quality scores. > The chimp browser for chr21 or chrY does not show quality scores, which > is fine, since there are no qual scores. > However, the hg18 44-way alignment contains for chimp chr21 or chrY the > qual score 0, which comes from mafAddQRows that encodes scores 0 .. <5 > and 98 as 0. > > s panTro2.chr21 13793045 70 + 46489110 > CTTGTGTGCCACCATCCCTGACTTTGTTGATAAGGGCATCAGGCTACATCCCTCTGGTACTCAGTGGTAA > q panTro2.chr21 > 0000000000000000000000000000000000000000000000000000000000000000000000 > > That means that any attempt to filter out bad quality from a maf will > fail for chr21 and Y because one cannot distinguish between a real > quality score of say 3 and missing data (98) because both end up as 0. > > I have the following questions/suggestions: > 1. Is there any species where 98 represents a real quality score (I mean > 97 < 98 < 99) or is 98 always missing data? > 2. Would it make sense to encode score 98 in the maf as '.' like it is > done for gaps? Then one can distinguish between bad qual and missing data. > 3. For chimp: chrY, chrY_random and chr21 have no quality scores in the > browser display and in the quality wib table. However, the region > chr7:87674857-92389096 has quality score 98. And these regions in the > hg18 44-way maf are contain a 0 in the q lines. Is the region > chr7:87674857-92389096 different from chrY or chr21? And why is it > treated differently? > > I think the quality score annotation of mafs is very useful, especially > because of many low coverage genomes. > > Thanks a lot for your help. > - Michael > > > > > ------------------------------ > > _______________________________________________ > Genome maillist - [email protected] > http://www.soe.ucsc.edu/mailman/listinfo/genome > > > End of Genome Digest, Vol 74, Issue 32 > ************************************** > -- http://www.watch-movies-online-hollywoodkiller.com _______________________________________________ Genome maillist - [email protected] http://www.soe.ucsc.edu/mailman/listinfo/genome
