On Tue, May 24, 2011 at 9:30 AM, Galt Barber <[email protected]> wrote:
>
> Hi, Brent!
>
> The psl record found in kgTargetAli has already been changed for genepred
> use.
>
> If you just blat uc002imy.2 to hg19 you can see everything that's happening.
>
> 1. 16-bases of the poly-A tail are removed from the picture.
> 2. A small exon of 5 bases is merged into its neighbor exon.
>  (This wipes out 1 tiny q-gap).
> 3. 2 tiny q-gaps are ignored since they cannot be represented in genepred.
>
> The original blat psl shows them though.
>
>>uc002imy.2 (COPZ2) length=923
> ggcggcgagcggaatgcagcggcccgaggcctggccacgtccgcacccgggggagggggc
> cgcggcggcccaggccgggggcccggcgccgcctgctcgagccggggagccctcggggct
> gcggttgcaggaaccttccctctacaccatcaaggctgttttcatcctagataatgacgg
> gcgccggctgctggccaagtattatgatgacacattcccctccatgaaggagcagatggt
> tttcgagaaaaatgtcttcaacaagaccagccggactgagagtgagattgcattttttgg
> gggtatgaccatcgtctacaagaacagcattgacctcttcctatacgtggtgggctcatc
> ctacgagaatgagctgatgctcatgtctgttctcacctgcctgtttgagtctctgaacca
> catgttaaggaagaacgtggagaagcgctggttgctggagaacatggacggagccttctt
> ggtgctggacgagattgtggatggcggtgtgattctggagagtgacccccagcaagtgat
> ccagaaggtgaattttagggcagatgatggcggcttgactgaacagagtgtggcccaggt
> tcttcagtctgccaaggaacaaattaaatggtcgttattgaaatgaaggctgtggattca
> aggctccctgccccccagatcatttccccaatcctggcaaaagcccaaagatcccagggt
> caggagagacccctctgtatccccaggtccctcccagaactgactcctaaggtctccagc
> cagggcttctgagatgcaaaggtttggcctcaggagagtcaccttttctcacggccctgg
> ccttaactcatatcttaggcattcctggccccagggccctaataaacctgcttttgtctt
> ctgccaaaaaaaaaaaaaaaaaa
>
> The quick answer is that there were tiny 1 and 2 bp gaps on
> the query side that caused the alignment to be broken on the
> target side.
>
> People are used to seeing gaps of size zero on the query side
> all the time. They are not used to seeing it on the target side.
> This is just the flip-side of having an insert on the opposite side.
>
> I suppose that a human looking at these would merge them together.
> If the second to the last intron is size 0, the last intron is only
> size 2 which is not biologically realistic.
>
> So really, there are 9 exons here, not 11 (genepred) or 12 (blat psl).
>
> -Galt


Thanks Galt, this makes sense. I will tinker with BLAT and try to
understand fully.
-B

>
> 5/24/2011 8:18 AM, Brent Pedersen:
>>
>> On Sun, May 22, 2011 at 8:02 AM, Brent Pedersen<[email protected]>
>>  wrote:
>>>
>>> hi, I have grabbed some data from mysql like this:
>>>
>>> mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -D $ORG -P
>>> 3306   -e "select
>>>
>>> chrom,txStart,txEnd,cdsStart,cdsEnd,K.name,X.geneSymbol,proteinID,strand,exonStarts,exonEnds
>>> from knownGene as K,kgXref as X where  X.kgId=K.name
>>>
>>> I have a couple questions about the data. First, a row like this:
>>>
>>> chrom   txStart txEnd   cdsStart        cdsEnd  name    geneSymbol
>>>  proteinID       strand  exonStarts      exonEnds
>>> chr17    46103534       46115152        46103793        46115139
>>>  uc002imy.2      COPZ2   Q9P299  -
>>> 46103534,46105837,46106490,46109521,46110051,46110576,46111228,46114216,46115032,46115092,46115124,
>>>
>>> 46103841,46105876,46106542,46109599,46110107,46110668,46111310,46114291,46115092,46115122,46115152,
>>>
>>> note that the 2nd-to-last exonStart is the same as the 3rd-from-last
>>> exonEnd: 46115092. Does this mean a 0 length intron? And what does
>>> that mean within a transcript?
>>
>> Can anyone comment on this? Is there some way I can clarify the question?
>> I find cases like this 185 times in hg19
>> Thanks,
>> -Brent
>>
>>
>>>
>>> Second question: for this same row; is it correct to infer that the
>>> first exon in (0-based) bed format would be:
>>>  start=46103534, end=46103841
>>> and the first intron would be:
>>>  start=46103841 end=46105837
>>>
>>> but then the problem is that start == end in for the 0-length intron.
>>>
>>> I have seen this: http://genome.ucsc.edu/FAQ/FAQtracks#tracks1 so the
>>> internal format matches the BED format, correct?
>>>
>>> thanks,
>>> -Brent
>>>
>>
>> _______________________________________________
>> Genome maillist  -  [email protected]
>> https://lists.soe.ucsc.edu/mailman/listinfo/genome
>
>

_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Reply via email to