On Sun, May 22, 2011 at 8:02 AM, Brent Pedersen <[email protected]> wrote: > hi, I have grabbed some data from mysql like this: > > mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -D $ORG -P > 3306 -e "select > chrom,txStart,txEnd,cdsStart,cdsEnd,K.name,X.geneSymbol,proteinID,strand,exonStarts,exonEnds > from knownGene as K,kgXref as X where X.kgId=K.name > > I have a couple questions about the data. First, a row like this: > > chrom txStart txEnd cdsStart cdsEnd name geneSymbol > proteinID strand exonStarts exonEnds > chr17 46103534 46115152 46103793 46115139 > uc002imy.2 COPZ2 Q9P299 - > 46103534,46105837,46106490,46109521,46110051,46110576,46111228,46114216,46115032,46115092,46115124, > > 46103841,46105876,46106542,46109599,46110107,46110668,46111310,46114291,46115092,46115122,46115152, > > note that the 2nd-to-last exonStart is the same as the 3rd-from-last > exonEnd: 46115092. Does this mean a 0 length intron? And what does > that mean within a transcript?
Can anyone comment on this? Is there some way I can clarify the question? I find cases like this 185 times in hg19 Thanks, -Brent > > Second question: for this same row; is it correct to infer that the > first exon in (0-based) bed format would be: > start=46103534, end=46103841 > and the first intron would be: > start=46103841 end=46105837 > > but then the problem is that start == end in for the 0-length intron. > > I have seen this: http://genome.ucsc.edu/FAQ/FAQtracks#tracks1 so the > internal format matches the BED format, correct? > > thanks, > -Brent > _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome
