Included below is a simple awk script that converts
genePred to bed format: genePredToBed

--Hiram

Rathi Thiagarajan wrote:
> Hi Hiram,
> 
> Thank you very much for gene track. It's exactly what I wanted!
> 
> I am currently trying to get this table into a BED format however 
> noticed that the last two columns actually contains the exonStarts and 
> exonEnds rather than blockSizes and blockStarts (which would be 
> exonStarts). Is the blockSizes information readily available somewhere 
> where I can get access to it?
> 
> Thanks again for all your help.
> 
> Cheers,
> Rathi
> 
> On Sun, 04 Apr 2010 06:03:03 +1000, Hiram Clawson <[email protected]> 
> wrote:
> 
>> Good Afternoon Rathi:
>>
>> You can get a 'single coverage' gene track out of the mm9 refGene
>> table with the following mysql and kent source tree command:
>>
>> $ mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A mm9 -Ne \
>>      "select 
>> name,chrom,strand,txStart,txEnd,cdsStart,cdsEnd,exonCount,exonStarts,exonEnds
>>  
>> from
>> refGene" > mm9.refGene.gp
>>
>> $ genePredSingleCover mm9.refGene.gp stdout | sort > 
>> mm9.refGene.singleCover.gp
>>
>> See also:
>> http://genome.ucsc.edu/admin/cvs.html
>> http://genome.ucsc.edu/admin/jk-install.html
>>
>> And two scripts and a configuration file that can fetch and
>> build the source tree:
>>
>> http://genome-test.cse.ucsc.edu/~kent/src/unzipped/product/scripts/kentSrcUpdate.sh
>>  
>>
>> http://genome-test.cse.ucsc.edu/~kent/src/unzipped/product/scripts/beta.cvsup.pl
>>  
>>
>> http://genome-test.cse.ucsc.edu/~kent/src/unzipped/product/scripts/browserEnvironment.txt
>>  
>>
>>
>> --Hiram

#!/usr/bin/awk -f

#
# Convert genePred file to a bed file (on stdout)
#
BEGIN {
     FS="\t";
     OFS="\t";
}
{
     name=$1
     chrom=$2
     strand=$3
     start=$4
     end=$5
     cdsStart=$6
     cdsEnd=$7
     blkCnt=$8

     delete starts
     split($9, starts, ",");
     delete ends
     split($10, ends, ",");
     blkStarts=""
     blkSizes=""
     for (i = 1; i <= blkCnt; i++) {
         blkSizes = blkSizes (ends[i]-starts[i]) ",";
         blkStarts = blkStarts (starts[i]-start) ",";
     }

     print chrom, start, end, name, 1000, strand, cdsStart, cdsEnd, 0, blkCnt, 
blkSizes, blkStarts
}
_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Reply via email to