Hello Ricardo, After considering your question with some internal discussion, the most accurate method may be the simplest. Compare probes to transcripts, generate alignment, analyze results.
Alternatively, there are three programs in our code tree that do comparisons between tracks mapped to genomic. None do this exact task in one step. Some experimentation on your part would be required. http://genome.ucsc.edu/FAQ/FAQdownloads#download27 Thank you, Jennifer Jackson UCSC Genome Bioinformatics Group ------------------------------- hgMapToGene - Map a track to a genePred track. usage: hgMapToGene database track geneTrack mapTable where: database - is a browser database (ex. hg16) track - the track to map to the genePred track This must be bed12 psl or genePred itself geneTrack - a genePred format track newMapTable - a new table to create with two columns <geneTrackId><trackId> pslMap - map PSLs alignments to new targets using alignments of the old target to the new target. Given inPsl and mapPsl, where the target of inPsl is the query of mapPsl, create a new PSL with the query of inPsl aligned to the target of mapPsl. If inPsl is a protein to nucleotide alignment and mapPsl is a nucleotide to nucleotide alignment, the resulting alignment is nucleotide to nucleotide alignment of a hypothetical mRNA that would code for the protein. This is useful as it gives base alignments of spliced codons. A chain file may be used instead mapPsl usage: pslMap [options] inPsl mapFile outPsl Options: -chainMapFile - mapFile is a chain file instead of a psl file -swapMap - swap query and target sides of map file. -swapIn - swap query and target sides of inPsl file. -suffix=str - append str to the query ids in the output alignment. Useful with protein alignments, where the result is not actually and alignment of the protein. -keepTranslated - if either psl is translated, the output psl will be translated (both strands explicted). Normally an untranslated psl will always be created -mapInfo=file - output a file with information about each mapping. The file has the following columns: o srcQName, srcQStart, srcQEnd, srcQSize - qName, etc of psl being mapped (source alignment) o srcTName, srcTStart, srcTEnd - tName, etc of psl being mapped o srcStrand - strand of psl being mapped o srcAligned - number of aligned based in psl being mapped o mappingQName, mappingQStart, mappingQEnd - qName, etc of mapping psl used to map alignment o mappingTName, mappingTStart, mappingTEnd - tName, etc of mapping psl o mappingStrand - strand of mapping psl o mappingId - chain id, or psl file row o mappedQName mappedQStart, mappedQEnd - qName, etc of mapped psl o mappedTName, mappedTStart, mappedTEnd - tName, etc of mapped psl o mappedStrand - strand of mapped psl o mappedAligned - number of aligned bases that were mapped o qStartTrunc - aligned bases at qStart not mapped due to mapping psl/chain not covering the entire soruce psl. This is from the start of the query in the positive direction. o qEndTrunc - similary for qEnd -simplifyMappingIds - simplifying mapping ids (inPsl target name and mapFile query name) before matching them. This first drops everything after the last `-', and then drops everything after the last remaining `.'. featureBits - Correlate tables via bitmap projections. usage: featureBits database table(s) This will return the number of bits in all the tables anded together Pipe warning: output goes to stderr. Options: -bed=output.bed Put intersection into bed format. Can use stdout. -fa=output.fa Put sequence in intersection into .fa file -faMerge For fa output merge overlapping features. -minSize=N Minimum size to output (default 1) -chrom=chrN Restrict to one chromosome -chromSize=sizefile Read chrom sizes from file instead of database. -or Or tables together instead of anding them -not Output negation of resulting bit set. -countGaps Count gaps in denominator -noRandom Don't include _random (or Un) chromosomes -noHap Don't include _hap chromosomes -dots=N Output dot every N chroms (scaffolds) processed -minFeatureSize=n Don't include bits of the track that are smaller than minFeatureSize, useful for differentiating between alignment gaps and introns. -bin=output.bin Put bin counts in output file -binSize=N Bin size for generating counts in bin file (default 500000) -binOverlap=N Bin overlap for generating counts in bin file (default 250000) -bedRegionIn=input.bed Read in a bed file for bin counts in specific regions and write to bedRegionsOut -bedRegionOut=output.bed Write a bed file of bin counts in specific regions from bedRegionIn -enrichment Calculates coverage and enrichment assuming first table is reference gene track and second track something else '-where=some sql pattern' Restrict to features matching some sql pattern You can include a '!' before a table name to negate it. Some table names can be followed by modifiers such as: :exon:N Break into exons and add N to each end of each exon :cds Break into coding exons :intron:N Break into introns, remove N from each end :utr5, :utr3 Break into 5' or 3' UTRs :upstream:N Consider the region of N bases before region :end:N Consider the region of N bases after region :score:N Consider records with score >= N :upstreamAll:N Like upstream, but doesn't filter out genes that have txStart==cdsStart or txEnd==cdsEnd :endAll:N Like end, but doesn't filter out genes that have txStart==cdsStart or txEnd==cdsEnd The tables can be bed, psl, or chain files, or a directory full of such files as well as actual database tables. To count the bits used in dir/chrN_something*.bed you'd do: featureBits database dir/_something.bed Ricardo Verdugo wrote: > Hello: > > I have been looking for a tool that can compare two lists of multi- > block genomic features (or genome alignments in PSL, e.g.) and output > the list of features that overlap. I have found the intersectBed (BED > format) and the fjoin (GFF format) programs, which do this but in a > one block basis, i.e. they intersect one block or exon at a time, > regardless of wether other blocks also match or not. > > I am interested in annotating microarray probes with genes and > transcripts. If a probe align to the genome say, in two blocks > separated by a gap, this probe should be annotated to a transcript > only if both blocks match (overlap) two consecutive exons in a > transcript. > > I would like to know if you have any program that can do this before I > decide if I should write my own. > > Thank you in advance, > > Ricardo Verdugo > > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > Ricardo A. Verdugo S., D.V.M. Ph.D. > Postdoctoral Fellow > Gary Churchill Group > The Jackson Laboratory > 600 Main Street, > Bar Harbor, ME 04609 > [email protected] > +1-207-288-6715 > +1-207-288-6847 Fax > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > _______________________________________________ > Genome maillist - [email protected] > https://lists.soe.ucsc.edu/mailman/listinfo/genome _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome
