1. See attached graphic. If there is no ortholog to human for a given mouse or
yeast gene, how can there be links to a human gene in Gene Details and Gene
Sorter? These links were correct so for some reason there was a failure to find
the ortholog and browser location (of course those are given at both the gene
details and gene sorter links).
I think the problem here is New UCSC Genes vs Old UCSC genes and certain old uc
numbers no longer corresponding to anything. That may mean the underlying
tables for Orthologous Genes in Other Species" were never updated to New UCSC
Genes. That is, Gene Sorter and this table aren't on the 'same page'.
Example:
Human Gene RPL11 (uc001bhk.3)
S. cerevisiae Gene RPL11A (YPR102C)
2. I am seeing hundreds of cases where a single human gene has multiple yeast
'orthologs'. This contradicts our claim that we are using best reciprocal
blastp. That would put orthologs in a 1-1 relationship. The real problem is
yeast has a lot of duplicated proteins that are nearly identical to each other.
Humans also have a lot of duplicated proteins that are nearly identical to each
other and homologous to the yeast set. It is very problematic to match these
up.
yeast human
YGR214W uc003cjr.2
YLR048W uc003cjr.2
YHR216W uc003vmx.2
YLR432W uc003vmx.2
YML056C uc003vmx.2
YBL068W uc004cvb.2
YER099C uc004cvb.2
YHL011C uc004cvb.2
sacCer3.sgdGene
sacCer3.hgBlastTab fields
3. A prose problem deep in the tables:
http://genome-test.cse.ucsc.edu/cgi-bin/hgGene?hgsid=3502800&hgg_do_otherProteinAli=on&hgg_otherPepTable=mm9.knownGenePep&hgg_otherId=uc009mke.1
"The single best exon chains extending over more than 60% of the query protein
were included. Exon chains that extended over 60% of the query and matched at
least 60% of the protein's amino acids were also included."
That should read:
"The single best exon chains extending over more than 60% of the query protein
were included. Other exon chains that extended over 60% of the query and
matched at least 60% of the protein's amino acids were also included."
3. More unclear prose:
Schema for Human Proteins - Human Proteins Mapped by Chained tBLASTn
"ID (including gaps) 97.9%, coverage (of both) 100.0%,..."
I don't know what coverage of both could possiblyh. We have qStart, qEnd,
tStart, tEnd which make sense. The match starts at position such and such in
the query and ends a ways later. That match corresponds to a position range in
the target. I don't see how or why that should be shortened.
YAL012W 8 392 17 397
YAL061W 2 289 10 271
YAL060W 9 237 17 220
YAL058W 49 443 82 466
YAL054C 66 707 37 692
YAL048C 2 623 1 591
YAL046C 19 107 13 102
_______________________________________________
Genome maillist - [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome