Hello, I'm new on this mailing list so, apologies if it's the wrong place to ask the following question. Feel free to redirect me if needed! I'm working in a Bioinformatics service in our Unit and someone asked me if they could get a list of most conserved elements in vertebrates. I thought "easy, I just have to download the phastConsElements46way table and take the highest score ones. I decided to check "manually" a few of them and was horrified to see that all (or most) seem to be artifacts due to human genomic DNA contaminant in other species. One example: the longest element: chr5:69686054-6970347 in GRch37, lod=14726, score=995. looks like it is conserved only in Xenopus and not other vertebrates (looking at the Multi Z alignment tracks). And when I realigned it to the corresponding Xenopus genomic sequence (scaffold_7921: 87-17248) it is virtually identical (>97% over 17Kb), undoubtedly a contamination! Moreover, I looked at several other elements down the list and almost all the top one (longest ones) are similar: not conserved in any vertebrate, except in Xenopus or Zebrafish. These pieces of DNA do contain LINE or LTR repeats so, are present in the human genome in multiple copies, but that does not explain such a high conservation in frog or fish, and could only be explain by genome contaminations. Obviously, it is a problem at the assembly level, but I was also wondering if these elements should not be filtered out of the phastCons element list?
Philippe -- Philippe Gautier Bioinformatics Service MRC - Human Genetics Unit Western General Hospital Crewe Road Edinburgh EH4 2XU U.K. tel: 0131 332 24 71 _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome
