Hi folks -- yes, this is an unfortunate problem, but I've always  
resisted tackling it at the level of the phastCons tracks.  It really  
would be best to filter these elements out of the assemblies.  Barring  
this, I would suggest addressing it at the level of the alignments,  
because it's not just phastCons that is affected -- any method that  
makes use of patterns of conservation in the multiple alignments is  
likely to be confused by these regions.
Adam

On Feb 5, 2010, at 12:49 PM, Jim Kent wrote:

> I remember facing this issue of conservation via human contamination  
> when we were first
> doing comparative genomics when the mouse was sequenced.  It's one  
> reason we didn't call
> the ultraconserved regions at that point.  It wasn't until the rat  
> sequence was available and
> they were conserved there that we were convinced they weren't  
> artifacts.  In the process we
> did flag them in the mouse and get the assemblers to remove ones  
> that where there was not
> excellent evidence joining them to non-conserved regions in the  
> mouse assembly.
>
> So, I am not surprised this is a problem.  The best solution is to  
> get the xenopus and zebrafish
> assemblies cleaned up. I'll cc this message to zfish- 
> [email protected] the help link for
> Zebrafish, and to Dan Rhoksar who I know did some work at least in  
> the past on Xenopus.
> I'll also cc Adam Seipel the author of phastCons, and our own David  
> Haussler to collect
> their thoughts on the best way to proceed.
>
> Take care
>       Jim
>
> On Feb 5, 2010, at 1:41 AM, Philippe Gautier wrote:
>
>> Hello,
>> I'm new on this mailing list so, apologies if it's the wrong place to
>> ask the following question. Feel free to redirect me if needed!
>> I'm working in a Bioinformatics service in our Unit and someone  
>> asked me
>> if they could get a list of most conserved elements in vertebrates. I
>> thought "easy, I just have to download the phastConsElements46way  
>> table
>> and take the highest score ones.
>> I decided to check "manually" a few of them and was horrified to see
>> that all (or most) seem to be artifacts due to human genomic DNA
>> contaminant in other species.
>> One example: the longest element:
>> chr5:69686054-6970347 in GRch37, lod=14726, score=995.
>> looks like it is conserved only in Xenopus and not other vertebrates
>> (looking at the Multi Z alignment tracks). And when I realigned it to
>> the corresponding Xenopus genomic sequence (scaffold_7921:  
>> 87-17248) it
>> is virtually identical (>97% over 17Kb), undoubtedly a contamination!
>> Moreover, I looked at several other elements down the list and almost
>> all the top one (longest ones) are similar: not conserved in any
>> vertebrate, except in Xenopus or Zebrafish. These pieces of DNA do
>> contain LINE or LTR repeats so, are present in the human genome in
>> multiple copies, but that does not explain such a high conservation  
>> in
>> frog or fish, and could only be explain by genome contaminations.
>> Obviously, it is a problem at the assembly level, but I was also
>> wondering if these elements should not be filtered out of the  
>> phastCons
>> element list?
>>
>> Philippe
>>
>> -- 
>> Philippe Gautier
>> Bioinformatics Service
>> MRC - Human Genetics Unit
>> Western General Hospital
>> Crewe Road
>> Edinburgh EH4 2XU
>> U.K.
>> tel: 0131 332 24 71
>>
>>
>>
>> _______________________________________________
>> Genome maillist  -  [email protected]
>> https://lists.soe.ucsc.edu/mailman/listinfo/genome
>

_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Reply via email to