Hi all,

referencing 
http://listserver.ebi.ac.uk/mailing-lists-archives/mart-dev/msg01094.html
,
I observed 'gene duplication' when attempting to export genes filtered
by a single chromosome. I wonder if the explanation is the same as for
the aforementioned case - despite the single-chromosome filter.

here's the query used (via http://www.biomart.org/biomart/martview/):
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE Query>
<Query  virtualSchemaName = "default" header = "0" count = ""
softwareVersion = "0.5" >
        <Dataset name = "rnorvegicus_gene_ensembl" interface = "default" >
                <Attribute name = "ensembl_gene_id" />
                <Attribute name = "chromosome_name" />
                <Attribute name = "description" />
                <Attribute name = "start_position" />
                <Attribute name = "end_position" />
                <Attribute name = "strand" />
                <Filter name = "chromosome_name" value = "X"/>
                <Filter name = "with_entrezgene" excluded = "0"/>
                <Filter name = "transcript_status" value = "KNOWN"/>
                <Filter name = "biotype" value = "protein_coding"/>
        </Dataset>
</Query>

here's a link to results via webservice:
http://www.biomart.org/biomart/martservice?query=%3C?xml%20version=%221.0%22%20encoding=%22UTF-8%22?%3E%3C!DOCTYPE%20Query%3E%3CQuery%20%20virtualSchemaName%20=%20%22default%22%20header%20=%20%220%22%20count%20=%20%22%22%20softwareVersion%20=%20%220.5%22%20%3E%3CDataset%20name%20=%20%22rnorvegicus_gene_ensembl%22%20interface%20=%20%22default%22%20%3E%3CAttribute%20name%20=%20%22ensembl_gene_id%22%20/%3E%3CAttribute%20name%20=%20%22chromosome_name%22%20/%3E%3CAttribute%20name%20=%20%22description%22%20/%3E%3CAttribute%20name%20=%20%22start_position%22%20/%3E%3CAttribute%20name%20=%20%22end_position%22%20/%3E%3CAttribute%20name%20=%20%22strand%22%20/%3E%3CFilter%20name%20=%20%22chromosome_name%22%20value%20=%20%22X%22/%3E%3CFilter%20name%20=%20%22with_entrezgene%22%20excluded%20=%20%220%22/%3E%3CFilter%20name%20=%20%22transcript_status%22%20value%20=%20%22KNOWN%22/%3E%3CFilter%20name%20=%20%22biotype%22%20value%20=%20%22protein_coding%22/%3E%3C/Dataset%3E%3C/Query%3E

And here's a portion of the output (sorted ASC by ensembl_gene_id):

ENSRNOG00000002437      X               124527886       124528491       -1
ENSRNOG00000002449      X       melanoma antigen family D, 2
[Source:RefSeq_peptide;Acc:NP_536727]   40056332        40064505        -1
ENSRNOG00000002449      X       melanoma antigen family D, 2
[Source:RefSeq_peptide;Acc:NP_536727]   40056332        40064505        -1
ENSRNOG00000002451      X               94617360        94687407        -1

and one more duplicate sample:
ENSRNOG00000003622      X       cytochrome b-245, beta polypeptide
[Source:RefSeq_peptide;Acc:NP_076455]   25514572        25547181        -1
ENSRNOG00000003667      X       Dystrophin (Fragment).
[Source:Uniprot/SWISSPROT;Acc:P11530]   69607890        71671414        1
ENSRNOG00000003667      X       Dystrophin (Fragment).
[Source:Uniprot/SWISSPROT;Acc:P11530]   69607890        71671414        1
ENSRNOG00000003674      X       pirin
[Source:RefSeq_peptide;Acc:NP_001009474]        50864981        50974861        
-1

there might be other duplicates, I just didn't look for more.

so is the reason the same as described in
http://listserver.ebi.ac.uk/mailing-lists-archives/mart-dev/msg01094.html
, or this is something different?

Thanks in advance,

--
Sincerely yours,
Bogdan Tokovenko,
PhD student at the Laboratory of Protein Biosynthesis,
Department of Genetic Information Translation Mechanisms,
Institute of Molecular Biology and Genetics, Kyiv, Ukraine
http://bogdan.org.ua/

Reply via email to