-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Yes, it is exactly the same case.

You are not just filtering on a single chromosome - you are also
filtering on a number of other things, including with_entrezgene and
transcript_status. Transcript_status is a transcript-level
filter/attribute, and therefore it uses the transcript table to do the
query instead of the gene table. The transcript table contains multiple
entries per gene, and so you see multiple results - if you selected
transcript ids in your attributes you would see that each row actually
refers to a different transcript.

If you filtered on chromosome alone and removed those other filters,
you'd see the duplicate results would disappear.

We are adding an optional 'unique results only' feature to the
forthcoming 0.6 release which will help reduce these situations.

cheers,
Richard

Bogdan wrote:
> Hi all,
> 
> referencing
> http://listserver.ebi.ac.uk/mailing-lists-archives/mart-dev/msg01094.html
> ,
> I observed 'gene duplication' when attempting to export genes filtered
> by a single chromosome. I wonder if the explanation is the same as for
> the aforementioned case - despite the single-chromosome filter.
> 
> here's the query used (via http://www.biomart.org/biomart/martview/):
> <?xml version="1.0" encoding="UTF-8"?>
> <!DOCTYPE Query>
> <Query  virtualSchemaName = "default" header = "0" count = ""
> softwareVersion = "0.5" >
>     <Dataset name = "rnorvegicus_gene_ensembl" interface = "default" >
>              <Attribute name = "ensembl_gene_id" />
>              <Attribute name = "chromosome_name" />
>              <Attribute name = "description" />
>              <Attribute name = "start_position" />
>              <Attribute name = "end_position" />
>              <Attribute name = "strand" />
>             <Filter name = "chromosome_name" value = "X"/>
>             <Filter name = "with_entrezgene" excluded = "0"/>
>             <Filter name = "transcript_status" value = "KNOWN"/>
>             <Filter name = "biotype" value = "protein_coding"/>
>     </Dataset>
> </Query>
> 
> here's a link to results via webservice:
> http://www.biomart.org/biomart/martservice?query=%3C?xml%20version=%221.0%22%20encoding=%22UTF-8%22?%3E%3C!DOCTYPE%20Query%3E%3CQuery%20%20virtualSchemaName%20=%20%22default%22%20header%20=%20%220%22%20count%20=%20%22%22%20softwareVersion%20=%20%220.5%22%20%3E%3CDataset%20name%20=%20%22rnorvegicus_gene_ensembl%22%20interface%20=%20%22default%22%20%3E%3CAttribute%20name%20=%20%22ensembl_gene_id%22%20/%3E%3CAttribute%20name%20=%20%22chromosome_name%22%20/%3E%3CAttribute%20name%20=%20%22description%22%20/%3E%3CAttribute%20name%20=%20%22start_position%22%20/%3E%3CAttribute%20name%20=%20%22end_position%22%20/%3E%3CAttribute%20name%20=%20%22strand%22%20/%3E%3CFilter%20name%20=%20%22chromosome_name%22%20value%20=%20%22X%22/%3E%3CFilter%20name%20=%20%22with_entrezgene%22%20excluded%20=%20%220%22/%3E%3CFilter%20name%20=%20%22transcript_status%22%20value%20=%20%22KNOWN%22/%3E%3CFilter%20name%20=%20%22biotype%22%20value%20=%20%22protein_coding%22/%3E%3C/Dataset%3E%3C/Query%3E
> 
> 
> And here's a portion of the output (sorted ASC by ensembl_gene_id):
> 
> ENSRNOG00000002437    X        124527886    124528491    -1
> ENSRNOG00000002449    X    melanoma antigen family D, 2
> [Source:RefSeq_peptide;Acc:NP_536727]    40056332    40064505    -1
> ENSRNOG00000002449    X    melanoma antigen family D, 2
> [Source:RefSeq_peptide;Acc:NP_536727]    40056332    40064505    -1
> ENSRNOG00000002451    X        94617360    94687407    -1
> 
> and one more duplicate sample:
> ENSRNOG00000003622    X    cytochrome b-245, beta polypeptide
> [Source:RefSeq_peptide;Acc:NP_076455]    25514572    25547181    -1
> ENSRNOG00000003667    X    Dystrophin (Fragment).
> [Source:Uniprot/SWISSPROT;Acc:P11530]    69607890    71671414    1
> ENSRNOG00000003667    X    Dystrophin (Fragment).
> [Source:Uniprot/SWISSPROT;Acc:P11530]    69607890    71671414    1
> ENSRNOG00000003674    X    pirin
> [Source:RefSeq_peptide;Acc:NP_001009474]    50864981    50974861    -1
> 
> there might be other duplicates, I just didn't look for more.
> 
> so is the reason the same as described in
> http://listserver.ebi.ac.uk/mailing-lists-archives/mart-dev/msg01094.html
> , or this is something different?
> 
> Thanks in advance,
> 
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGZVGb4C5LeMEKA/QRAphQAJkB5zZ7yoo6cJLzTFGITi+EhhJdPACfXWC+
ud7JTlQtAxNPT/+4StQouc8=
=sjk6
-----END PGP SIGNATURE-----

Reply via email to