-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Yes, it is exactly the same case.
You are not just filtering on a single chromosome - you are also filtering on a number of other things, including with_entrezgene and transcript_status. Transcript_status is a transcript-level filter/attribute, and therefore it uses the transcript table to do the query instead of the gene table. The transcript table contains multiple entries per gene, and so you see multiple results - if you selected transcript ids in your attributes you would see that each row actually refers to a different transcript. If you filtered on chromosome alone and removed those other filters, you'd see the duplicate results would disappear. We are adding an optional 'unique results only' feature to the forthcoming 0.6 release which will help reduce these situations. cheers, Richard Bogdan wrote: > Hi all, > > referencing > http://listserver.ebi.ac.uk/mailing-lists-archives/mart-dev/msg01094.html > , > I observed 'gene duplication' when attempting to export genes filtered > by a single chromosome. I wonder if the explanation is the same as for > the aforementioned case - despite the single-chromosome filter. > > here's the query used (via http://www.biomart.org/biomart/martview/): > <?xml version="1.0" encoding="UTF-8"?> > <!DOCTYPE Query> > <Query virtualSchemaName = "default" header = "0" count = "" > softwareVersion = "0.5" > > <Dataset name = "rnorvegicus_gene_ensembl" interface = "default" > > <Attribute name = "ensembl_gene_id" /> > <Attribute name = "chromosome_name" /> > <Attribute name = "description" /> > <Attribute name = "start_position" /> > <Attribute name = "end_position" /> > <Attribute name = "strand" /> > <Filter name = "chromosome_name" value = "X"/> > <Filter name = "with_entrezgene" excluded = "0"/> > <Filter name = "transcript_status" value = "KNOWN"/> > <Filter name = "biotype" value = "protein_coding"/> > </Dataset> > </Query> > > here's a link to results via webservice: > http://www.biomart.org/biomart/martservice?query=%3C?xml%20version=%221.0%22%20encoding=%22UTF-8%22?%3E%3C!DOCTYPE%20Query%3E%3CQuery%20%20virtualSchemaName%20=%20%22default%22%20header%20=%20%220%22%20count%20=%20%22%22%20softwareVersion%20=%20%220.5%22%20%3E%3CDataset%20name%20=%20%22rnorvegicus_gene_ensembl%22%20interface%20=%20%22default%22%20%3E%3CAttribute%20name%20=%20%22ensembl_gene_id%22%20/%3E%3CAttribute%20name%20=%20%22chromosome_name%22%20/%3E%3CAttribute%20name%20=%20%22description%22%20/%3E%3CAttribute%20name%20=%20%22start_position%22%20/%3E%3CAttribute%20name%20=%20%22end_position%22%20/%3E%3CAttribute%20name%20=%20%22strand%22%20/%3E%3CFilter%20name%20=%20%22chromosome_name%22%20value%20=%20%22X%22/%3E%3CFilter%20name%20=%20%22with_entrezgene%22%20excluded%20=%20%220%22/%3E%3CFilter%20name%20=%20%22transcript_status%22%20value%20=%20%22KNOWN%22/%3E%3CFilter%20name%20=%20%22biotype%22%20value%20=%20%22protein_coding%22/%3E%3C/Dataset%3E%3C/Query%3E > > > And here's a portion of the output (sorted ASC by ensembl_gene_id): > > ENSRNOG00000002437 X 124527886 124528491 -1 > ENSRNOG00000002449 X melanoma antigen family D, 2 > [Source:RefSeq_peptide;Acc:NP_536727] 40056332 40064505 -1 > ENSRNOG00000002449 X melanoma antigen family D, 2 > [Source:RefSeq_peptide;Acc:NP_536727] 40056332 40064505 -1 > ENSRNOG00000002451 X 94617360 94687407 -1 > > and one more duplicate sample: > ENSRNOG00000003622 X cytochrome b-245, beta polypeptide > [Source:RefSeq_peptide;Acc:NP_076455] 25514572 25547181 -1 > ENSRNOG00000003667 X Dystrophin (Fragment). > [Source:Uniprot/SWISSPROT;Acc:P11530] 69607890 71671414 1 > ENSRNOG00000003667 X Dystrophin (Fragment). > [Source:Uniprot/SWISSPROT;Acc:P11530] 69607890 71671414 1 > ENSRNOG00000003674 X pirin > [Source:RefSeq_peptide;Acc:NP_001009474] 50864981 50974861 -1 > > there might be other duplicates, I just didn't look for more. > > so is the reason the same as described in > http://listserver.ebi.ac.uk/mailing-lists-archives/mart-dev/msg01094.html > , or this is something different? > > Thanks in advance, > -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.2 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFGZVGb4C5LeMEKA/QRAphQAJkB5zZ7yoo6cJLzTFGITi+EhhJdPACfXWC+ ud7JTlQtAxNPT/+4StQouc8= =sjk6 -----END PGP SIGNATURE-----
