Hi Peter
The genes in your output are the ones located on PAR Regions, suggesting
they existing on both X and Y chromosome. But they have the same Ensembl
Gene Id.
Hope this explains the scenario.

Cheers
Syed

On Wed, 2007-05-23 at 11:32 -0400, Peter Andrews wrote:
> I was exporting some upstream sequences for Homo sapiens. Of the
> 31,545 genes exported (no filters) I received 21 duplicates. Both the
> fasta header line with '>' and the upstream sequence were identical in
> all cases. Here is some debugging output showing details:
> 
> NOTE 1th duplicate, at fasta input record 30639: [ENSG00000185960,
> ENSG00000185960.4]. gene identifier 'ENSG00000185960' previously found
> at fasta input record 8429 which has these geneIds: [ENSG00000185960,
> ENSG00000185960.4].  Do the sequences match? true Partial old
> sequence:
> TAAAAAGAAAAGTGTTTCCTCCCTGGCTGGAGGACCCAGGAGGAGGTCCCAGTTTTCCGGTGGGGATGGGCGTGGAGTAGGGGGCGGGGAAGGGATGAGG
>  Partial new sequence: 
> TAAAAAGAAAAGTGTTTCCTCCCTGGCTGGAGGACCCAGGAGGAGGTCCCAGTTTTCCGGTGGGGATGGGCGTGGAGTAGGGGGCGGGGAAGGGATGAGG
> NOTE 2th duplicate, at fasta input record 30727: [ENSG00000197976,
> ENSG00000197976.2]. gene identifier 'ENSG00000197976' previously found
> at fasta input record 9268 which has these geneIds: [ENSG00000197976,
> ENSG00000197976.2].  Do the sequences match? true Partial old
> sequence:
> CCTTCCCCTCCCCTCCCCTCCTTTCCCTTCCCCTCCCCTCCTCTCCCTTCCCCTCCCCTCCTCTCCCTTCCCCTCCCTTCCCCTCCATTCCCCTCCCTTC
>  Partial new sequence: 
> CCTTCCCCTCCCCTCCCCTCCTTTCCCTTCCCCTCCCCTCCTCTCCCTTCCCCTCCCCTCCTCTCCCTTCCCCTCCCTTCCCCTCCATTCCCCTCCCTTC
> NOTE 3th duplicate, at fasta input record 30730: [ENSG00000182162,
> ENSG00000182162.2]. gene identifier 'ENSG00000182162' previously found
> at fasta input record 9310 which has these geneIds: [ENSG00000182162,
> ENSG00000182162.2].  Do the sequences match? true Partial old
> sequence:
> TTTATTTGTTTATTTATTTATTTTTTGAGACAGAGTTTCGCTCTTGTTGCCCAGGCTGGGGTGCAGCGGCATGATCTCGGCTCACTGCAACCTCCGCCTC
>  Partial new sequence: 
> TTTATTTGTTTATTTATTTATTTTTTGAGACAGAGTTTCGCTCTTGTTGCCCAGGCTGGGGTGCAGCGGCATGATCTCGGCTCACTGCAACCTCCGCCTC
> NOTE 4th duplicate, at fasta input record 30798: [ENSG00000205681,
> ENSG00000205681.1]. gene identifier 'ENSG00000205681' previously found
> at fasta input record 10007 which has these geneIds: [ENSG00000205681,
> ENSG00000205681.1].  Do the sequences match? true Partial old
> sequence:
> GCTATGGCGCTTGGCTACCTGAGTCTTTATTCTGCCTTCCAGGTGCTTGTTGGTTGGATAACTTTGGGTAGGTTCTTGTACCTCTTTGAGCTTCAAGACT
>  Partial new sequence: 
> GCTATGGCGCTTGGCTACCTGAGTCTTTATTCTGCCTTCCAGGTGCTTGTTGGTTGGATAACTTTGGGTAGGTTCTTGTACCTCTTTGAGCTTCAAGACT
> NOTE 5th duplicate, at fasta input record 30820: [ENSG00000124343,
> ENSG00000124343.2]. gene identifier 'ENSG00000124343' previously found
> at fasta input record 7623 which has these geneIds: [ENSG00000124343,
> ENSG00000124343.2].  Do the sequences match? true Partial old
> sequence:
> CTAATCTCCAGTGATCCGCTCACCTCAGCCACCCAAAGTGCTGGGATTACAGACGTGAGCCACCGGGCCCAGCCAGCAGGGCTGATTTCTTCTGATGCTG
>  Partial new sequence: 
> CTAATCTCCAGTGATCCGCTCACCTCAGCCACCCAAAGTGCTGGGATTACAGACGTGAGCCACCGGGCCCAGCCAGCAGGGCTGATTTCTTCTGATGCTG
> NOTE 6th duplicate, at fasta input record 30844: [ENSG00000124333,
> ENSG00000124333.4]. gene identifier 'ENSG00000124333' previously found
> at fasta input record 19603 which has these geneIds: [ENSG00000124333,
> ENSG00000124333.4].  Do the sequences match? true Partial old
> sequence:
> AGGAAAAATAGCTAATGCATGCTGGGCTTTAATACCTAGGTGATGGGTTGATAGGTGCAGCAAATTACCATGGCACACATTTACCTGTATAACAAACCTG
>  Partial new sequence: 
> AGGAAAAATAGCTAATGCATGCTGGGCTTTAATACCTAGGTGATGGGTTGATAGGTGCAGCAAATTACCATGGCACACATTTACCTGTATAACAAACCTG
> NOTE 7th duplicate, at fasta input record 30934: [ENSG00000198223,
> ENSG00000198223.3]. gene identifier 'ENSG00000198223' previously found
> at fasta input record 8798 which has these geneIds: [ENSG00000198223,
> ENSG00000198223.3].  Do the sequences match? true Partial old
> sequence:
> TCCTGCAGGAATGGGGAGGCTAAGACGGTAGAGGTGCAGCCTGGTCAGCCATCTTTCACCTTTGCTGATGTTGCTATCCAGGTGTTTTCCATTGCATGTG
>  Partial new sequence: 
> TCCTGCAGGAATGGGGAGGCTAAGACGGTAGAGGTGCAGCCTGGTCAGCCATCTTTCACCTTTGCTGATGTTGCTATCCAGGTGTTTTCCATTGCATGTG
> NOTE 8th duplicate, at fasta input record 30968: [ENSG00000205755,
> ENSG00000205755.1]. gene identifier 'ENSG00000205755' previously found
> at fasta input record 9187 which has these geneIds: [ENSG00000205755,
> ENSG00000205755.1].  Do the sequences match? true Partial old
> sequence:
> GACGGAGTCTTGCTCTTGTCGCCCAGGCTGGAGTGCCGTGGCACGATCTCAGCTCACTGCCAACTCCGCCTCCCGGGTTCACGCCATTCTCCTGCCTCAG
>  Partial new sequence: 
> GACGGAGTCTTGCTCTTGTCGCCCAGGCTGGAGTGCCGTGGCACGATCTCAGCTCACTGCCAACTCCGCCTCCCGGGTTCACGCCATTCTCCTGCCTCAG
> NOTE 9th duplicate, at fasta input record 31013: [ENSG00000196433,
> ENSG00000196433.2]. gene identifier 'ENSG00000196433' previously found
> at fasta input record 9741 which has these geneIds: [ENSG00000196433,
> ENSG00000196433.2].  Do the sequences match? true Partial old
> sequence:
> GCCAATATAGTGAAACCCTGTCTCTACGAAAAATACAAAAATTAGCCAGGTATGGTGGCAGGTGCTTGTAATCCCAGCTACTCGGGAGGCTGAGGCAGAA
>  Partial new sequence: 
> GCCAATATAGTGAAACCCTGTCTCTACGAAAAATACAAAAATTAGCCAGGTATGGTGGCAGGTGCTTGTAATCCCAGCTACTCGGGAGGCTGAGGCAGAA
> NOTE 10th duplicate, at fasta input record 31022: [ENSG00000168939,
> ENSG00000168939.2]. gene identifier 'ENSG00000168939' previously found
> at fasta input record 20849 which has these geneIds: [ENSG00000168939,
> ENSG00000168939.2].  Do the sequences match? true Partial old
> sequence:
> GAGACAGCCTGAGTCAGCCTGAGTTAAAATCCTAGATCTGCAAACTGCCAACTGTGTAACCTTGGACAAGTTACTTAAGGTCTTTGGACCTTGGTTTCTC
>  Partial new sequence: 
> GAGACAGCCTGAGTCAGCCTGAGTTAAAATCCTAGATCTGCAAACTGCCAACTGTGTAACCTTGGACAAGTTACTTAAGGTCTTTGGACCTTGGTTTCTC
> NOTE 11th duplicate, at fasta input record 31055: [ENSG00000169100,
> ENSG00000169100.3]. gene identifier 'ENSG00000169100' previously found
> at fasta input record 7624 which has these geneIds: [ENSG00000169100,
> ENSG00000169100.3].  Do the sequences match? true Partial old
> sequence:
> AGCCAGCCTCATCTGGAAATAGCAGCTCTGGTCCCGGCCTCGCTGAGGCACTGAAAACCAGCACCAGGGCCCCGTCCAGCCCGGCCTCGCTGAGGCTGGG
>  Partial new sequence: 
> AGCCAGCCTCATCTGGAAATAGCAGCTCTGGTCCCGGCCTCGCTGAGGCACTGAAAACCAGCACCAGGGCCCCGTCCAGCCCGGCCTCGCTGAGGCTGGG
> NOTE 12th duplicate, at fasta input record 31115: [ENSG00000185291,
> ENSG00000185291.3]. gene identifier 'ENSG00000185291' previously found
> at fasta input record 8154 which has these geneIds: [ENSG00000185291,
> ENSG00000185291.3].  Do the sequences match? true Partial old
> sequence:
> AGGCTGGTCTTGAACCCCTGACCTCAGGTGATGCACCCACCTTGGCCTCCCACAGAGCTGGGATTACAGGCGTGAGCCACTGGGCCCCGCCCTGTATTTG
>  Partial new sequence: 
> AGGCTGGTCTTGAACCCCTGACCTCAGGTGATGCACCCACCTTGGCCTCCCACAGAGCTGGGATTACAGGCGTGAGCCACTGGGCCCCGCCCTGTATTTG
> NOTE 13th duplicate, at fasta input record 31130: [ENSG00000124334,
> ENSG00000124334.6]. gene identifier 'ENSG00000124334' previously found
> at fasta input record 19934 which has these geneIds: [ENSG00000124334,
> ENSG00000124334.6].  Do the sequences match? true Partial old
> sequence:
> CTTTTCTCTTAAGCATGGGTGACATAGTACTCTTTCTTCATGTGTTTGATAAATTTGTTTTTATCTTAGAAATTGTGAATGGTATACATTGTTGAGACTG
>  Partial new sequence: 
> CTTTTCTCTTAAGCATGGGTGACATAGTACTCTTTCTTCATGTGTTTGATAAATTTGTTTTTATCTTAGAAATTGTGAATGGTATACATTGTTGAGACTG
> NOTE 14th duplicate, at fasta input record 31198: [ENSG00000169084,
> ENSG00000169084.3]. gene identifier 'ENSG00000169084' previously found
> at fasta input record 9163 which has these geneIds: [ENSG00000169084,
> ENSG00000169084.3].  Do the sequences match? true Partial old
> sequence:
> ATTACCTGAGGTCAGGAGTTTGAGACCAGCCAGGCCAACATGGTGAAATCCCATCTCTATTAAAAATACGAAAATTATTTGGGTGTGCTGGTGCATGCCT
>  Partial new sequence: 
> ATTACCTGAGGTCAGGAGTTTGAGACCAGCCAGGCCAACATGGTGAAATCCCATCTCTATTAAAAATACGAAAATTATTTGGGTGTGCTGGTGCATGCCT
> NOTE 15th duplicate, at fasta input record 31327: [ENSG00000182484,
> ENSG00000182484.4]. gene identifier 'ENSG00000182484' previously found
> at fasta input record 19614 which has these geneIds: [ENSG00000182484,
> ENSG00000182484.4].  Do the sequences match? true Partial old
> sequence:
> ATGCATTCAGAAAACTTTAGATCACGGTTGAGAAGAATCAAAAATATTAAATCAAATGCAGATACTCCTTGTTTAGGAGCAGTACACTCATTATTGTTAG
>  Partial new sequence: 
> ATGCATTCAGAAAACTTTAGATCACGGTTGAGAAGAATCAAAAATATTAAATCAAATGCAGATACTCCTTGTTTAGGAGCAGTACACTCATTATTGTTAG
> NOTE 16th duplicate, at fasta input record 31342: [ENSG00000002586,
> ENSG00000002586.7]. gene identifier 'ENSG00000002586' previously found
> at fasta input record 8086 which has these geneIds: [ENSG00000002586,
> ENSG00000002586.7].  Do the sequences match? true Partial old
> sequence:
> AGCCTGTACCCCAGAACTTAAAGTATAATAATAACAATAATAAAAAGACAGGTGTTATCTCAGAGCCCCTGACTCAGTCGGCTGGGCAGCAAGTATGCCA
>  Partial new sequence: 
> AGCCTGTACCCCAGAACTTAAAGTATAATAATAACAATAATAAAAAGACAGGTGTTATCTCAGAGCCCCTGACTCAGTCGGCTGGGCAGCAAGTATGCCA
> NOTE 17th duplicate, at fasta input record 31373: [ENSG00000182378,
> ENSG00000182378.3]. gene identifier 'ENSG00000182378' previously found
> at fasta input record 8467 which has these geneIds: [ENSG00000182378,
> ENSG00000182378.3].  Do the sequences match? true Partial old
> sequence:
> GACCACAGTCCACATCACACCAGGACACGGAGGAAGGGCCAGGCCTCATGACCACAGTCCAGATCACACCAGGACACAGAGGAAGGGCCGGGCCCTGTGA
>  Partial new sequence: 
> GACCACAGTCCACATCACACCAGGACACGGAGGAAGGGCCAGGCCTCATGACCACAGTCCAGATCACACCAGGACACAGAGGAAGGGCCGGGCCCTGTGA
> NOTE 18th duplicate, at fasta input record 31428: [ENSG00000169093,
> ENSG00000169093.5]. gene identifier 'ENSG00000169093' previously found
> at fasta input record 9036 which has these geneIds: [ENSG00000169093,
> ENSG00000169093.5].  Do the sequences match? true Partial old
> sequence:
> TATTCCTTGATTTCAGATGTCTGGGCTCCAGAGCTGTAATACAATTAAGTTTTGCTGTTTTAAGCCCCAGGGTTTTGAGTGACAGTTACCAGCAACCCCC
>  Partial new sequence: 
> TATTCCTTGATTTCAGATGTCTGGGCTCCAGAGCTGTAATACAATTAAGTTTTGCTGTTTTAAGCCCCAGGGTTTTGAGTGACAGTTACCAGCAACCCCC
> NOTE 19th duplicate, at fasta input record 31442: [ENSG00000167393,
> ENSG00000167393.7]. gene identifier 'ENSG00000167393' previously found
> at fasta input record 9170 which has these geneIds: [ENSG00000167393,
> ENSG00000167393.7].  Do the sequences match? true Partial old
> sequence:
> CCCAGCAAACTCTGCAACACCTCAGGCCCTGCCAGCCTTGGGGGCCCGACAGCACCTCTTTGTTCTCCCAGAGCAAAGCCTGCACGGAGTGGGCCCCCGG
>  Partial new sequence: 
> CCCAGCAAACTCTGCAACACCTCAGGCCCTGCCAGCCTTGGGGGCCCGACAGCACCTCTTTGTTCTCCCAGAGCAAAGCCTGCACGGAGTGGGCCCCCGG
> NOTE 20th duplicate, at fasta input record 31485: [ENSG00000178605,
> ENSG00000178605.4]. gene identifier 'ENSG00000178605' previously found
> at fasta input record 9699 which has these geneIds: [ENSG00000178605,
> ENSG00000178605.4].  Do the sequences match? true Partial old
> sequence:
> NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
>  Partial new sequence: 
> NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
> NOTE 21th duplicate, at fasta input record 31508: [ENSG00000169098,
> ENSG00000169098.5]. gene identifier 'ENSG00000169098' previously found
> at fasta input record 9900 which has these geneIds: [ENSG00000169098,
> ENSG00000169098.5].  Do the sequences match? true Partial old
> sequence:
> GCCGGGCACGGTGGCTCACGCCTGCAATGCCAGCACTTTAGGAGGCCGAGGTGGGCAGATCACCTGAGGTCAGGAGTTCGAGACCAGCCTGGCCAACATG
>  Partial new sequence: 
> GCCGGGCACGGTGGCTCACGCCTGCAATGCCAGCACTTTAGGAGGCCGAGGTGGGCAGATCACCTGAGGTCAGGAGTTCGAGACCAGCCTGGCCAACATG
> 
> 
> For those interested the results should still be available for a
> little while: 
> http://www.biomart.org/biomart/martresults?file=martquery_0523154530_544.txt.gz
> 
> Ideas?
> 
> Thanks,
> 
> Peter Andrews
> 
> 
> -- 
> --------------
> Peter Andrews
> Computational Genetics Lab
> Dartmouth Hitchcock Medical Center
> (603) 653-3598
-- 
======================================
Syed Haider.
EMBL-European Bioinformatics Institute
Wellcome Trust Genome Campus, Hinxton,
Cambridge CB10 1SD, UK.
======================================

Reply via email to