Hello all,
First, I would like to say thank you to the developers that are making Biomart accessible for us bench scientists!

My question is in regards to the behavior of the "Unique Only" option. In an effort to retrieve all 3' UTRs, and the accompanying 100 bp upstream on the transcripts I gave the following parameters...my database of choice was the Ensembl NCBI36 database:

$query->setDataset("hsapiens_gene_ensembl");
   $query->addFilter("upstream_flank", ["100"]);
   $query->addAttribute("ensembl_gene_id");
   $query->addAttribute("ensembl_transcript_id");
   $query->addAttribute("3utr");
   $query->addAttribute("external_gene_id");
   $query->addAttribute("external_gene_db");
   $query->addAttribute("chromosome_name");
   $query->addAttribute("start_position");
   $query->addAttribute("end_position");
   $query->addAttribute("biotype");
   $query->addAttribute("transcript_start");
   $query->addAttribute("transcript_end");
   $query->addAttribute("ensembl_exon_id");
   $query->addAttribute("exon_chrom_start");
   $query->addAttribute("exon_chrom_end");
   $query->addAttribute("strand");
   $query->addAttribute("rank");

This retrieves 61356 sequences, which is equal to the number of sequences retrieved when all 3'UTRs are called without the 100 bp flanking...bravo. Now, I was curious about what might happen if I fetch the "Unique rows only". When I flag "Unique Only" as true and I fetch all 3' UTRs with 100 bp upstream then I get only 8916 sequences.
OK, fair enough...
BUT when I I flag "Unique Only" as true and fetch all 3' UTRs WITHOUT the 100 bp flanking upstream then I get 14322 sequences. This does not make sense to me because I would expect to retrieve more unique sequences when I have the 100 bp upstream, which adds complexity to the sequence and should yield more unique hits.

I cannot figure out why I am getting the behavior described above.

Could someone point me in the direction of some good documentation on this and/or explain to me the behavior I am observing?

Thank You!
-Lee Brooks

Reply via email to