No probs Lionel,

We will get back to you as soon as your results are ready.

Cheers
Syed




Lionel Brooks 3rd wrote:
Hi Syed,

That would be fantastic! We need the sequences because we will be building a tiling array which should contain all known UTRs.

Our biomart shopping list is:

1. all 3'UTRs plus the 100 bp flanking upstream on the transcript level (exonic, not intronic)
2. all 3'UTRs (without any flanking sequence)
3. all 5' UTRs plus the 100 bp flanking downstream on the transcript.
4. all 5' UTRs (without any flanking sequence)

Preferably these would be in separate files with the sequences indexed by gene ID, transcript ID and genomic coordinates.

We would be especially grateful if redundancy in the sequence set could be reduced with the 'Unique only' option...as we expect a good deal of redundancy across splicoforms but we certainly understand that this is a fairly tall order.

Thank you for helping us out Syed! Your help has already been indispensable.

Sincerely,
Lee




Syed Haider wrote:

Hi Lionel,

If all you need is 3utr or 5utr sequences or say its a set of queries you are interested in, with the help of Ensembl team, we can execute them locally and send you the *correct* sequences. With Ensembl v52 release, this will be rectified (Ensembl team to confirm).

Cheers
Syed



Lionel Brooks 3rd wrote:
Wow...thank you for a swift response Syed!
This is important information for us indeed!!

Syed Haider wrote:
Hi Lionel,

Before we dive into investigation, I would like to flag that 5utr and 3utr sequences in current release of Ensembl are incorrect in mart due to a configuration bug in Ensembl Configs which are managed by Ensembl Mart team. cc'ing Ensembl experts here on thisn email who would be able to help you on this.

Cheers
Syed



Lionel Brooks 3rd wrote:
Hello all,
First, I would like to say thank you to the developers that are making Biomart accessible for us bench scientists!

My question is in regards to the behavior of the "Unique Only" option. In an effort to retrieve all 3' UTRs, and the accompanying 100 bp upstream on the transcripts I gave the following parameters...my database of choice was the Ensembl NCBI36 database:

$query->setDataset("hsapiens_gene_ensembl");
   $query->addFilter("upstream_flank", ["100"]);
   $query->addAttribute("ensembl_gene_id");
   $query->addAttribute("ensembl_transcript_id");
   $query->addAttribute("3utr");
   $query->addAttribute("external_gene_id");
   $query->addAttribute("external_gene_db");
   $query->addAttribute("chromosome_name");
   $query->addAttribute("start_position");
   $query->addAttribute("end_position");
   $query->addAttribute("biotype");
   $query->addAttribute("transcript_start");
   $query->addAttribute("transcript_end");
   $query->addAttribute("ensembl_exon_id");
   $query->addAttribute("exon_chrom_start");
   $query->addAttribute("exon_chrom_end");
   $query->addAttribute("strand");
   $query->addAttribute("rank");

This retrieves 61356 sequences, which is equal to the number of sequences retrieved when all 3'UTRs are called without the 100 bp flanking...bravo. Now, I was curious about what might happen if I fetch the "Unique rows only". When I flag "Unique Only" as true and I fetch all 3' UTRs with 100 bp upstream then I get only 8916 sequences.
OK, fair enough...
BUT when I I flag "Unique Only" as true and fetch all 3' UTRs WITHOUT the 100 bp flanking upstream then I get 14322 sequences. This does not make sense to me because I would expect to retrieve more unique sequences when I have the 100 bp upstream, which adds complexity to the sequence and should yield more unique hits.

I cannot figure out why I am getting the behavior described above.

Could someone point me in the direction of some good documentation on this and/or explain to me the behavior I am observing?

Thank You!
-Lee Brooks

Reply via email to