Dear Damian, thank you for your reply.
ok - we are improving the user warning and images for the forthcoming release :-) Downstream flank refers to the "downstream of the gene". As it doesn't really make sense to join the upstream and downstream flanks when just selecting flanks we disabling using them both together - it just returns the upstream flank as you experienced. Apologies for the confusion
for "flank-coding region" for both "gene" and "transcript" the image does show both upstream and downstream flanks as those of the gene/transcript, but for "flank" only the upstream sequence is highlighted on the image - that was the source of confusion in my case.
> <Dataset name = "rnorvegicus_gene_ensembl" interface = "default" > > <Attribute name = "gene_stable_id" /> > <Attribute name = "coding_gene_flank" /> > <Filter name = "upstream_flank" value = "1000"/> > <Filter name = "transcript_status" value = "KNOWN"/> > <Filter name = "ensembl_gene_id" value = "ENSRNOG00000006899"/> > </Dataset> this should give you your 1000bp upstream of the TSS - is it not doing this? or are you looking for something different? Let me know and will try and help
It does give 1kbp upstream, but I'm looking for the 1kbp up from TSS *plus* a stretch of sequence down from TSS to the translation start site (i.e. 5'UTR). I can do this with the following sample query: <Dataset name = "rnorvegicus_gene_ensembl" interface = "default" > <Attribute name = "gene_stable_id" /> <Attribute name = "5utr" /> <Attribute name = "5utr_start" /> <Attribute name = "5utr_end" /> <Attribute name = "transcript_chrom_strand" /> <Filter name = "upstream_flank" value = "1000"/> <Filter name = "transcript_status" value = "KNOWN"/> <Filter name = "ensembl_gene_id" value = "ENSRNOG00000014029"/> </Dataset> but I have a problem interpreting the results for the genes with multiple 5'UTRs defined (like the ENSRNOG00000014029 in the sample query above). I do not understand what should multiple 5'UTRs mean for a single gene. Based on query results, it appears that UTRs are linked to the gene, and not to the gene transcripts. Thus, multiple 5'UTRs shouldn't mean the UTRs of transcripts. Then what sequence do I get with the following query, issued for the multiple-5'UTR gene? <Dataset name = "rnorvegicus_gene_ensembl" interface = "default" > <Attribute name = "gene_stable_id" /> <Attribute name = "5utr" /> <Attribute name = "transcript_chrom_strand" /> <Filter name = "transcript_status" value = "KNOWN"/> <Filter name = "ensembl_gene_id" value = "ENSRNOG00000014029"/> </Dataset> I attempted aligning the sequence returned by this query to the "Unspliced (Gene)" sequence from the same gene, and there were 379bp of identities followed by some 122bp of non-identical sequence (full length of 5'UTR returned is 501bp). Hence the question on _what exactly_ is returned by the 5'utr-query? Thank you for your answer, -- Sincerely yours, Bogdan Tokovenko, PhD student at the Laboratory of Protein Biosynthesis, Department of Genetic Information Translation Mechanisms, Institute of Molecular Biology and Genetics, Kyiv, Ukraine http://bogdan.org.ua/
