Hi Damian,
I am just not sure what the best policy is - is it more confusing when you ask for 800bp upstream + UTR for all genes to get a "no UTR response" for the ones without a UTR or just silently get the 800bp. I think as usual the best solution will be some sort of middleground where the output is 800bp but with a clear indication of "no UTR". We are going to review all our sequence outputs soon with maybe some clearer markup of coding vs non-coding etc so this may become part of that solution. If anyone has any suggestions please forward them to us.
I'd say that silently getting exactly 800bp is fine - as with the sequence length it carries also the information about the absence of the UTR. But my view might be biased towards my specific task 8=) I also completely agree with what you say about "middleground solution". But I can't imagine the way to add some "no utr" message to FASTA formatted file without breaking the file format... The only two options I came up with after a minute thought are these: 1. - for upstream+utr return usual fasta record:
ENSX12
ATGC... - for upstream+utr (but no utr defined) return *two* fasta records - one for 800bp upstream sequence, and another for the utr message only:
ENSX12
ATGC...
ENSX12
No UTR is defined 2. return a single fasta record per entity, but with a selectable "[error] messages" column (which might be added as an attribute):
ENSRNOG000002345|123456|134567|No 5'UTR is defined for this gene
ATGC...(800bp of sequence) In this second scheme, messages are always the last |-separated column, and might be empty for the majority of returned sequences. I would like to know if you already have some kind of a convention to return both the sequence and the informative message in fasta format, and what is that convention. -- Sincerely yours, Bogdan Tokovenko, PhD student at the Laboratory of Protein Biosynthesis, Department of Genetic Information Translation Mechanisms, Institute of Molecular Biology and Genetics, Kyiv, Ukraine http://bogdan.org.ua/
