On Sat, 7 Jul 2007, Bogdan wrote:

Hi Damian,

I am just not sure what the best policy is - is it more
confusing when you ask for 800bp upstream + UTR for all genes to get a "no
UTR response" for the ones without a UTR or just silently get the
800bp. I think as usual the best solution will be some sort of middleground where
the output is 800bp but with a clear indication of "no UTR". We are going
to review all our sequence outputs soon with maybe some clearer markup of
coding vs non-coding etc so this may become part of that solution.
If anyone has any suggestions please forward them to us.

I'd say that silently getting exactly 800bp is fine - as with the
sequence length it carries also the information about the absence of
the UTR. But my view might be biased  towards my specific task 8=)

I also completely agree with what you say about "middleground
solution". But I can't imagine the way to add some "no utr" message to
FASTA formatted file without breaking the file format...

The only two options I came up with after a minute thought are these:

1.
- for upstream+utr return usual fasta record:
ENSX12
ATGC...

- for upstream+utr (but no utr defined) return *two* fasta records -
one for 800bp upstream sequence, and another for the utr message only:
ENSX12
ATGC...
ENSX12
No UTR is defined

2.
return a single fasta record per entity, but with a selectable
"[error] messages" column (which might be added as an attribute):
ENSRNOG000002345|123456|134567|No 5'UTR is defined for this gene
ATGC...(800bp of sequence)

In this second scheme, messages are always the last |-separated
column, and might be empty for the majority of returned sequences.


I would like to know if you already have some kind of a convention to
return both the sequence and the informative message in fasta format,
and what is that convention.

we don't have any convention yet but we will prob try and come up with one

Thanks for the suggestions
Damian



Reply via email to