Dear Brooke, Thank you for sending me the fasta-subseq executable.
I tested the program. For future reference, it indeed assumes 1-based inclusive coordinates (so "fasta-subseq filename 1 10" returns the first ten bases in the sequence). If strand is '-', then it first takes the reverse complement of the sequence, and then finds the start and end of the subsequence. So "fasta-subseq filename 1 10 -" is in general not the reverse complement of "fasta-subseq filename 1 10". The headers are modified as follows. Without reverse complementing, the new header is >filename:start-end:___ so with three underscores at the end. With reverse complementing, the new header is >filename:start-end:__- so with two underscores and then a single dash. Thanks, -Michiel. --- On Fri, 3/16/12, Brooke Rhead <[email protected]> wrote: > From: Brooke Rhead <[email protected]> > Subject: Re: [Genome] fasta-subseq source code > To: "Michiel de Hoon" <[email protected]> > Cc: [email protected] > Date: Friday, March 16, 2012, 4:01 PM > Hi Michiel, > > We do not have the source code, either (and we are in the > process of > changing our programs to use faFrag instead of fasta-subseq > to avoid > problems should the binary be lost in the future), but the > usage > statement indicates that it uses 1-based coordinates: > > $ ./fasta-subseq -help > usage: seqfile lo hi [strand] (1 indexed) > > If you like, we can send you our binary so that you can test > what it > does with the Fasta headers. > > -- > Brooke Rhead > UCSC Genome Bioinformatics Group > > > On 3/15/12 8:26 PM, Michiel de Hoon wrote: > > Dear all, > > > > I am looking for the source code (or a binary) of the > fasta-subseq > > program that is used in blastz-run-ucsc to abridge > repeat regions. > > This previous message on the mailing list: > > > > https://lists.soe.ucsc.edu/pipermail/genome/2006-June/010902.html > > > > says that this program was compiled from PSU source > code. However, I > > couldn't find this program or its source code there. > Does anybody > > know where to find this program? If not, is its usage > described > > somewhere in detail? In particular I am wondering if > fasta-subseq > > uses 1-based coordinates or 0-based coordinates, and if > it modifies > > the header lines in the Fasta file in some way. > > > > Thanks, Michiel de Hoon RIKEN Omics Science Center > > _______________________________________________ Genome > maillist - > > [email protected] > > https://lists.soe.ucsc.edu/mailman/listinfo/genome > _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome
