All thanks for the suggestions. A solution to the GeneBegin..GeneEnd problem has been worked out, per the Attachment, for those interested.
But for me the more important problem is making a FASTA repository, which is a subset of the gene files in a much larger Repository. This is desirable before & after using Usearch - http://www.drive5.com/usearch/intro.html to select out a minimally homologous gene set of a species. Elimination of RNA genes, cryptic viruses, SINE/LINE genes are among the undesirables. Specifically, is the command using ENTRET or relatives , to accept a list like 637008924 637008927 640691430 640691431 637008928 637008954 637008980 for extraction and repacking into a single smaller Repository? If not, could you recommend a software tool/suite for this type of job. MarvS On Tue, Feb 15, 2011 at 3:59 AM, Peter Rice <p...@ebi.ac.uk> wrote: > On 14/02/2011 23:35, Marvin Stodolsky wrote: >> >> This is elementary I’m sure, but I’ve been unable to work out the >> syntax from the documentation. >> More minor issue. >> >> When using infoseq to extract all the fasta Headers from a sequence >> Repository, the GeneBegin..GeneEnd (like 234466..234589) often fails to >> come as a uniform field/fields in a resultant spreadsheet. Is there a Fix >> for this? > > I don't see the genebegin and geneend in EMBOSS infoseq output. Are they > part of the sequence ID in the FASTA file? > > You can use a delimiter between items for infoseq using: > > -nocolumn > > on the command line. > > For import into a spreadsheet you can set the delimiter to be tab with: > > -nocolumn -delimiter "\t" > > on the command line. That should then import nicely into a spreadsheet. > > Hope that helps > > Peter Rice > EMBOSS Team > _______________________________________________ EMBOSS mailing list EMBOSS@lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/emboss