[ruby.parslet] Parsing the NCBI Genetic Code Table

Stefan Rohlfing Sun, 07 Aug 2011 21:22:58 -0700

Hi,

I am trying to parse the NCBI genetic code
table<ftp://ftp.ncbi.nih.gov/entrez/misc/data/gc.prt>
:


https://github.com/bytesource/CodonTableParser/blob/master/data/codons.txt

to extract those lines of each block that contain either "name", "id",
"ncbieaa", or "sncbieaa".

As each line either contains the content I am interested in or text that can
be discarded, I started by first parsing the document on a per-line basis:

https://github.com/bytesource/CodonTableParser/blob/master/parser.rb

Unfortunately, parsing the file resulted in an error message that tells me
Parslet failed to parse line 233, which is the very last line of the file:

Expected at least 1 of LINE NEWLINE at line 1 char 1.
`- Expected at least 1 of LINE NEWLINE at line 1 char 1.
   `- Failed to match sequence (LINE NEWLINE) at line 233 char 1.
      `- Failed to match sequence (LF CR?) at line 233 char 1.
         `- Premature end of input at line 233 char 1.

However, apart from knowing where is problem is located, I have difficulties
finding out where my code went wrong.

I already read Parslet's documentation without finding a solution, so now I
hope someone on this list might help me with my problem.

On a site note, I am often not sure when to use 'repeat(1)' instead of just
repeat. I know the latter repeats the rule zero or more times, but how do I
decide when zero is enough? Is there a rule to follow?

Thanks again in advance!

Stefan

[ruby.parslet] Parsing the NCBI Genetic Code Table

Reply via email to