Re: [ruby.parslet] Parsing the NCBI Genetic Code Table

Melissa Whittington Tue, 09 Aug 2011 06:07:11 -0700

Stefan,

Ah! I missed one important mistake that I've easily made myself
before. You can't use 'match' to match multiple characters, the
regular expression can only match one character. I find that slightly
unintuitive and it gives no warning if you try to do this.


I tried this:
  rule(:content)         {str('  id ') >> match('\d').repeat >> textdata.repeat}
  rule(:no_value)        {textdata.repeat(1)}

Because it tries to match :content first, it will only match :no_value
if it didn't match :content. That matched all the lines with "id".

For me, learning parslet has been fairly trial and error too. And
google thinks 'parsley' is a much better word to search for than
'parslet', heh.

-mj

On Tue, Aug 9, 2011 at 4:57 AM, Stefan Rohlfing
<[email protected]> wrote:
> Melissa,
> Thanks for your help!
> However, after fixing the problems you pointed me to I got stuck again
> https://github.com/bytesource/CodonTableParser/blob/master/parser.rb
> and I am realizing that I am more or less relying on trial & error here. In
> other words, I am still lacking the knowledge of translating a document into
> its Backus Naur form with which I can then feed the parser (Parslet).
> As I have no background in computer science, I would be interested in any
> resources (printed or online) you have found valuable in laying the basis
> for building a parser. This question is for everyone, as I am always
> interested in different opinions.
> Stefan
>
> On Mon, Aug 8, 2011 at 19:49, Melissa Whittington
> <[email protected]> wrote:
>>
>> Whoops, I meant "The :file rule's repeat is what is describing multiple
>> lines."
>>
>> -mj
>>
>> On Mon, Aug 8, 2011 at 7:47 AM, Melissa Whittington
>> <[email protected]> wrote:
>> > Stefan,
>> >
>> > The reason you're getting that error on the last line is because there
>> > will be no newline at the end of the last line, so just switch it to
>> > 'newline.maybe'.
>> >
>> > Your :line rule also does not need the .repeat because there will only
>> > be one of either a :codon or a :comment and not more. The :line rule's
>> > repeat is what is describing multiple lines.
>> >
>> > Also, I don't know what "repeat(1)" by itself does, but you probably
>> > don't mean that?
>> >
>> > Don't forget any only matches one character. You should probably not
>> > use any, either. For your :content and :no_value rules, they should be
>> > matching everything on a line (sans a possible newline). You could use
>> > any.repeat to parse the rest of the line, but it will try to parse
>> > *anything* including newlines and on to the next lines which is not
>> > what you want.
>> >
>> > So, it'll probably be helpful to be a little more descriptive.
>> >
>> > Hope that helps you make a little more progress!
>> >
>> > -mj
>> >
>> > On Mon, Aug 8, 2011 at 12:21 AM, Stefan Rohlfing
>> > <[email protected]> wrote:
>> >> Hi,
>> >> I am trying to parse the NCBI genetic code table:
>> >>
>> >> https://github.com/bytesource/CodonTableParser/blob/master/data/codons.txt
>> >> to extract those lines of each block that contain either "name", "id",
>> >> "ncbieaa", or "sncbieaa".
>> >> As each line either contains the content I am interested in or text
>> >> that can
>> >> be discarded, I started by first parsing the document on a per-line
>> >> basis:
>> >> https://github.com/bytesource/CodonTableParser/blob/master/parser.rb
>> >> Unfortunately, parsing the file resulted in an error message that tells
>> >> me
>> >> Parslet failed to parse line 233, which is the very last line of the
>> >> file:
>> >> Expected at least 1 of LINE NEWLINE at line 1 char 1.
>> >> `- Expected at least 1 of LINE NEWLINE at line 1 char 1.
>> >>    `- Failed to match sequence (LINE NEWLINE) at line 233 char 1.
>> >>       `- Failed to match sequence (LF CR?) at line 233 char 1.
>> >>          `- Premature end of input at line 233 char 1.
>> >> However, apart from knowing where is problem is located, I have
>> >> difficulties
>> >> finding out where my code went wrong.
>> >> I already read Parslet's documentation without finding a solution, so
>> >> now I
>> >> hope someone on this list might help me with my problem.
>> >> On a site note, I am often not sure when to use 'repeat(1)' instead of
>> >> just
>> >> repeat. I know the latter repeats the rule zero or more times, but how
>> >> do I
>> >> decide when zero is enough? Is there a rule to follow?
>> >> Thanks again in advance!
>> >> Stefan
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >
>
>

Re: [ruby.parslet] Parsing the NCBI Genetic Code Table

Reply via email to