On 3/9/07, John W. Krahn <[EMAIL PROTECTED]> wrote:
Chas Owens wrote: > On 3/8/07, Dharshana Eswaran <[EMAIL PROTECTED]> wrote: >> >> I need to extract few strings from one file and paste it to another file. > snip > > This doesn't seem like a good job for split. The split function is > good for parsing X separated records where X is either constant or > simple. What you have there a grammar. Specifically a subset of the > C grammar for #define. With grammars you want to use either a regex > or Parse::RecDescent depending on the complexity of the grammar. In > this case the grammar is simple enough that a regex does fine. I have > created a regex that parses your record into a name, base number, > operator, and modifying number. The last two values are optional. I > have used the x option on the regex to make it readable since it is so > large (anything bigger than 80 characters should probably use the x > option). You can learn more about regexes in perldoc perlre and > perldoc perlretut. > > #!/usr/bin/perl > > use strict; > use warnings; > > while (<DATA>) { > my ($name, $base, $op, $mod) = m{ > ^ # start of string > \s* # optional spaces > \#define # the start of the macro The C preprocessor allows whitespace between '#' and 'define'. \# \s* define # the start of the macro > \s+ # mandatory spaces > (\w+) # capture the name of the macro > \s* # optional spaces If the next token is a left parenthesis then the whitespace is not optional otherwise it would be a macro definition. Also, if the next token is a word character then the whitespace is not optional. > \( # the open paren > \s* # optional spaces > ( > \w+ | > \d+ | > 0x[a-fA-F0-9] That only matches a single hexadecimal digit, you probably want 0x[a-fA-F0-9]+ instead. > ) # capture a word, int, or hex > \s* # optional spaces > (?: > ( # capture the various int operators > [+-|^*/%] | [+-|^&*/%] | > << | > >> > ) What about: TOKEN & ~TOKEN Or: TOKEN * -TOKEN :-) > \s* # optional spaces > ( # capture a word, int, or hex > \w+ | > \d+ | > 0x[a-fA-F0-9]) 0x[a-fA-F0-9]+) > )? # but make the last two captures > optional > \) # the close paren > }x; > $op = $mod = '' unless defined $op; > print; > print "\tname is $name, base is $base, modified by $mod using $op\n" > if $name; > } John -- Perl isn't a toolbox, but a small machine shop where you can special-order certain sorts of tools at low cost and in short order. -- Larry Wall -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
Thank you for the suggestions. I tried both ways. 0x[a-fA-F0-9] That only matches a single hexadecimal digit, you probably want 0x[a-fA-F0-9]+ instead. When i tried Chas's idea, without this correction itself i could get the hex values more than a digit. But i did not understand the below lines mentioned by John. What about: TOKEN & ~TOKEN Or: TOKEN * -TOKEN :-) Can you please explain it? Thanks once again for the immediate response. Thanks and Regards, Dharshana