Regex help

2012-04-20 Thread Simon Wistow
My Regex-fu has failed me ... I don't do much heavy duty data munging 
any more and apparently the skills of The Old Ways[tm] have forsaken me. 
Or at least - I'm now lazy. After spending 5 minutes trying to figure it 
out I just mailed the list.

Plus my copy of Mr Friedl's book is currently packed up in a box 
awaiting lugging to my new place.

Anyway, the problem I have is that I'm trying to parse a line of 
DCPU16[*] assembler. It's possible the issue lies in trying to parse 
using regexps but that's a debate for another day. Anyway this regex

my ($label, $op, $a, $b) = $line =~ m!
^
(?::(\w+)  \s+)? # optional label
([A-Za-z]+)\s+   # opcode
([^,\s]+) (?:, \s+   # operand
([^,\s]+))?\s*   # optional second opcode
$
!x;


currently parses lines such as


  :label SET A, 2
 JSR label 
 SET [0x1000+I], [PC]

just fine. However I'm trying to add a new opcode DAT which can take any 
number of operands

   DAT 0x170, "Hello ", 0x2e1 (, )

and it fails there. 

Running this

my ($label, $op, @operands) = $line =~ m!
^
(?::(\w+)  \s+)? # optional label
([A-Za-z]+)\s+   # opcode
([^,\s]+) (?:, \s+   # operand
([^,\s]+))*\s*   # optional second opcode
$
!x;

on

FOO A, B, C

results in @operands being ('A', 'C');

Sp, rather than attempting to actually work this out myself, I'm asking 
the lazyweb.

Cheers!

Simon

[*] Notch (Of Minecraft fame)'s virtual CPU for his new game. I've been 
noodling with a Assembler/Disassembler/Emulator







Re: Regex help

2012-04-20 Thread Andy Armstrong
On 20 Apr 2012, at 16:29, Simon Wistow wrote:
>^
>(?::(\w+)  \s+)? # optional label
>([A-Za-z]+)\s+   # opcode
>([^,\s]+) (?:, \s+   # operand
>([^,\s]+))*\s*   # optional second opcode

You don't get a distinct capture for each application of the pattern - so the 
third operand overwrites the second. Maybe put the * inside the parens and then 
parse the list of operands in a second step?

-- 
Andy Armstrong, Hexten






Re: Regex help

2012-04-20 Thread Dagfinn Ilmari Mannsåker
Simon Wistow  writes:

> my ($label, $op, @operands) = $line =~ m!
> ^
> (?::(\w+)  \s+)? # optional label
> ([A-Za-z]+)\s+   # opcode
> ([^,\s]+) (?:, \s+   # operand
> ([^,\s]+))*\s*   # optional second opcode
> $
> !x;

You could use (?{ push @operands, $^N}) after the operand capture
groups, rather than relying on the returned captures.


-- 
ilmari
"A disappointingly low fraction of the human race is,
 at any given time, on fire." - Stig Sandbeck Mathisen


Re: Regex help

2012-04-20 Thread Dave Hodgkinson
Or use a proper parser?

On 20 Apr 2012, at 16:41, Andy Armstrong wrote:

> On 20 Apr 2012, at 16:29, Simon Wistow wrote:
>>   ^
>>   (?::(\w+)  \s+)? # optional label
>>   ([A-Za-z]+)\s+   # opcode
>>   ([^,\s]+) (?:, \s+   # operand
>>   ([^,\s]+))*\s*   # optional second opcode
> 
> You don't get a distinct capture for each application of the pattern - so the 
> third operand overwrites the second. Maybe put the * inside the parens and 
> then parse the list of operands in a second step?
> 
> -- 
> Andy Armstrong, Hexten
> 
> 
> 
> 




Re: Regex help

2012-04-20 Thread Anthony Lucas

How about a parser using Regexp? Parser::MGC?

You can implement a proper parser, or just a few staged regexp, or anything 
in-between.


-Original Message-
From: Dave Hodgkinson 
Sender: london.pm-boun...@london.pm.org
Date: Fri, 20 Apr 2012 17:15:29 
To: London.pm Perl M\[ou\]ngers
Reply-To: "London.pm Perl M\[ou\]ngers" 
Subject: Re: Regex help

Or use a proper parser?

On 20 Apr 2012, at 16:41, Andy Armstrong wrote:

> On 20 Apr 2012, at 16:29, Simon Wistow wrote:
>>   ^
>>   (?::(\w+)  \s+)? # optional label
>>   ([A-Za-z]+)\s+   # opcode
>>   ([^,\s]+) (?:, \s+   # operand
>>   ([^,\s]+))*\s*   # optional second opcode
> 
> You don't get a distinct capture for each application of the pattern - so the 
> third operand overwrites the second. Maybe put the * inside the parens and 
> then parse the list of operands in a second step?
> 
> -- 
> Andy Armstrong, Hexten
> 
> 
> 
> 





Re: Regex help

2012-04-20 Thread Randal L. Schwartz
> "Dave" == Dave Hodgkinson  writes:

Dave> Or use a proper parser?

What's the state of "Perl6 grammars in Perl5" at the moment?

-- 
Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
 http://www.stonehenge.com/merlyn/>
Smalltalk/Perl/Unix consulting, Technical writing, Comedy, etc. etc.
See http://methodsandmessages.posterous.com/ for Smalltalk discussion


Re: Regex help

2012-04-20 Thread Simon Wistow
On Fri, Apr 20, 2012 at 08:37:42PM +, Anthony Lucas said:
> 
> How about a parser using Regexp? Parser::MGC?
> 
> You can implement a proper parser, or just a few staged regexp, or anything 
> in-between.

I ended up just doing this

my ($label, $op, $operands) = $line =~ m!
^
(?::(\w+)  \s+)? # optional label
([A-Za-z]+)\s+   # opcode
(.+)   \s*   # operands
$
!x;
# TODO this won't cope with commas inside quotes 
# e.g DAT 0x20, "hello, goodbye", 0x10
my @operands = split /\s*,\s*/, ($operands || "");

at some point I should properly parse the assembler file using a grammar 
but since the spec is still in flux I'll go with my bodged parser for 
now.

If any of you care you can see the code here

http://search.cpan.org/~simonw/CPU-Emulator-DCPU16-0.3/
https://github.com/simonwistow/CPU-Emulator-DCPU16

Simon




Re: Regex help

2012-04-20 Thread Uri Guttman

On 04/20/2012 11:29 AM, Simon Wistow wrote:



 just fine. However I'm trying to add a new opcode DAT which can take any
 number of operands

 DAT 0x170, "Hello ", 0x2e1 (, )

 and it fails there.

 Running this

 my ($label, $op, @operands) = $line =~ m!
  ^
  (?::(\w+)  \s+)? # optional label
  ([A-Za-z]+)\s+   # opcode
  ([^,\s]+) (?:, \s+   # operand
  ([^,\s]+))*\s*   # optional second opcode
  $
 !x;

 on

  FOO A, B, C

 results in @operands being ('A', 'C');



i would take a simpler approach. just grab the whole optional string
after the opcode (all the operands) and then split that on comma (and
optional trailing whitespace). why try to do it all in one regex when
that is simpler and should work fine. so your last regex component would
be something like \s*(.+)? or \s*(.*)

uri








Re: Regex help

2012-04-20 Thread Uri Guttman

On 04/20/2012 05:23 PM, Simon Wistow wrote:

On Fri, Apr 20, 2012 at 08:37:42PM +, Anthony Lucas said:


How about a parser using Regexp? Parser::MGC?

You can implement a proper parser, or just a few staged regexp, or anything 
in-between.


I ended up just doing this

 my ($label, $op, $operands) = $line =~ m!
 ^
 (?::(\w+)  \s+)? # optional label
 ([A-Za-z]+)\s+   # opcode
 (.+)   \s*   # operands
 $
 !x;
 # TODO this won't cope with commas inside quotes
 # e.g DAT 0x20, "hello, goodbye", 0x10
 my @operands = split /\s*,\s*/, ($operands || "");



and that is exactly what i suggested in another post (haven't seen it 
hit the list yet). :)


you can use text::balanced to handle quoted strings.

uri


Re: Regex help

2012-04-21 Thread Nicholas Clark
On Fri, Apr 20, 2012 at 10:23:36PM +0100, Simon Wistow wrote:

> at some point I should properly parse the assembler file using a grammar 
> but since the spec is still in flux I'll go with my bodged parser for 

So there's still a chance it will converge onto the VAX instruction set?

Nicholas Clark


Re: Regex help

2012-04-21 Thread Torsten Knorr
($label, $opcode) = m/^\s*
(?::(\w+)\s+)?   # label
([A-Za-z]+)\s+  # opcode
(?:([^,\s]+)(?{push(@operands, $^N)}),?\s*)* # operands
$/x;

Torsten




Re: Regex help

2012-04-21 Thread Dave Hodgkinson

On 21 Apr 2012, at 19:13, Torsten Knorr wrote:

> ($label, $opcode) = m/^\s*
>(?::(\w+)\s+)?   # label
>([A-Za-z]+)\s+  # opcode
>(?:([^,\s]+)(?{push(@operands, $^N)}),?\s*)* # operands
>$/x;

~~~NO CARRIER

Your modem cable is loose.



Re: Regex help

2012-04-23 Thread David Cantrell
On Fri, Apr 20, 2012 at 10:23:36PM +0100, Simon Wistow wrote:

> # TODO this won't cope with commas inside quotes 
> # e.g DAT 0x20, "hello, goodbye", 0x10
> my @operands = split /\s*,\s*/, ($operands || "");

Text::CSV_XS is your special friend, if you can rely on the operand list
being reasonably well-formed.

Alternatively you might want to take a look at CPU::Z80::Assembler.
Paulo Custodio ripped out my crappy regex-based parser and replaced it
with one that doesn't suck, so you may be able to re-use bits of that.
The DEFM etc pseudo-instructions take arbitrarily long lists of args.

-- 
David Cantrell | top google result for "topless karaoke murders"

 If you can't imagine how I do something, it's
 because I have a better imagination than you