> On 4/16/07, Oleg Kobchenko <[EMAIL PROTECTED]> wrote:
> > Although ". or 0&". can serve as scanners, together with ;: ,
> > they won't be useful for something like a date format:
> >
> >   ;:'2001-01-01 00:00:00'

--- Raul Miller <[EMAIL PROTECTED]> wrote:
> Actually, dyadic ;: works rather nicely for
> lexical analysis.

On 4/16/07, Oleg Kobchenko <[EMAIL PROTECTED]> wrote:
Can you define an (ad)verb that will accept a one-line
pattern for ;: ? Because what can beat this?

 '(....)-(..)-(..) (..):(..):(..)' <rxscan input

And then, I wrote:
. I would not classify that as lexical analysis.  I would
. classify that as ad-hoc pattern matching.

But perhaps I should elaborate on this point.  What's the
distinction between lexical analysis and pattern matching?

If I wanted to do lexical analysis in this context (and
I'm not sure why I would), here's a ;: verb which would
serve:

  lexi=:(0;(0 10#:0 100#:1121 1022 1220);a.e.'-: ');:]

This breaks the input into a sequence of lexical tokens.

If I wanted the result given by the above rxscan expression
for the given input, here's a ;: verb which would serve:

  split=: (0;(0 10#:0 100#:1100 1003);a.e.'-: ');:]

Of course, neither of the above is an exact work-alike for
  '(....)-(..)-(..) (..):(..):(..)' <rxscan input [ require 'regex'

For example, the rxscan version can pull an iso date out
of the middle of a line with other text in it, where my split
verb would incorporate this other text in its result.

Anyways, my point -- textual pattern matching using
regular expressions is only loosely related to lexical
analysis.

See also:
http://en.wikipedia.org/wiki/Lexical_analysis
http://en.wikipedia.org/wiki/Regular_expression
http://en.wikipedia.org/wiki/Pattern_matching


P.S. Since not many people use ;: currently, here's some
abbreviated documentation for this context:

The left arg for ;: here has three boxes.  The first
box is 0 (meaning the result should be a boxed list
of character lists).  The second boxed argument defines the
state machine, and the third boxed argument defines a
class index for each character.

In my state machines, the four digit numbers correspond to
states in the state machine (with 0 being the starting
state).  The first two digits are the next state and
current operation to use when encountering an ordinary
character, and the final two digits are the next state
and current operation to use when encountering
delimiter characters.

The array a.e.'-: ' distinguishes between ordinary characters
(0s in that array) and delimiter characters (1s in that array).

Opcodes are:
  0 -- continue building current word or ignoring current non-word
  1 -- ignore current (non-word or word) and start new word
  2 -- end current word and start new word
  3 -- end current word and start non-word

--
Raul
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to