> On 4/16/07, Oleg Kobchenko <[EMAIL PROTECTED]> wrote: > > Although ". or 0&". can serve as scanners, together with ;: , > > they won't be useful for something like a date format: > > > > ;:'2001-01-01 00:00:00'
--- Raul Miller <[EMAIL PROTECTED]> wrote: > Actually, dyadic ;: works rather nicely for > lexical analysis.
On 4/16/07, Oleg Kobchenko <[EMAIL PROTECTED]> wrote:
Can you define an (ad)verb that will accept a one-line pattern for ;: ? Because what can beat this? '(....)-(..)-(..) (..):(..):(..)' <rxscan input
And then, I wrote: . I would not classify that as lexical analysis. I would . classify that as ad-hoc pattern matching. But perhaps I should elaborate on this point. What's the distinction between lexical analysis and pattern matching? If I wanted to do lexical analysis in this context (and I'm not sure why I would), here's a ;: verb which would serve: lexi=:(0;(0 10#:0 100#:1121 1022 1220);a.e.'-: ');:] This breaks the input into a sequence of lexical tokens. If I wanted the result given by the above rxscan expression for the given input, here's a ;: verb which would serve: split=: (0;(0 10#:0 100#:1100 1003);a.e.'-: ');:] Of course, neither of the above is an exact work-alike for '(....)-(..)-(..) (..):(..):(..)' <rxscan input [ require 'regex' For example, the rxscan version can pull an iso date out of the middle of a line with other text in it, where my split verb would incorporate this other text in its result. Anyways, my point -- textual pattern matching using regular expressions is only loosely related to lexical analysis. See also: http://en.wikipedia.org/wiki/Lexical_analysis http://en.wikipedia.org/wiki/Regular_expression http://en.wikipedia.org/wiki/Pattern_matching P.S. Since not many people use ;: currently, here's some abbreviated documentation for this context: The left arg for ;: here has three boxes. The first box is 0 (meaning the result should be a boxed list of character lists). The second boxed argument defines the state machine, and the third boxed argument defines a class index for each character. In my state machines, the four digit numbers correspond to states in the state machine (with 0 being the starting state). The first two digits are the next state and current operation to use when encountering an ordinary character, and the final two digits are the next state and current operation to use when encountering delimiter characters. The array a.e.'-: ' distinguishes between ordinary characters (0s in that array) and delimiter characters (1s in that array). Opcodes are: 0 -- continue building current word or ignoring current non-word 1 -- ignore current (non-word or word) and start new word 2 -- end current word and start new word 3 -- end current word and start non-word -- Raul ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm
