Re: [Haskell-cafe] Regular Expressions, was Re: Interest in helping w/ Haskell standard

2005-10-18 Thread Martin Sulzmann

Semantic subtyping issue:

Assume we have a function f of type f :: Reg (r*) - ...
to which we pass a value x of type Reg (r,r*).
We have that (r,r*) is a semantic subtype of r*,
hence, the code f x is accepted in languages such as XDuce/CDuce.

I'm just saying that the fact regexp can be represented via GADTs
doesn't imply that we get the same expressive power of
languages such as XDuce/CDuce.

Martin


Hongwei Xi writes:
  Hi Martin:
  
  Thanks for the message.
  
  No, I am not on the list.
  
  Back then, we did something similar but for a different
  purpose. The idea is like this:
  
  We wanted to represent an XML document as a pair:
  
  XML (r) * Reg (r)
  
  where r stands for some regexp and Reg(r) is a
  singleton type. When performing search over XML(r),
  we can use Reg(r) as some kind of pilot value
  (which is a lot smaller than XML(r)) to guide the
  search. For instance, it is often easy to tell from
  Reg(r) that an item to be found is not in XML(r) and
  thus we can skip searching XML(r) entirely.
  This is sort of like indexing scheme in database.
  
  BTW, I am not sure what kind of 'semantic typing'
  you have in mind. An example?
  
  --Hongwei
  
  Computer Science Department
  Boston University
  111 Cummington Street
  Boston, MA 02215
  
  Email: [EMAIL PROTECTED]
  Url: http://www.cs.bu.edu/~hwxi
  Tel: +1 617 358 2511 (office)
  Fax: +1 617 353 6457 (department)
  
  
  
  On Mon, 17 Oct 2005, Martin Sulzmann wrote:
  
  
  Very interesting Conor. Do you know Xi et al's APLAS'03 paper?
  (Hongwei, I'm not sure whether you're on this list).
  Xi et al. use GRDTs (aka GADTs aka first-class phantom types)
  to represent XML documents. There're may be some connections
  between what you're doing and Xi et al's work.
  
  I believe that there's an awful lot you can  do with GADTs 
  (in the context of regular expressions). But there're two things 
  you can't accomplish:
  semantic subtyping and regular expression pattern matching
  a la XDuce/CDuce. Unless somebody out there proves me wrong.
  
  Martin
  
  
  
  Hi folks,
  
  Inspired by Ralf's post, I thought I'd just GADTize a dependently typed=20
  program I wrote in 2001.
  
  Wolfgang Jeltsch wrote:
  
   Now lets consider using an algebraic datatype for regexps:
  
data RegExp
=3D Empty | Single Char | RegExp :+: RegExp | RegExp :|: 
   RegExpt | Ite=
  r RegExp
  
  Manipulating regular expressions now becomes easy and safe =E2=80=93 you=
   are just not=20
  able to create syntactically incorrect regular expressions since durin=
  g=20
  runtime you don't deal with syntax at all.
   =20
  
  
  A fancier variation on the same theme...
  
data RegExp :: * - * - * where
  Zero   :: RegExp tok Empty
  One:: RegExp tok ()
  Check  :: (tok - Bool) - RegExp tok tok
  Plus   :: RegExp tok a - RegExp tok b - RegExp tok (Either a b)
  Mult   :: RegExp tok a - RegExp tok b - RegExp tok (a, b)
  Star   :: RegExp tok a - RegExp tok [a]
  
data Empty
  
  The intuition is that a RegExp tok output is a regular expression=20
  explaining how to parse a list of tok as an output. Here, Zero is the=20
  regexp which does not accept anything, One accepts just the empty=20
  string, Plus is choice and Mult is sequential composition; Check lets=20
  you decide whether you like a single token.
  
  Regular expressions may be seen as an extended language of polynomials=20
  with tokens for variables; this parser works by repeated application of=20
  the remainder theorem.
  
parse :: RegExp tok x - [tok] - Maybe x
parse r []   =3D empty r
parse r (t : ts) =3D case divide t r of
  Div q f - return f `ap` parse q ts
  
  Example
  
  *RegExp parse (Star (Mult (Star (Check (=3D=3D 'a'))) (Star (Check 
  (=3D=3D=
  =20
  'b') abaabaaa
  Just [(a,b),(aa,b),(aaa,)]
  
  The 'remainder' explains if a regular expression accepts the empty=20
  string, and if so, how. The Star case is a convenient=20
  underapproximation, ruling out repeated empty values.
  =20
empty :: RegExp tok a - Maybe a
empty Zero =3D mzero
empty One  =3D return ()
empty (Check _)=3D mzero
empty (Plus r1 r2) =3D (return Left  `ap` empty r1) `mplus`
 (return Right `ap` empty r2)
empty (Mult r1 r2) =3D return (,) `ap` empty r1 `ap` empty r2
empty (Star _) =3D return []
  
  The 'quotient' explains how to parse the tail of the list, and how to=20
  recover the meaning of the whole list from the meaning of the tail.
  
data Division tok x =3D forall y. Div (RegExp tok y) (y - x)
  
  Here's how it's done. I didn't expect to need scoped type variables, but=20
  I did...
  
divide :: tok - RegExp tok x - Division tok x
divide t Zero  =3D Div Zero naughtE
divide t One   =3D Div Zero naughtE
divide t (Check p) | p t   =3D Div One (const t)
   | 

Re: [Haskell-cafe] Regular Expressions, was Re: Interest in helping w/ Haskell standard

2005-10-18 Thread Hongwei Xi
On Tue, 18 Oct 2005, Martin Sulzmann wrote:


Semantic subtyping issue:

Assume we have a function f of type f :: Reg (r*) - ...
to which we pass a value x of type Reg (r,r*).
We have that (r,r*) is a semantic subtype of r*,
hence, the code f x is accepted in languages such as XDuce/CDuce.

I see. If I understand you correctly, this can be done like this:

1. Introducing a type constructor sub (T1, T2) to mean that
   T1 is a subtype of T2

2. Then introducing constructors to represent semantic subtyping
   rules (These constructors are justified semantically)

3. Then introducing the following function

   coerce: forall T1, T2. (sub (T1, T2), T1) - T2

   and prove that coerce can be erased at run-time.

--

In your example,

we have a proof of the type sub (Reg (r, r*), Reg (r*)); let us call the
proof pf; for x of the type Reg (r, r*), we can do f (coerce (pf, x)).

I'm just saying that the fact regexp can be represented via GADTs
doesn't imply that we get the same expressive power of languages
such as XDuce/CDuce.

My view is that XDuce/CDuce provides an automatic approach to constructing
the part: coerce (pf, ...). But in terms of type theory, I am yet to see
why it is more expressive.

--

We are currently debating whether the above approach to semantic subtyping
should be added into ATS. The trouble is that there seems no good way of
verifying semantic subtyping rules. Maybe we should just blame the user
if things go wrong (including core dump) :)

Cheers,

--Hongwei

Computer Science Department
Boston University
111 Cummington Street
Boston, MA 02215

Email: [EMAIL PROTECTED]
Url: http://www.cs.bu.edu/~hwxi
Tel: +1 617 358 2511 (office)
Fax: +1 617 353 6457 (department)



___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Regular Expressions, was Re: Interest in helping w/ Haskell standard

2005-10-17 Thread Martin Sulzmann

Very interesting Conor. Do you know Xi et al's APLAS'03 paper?
(Hongwei, I'm not sure whether you're on this list).
Xi et al. use GRDTs (aka GADTs aka first-class phantom types)
to represent XML documents. There're may be some connections
between what you're doing and Xi et al's work.

I believe that there's an awful lot you can  do with GADTs 
(in the context of regular expressions). But there're two things 
you can't accomplish:
semantic subtyping and regular expression pattern matching
a la XDuce/CDuce. Unless somebody out there proves me wrong.

Martin



Hi folks,

Inspired by Ralf's post, I thought I'd just GADTize a dependently typed=20
program I wrote in 2001.

Wolfgang Jeltsch wrote:

 Now lets consider using an algebraic datatype for regexps:

   data RegExp
   =3D Empty | Single Char | RegExp :+: RegExp | RegExp :|: 
 RegExpt | Ite=
r RegExp

Manipulating regular expressions now becomes easy and safe =E2=80=93 you=
 are just not=20
able to create syntactically incorrect regular expressions since durin=
g=20
runtime you don't deal with syntax at all.
 =20


A fancier variation on the same theme...

  data RegExp :: * - * - * where
Zero   :: RegExp tok Empty
One:: RegExp tok ()
Check  :: (tok - Bool) - RegExp tok tok
Plus   :: RegExp tok a - RegExp tok b - RegExp tok (Either a b)
Mult   :: RegExp tok a - RegExp tok b - RegExp tok (a, b)
Star   :: RegExp tok a - RegExp tok [a]

  data Empty

The intuition is that a RegExp tok output is a regular expression=20
explaining how to parse a list of tok as an output. Here, Zero is the=20
regexp which does not accept anything, One accepts just the empty=20
string, Plus is choice and Mult is sequential composition; Check lets=20
you decide whether you like a single token.

Regular expressions may be seen as an extended language of polynomials=20
with tokens for variables; this parser works by repeated application of=20
the remainder theorem.

  parse :: RegExp tok x - [tok] - Maybe x
  parse r []   =3D empty r
  parse r (t : ts) =3D case divide t r of
Div q f - return f `ap` parse q ts

Example

*RegExp parse (Star (Mult (Star (Check (=3D=3D 'a'))) (Star (Check (=3D=3D=
=20
'b') abaabaaa
Just [(a,b),(aa,b),(aaa,)]

The 'remainder' explains if a regular expression accepts the empty=20
string, and if so, how. The Star case is a convenient=20
underapproximation, ruling out repeated empty values.
=20
  empty :: RegExp tok a - Maybe a
  empty Zero =3D mzero
  empty One  =3D return ()
  empty (Check _)=3D mzero
  empty (Plus r1 r2) =3D (return Left  `ap` empty r1) `mplus`
   (return Right `ap` empty r2)
  empty (Mult r1 r2) =3D return (,) `ap` empty r1 `ap` empty r2
  empty (Star _) =3D return []

The 'quotient' explains how to parse the tail of the list, and how to=20
recover the meaning of the whole list from the meaning of the tail.

  data Division tok x =3D forall y. Div (RegExp tok y) (y - x)

Here's how it's done. I didn't expect to need scoped type variables, but=20
I did...

  divide :: tok - RegExp tok x - Division tok x
  divide t Zero  =3D Div Zero naughtE
  divide t One   =3D Div Zero naughtE
  divide t (Check p) | p t   =3D Div One (const t)
 | otherwise =3D Div Zero naughtE
  divide t (Plus (r1 :: RegExp tok a) (r2 :: RegExp tok b)) =3D
case (divide t r1, divide t r2) of
  (Div (q1 :: RegExp tok a') (f1 :: a' - a),
Div (q2 :: RegExp tok b') (f2 :: b' - b)) -
Div (Plus q1 q2) (f1 +++ f2)
  divide t (Mult r1 r2) =3D case (empty r1, divide t r1, divide t r2) of
(Nothing, Div q1 f1, _) - Div (Mult q1 r2) (f1 *** id)
(Just x1, Div q1 f1, Div q2 f2) -
  Div (Plus (Mult q1 r2) q2) (either (f1 *** id) (((,) x1) . f2))
  divide t (Star r) =3D case (divide t r) of
Div q f - Div (Mult q (Star r)) (\ (y, xs) - (f y : xs))

Bureaucracy.

  (***) :: (a - b) - (c - d) - (a, c) - (b, d)
  (f *** g) (a, c) =3D (f a, g c)

  (+++) :: (a - b) - (c - d) - Either a c - Either b d
  (f +++ g) (Left a)  =3D Left  (f a)
  (f +++ g) (Right c) =3D Right (g c)

  naughtE :: Empty - x
  naughtE =3D undefined

It's not the most efficient parser in the world (doing some algebraic=20
simplification on the fly wouldn't hurt), but it shows the sort of stuff=20
you can do.

Have fun

Conor
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Regular Expressions, was Re: Interest in helping w/ Haskell standard

2005-10-15 Thread Conor McBride

Hi folks,

Inspired by Ralf's post, I thought I'd just GADTize a dependently typed 
program I wrote in 2001.


Wolfgang Jeltsch wrote:


Now lets consider using an algebraic datatype for regexps:

data RegExp
= Empty | Single Char | RegExp :+: RegExp | RegExp :|: RegExpt 
| Iter RegExp

Manipulating regular expressions now becomes easy and safe – you are just not 
able to create syntactically incorrect regular expressions since during 
runtime you don't deal with syntax at all.
 



A fancier variation on the same theme...

 data RegExp :: * - * - * where
   Zero   :: RegExp tok Empty
   One:: RegExp tok ()
   Check  :: (tok - Bool) - RegExp tok tok
   Plus   :: RegExp tok a - RegExp tok b - RegExp tok (Either a b)
   Mult   :: RegExp tok a - RegExp tok b - RegExp tok (a, b)
   Star   :: RegExp tok a - RegExp tok [a]

 data Empty

The intuition is that a RegExp tok output is a regular expression 
explaining how to parse a list of tok as an output. Here, Zero is the 
regexp which does not accept anything, One accepts just the empty 
string, Plus is choice and Mult is sequential composition; Check lets 
you decide whether you like a single token.


Regular expressions may be seen as an extended language of polynomials 
with tokens for variables; this parser works by repeated application of 
the remainder theorem.


 parse :: RegExp tok x - [tok] - Maybe x
 parse r []   = empty r
 parse r (t : ts) = case divide t r of
   Div q f - return f `ap` parse q ts

Example

*RegExp parse (Star (Mult (Star (Check (== 'a'))) (Star (Check (== 
'b') abaabaaa

Just [(a,b),(aa,b),(aaa,)]

The 'remainder' explains if a regular expression accepts the empty 
string, and if so, how. The Star case is a convenient 
underapproximation, ruling out repeated empty values.


 empty :: RegExp tok a - Maybe a
 empty Zero = mzero
 empty One  = return ()
 empty (Check _)= mzero
 empty (Plus r1 r2) = (return Left  `ap` empty r1) `mplus`
  (return Right `ap` empty r2)
 empty (Mult r1 r2) = return (,) `ap` empty r1 `ap` empty r2
 empty (Star _) = return []

The 'quotient' explains how to parse the tail of the list, and how to 
recover the meaning of the whole list from the meaning of the tail.


 data Division tok x = forall y. Div (RegExp tok y) (y - x)

Here's how it's done. I didn't expect to need scoped type variables, but 
I did...


 divide :: tok - RegExp tok x - Division tok x
 divide t Zero  = Div Zero naughtE
 divide t One   = Div Zero naughtE
 divide t (Check p) | p t   = Div One (const t)
| otherwise = Div Zero naughtE
 divide t (Plus (r1 :: RegExp tok a) (r2 :: RegExp tok b)) =
   case (divide t r1, divide t r2) of
 (Div (q1 :: RegExp tok a') (f1 :: a' - a),
   Div (q2 :: RegExp tok b') (f2 :: b' - b)) -
   Div (Plus q1 q2) (f1 +++ f2)
 divide t (Mult r1 r2) = case (empty r1, divide t r1, divide t r2) of
   (Nothing, Div q1 f1, _) - Div (Mult q1 r2) (f1 *** id)
   (Just x1, Div q1 f1, Div q2 f2) -
 Div (Plus (Mult q1 r2) q2) (either (f1 *** id) (((,) x1) . f2))
 divide t (Star r) = case (divide t r) of
   Div q f - Div (Mult q (Star r)) (\ (y, xs) - (f y : xs))

Bureaucracy.

 (***) :: (a - b) - (c - d) - (a, c) - (b, d)
 (f *** g) (a, c) = (f a, g c)

 (+++) :: (a - b) - (c - d) - Either a c - Either b d
 (f +++ g) (Left a)  = Left  (f a)
 (f +++ g) (Right c) = Right (g c)

 naughtE :: Empty - x
 naughtE = undefined

It's not the most efficient parser in the world (doing some algebraic 
simplification on the fly wouldn't hurt), but it shows the sort of stuff 
you can do.


Have fun

Conor
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Regular Expressions, was Re: Interest in helping w/ Haskell standard

2005-10-15 Thread Bulat Ziganshin
Hello Conor,

Saturday, October 15, 2005, 4:47:02 PM, you wrote:

 Now lets consider using an algebraic datatype for regexps:

   data RegExp
   = Empty | Single Char | RegExp :+: RegExp | RegExp :|: RegExpt 
 | Iter RegExp

btw, a year ago i written RE processing library, which used Parsec
both to parse and compile regexpr itself and to parse input string according to
compiled regexpr. i think, this have no practical meaning, but may be
included in parsec library as intersting example of its usage :)
unluckily, i dont debugged it and so don't send it to parsec author


-- 
Best regards,
 Bulatmailto:[EMAIL PROTECTED]



___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe