Re: [Chicken-users] Using irregex safely & responsibly

2010-10-08 Thread Felix
From: Jim Ursetto 
Subject: Re: [Chicken-users] Using irregex safely & responsibly [Was: Re: 
dev-snapshot 4.6.3]
Date: Fri, 8 Oct 2010 16:00:05 -0500

> Why can't the compatibility code be included in the new irregex unit?
> In other words, the old procedure names and behavior could be
> deprecated but left in so that 1) we don't have to add a blob of
> compatibility code to every egg, and 2) eggs using the old irregex API
> would be compatible with all Chicken versions without rebuilding.
> It's not very nice to the end-user to just remove procedures without
> going through a deprecation phase.

That's a good idea. This code could be added to irregex.scm without
touching irregex-core.scm (the upstream code).


cheers,
felix

___
Chicken-users mailing list
Chicken-users@nongnu.org
http://lists.nongnu.org/mailman/listinfo/chicken-users


Re: [Chicken-users] Using irregex safely & responsibly

2010-10-10 Thread Alex Shinn
Jim Ursetto  writes:

> There is some inconsistency in the docs:
>
> irregex-match-num-submatches: Returns the number of numbered
> submatches that are defined in the
> irregex or match object.
> irregex-match-valid-index?: Returns {{#t}} iff the {{index-or-name}}
> named submatch or index is defined in the {{match}} object.
>
> But below, *-valid-index? says undefined when *-num-submatches says defined:

Not quite, *-valid-index? says "invalid", not "undefined".

*-num-submatches just tells you the total number of
submatches that are defined in the regexp, regardless of
what has been matched, and irregex-match-num-submatches on a
match result will always return the same result as
irregex-num-submatches on the corresponding regexp.

> The valid-index? predicate does not return a boolean #t value:
>
> #;9> (irregex-match-valid-index? m 3)
> 0

It returns #t for this in the upstream irregex.

> I prefer the old behavior for consistency because if irregex tells me
> that 3 submatches exist, I expect to be able to access them without an
> exception being thrown.

*-valid-index? just states whether the submatch _may_ exist.

We could add a utility irregex-match-matched-index? to test
if a specific index was successfully matched.

An index which could never be a valid submatch should
arguably always throw an error.

An index which is valid, but failed to match, could either
throw an error or return #f.  The -index and -substring
operations are inconsistent in this respect, so we should
fix that.

It may be good to provide both sets, with a /default version
analogous to SRFI-69 hash-table-ref and
hash-table-ref/default:

  (irregex-match-substring  )=> error
  (irregex-match-substring  )  => error

  (irregex-match-substring/default   #f)=> error
  (irregex-match-substring/default   #f)  => #f

Thoughts?

-- 
Alex

___
Chicken-users mailing list
Chicken-users@nongnu.org
http://lists.nongnu.org/mailman/listinfo/chicken-users


Re: [Chicken-users] Using irregex safely & responsibly

2010-10-11 Thread Peter Bex
On Mon, Oct 11, 2010 at 01:17:49PM +0900, Alex Shinn wrote:
> > The valid-index? predicate does not return a boolean #t value:
> >
> > #;9> (irregex-match-valid-index? m 3)
> > 0
> 
> It returns #t for this in the upstream irregex.

I'll look into that. It's probably a bug introduced by a
Chicken-specific optimization.

> *-valid-index? just states whether the submatch _may_ exist.
> 
> We could add a utility irregex-match-matched-index? to test
> if a specific index was successfully matched.

That's a horrible name.  I think we shouldn't need this if
the procedures just returned #f in case of no match.

> An index which could never be a valid submatch should
> arguably always throw an error.

Agreed.

> An index which is valid, but failed to match, could either
> throw an error or return #f.  The -index and -substring
> operations are inconsistent in this respect, so we should
> fix that.

IMHO they all should behave like -substring; return #f if
there was no match.

> It may be good to provide both sets, with a /default version
> analogous to SRFI-69 hash-table-ref and
> hash-table-ref/default:
> 
>   (irregex-match-substring  )=> error
>   (irregex-match-substring  )  => error
> 
>   (irregex-match-substring/default   #f)=> error
>   (irregex-match-substring/default   #f)  => #f
> 
> Thoughts?

I think this is pointless.  The hash table has a way to specify a
default value because it's possible to have #f as a value in your
hash table, which makes returning #f ambiguous.  That's why there's
a way to specify the default.

However, in case of substring and index operations, the result is
always an integer/a string.  Returning #f is completely unambiguous
in those cases, so I don't see the need to add yet another procedure.

It would be preferable to have this behaviour:

 (irregex-match-substring  )=> error
 (irregex-match-substring  )  => #f

 (irregex-match-start-index  )=> error
 (irregex-match-start-index  )  => #f

Cheers,
Peter
-- 
http://sjamaan.ath.cx
--
"The process of preparing programs for a digital computer
 is especially attractive, not only because it can be economically
 and scientifically rewarding, but also because it can be an aesthetic
 experience much like composing poetry or music."
-- Donald Knuth

___
Chicken-users mailing list
Chicken-users@nongnu.org
http://lists.nongnu.org/mailman/listinfo/chicken-users


Re: [Chicken-users] Using irregex safely & responsibly

2010-10-11 Thread Jim Ursetto
On Mon, Oct 11, 2010 at 02:51, Peter Bex  wrote:
> On Mon, Oct 11, 2010 at 01:17:49PM +0900, Alex Shinn wrote:

> However, in case of substring and index operations, the result is
> always an integer/a string.  Returning #f is completely unambiguous
> in those cases, so I don't see the need to add yet another procedure.
>
> It would be preferable to have this behaviour:
>
>  (irregex-match-substring  )    => error
>  (irregex-match-substring  )  => #f
>
>  (irregex-match-start-index  )    => error
>  (irregex-match-start-index  )  => #f

I agree with Peter, the /default procedures seem like a needless
abstraction as a totally unambiguous #f is common practice.  For
example, srfi-13 string-index.  Unless this practice is going to be
deprecated somehow by R7RS.

___
Chicken-users mailing list
Chicken-users@nongnu.org
http://lists.nongnu.org/mailman/listinfo/chicken-users


Re: [Chicken-users] Using irregex safely & responsibly

2010-10-11 Thread Jim Ursetto
On Sun, Oct 10, 2010 at 23:17, Alex Shinn  wrote:
> Jim Ursetto  writes:
>
>> There is some inconsistency in the docs:
>>
>> irregex-match-num-submatches: Returns the number of numbered
>> submatches that are defined in the
>> irregex or match object.
>> irregex-match-valid-index?: Returns {{#t}} iff the {{index-or-name}}
>> named submatch or index is defined in the {{match}} object.
>>
>> But below, *-valid-index? says undefined when *-num-submatches says defined:
>
> Not quite, *-valid-index? says "invalid", not "undefined".

Indeed, but it says "defined" in the Chicken docs reproduced above,
which is why I said there was an inconsistency in the docs.

This was more a note for the Chicken docs to be clarified (I couldn't
find this documented in the official irregex docs at all).

Jim

___
Chicken-users mailing list
Chicken-users@nongnu.org
http://lists.nongnu.org/mailman/listinfo/chicken-users


Re: [Chicken-users] Using irregex safely & responsibly

2010-10-11 Thread Peter Bex
On Mon, Oct 11, 2010 at 09:51:15AM +0200, Peter Bex wrote:
> > > #;9> (irregex-match-valid-index? m 3)
> > > 0
> > 
> > It returns #t for this in the upstream irregex.
> 
> I'll look into that. It's probably a bug introduced by a
> Chicken-specific optimization.

Yeah, it was a small oversight in a manual merge of a failed patch hunk
for irregex upstream changeset 9c903144d459.
It has been fixed in experimental 0ea0570b4555c737e35288ba9f43e45b25539913.

Cheers,
Peter
-- 
http://sjamaan.ath.cx
--
"The process of preparing programs for a digital computer
 is especially attractive, not only because it can be economically
 and scientifically rewarding, but also because it can be an aesthetic
 experience much like composing poetry or music."
-- Donald Knuth

___
Chicken-users mailing list
Chicken-users@nongnu.org
http://lists.nongnu.org/mailman/listinfo/chicken-users


Re: [Chicken-users] Using irregex safely & responsibly

2010-10-11 Thread Alex Shinn
Jim Ursetto  writes:

> I agree with Peter, the /default procedures seem like a needless
> abstraction as a totally unambiguous #f is common practice.  For
> example, srfi-13 string-index.

No, in retrospect I'm not sure why I didn't suggest that to
begin with - I think I've been working too much with type
inference lately, which makes such ambiguous return types
undesirable.

-- 
Alex

___
Chicken-users mailing list
Chicken-users@nongnu.org
http://lists.nongnu.org/mailman/listinfo/chicken-users


[Chicken-users] Using irregex safely & responsibly [Was: Re: dev-snapshot 4.6.3]

2010-10-07 Thread Peter Bex
Hello!

Since there are a few pitfalls to updating eggs to work with the new
Chicken 4.6.2, I've decided to draw up a quick list of pitfalls I noticed.

If you're just using the regex API or the regex API with a few irregex
things, usually all you need is to add (needs regex) to your egg's meta
file and you're done (but read on!).

If you are only using the irregex API and don't want to drag in another
dependency just to keep older Chickens happy, here's a way to depend
on just irregex in either old or new Chicken.  Add the following to your
.setup file:

(define regex-version
  (if (version>=? (chicken-version) "4.6.2")
  'total-irregex
  'irregex-through-regex))

and alter your compilation line to read something like this:

(compile -s -D ,regex-version my-egg.scm -j my-egg)

In your egg's file, where you would previously use this idiom:

(require-library regex)  ; or (use regex) for the lazy & sloppy
(import irregex)

you can now replace it with this block (you can delete emulation of
procedures you are sure you aren't using):

(cond-expand
 (total-irregex
  (use irregex))
 (else
  (require-library regex)
  (import (rename irregex
  (irregex-match-start irregex-match-start-index)
  (irregex-match-end irregex-match-end-index)))
  (define irregex-num-submatches irregex-submatches)
  (define irregex-match-num-submatches irregex-submatches)
  (define (irregex-match-valid-index? m i)
(and (irregex-match-start-index m i) #t))
  (define (maybe-string->sre obj)
(if (string? obj) (string->sre obj) obj

We need to do this because the irregex-match-{start,end} procedures have
been suffixed "-index".  But be careful!  They also have slightly changed
semantics.  If you pass an index of a submatch which did not get matched,
irrgex-match-start-index will throw an exception.  You first need to
test whether it matched.  That's what irregex-match-valid-index? is
for, which we simulate in older Chickens by simply attempting to fetch
the start index and checking whether it succeeded.
We also define irregex-[match-]num-submatches as irregex-submatches.
The old procedure accepted both match objects and irregex objects, while
the new procedures are more consistent with the rest of the API and only
accept their corresponding types.  The maybe-string->sre is just copied
from the irregex-core.scm file.

One last pitfall to take care of is the fact that the new irregex has
a few new procedures.  For example, the old Chicken's irregex simply
didn't have the chunked matching API at all.  Here's a list of things
that are unavailable in older Chickens:

irregex-extract
irregex-fold/chunked
irregex-match?
irregex-match-end-index (but see emulation code above)
irregex-match-end-chunk
irregex-match-names
irregex-match-start-index   (but see emulation code above)
irregex-match-start-chunk
irregex-match-subchunk
irregex-match-valid-index?  (but see emulation code above)
irregex-match/chunked
irregex-num-submatches  (but see emulation code above)
irregex-opt
irregex-quote
irregex-split
make-irregex-chunker
maybe-string->sre   (but see emulation code above)

Cheers,
Peter
-- 
http://sjamaan.ath.cx
--
"The process of preparing programs for a digital computer
 is especially attractive, not only because it can be economically
 and scientifically rewarding, but also because it can be an aesthetic
 experience much like composing poetry or music."
-- Donald Knuth

___
Chicken-users mailing list
Chicken-users@nongnu.org
http://lists.nongnu.org/mailman/listinfo/chicken-users


Re: [Chicken-users] Using irregex safely & responsibly [Was: Re: dev-snapshot 4.6.3]

2010-10-07 Thread Jim Ursetto
On Thu, Oct 7, 2010 at 15:53, Peter Bex  wrote:
> In your egg's file, where you would previously use this idiom:
>
> (require-library regex)      ; or (use regex) for the lazy & sloppy
> (import irregex)
>
> you can now replace it with this block (you can delete emulation of
> procedures you are sure you aren't using):
>
> (cond-expand
>  (total-irregex
>  (use irregex))
>  (else
>  (require-library regex)
>  (import (rename irregex
>                  (irregex-match-start irregex-match-start-index)
>                  (irregex-match-end irregex-match-end-index)))
>  (define irregex-num-submatches irregex-submatches)
>  (define irregex-match-num-submatches irregex-submatches)
>  (define (irregex-match-valid-index? m i)
>    (and (irregex-match-start-index m i) #t))
>  (define (maybe-string->sre obj)
>    (if (string? obj) (string->sre obj) obj

Does this mean for every egg that uses the irregex API directly, I
need to insert this blob of code?

There is some inconsistency in the docs:

irregex-match-num-submatches: Returns the number of numbered
submatches that are defined in the
irregex or match object.
irregex-match-valid-index?: Returns {{#t}} iff the {{index-or-name}}
named submatch or index is defined in the {{match}} object.

But below, *-valid-index? says undefined when *-num-submatches says defined:

#;1> (define m (irregex-search (irregex "(abc)|(def)|(ghi)") "ghi"))
#;2> (irregex-match-num-submatches m)
3
#;3> (irregex-match-valid-index? m 2)
#f

The valid-index? predicate does not return a boolean #t value:

#;9> (irregex-match-valid-index? m 3)
0
#;9> (irregex-match-substring m 3)
"ghi"

Failure behavior for match-start-index and match-substring is
unspecified in the docs.  The former throws an error and the latter
returns #f:

#;3> (irregex-match-start-index m 2)
Error: (irregex-match-start-index) not a valid index
#
2

#;6> (irregex-match-substring m 2)
#f

I prefer the old behavior for consistency because if irregex tells me
that 3 submatches exist, I expect to be able to access them without an
exception being thrown.

___
Chicken-users mailing list
Chicken-users@nongnu.org
http://lists.nongnu.org/mailman/listinfo/chicken-users


Re: [Chicken-users] Using irregex safely & responsibly [Was: Re: dev-snapshot 4.6.3]

2010-10-08 Thread Peter Bex
On Thu, Oct 07, 2010 at 08:37:59PM -0500, Jim Ursetto wrote:
> Does this mean for every egg that uses the irregex API directly, I
> need to insert this [cond-expand] blob of code?

You have three options:
- Add a dependency on the regex egg and keep doing
   (require-library regex)(import irregex) like before
- Insert this blob of code to ensure it works with old and new Chickens
- Drop the blob if you don't care about older Chickens.

> There is some inconsistency in the docs:
> 
> irregex-match-num-submatches: Returns the number of numbered
> submatches that are defined in the
> irregex or match object.
> irregex-match-valid-index?: Returns {{#t}} iff the {{index-or-name}}
> named submatch or index is defined in the {{match}} object.
> 
> But below, *-valid-index? says undefined when *-num-submatches says defined:

Hm, I'll have to take this up with Alex, it looks like a bug indeed.

Cheers,
Peter
-- 
http://sjamaan.ath.cx
--
"The process of preparing programs for a digital computer
 is especially attractive, not only because it can be economically
 and scientifically rewarding, but also because it can be an aesthetic
 experience much like composing poetry or music."
-- Donald Knuth

___
Chicken-users mailing list
Chicken-users@nongnu.org
http://lists.nongnu.org/mailman/listinfo/chicken-users


Re: [Chicken-users] Using irregex safely & responsibly [Was: Re: dev-snapshot 4.6.3]

2010-10-08 Thread Peter Bex
On Fri, Oct 08, 2010 at 09:05:10AM +0200, Peter Bex wrote:
> On Thu, Oct 07, 2010 at 08:37:59PM -0500, Jim Ursetto wrote:
> > Does this mean for every egg that uses the irregex API directly, I
> > need to insert this [cond-expand] blob of code?
> 
> You have three options:
> - Add a dependency on the regex egg and keep doing
>(require-library regex)(import irregex) like before

That's not quite true; some of the compatibility code is still necessary
to make up for the changes in the API.

Cheers,
Peter
-- 
http://sjamaan.ath.cx
--
"The process of preparing programs for a digital computer
 is especially attractive, not only because it can be economically
 and scientifically rewarding, but also because it can be an aesthetic
 experience much like composing poetry or music."
-- Donald Knuth

___
Chicken-users mailing list
Chicken-users@nongnu.org
http://lists.nongnu.org/mailman/listinfo/chicken-users


Re: [Chicken-users] Using irregex safely & responsibly [Was: Re: dev-snapshot 4.6.3]

2010-10-08 Thread Jim Ursetto
On Fri, Oct 8, 2010 at 02:09, Peter Bex  wrote:
> That's not quite true; some of the compatibility code is still necessary
> to make up for the changes in the API.

If that's the case, it means that eggs compiled with 4.6.0 aren't
compatible with those compiled with 4.6.2, because that compatibility
code is selected at compile-time.  It's looking to me more and more
that the binversion should be bumped from 5 to 6 (as much as I dislike
this).

Why can't the compatibility code be included in the new irregex unit?
In other words, the old procedure names and behavior could be
deprecated but left in so that 1) we don't have to add a blob of
compatibility code to every egg, and 2) eggs using the old irregex API
would be compatible with all Chicken versions without rebuilding.
It's not very nice to the end-user to just remove procedures without
going through a deprecation phase.

Thoughts?
Jim

___
Chicken-users mailing list
Chicken-users@nongnu.org
http://lists.nongnu.org/mailman/listinfo/chicken-users


Re: [Chicken-users] Using irregex safely & responsibly [Was: Re: dev-snapshot 4.6.3]

2010-10-08 Thread Jim Ursetto
Eh, let me clarify #2.  Eggs built with 4.6.0 need to be recompiled
with 4.6.2 regardless due to the C_regex_toplevel linking issues.
However, once they are, they would also work with 4.6.0 again, as long
as they stuck to the old irregex API.  I think.  This is pretty
confusing.  Maybe we should bump binversion to 6 after all. :(

On Fri, Oct 8, 2010 at 16:00, Jim Ursetto  wrote:
> 2) eggs using the old irregex API would be compatible with all Chicken 
> versions without rebuilding.

___
Chicken-users mailing list
Chicken-users@nongnu.org
http://lists.nongnu.org/mailman/listinfo/chicken-users