Re: [Chicken-users] Using irregex safely & responsibly
From: Jim Ursetto Subject: Re: [Chicken-users] Using irregex safely & responsibly [Was: Re: dev-snapshot 4.6.3] Date: Fri, 8 Oct 2010 16:00:05 -0500 > Why can't the compatibility code be included in the new irregex unit? > In other words, the old procedure names and behavior could be > deprecated but left in so that 1) we don't have to add a blob of > compatibility code to every egg, and 2) eggs using the old irregex API > would be compatible with all Chicken versions without rebuilding. > It's not very nice to the end-user to just remove procedures without > going through a deprecation phase. That's a good idea. This code could be added to irregex.scm without touching irregex-core.scm (the upstream code). cheers, felix ___ Chicken-users mailing list Chicken-users@nongnu.org http://lists.nongnu.org/mailman/listinfo/chicken-users
Re: [Chicken-users] Using irregex safely & responsibly
Jim Ursetto writes: > There is some inconsistency in the docs: > > irregex-match-num-submatches: Returns the number of numbered > submatches that are defined in the > irregex or match object. > irregex-match-valid-index?: Returns {{#t}} iff the {{index-or-name}} > named submatch or index is defined in the {{match}} object. > > But below, *-valid-index? says undefined when *-num-submatches says defined: Not quite, *-valid-index? says "invalid", not "undefined". *-num-submatches just tells you the total number of submatches that are defined in the regexp, regardless of what has been matched, and irregex-match-num-submatches on a match result will always return the same result as irregex-num-submatches on the corresponding regexp. > The valid-index? predicate does not return a boolean #t value: > > #;9> (irregex-match-valid-index? m 3) > 0 It returns #t for this in the upstream irregex. > I prefer the old behavior for consistency because if irregex tells me > that 3 submatches exist, I expect to be able to access them without an > exception being thrown. *-valid-index? just states whether the submatch _may_ exist. We could add a utility irregex-match-matched-index? to test if a specific index was successfully matched. An index which could never be a valid submatch should arguably always throw an error. An index which is valid, but failed to match, could either throw an error or return #f. The -index and -substring operations are inconsistent in this respect, so we should fix that. It may be good to provide both sets, with a /default version analogous to SRFI-69 hash-table-ref and hash-table-ref/default: (irregex-match-substring )=> error (irregex-match-substring ) => error (irregex-match-substring/default #f)=> error (irregex-match-substring/default #f) => #f Thoughts? -- Alex ___ Chicken-users mailing list Chicken-users@nongnu.org http://lists.nongnu.org/mailman/listinfo/chicken-users
Re: [Chicken-users] Using irregex safely & responsibly
On Mon, Oct 11, 2010 at 01:17:49PM +0900, Alex Shinn wrote: > > The valid-index? predicate does not return a boolean #t value: > > > > #;9> (irregex-match-valid-index? m 3) > > 0 > > It returns #t for this in the upstream irregex. I'll look into that. It's probably a bug introduced by a Chicken-specific optimization. > *-valid-index? just states whether the submatch _may_ exist. > > We could add a utility irregex-match-matched-index? to test > if a specific index was successfully matched. That's a horrible name. I think we shouldn't need this if the procedures just returned #f in case of no match. > An index which could never be a valid submatch should > arguably always throw an error. Agreed. > An index which is valid, but failed to match, could either > throw an error or return #f. The -index and -substring > operations are inconsistent in this respect, so we should > fix that. IMHO they all should behave like -substring; return #f if there was no match. > It may be good to provide both sets, with a /default version > analogous to SRFI-69 hash-table-ref and > hash-table-ref/default: > > (irregex-match-substring )=> error > (irregex-match-substring ) => error > > (irregex-match-substring/default #f)=> error > (irregex-match-substring/default #f) => #f > > Thoughts? I think this is pointless. The hash table has a way to specify a default value because it's possible to have #f as a value in your hash table, which makes returning #f ambiguous. That's why there's a way to specify the default. However, in case of substring and index operations, the result is always an integer/a string. Returning #f is completely unambiguous in those cases, so I don't see the need to add yet another procedure. It would be preferable to have this behaviour: (irregex-match-substring )=> error (irregex-match-substring ) => #f (irregex-match-start-index )=> error (irregex-match-start-index ) => #f Cheers, Peter -- http://sjamaan.ath.cx -- "The process of preparing programs for a digital computer is especially attractive, not only because it can be economically and scientifically rewarding, but also because it can be an aesthetic experience much like composing poetry or music." -- Donald Knuth ___ Chicken-users mailing list Chicken-users@nongnu.org http://lists.nongnu.org/mailman/listinfo/chicken-users
Re: [Chicken-users] Using irregex safely & responsibly
On Mon, Oct 11, 2010 at 02:51, Peter Bex wrote: > On Mon, Oct 11, 2010 at 01:17:49PM +0900, Alex Shinn wrote: > However, in case of substring and index operations, the result is > always an integer/a string. Returning #f is completely unambiguous > in those cases, so I don't see the need to add yet another procedure. > > It would be preferable to have this behaviour: > > (irregex-match-substring ) => error > (irregex-match-substring ) => #f > > (irregex-match-start-index ) => error > (irregex-match-start-index ) => #f I agree with Peter, the /default procedures seem like a needless abstraction as a totally unambiguous #f is common practice. For example, srfi-13 string-index. Unless this practice is going to be deprecated somehow by R7RS. ___ Chicken-users mailing list Chicken-users@nongnu.org http://lists.nongnu.org/mailman/listinfo/chicken-users
Re: [Chicken-users] Using irregex safely & responsibly
On Sun, Oct 10, 2010 at 23:17, Alex Shinn wrote: > Jim Ursetto writes: > >> There is some inconsistency in the docs: >> >> irregex-match-num-submatches: Returns the number of numbered >> submatches that are defined in the >> irregex or match object. >> irregex-match-valid-index?: Returns {{#t}} iff the {{index-or-name}} >> named submatch or index is defined in the {{match}} object. >> >> But below, *-valid-index? says undefined when *-num-submatches says defined: > > Not quite, *-valid-index? says "invalid", not "undefined". Indeed, but it says "defined" in the Chicken docs reproduced above, which is why I said there was an inconsistency in the docs. This was more a note for the Chicken docs to be clarified (I couldn't find this documented in the official irregex docs at all). Jim ___ Chicken-users mailing list Chicken-users@nongnu.org http://lists.nongnu.org/mailman/listinfo/chicken-users
Re: [Chicken-users] Using irregex safely & responsibly
On Mon, Oct 11, 2010 at 09:51:15AM +0200, Peter Bex wrote: > > > #;9> (irregex-match-valid-index? m 3) > > > 0 > > > > It returns #t for this in the upstream irregex. > > I'll look into that. It's probably a bug introduced by a > Chicken-specific optimization. Yeah, it was a small oversight in a manual merge of a failed patch hunk for irregex upstream changeset 9c903144d459. It has been fixed in experimental 0ea0570b4555c737e35288ba9f43e45b25539913. Cheers, Peter -- http://sjamaan.ath.cx -- "The process of preparing programs for a digital computer is especially attractive, not only because it can be economically and scientifically rewarding, but also because it can be an aesthetic experience much like composing poetry or music." -- Donald Knuth ___ Chicken-users mailing list Chicken-users@nongnu.org http://lists.nongnu.org/mailman/listinfo/chicken-users
Re: [Chicken-users] Using irregex safely & responsibly
Jim Ursetto writes: > I agree with Peter, the /default procedures seem like a needless > abstraction as a totally unambiguous #f is common practice. For > example, srfi-13 string-index. No, in retrospect I'm not sure why I didn't suggest that to begin with - I think I've been working too much with type inference lately, which makes such ambiguous return types undesirable. -- Alex ___ Chicken-users mailing list Chicken-users@nongnu.org http://lists.nongnu.org/mailman/listinfo/chicken-users
[Chicken-users] Using irregex safely & responsibly [Was: Re: dev-snapshot 4.6.3]
Hello! Since there are a few pitfalls to updating eggs to work with the new Chicken 4.6.2, I've decided to draw up a quick list of pitfalls I noticed. If you're just using the regex API or the regex API with a few irregex things, usually all you need is to add (needs regex) to your egg's meta file and you're done (but read on!). If you are only using the irregex API and don't want to drag in another dependency just to keep older Chickens happy, here's a way to depend on just irregex in either old or new Chicken. Add the following to your .setup file: (define regex-version (if (version>=? (chicken-version) "4.6.2") 'total-irregex 'irregex-through-regex)) and alter your compilation line to read something like this: (compile -s -D ,regex-version my-egg.scm -j my-egg) In your egg's file, where you would previously use this idiom: (require-library regex) ; or (use regex) for the lazy & sloppy (import irregex) you can now replace it with this block (you can delete emulation of procedures you are sure you aren't using): (cond-expand (total-irregex (use irregex)) (else (require-library regex) (import (rename irregex (irregex-match-start irregex-match-start-index) (irregex-match-end irregex-match-end-index))) (define irregex-num-submatches irregex-submatches) (define irregex-match-num-submatches irregex-submatches) (define (irregex-match-valid-index? m i) (and (irregex-match-start-index m i) #t)) (define (maybe-string->sre obj) (if (string? obj) (string->sre obj) obj We need to do this because the irregex-match-{start,end} procedures have been suffixed "-index". But be careful! They also have slightly changed semantics. If you pass an index of a submatch which did not get matched, irrgex-match-start-index will throw an exception. You first need to test whether it matched. That's what irregex-match-valid-index? is for, which we simulate in older Chickens by simply attempting to fetch the start index and checking whether it succeeded. We also define irregex-[match-]num-submatches as irregex-submatches. The old procedure accepted both match objects and irregex objects, while the new procedures are more consistent with the rest of the API and only accept their corresponding types. The maybe-string->sre is just copied from the irregex-core.scm file. One last pitfall to take care of is the fact that the new irregex has a few new procedures. For example, the old Chicken's irregex simply didn't have the chunked matching API at all. Here's a list of things that are unavailable in older Chickens: irregex-extract irregex-fold/chunked irregex-match? irregex-match-end-index (but see emulation code above) irregex-match-end-chunk irregex-match-names irregex-match-start-index (but see emulation code above) irregex-match-start-chunk irregex-match-subchunk irregex-match-valid-index? (but see emulation code above) irregex-match/chunked irregex-num-submatches (but see emulation code above) irregex-opt irregex-quote irregex-split make-irregex-chunker maybe-string->sre (but see emulation code above) Cheers, Peter -- http://sjamaan.ath.cx -- "The process of preparing programs for a digital computer is especially attractive, not only because it can be economically and scientifically rewarding, but also because it can be an aesthetic experience much like composing poetry or music." -- Donald Knuth ___ Chicken-users mailing list Chicken-users@nongnu.org http://lists.nongnu.org/mailman/listinfo/chicken-users
Re: [Chicken-users] Using irregex safely & responsibly [Was: Re: dev-snapshot 4.6.3]
On Thu, Oct 7, 2010 at 15:53, Peter Bex wrote: > In your egg's file, where you would previously use this idiom: > > (require-library regex) ; or (use regex) for the lazy & sloppy > (import irregex) > > you can now replace it with this block (you can delete emulation of > procedures you are sure you aren't using): > > (cond-expand > (total-irregex > (use irregex)) > (else > (require-library regex) > (import (rename irregex > (irregex-match-start irregex-match-start-index) > (irregex-match-end irregex-match-end-index))) > (define irregex-num-submatches irregex-submatches) > (define irregex-match-num-submatches irregex-submatches) > (define (irregex-match-valid-index? m i) > (and (irregex-match-start-index m i) #t)) > (define (maybe-string->sre obj) > (if (string? obj) (string->sre obj) obj Does this mean for every egg that uses the irregex API directly, I need to insert this blob of code? There is some inconsistency in the docs: irregex-match-num-submatches: Returns the number of numbered submatches that are defined in the irregex or match object. irregex-match-valid-index?: Returns {{#t}} iff the {{index-or-name}} named submatch or index is defined in the {{match}} object. But below, *-valid-index? says undefined when *-num-submatches says defined: #;1> (define m (irregex-search (irregex "(abc)|(def)|(ghi)") "ghi")) #;2> (irregex-match-num-submatches m) 3 #;3> (irregex-match-valid-index? m 2) #f The valid-index? predicate does not return a boolean #t value: #;9> (irregex-match-valid-index? m 3) 0 #;9> (irregex-match-substring m 3) "ghi" Failure behavior for match-start-index and match-substring is unspecified in the docs. The former throws an error and the latter returns #f: #;3> (irregex-match-start-index m 2) Error: (irregex-match-start-index) not a valid index # 2 #;6> (irregex-match-substring m 2) #f I prefer the old behavior for consistency because if irregex tells me that 3 submatches exist, I expect to be able to access them without an exception being thrown. ___ Chicken-users mailing list Chicken-users@nongnu.org http://lists.nongnu.org/mailman/listinfo/chicken-users
Re: [Chicken-users] Using irregex safely & responsibly [Was: Re: dev-snapshot 4.6.3]
On Thu, Oct 07, 2010 at 08:37:59PM -0500, Jim Ursetto wrote: > Does this mean for every egg that uses the irregex API directly, I > need to insert this [cond-expand] blob of code? You have three options: - Add a dependency on the regex egg and keep doing (require-library regex)(import irregex) like before - Insert this blob of code to ensure it works with old and new Chickens - Drop the blob if you don't care about older Chickens. > There is some inconsistency in the docs: > > irregex-match-num-submatches: Returns the number of numbered > submatches that are defined in the > irregex or match object. > irregex-match-valid-index?: Returns {{#t}} iff the {{index-or-name}} > named submatch or index is defined in the {{match}} object. > > But below, *-valid-index? says undefined when *-num-submatches says defined: Hm, I'll have to take this up with Alex, it looks like a bug indeed. Cheers, Peter -- http://sjamaan.ath.cx -- "The process of preparing programs for a digital computer is especially attractive, not only because it can be economically and scientifically rewarding, but also because it can be an aesthetic experience much like composing poetry or music." -- Donald Knuth ___ Chicken-users mailing list Chicken-users@nongnu.org http://lists.nongnu.org/mailman/listinfo/chicken-users
Re: [Chicken-users] Using irregex safely & responsibly [Was: Re: dev-snapshot 4.6.3]
On Fri, Oct 08, 2010 at 09:05:10AM +0200, Peter Bex wrote: > On Thu, Oct 07, 2010 at 08:37:59PM -0500, Jim Ursetto wrote: > > Does this mean for every egg that uses the irregex API directly, I > > need to insert this [cond-expand] blob of code? > > You have three options: > - Add a dependency on the regex egg and keep doing >(require-library regex)(import irregex) like before That's not quite true; some of the compatibility code is still necessary to make up for the changes in the API. Cheers, Peter -- http://sjamaan.ath.cx -- "The process of preparing programs for a digital computer is especially attractive, not only because it can be economically and scientifically rewarding, but also because it can be an aesthetic experience much like composing poetry or music." -- Donald Knuth ___ Chicken-users mailing list Chicken-users@nongnu.org http://lists.nongnu.org/mailman/listinfo/chicken-users
Re: [Chicken-users] Using irregex safely & responsibly [Was: Re: dev-snapshot 4.6.3]
On Fri, Oct 8, 2010 at 02:09, Peter Bex wrote: > That's not quite true; some of the compatibility code is still necessary > to make up for the changes in the API. If that's the case, it means that eggs compiled with 4.6.0 aren't compatible with those compiled with 4.6.2, because that compatibility code is selected at compile-time. It's looking to me more and more that the binversion should be bumped from 5 to 6 (as much as I dislike this). Why can't the compatibility code be included in the new irregex unit? In other words, the old procedure names and behavior could be deprecated but left in so that 1) we don't have to add a blob of compatibility code to every egg, and 2) eggs using the old irregex API would be compatible with all Chicken versions without rebuilding. It's not very nice to the end-user to just remove procedures without going through a deprecation phase. Thoughts? Jim ___ Chicken-users mailing list Chicken-users@nongnu.org http://lists.nongnu.org/mailman/listinfo/chicken-users
Re: [Chicken-users] Using irregex safely & responsibly [Was: Re: dev-snapshot 4.6.3]
Eh, let me clarify #2. Eggs built with 4.6.0 need to be recompiled with 4.6.2 regardless due to the C_regex_toplevel linking issues. However, once they are, they would also work with 4.6.0 again, as long as they stuck to the old irregex API. I think. This is pretty confusing. Maybe we should bump binversion to 6 after all. :( On Fri, Oct 8, 2010 at 16:00, Jim Ursetto wrote: > 2) eggs using the old irregex API would be compatible with all Chicken > versions without rebuilding. ___ Chicken-users mailing list Chicken-users@nongnu.org http://lists.nongnu.org/mailman/listinfo/chicken-users