Re: [racket-dev] `string-split'
[Changed title to talk about each one separately.] Two hours ago, Laurent wrote: One string function that I often find useful in various scripting languages is a `string-split' (explode in php). It can be done with `regexp-split', but having something more along the lines of a `string-split' should belong to a racket/string lib I think. Plus it would be symmetric with `string-join', which already is in racket/ string (or at least a doc line pointing to regexp-split should be added there). If you mean something like this: (define (string-split str) (regexp-match* #px\\S+ str)) ? If so, then I see a much weaker point for it -- unlike other small utilities, this one doesn't even compose two function calls. The very weak point here is if you want a default argument that specifies the gaps to split on rather than the words: (define (string-split str [sep #px\\s+]) (remove* '() (regexp-split sep str))) but that *does* use regexps, so I don't see the point, still... -- ((lambda (x) (x x)) (lambda (x) (x x))) Eli Barzilay: http://barzilay.org/ Maze is Life! _ Racket Developers list: http://lists.racket-lang.org/dev
Re: [racket-dev] `string-split'
On Thu, Apr 19, 2012 at 8:21 AM, Eli Barzilay e...@barzilay.org wrote: Two hours ago, Laurent wrote: One string function that I often find useful in various scripting languages is a `string-split' (explode in php). It can be done with `regexp-split', but having something more along the lines of a `string-split' should belong to a racket/string lib I think. Plus it would be symmetric with `string-join', which already is in racket/ string (or at least a doc line pointing to regexp-split should be added there). If you mean something like this: (define (string-split str) (regexp-match* #px\\S+ str)) ? If so, then I see a much weaker point for it -- unlike other small utilities, this one doesn't even compose two function calls. It composes one function call (with an extremely complex API) with one domain-specific language (that lots of people don't know/understand/use) into one extremely simple but useful function. The very weak point here is if you want a default argument that specifies the gaps to split on rather than the words: (define (string-split str [sep #px\\s+]) (remove* '() (regexp-split sep str))) but that *does* use regexps, so I don't see the point, still... Note that (string-split str ;) works given that implementation, which I think makes it both easy-to-understand and useful. -- sam th sa...@ccs.neu.edu _ Racket Developers list: http://lists.racket-lang.org/dev
Re: [racket-dev] `string-split'
(define (string-split str [sep #px\\s+]) (remove* '() (regexp-split sep str))) Nearly, I meant something more like this: (define (string-split str [splitter ]) (regexp-split (regexp-quote splitter) str)) No regexp from the user POV, and much easier to use with little knowledge. _ Racket Developers list: http://lists.racket-lang.org/dev
Re: [racket-dev] `string-split'
I agree with this: we should add `string-split', the one-argument case should be as Eli wrote, and the two-argument case should be as Laurent wrote. (Probably the optional second argument should be string-or-#f, where #f means to use #px\\s+.) At Thu, 19 Apr 2012 14:30:31 +0200, Laurent wrote: (define (string-split str [sep #px\\s+]) (remove* '() (regexp-split sep str))) Nearly, I meant something more like this: (define (string-split str [splitter ]) (regexp-split (regexp-quote splitter) str)) No regexp from the user POV, and much easier to use with little knowledge. _ Racket Developers list: http://lists.racket-lang.org/dev _ Racket Developers list: http://lists.racket-lang.org/dev
Re: [racket-dev] `string-split'
I think Laurent pointed out in his initial message that beginners may be intimidated by regexps. I agree. Plus someone who isn't fluent with regexp may be more comfortable with string-split. Last but not least, a program documents itself more clearly with string-split vs regexp. On Apr 19, 2012, at 8:21 AM, Eli Barzilay wrote: [Changed title to talk about each one separately.] Two hours ago, Laurent wrote: One string function that I often find useful in various scripting languages is a `string-split' (explode in php). It can be done with `regexp-split', but having something more along the lines of a `string-split' should belong to a racket/string lib I think. Plus it would be symmetric with `string-join', which already is in racket/ string (or at least a doc line pointing to regexp-split should be added there). If you mean something like this: (define (string-split str) (regexp-match* #px\\S+ str)) ? If so, then I see a much weaker point for it -- unlike other small utilities, this one doesn't even compose two function calls. The very weak point here is if you want a default argument that specifies the gaps to split on rather than the words: (define (string-split str [sep #px\\s+]) (remove* '() (regexp-split sep str))) but that *does* use regexps, so I don't see the point, still... -- ((lambda (x) (x x)) (lambda (x) (x x))) Eli Barzilay: http://barzilay.org/ Maze is Life! _ Racket Developers list: http://lists.racket-lang.org/dev _ Racket Developers list: http://lists.racket-lang.org/dev
Re: [racket-dev] `string-split'
On Thu, Apr 19, 2012 at 14:33, Matthew Flatt mfl...@cs.utah.edu wrote: I agree with this: we should add `string-split', the one-argument case should be as Eli wrote, About this I'm not sure, as one cannot reproduce this behavior by providing an argument (or it could make the difference between string-as-not-regexps and regexps? Wouldn't this be different from other places?). It would then appear somewhat magical. To me the default splitter seems more intuitive. Laurent and the two-argument case should be as Laurent wrote. (Probably the optional second argument should be string-or-#f, where #f means to use #px\\s+.) At Thu, 19 Apr 2012 14:30:31 +0200, Laurent wrote: (define (string-split str [sep #px\\s+]) (remove* '() (regexp-split sep str))) Nearly, I meant something more like this: (define (string-split str [splitter ]) (regexp-split (regexp-quote splitter) str)) No regexp from the user POV, and much easier to use with little knowledge. _ Racket Developers list: http://lists.racket-lang.org/dev _ Racket Developers list: http://lists.racket-lang.org/dev
Re: [racket-dev] `string-split'
At Thu, 19 Apr 2012 14:43:44 +0200, Laurent wrote: On Thu, Apr 19, 2012 at 14:33, Matthew Flatt mfl...@cs.utah.edu wrote: I agree with this: we should add `string-split', the one-argument case should be as Eli wrote, About this I'm not sure, as one cannot reproduce this behavior by providing an argument (or it could make the difference between string-as-not-regexps and regexps? Wouldn't this be different from other places?). I'm suggesting that supplying `#f' as the argument would be the same as not supplying the argument. It is a special case, though. I don't mind the specialness here, because I see the job of `string-split' as making a couple of useful special cases easy (as opposed to the generality of `regexp-split'). It would then appear somewhat magical. To me the default splitter seems more intuitive. Laurent and the two-argument case should be as Laurent wrote. (Probably the optional second argument should be string-or-#f, where #f means to use #px\\s+.) At Thu, 19 Apr 2012 14:30:31 +0200, Laurent wrote: (define (string-split str [sep #px\\s+]) (remove* '() (regexp-split sep str))) Nearly, I meant something more like this: (define (string-split str [splitter ]) (regexp-split (regexp-quote splitter) str)) No regexp from the user POV, and much easier to use with little knowledge. _ Racket Developers list: http://lists.racket-lang.org/dev _ Racket Developers list: http://lists.racket-lang.org/dev
Re: [racket-dev] `string-split'
On Thu, Apr 19, 2012 at 14:53, Matthew Flatt mfl...@cs.utah.edu wrote: At Thu, 19 Apr 2012 14:43:44 +0200, Laurent wrote: On Thu, Apr 19, 2012 at 14:33, Matthew Flatt mfl...@cs.utah.edu wrote: I agree with this: we should add `string-split', the one-argument case should be as Eli wrote, About this I'm not sure, as one cannot reproduce this behavior by providing an argument (or it could make the difference between string-as-not-regexps and regexps? Wouldn't this be different from other places?). I'm suggesting that supplying `#f' as the argument would be the same as not supplying the argument. It is a special case, though. I don't mind the specialness here, because I see the job of `string-split' as making a couple of useful special cases easy (as opposed to the generality of `regexp-split'). Then instead of #f one idea is to go one step further and consider different useful cases based on input symbols like 'whitespaces, 'non-alpha, etc. ? Or even a list of string/symbols that can be used as a splitter. That would make a more powerful function for sure. (It's just that I'm troubled by the uniqueness of this magical default argument) Laurent It would then appear somewhat magical. To me the default splitter seems more intuitive. Laurent and the two-argument case should be as Laurent wrote. (Probably the optional second argument should be string-or-#f, where #f means to use #px\\s+.) At Thu, 19 Apr 2012 14:30:31 +0200, Laurent wrote: (define (string-split str [sep #px\\s+]) (remove* '() (regexp-split sep str))) Nearly, I meant something more like this: (define (string-split str [splitter ]) (regexp-split (regexp-quote splitter) str)) No regexp from the user POV, and much easier to use with little knowledge. _ Racket Developers list: http://lists.racket-lang.org/dev _ Racket Developers list: http://lists.racket-lang.org/dev
Re: [racket-dev] `string-split'
A few minutes ago, Laurent wrote: Then instead of #f one idea is to go one step further and consider different useful cases based on input symbols like 'whitespaces, 'non-alpha, etc. ? Or even a list of string/symbols that can be used as a splitter. That would make a more powerful function for sure. (It's just that I'm troubled by the uniqueness of this magical default argument) (This is something that I do object to... It leads to srfi-14 which is one overkill way for that, and we already have regexps that do that. So I think that simple is a major point.) -- ((lambda (x) (x x)) (lambda (x) (x x))) Eli Barzilay: http://barzilay.org/ Maze is Life! _ Racket Developers list: http://lists.racket-lang.org/dev
Re: [racket-dev] `string-split'
[Meta-note: I'm not just flatly object to these, just trying to clarify the exact behavior and the possible effects on other functions.] 10 minutes ago, Laurent wrote: (define (string-split str [sep #px\\s+]) (remove* '() (regexp-split sep str))) Nearly, I meant something more like this: (define (string-split str [splitter ]) (regexp-split (regexp-quote splitter) str)) No regexp from the user POV, and much easier to use with little knowledge. That doesn't seem right -- with this you get - (string-split st ring) '( st ring) which is why I think that the above is a better definition in terms of newbie-ness. 10 minutes ago, Matthew Flatt wrote: I agree with this: we should add `string-split', the one-argument case should be as Eli wrote, and the two-argument case should be as Laurent wrote. (Probably the optional second argument should be string-or-#f, where #f means to use #px\\s+.) Continuing with this line, it seems that a better definition is as follows: (define (string-split str [sep ]) (remove* '() (regexp-split (regexp-quote (or sep )) str))) Except that the full definition could be a bit more efficient. Three questions: 1. Laurent: Does this make more sense? 2. Matthew: Is there any reason to make the #f-as-default part of the interface? (Even with the new reply I don't see a necessity for this -- if the target is newbies, then I think that keeping it as a string is simpler...) 3. There's also the point of how this optional argument plays with other functions in `racket/string'. If it works as above, then `string-trim' and `string-normalize-spaces' should change accordingly so they take the same kind of input simplified regexp. 4. Related to Q3: what does xy as that argument mean exactly? a. #rx[xy] b. #rx[xy]+ c. #rxxy d. #rx(?:xy)+ -- ((lambda (x) (x x)) (lambda (x) (x x))) Eli Barzilay: http://barzilay.org/ Maze is Life! _ Racket Developers list: http://lists.racket-lang.org/dev
Re: [racket-dev] `string-split'
Continuing with this line, it seems that a better definition is as follows: (define (string-split str [sep ]) (remove* '() (regexp-split (regexp-quote (or sep )) str))) Except that the full definition could be a bit more efficient. Three questions: 1. Laurent: Does this make more sense? Yes, this definitely makes more sense to me. It would then treat (string-split aXXby X) just like the case. Although if you want to find the columns of a latex line like x y z you will have the wrong result. Maybe use an optional argument to remove the empty strings? (not sure) 2. Matthew: Is there any reason to make the #f-as-default part of the interface? (Even with the new reply I don't see a necessity for this -- if the target is newbies, then I think that keeping it as a string is simpler...) There is probably no need for #f with the new spec. 4. Related to Q3: what does xy as that argument mean exactly? a. #rx[xy] b. #rx[xy]+ c. #rxxy d. #rx(?:xy)+ Good question. d. would be the simplest case for newbies, but b. might be more useful. I think several other languages avoid this issue by using only one character as the separator. Laurent _ Racket Developers list: http://lists.racket-lang.org/dev
Re: [racket-dev] `string-split'
4. Related to Q3: what does xy as that argument mean exactly? a. #rx[xy] b. #rx[xy]+ c. #rxxy d. #rx(?:xy)+ Good question. d. would be the simplest case for newbies, but b. might be more useful. It would make more sense that a string really is a string, not a set of characters. Without going as far as srfi-14, a set could be a list of strings or characters, but maybe this is not needed. Laurent _ Racket Developers list: http://lists.racket-lang.org/dev
Re: [racket-dev] `string-split'
Just now, Laurent wrote: 1. Laurent: Does this make more sense? Yes, this definitely makes more sense to me. It would then treat (string-split aXXby X) just like the case. Although if you want to find the columns of a latex line like x y z you will have the wrong result. Maybe use an optional argument to remove the empty strings? (not sure) (This complicates things...) First, I don't think that there's a need to make it able to do stuff like that -- either you go with regexps, or you use combinations like (map string-trim (string-split x y z )) 4. Related to Q3: what does xy as that argument mean exactly? a. #rx[xy] b. #rx[xy]+ c. #rxxy d. #rx(?:xy)+ Good question. d. would be the simplest case for newbies, but b. might be more useful. I think several other languages avoid this issue by using only one character as the separator. The complication is that with or \t it seems that you'd want b, and with you'd want c. (Maybe even make equivalent to #rx * * -- that looks like it's too much guessing.) And you're also making a point for: e. Throw an error, must be a single-character string. BTW, this question is important because it affects other functions, so I'd like to resolve it before doing anything. -- ((lambda (x) (x x)) (lambda (x) (x x))) Eli Barzilay: http://barzilay.org/ Maze is Life! _ Racket Developers list: http://lists.racket-lang.org/dev
Re: [racket-dev] `string-split'
An hour and a half ago, Ryan Culpepper wrote: Instead of trying to design a 'string-split' that is both miraculously intuitive and profoundly flexible, why not design it like a Model-T Invalid analogy: the issue is not flexibility, it's making something that is simple (first) and useful (second) in most cases. An hour and a half ago, Michael W wrote: (TL;DR: I'd suggest two functions: one (string-words str) function that does Eli's way, and one (string-split str sep) that does it Laurent's way). I don't think that we argued on what it should do, rather it looks like we're both looking for whatever option looks best... - (string-split st ring) '( st ring) which is why I think that the above is a better definition in terms of newbie-ness. No, every other language I've worked with does that. [...] The examples you're quoting are the equivalents of our `regexp-split', which works in a similar way and is not going to change. We're talking about some watered-down version that is easier to use. Just now, Laurent wrote: (TL;DR: I'd suggest two functions: one (string-words str) function that does Eli's way, and one (string-split str sep) that does it Laurent's way). That would be a good option to me, considering that my way is with remaining s in the output list. The question remains if a string can be accepted for sep, in which case the empty string must be considered, as pointed out in the Lua discussion. Though a single char should be sufficient for nearly all simple cases. I think that I have a good conclusion here, I'll post on a new thread. -- ((lambda (x) (x x)) (lambda (x) (x x))) Eli Barzilay: http://barzilay.org/ Maze is Life! _ Racket Developers list: http://lists.racket-lang.org/dev