Re: Proposed Change to str-utils

David Nolen Mon, 23 Mar 2009 18:27:18 -0700

Looks interesting and maybe even very useful. Why not put your code on
Github or some other public repo of your liking. It's much nicer than
pasting all this code ;)


On Mon, Mar 23, 2009 at 9:18 PM, Sean <francoisdev...@gmail.com> wrote:

>
> Hello Everyone,
> I've been reviewing the str-utils package, and I'd like to propose a
> few changes to the library.  I've included the code at the bottom.
>
> USE MULTI-METHODS
>
> I'd like to propose re-writing the following methods to used multi-
> methods.  Every single method will take an input called input-string.
>
> *re-split[input-string & remaining-inputs](...)*
>
> The remaining inputs can be dispatched based on a regex pattern, a
> list of patterns, or a map.
>
> regex pattern - splits a string into a list, like it does now.
> e.g. (re-split "1 2 3\n4 5 6" #"\n") => ("1 2 3" "4 5 6")
>
> list - this splits each element either like a map or a regex.  The map
> operator is applied recursively to each element
> e.g. (re-split "1 2 3\n4 5 6" (list #"\n" #"\s+")) => (("1" "2" "3")
> ("4" "5" "6"))
>
> map - this splits each element based on the inputs of the map.  It is
> how options are passed to the method.
> e.g (re-split "1 2 3" {:pattern #"\s+" :limit 2 :marshal-fn #
> (java.lang.Double/parseDouble %)}) => (1.0 2.0)
> The :pattern and :limit options are relatively straightforward.
> The :marshal-fn is mapped after the string is split.
>
> These items can be chained together, as the following example shows
> e.g. (re-split "1 2 3\n4 5 6" (list #"\n" {:pattern #"\s+" :limit
> 2 :marshal-fn #(java.lang.Double/parseDouble %)})) => ((1.0 2.0) (4.0
> 5.0))
>
> In my opinion, the :marshal-fn is best used at the end of the list.
> However, it could be used earlier in the list, but a exception will
> most likely be thrown.
>
>
> *re-partion[input-string & remaining-inputs]
>
> This methods behaves like the original re-partition method, with the
> remaining-inputs being able to a list or a pattern.  I don't see a
> need to change the behavior of this method at the moment.
>
> *re-gsub[input-string & remaining-inputs]
>
> This method can take a list or two atoms as the remaining inputs.
>
> Two atoms -
> e.g. (re-gsub "1 2 3 4 5 6" #"\s" "") => "123456"
>
> A paired list
> e.g (re-gsub "1 2 3 4 5 6" '((#"\s" " ) (#"\d" "D"))) => "DDDDDD"
>
> *re-sub[input-string & remaining-inputs]
>
> Again, this method can take a list or two atoms as the remaining
> inputs.
>
> Two atoms
> e.g. (re-sub "1 2 3 4 5 6" #"\d" "D") => "D 2 3 4 5 6"
>
> A paired list
> e.g (re-sub "1 2 3 4 5 6" '((#"\d" "D") (#"\d" "E"))) => "D E 3 4 5 6"
>
> NEW PARSING HELPERS
> I've created four methods, str-before, str-before-inc, str-after, str-
> after-inc.  They are designed to help strip off parts of string before
> a regex.
>
> (str-before "Clojure Is Awesome" #"\s") => "Clojure"
> (str-before-inc "Clojure Is Awesome" #"\s") => "Clojure "
> (str-after "Clojure Is Awesome" #"\s") => "Is Awesome"
> (str-after-inc "Clojure Is Awesome" #"\s") => " Is Awesome"
>
> These methods can be used to help parse strings
>
> (str-before (str-after "<h4 ... >" #"<h4") ">") => ;the stuff in the
> middle
>
> NEW INFLECTORS
> I've added a few inflectors that I am familiar with from Rails.  My
> apologies if their origin is anther language.  I'd be interested in
> knowing where the method originated
>
> str-reverse
> This methods reverses a string
> e.g. (str-reverse "Clojure") => "erujolC"
>
> trim
> This is a convenience wrapper for the trim method java supplies
> e.g. (trim "  Clojure  ") => "Clojure"
>
> strip
> This is an alias for trim.  I accidently switch between *trim* and
> *strip* all the time.
> e.g. (strip "  Clojure  ") => "Clojure"
>
> ltrim
> This method removes the leading whitespace
> e.g. (ltrim "  Cloure  ") => "Clojure  "
>
> rtrim
> This method removes the trailing whitespace
> e.g. (ltrim "  Cloure  ") => "  Clojure"
>
> downcase
> This is a convenience wrapper for the toLowerCase method java supplies
> e.g. (downcase "Clojure") => "clojure"
>
> upcase
> This is a convenience wrapper for the toUpperCase method java supplies
> e.g. (upcase "Clojure") => "CLOJURE"
>
> capitalize
> This method capitalizes a string
> e.g (capitalize "clojure") => "Clojure"
>
> titleize, camelize, dasherize, underscore
> These methods manipulate "sentences", producing a consistent output.
> Check the unit tests for more examples
> (titleize "clojure iS Awesome") => "Clojure Is Awesome"
> (camleize "clojure iS Awesome") => "clojureIsAwesome"
> (dasherize "clojure iS Awesome") => "clojure-is-awesome"
> (underscore "clojure iS Awesome") => "clojure_is_awesome"
>
> *FINAL THOUGHTS*
> There are three more methods, str-join, chop, and chomp that were
> already in str-utils.  I change the implementation of the methods, but
> the behavior should be the same.
>
> There is a big catch with my proposed change.  The signature of re-
> split, re-partition, re-gsub and re-sub changes.  They will not be
> backwards compatible, and will break code.  However, I think the
> flexibility is worth it.
>
> *TO-DOs*
> There are a few more things I'd like to add, but that could done at a
> later date.
>
> *Add more inflectors
>
> The following additions become pretty easy if the propsed re-gsub is
> included:
>
> *Add HTML-escape function (like Rails' h method)
> *Add Javascript-escape function (like Rails' javascript-escape method)
> *Add SQL-escape function
>
> Okay, that's everything I can think of for now.  I'd like to thank the
> Stuart Sierra, and all of the contributors to this library.  This is
> possible because I'm standing on their shoulders.
>
> Oh, and I apologize for not putting this up on github, especially
> after I asked someone else to do the same yesterday.  I'll try not to
> be so hypocritical going forward.
>
> *CODE*
>
> (ns devlinsf.str-utils)
>
> ;;; String Merging & Slicing
>
> (defn str-join
>  "Returns a string of all elements in 'sequence', separated by
>  'separator'.  Like Perl's 'join'."
>  [separator sequence]
>  (apply str (interpose separator sequence)))
>
>
> (defmulti re-split (fn[input-string & remaining-inputs] (class (first
> remaining-inputs))))
>
> (defmethod re-split java.util.regex.Pattern
>  ([string #^java.util.regex.Pattern pattern] (seq (. pattern (split
> string)))))
>
> (defmethod re-split clojure.lang.PersistentList
>  [input-string patterns]
>  (let [reversed (reverse patterns)
>        pattern (first reversed)
>        remaining (rest reversed)]
>    (if (empty? remaining)
>      (re-split input-string pattern)
>      (map #(re-split % pattern) (re-split input-string (reverse
> remaining))))))
>
> (defmethod re-split clojure.lang.PersistentArrayMap
>  [input-string map-options]
>  (cond (:limit map-options) (take (:limit map-options) (re-split
> input-string (dissoc map-options :limit)))
>        (:marshal-fn map-options) (map (:marshal-fn map-options) (re-split
> input-string (dissoc map-options :marshal-fn)))
>        'true (re-split input-string (:pattern map-options))))
>
> (defmulti re-partition (fn[input-string & remaining-inputs] (class
> (first remaining-inputs))))
>
> (defmethod re-partition java.util.regex.Pattern
>  [string #^java.util.regex.Pattern re]
>  (let [m (re-matcher re string)]
>    ((fn step [prevend]
>       (lazy-seq
>        (if (.find m)
>          (cons (.subSequence string prevend (.start m))
>                (cons (re-groups m)
>                      (step (+ (.start m) (count (.group m))))))
>          (when (< prevend (.length string))
>            (list (.subSequence string prevend (.length string)))))))
>     0)))
>
> (defmethod re-partition clojure.lang.PersistentList
>  [input-string patterns]
>  (let [reversed (reverse patterns)
>        pattern (first reversed)
>        remaining (rest reversed)]
>    (if (empty? remaining)
>      (re-partition input-string pattern)
>      (map #(re-partition % pattern) (re-partition input-string
> (reverse remaining))))))
>
> (defmulti re-gsub (fn[input-string & remaining-inputs] (class (first
> remaining-inputs))))
>
> (defmethod re-gsub java.util.regex.Pattern
>  [#^String string #^java.util.regex.Pattern regex replacement]
>  (if (ifn? replacement)
>    (let [parts (vec (re-partition regex string))]
>      (apply str
>             (reduce (fn [parts match-idx]
>                       (update-in parts [match-idx] replacement))
>                     parts (range 1 (count parts) 2))))
>    (.. regex (matcher string) (replaceAll replacement))))
>
> (defmethod re-gsub clojure.lang.PersistentList
>  [input-string regex-pattern-pairs]
>  (let [reversed (reverse regex-pattern-pairs)
>        pair (first reversed)
>        remaining (rest reversed)]
>    (if (empty? remaining)
>      (re-gsub input-string (first pair) (second pair))
>      (re-gsub (re-gsub input-string (reverse remaining)) (first pair)
> (second pair)))))
>
>
> (defmulti re-sub (fn[input-string & remaining-inputs] (class (first
> remaining-inputs))))
>
> (defmethod re-sub java.util.regex.Pattern
>  [#^String string #^java.util.regex.Pattern regex replacement ]
>  (if (ifn? replacement)
>    (let [m (re-matcher regex string)]
>      (if (.find m)
>        (str (.subSequence string 0 (.start m))
>             (replacement (re-groups m))
>             (.subSequence string (.end m) (.length string)))
>        string))
>    (.. regex (matcher string) (replaceFirst replacement))))
>
> (defmethod re-sub clojure.lang.PersistentList
>  [input-string regex-pattern-pairs]
>  (let [reversed (reverse regex-pattern-pairs)
>        pair (first reversed)
>        remaining (rest reversed)]
>    (if (empty? remaining)
>      (re-sub input-string (first pair) (second pair))
>      (re-sub (re-sub input-string (reverse remaining)) (first pair)
> (second pair)))))
>
> ;;; Parsing Helpers
> (defn str-before [input-string regex]
>  (let [matches (re-partition input-string regex)]
>    (first matches)))
>
> (defn str-before-inc [input-string regex]
>  (let [matches (re-partition input-string regex)]
>    (str (first matches) (second matches))))
>
> (defn str-after [input-string regex]
>  (let [matches (re-partition input-string regex)]
>    (str-join "" (rest (rest matches)))))
>
> (defn str-after-inc [input-string regex]
>  (let [matches (re-partition input-string regex)]
>    (str-join "" (rest matches))))
>
>
> ;;; Inflectors
> ;;; These methods only take the input string.
> (defn str-reverse
>  "This method excepts a string and returns the reversed string as a
> results"
>  [input-string]
>  (apply str (reverse input-string)))
>
>
> (defn upcase
>  "Converts the entire string to upper case"
>  [input-string]
>  (. input-string toUpperCase))
>
> (defn downcase [input-string]
>  "Converts the entire string to lower case"
>  (. input-string toLowerCase))
>
> (defn trim[input-string]
>  "Shortcut for String.trim"
>  (. input-string trim))
>
> (defn strip
>  "Alias for trim, like Ruby."
>  [input-string]
>  (trim input-string))
>
> (defn ltrim
>  "This method chops all of the leading whitespace."
>  [input-string]
>  (str-after input-string #"\s+"))
>
> (defn rtrim
>  "This method chops all of the trailing whitespace."
>  [input-string]
>  (str-reverse (str-after (str-reverse input-string) #"\s+")))
>
> (defn chop
>  "Removes the last character of string."
>  [input-string]
>  (subs input-string 0 (dec (count input-string))))
>
> (defn chomp
>  "Removes all trailing newline \\n or return \\r characters from
>  string.  Note: String.trim() is similar and faster."
>  [input-string]
>  (str-before input-string #"[\r\n]+"))
>
> (defn capitalize
>  "This method turns a string into a capitalized version, Xxxx"
>  [input-string]
>  (str-join "" (list
>                (upcase (str (first input-string)))
>                (downcase (apply str (rest input-string))))))
>
> (defn titleize
>  "This method takes an input string, splits it across whitespace,
> dashes, and underscores.  Each word is capitalized, and the result is
> joined with \" \"."
>  [input-string]
>  (let [words (re-split input-string #"[\s_-]+")]
>    (str-join " " (map capitalize words))))
>
> (defn camelize
>  "This method takes an input string, splits it across whitespace,
> dashes, and underscores.  The first word is captialized, and the rest
> are downcased, and the result is joined with \"\"."
>  [input-string]
>  (let [words (re-split input-string #"[\s_-]+")]
>    (str-join "" (cons (downcase (first words)) (map capitalize (rest
> words))))))
>
> (defn dasherize
>  "This method takes an input string, splits it across whitespace,
> dashes, and underscores.  Each word is downcased, and the result is
> joined with \"-\"."
>  [input-string]
>  (let [words (re-split input-string #"[\s_-]+")]
>    (str-join "-" (map downcase words))))
>
> (defn underscore
>  "This method takes an input string, splits it across whitespace,
> dashes, and underscores.  Each word is downcased, and the result is
> joined with \"_\"."
>  [input-string]
>  (let [words (re-split input-string #"[\s_-]+")]
>    (str-join "_" (map downcase words))))
>
> ;;; Escapees
>
> ;TO-DO
>
> ;(defn sql-escape[x])
> ;(defn html-escape[x])
> ;(defn javascript-escape[x])
> ;(defn pdf-escape)
>
>
> *UNIT TESTS*
> (ns devlinsf.test-contrib.str-utils
>    (:use clojure.contrib.test-is
>          devlinsf.str-utils))
>
> (deftest test-str-reverse
>  (is (= (str-reverse "Clojure") "erujolC")))
>
> (deftest test-downcase
>  (is (= (downcase "Clojure") "clojure")))
>
> (deftest test-upcase
>  (is (= (upcase "Clojure") "CLOJURE")))
>
> (deftest test-trim
>  (is (= (trim "  Clojure  ") "Clojure")))
>
> (deftest test-strip
>  (is (= (strip "  Clojure  ") "Clojure")))
>
> (deftest test-ltrim
>  (is (= (ltrim "  Clojure  ") "Clojure  ")))
>
> (deftest test-rtrim
>  (is (= (rtrim "  Clojure  ") "  Clojure")))
>
> (deftest test-chop
>  (is (= (chop "Clojure") "Clojur")))
>
> (deftest test-chomp
>  (is (= (chomp "Clojure \n") "Clojure "))
>  (is (= (chomp "Clojure \r") "Clojure "))
>  (is (= (chomp "Clojure \n\r") "Clojure ")))
>
> (deftest test-capitalize
>  (is (= (capitalize "clojure") "Clojure")))
>
> (deftest test-titleize
>  (let [expected-string "Clojure Is Awesome"]
>    (is (= (titleize "clojure is awesome") expected-string))
>    (is (= (titleize "clojure   is  awesome") expected-string))
>    (is (= (titleize "CLOJURE IS AWESOME") expected-string))
>    (is (= (titleize "clojure-is-awesome") expected-string))
>    (is (= (titleize "clojure- _ is---awesome") expected-string))
>    (is (= (titleize "clojure_is_awesome") expected-string))))
>
> (deftest test-camelize
>  (let [expected-string "clojureIsAwesome"]
>    (is (= (camelize "clojure is awesome") expected-string))
>    (is (= (camelize "clojure   is  awesome") expected-string))
>    (is (= (camelize "CLOJURE IS AWESOME") expected-string))
>    (is (= (camelize "clojure-is-awesome") expected-string))
>    (is (= (camelize "clojure- _ is---awesome") expected-string))
>    (is (= (camelize "clojure_is_awesome") expected-string))))
>
> (deftest test-underscore
>  (let [expected-string "clojure_is_awesome"]
>    (is (= (underscore "clojure is awesome") expected-string))
>    (is (= (underscore "clojure   is  awesome") expected-string))
>    (is (= (underscore "CLOJURE IS AWESOME") expected-string))
>    (is (= (underscore "clojure-is-awesome") expected-string))
>    (is (= (underscore "clojure- _ is---awesome") expected-string))
>    (is (= (underscore "clojure_is_awesome") expected-string))))
>
> (deftest test-dasherize
>  (let [expected-string "clojure-is-awesome"]
>    (is (= (dasherize "clojure is awesome") expected-string))
>    (is (= (dasherize "clojure   is  awesome") expected-string))
>    (is (= (dasherize "CLOJURE IS AWESOME") expected-string))
>    (is (= (dasherize "clojure-is-awesome") expected-string))
>    (is (= (dasherize "clojure- _ is---awesome") expected-string))
>    (is (= (dasherize "clojure_is_awesome") expected-string))))
>
> (deftest test-str-before
>  (is (= (str-before "Clojure Is Awesome" #"Is") "Clojure ")))
>
> (deftest test-str-before-inc
>  (is (= (str-before-inc "Clojure Is Awesome" #"Is") "Clojure Is")))
>
> (deftest test-str-after
>  (is (= (str-after "Clojure Is Awesome" #"Is") " Awesome")))
>
> (deftest test-str-after-inc
>  (is (= (str-after-inc "Clojure Is Awesome" #"Is") "Is Awesome")))
>
> (deftest test-str-join
>  (is (= (str-join " " '("A" "B")) "A B")))
>
> (deftest test-re-split-single-regex
>  (let [source-string "1\t2\t3\n4\t5\t6"]
>    (is (= (re-split source-string #"\n") '("1\t2\t3" "4\t5\t6")))))
>
> (deftest test-re-split-single-map
>  (let [source-string "1\t2\t3\n4\t5\t6"]
>    (is (= (re-split source-string {:pattern #"\n"}) '("1\t2\t3"
> "4\t5\t6")))
>    (is (= (re-split source-string {:pattern #"\n" :limit 1})
> '("1\t2\t3")))
>    (is (= (re-split source-string {:pattern #"\n" :marshal-fn #(str %
> "\ta")}) '("1\t2\t3\ta" "4\t5\t6\ta")))
>    (is (= (re-split source-string {:pattern #"\n" :limit 1 :marshal-
> fn #(str % "\ta")}) '("1\t2\t3\ta")))
>    ))
>
> (deftest test-re-split-single-element-list
>  (let [source-string "1\t2\t3\n4\t5\t6"]
>    (is (= (re-split source-string (list #"\n")) '("1\t2\t3"
> "4\t5\t6")))))
>
> (deftest test-re-split-pure-list
>  (let [source-string "1\t2\t3\n4\t5\t6"]
>    (is (= (re-split source-string (list #"\n" #"\t")) '(("1" "2" "3")
> ("4" "5" "6"))))))
>
> (deftest test-re-split-mixed-list
>  (let [source-string "1\t2\t3\n4\t5\t6"]
>    (is (= (re-split source-string (list {:pattern #"\n" :limit 1}
> #"\t")) '(("1" "2" "3"))))
>    (is (= (re-split source-string (list {:pattern #"\n" :limit 1}
> {:pattern #"\t" :limit 2})) '(("1" "2"))))
>    (is (= (re-split source-string (list
>                                    {:pattern #"\n" :limit 1}
>                                    {:pattern #"\t" :limit 2 :marshal-fn
> #(java.lang.Double/
> parseDouble %)}))
>           '((1.0 2.0))))
>    (is (= (re-split source-string (list
>                                    {:pattern #"\n"}
>                                    {:pattern #"\t" :marshal-fn
> #(java.lang.Double/parseDouble
> %)}))
>           '((1.0 2.0 3.0) (4.0 5.0 6.0))))
>    (is (= (map #(reduce + %) (re-split source-string (list
>                                                       {:pattern #"\n"}
>                                                       {:pattern #"\t"
> :marshal-fn #(java.lang.Double/
> parseDouble %)})))
>           '(6.0 15.0)))
>    (is (= (reduce +(map #(reduce + %) (re-split source-string (list
>                                                                {:pattern
> #"\n"}
>                                                                {:pattern
> #"\t" :marshal-fn #(java.lang.Double/parseDouble
> %)}))))
>           '21.0))
>    ))
>
> (deftest test-re-partition
>  (is (= (re-partition "Clojure Is Awesome" #"\s+") '("Clojure" " "
> "Is" " " "Awesome"))))
>
> (deftest test-re-gsub
>  (let [source-string "1\t2\t3\n4\t5\t6"]
>    (is (= (re-gsub source-string #"\s+" " ") "1 2 3 4 5 6"))
>    (is (= (re-gsub source-string '((#"\s+" " "))) "1 2 3 4 5 6"))
>    (is (= (re-gsub source-string '((#"\s+" " ") (#"\d" "D"))) "D D D
> D D D"))))
>
> (deftest test-re-sub
>  (let [source-string "1 2 3 4 5 6"]
>    (is (= (re-sub source-string #"\d" "D") "D 2 3 4 5 6"))
>    (is (= (re-sub source-string '((#"\d" "D") (#"\d" "E"))) "D E 3 4
> 5 6"))))
>
> >
>

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To post to this group, send email to clojure@googlegroups.com
To unsubscribe from this group, send email to 
clojure+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/clojure?hl=en
-~----------~----~----~----~------~----~------~--~---

Re: Proposed Change to str-utils

Reply via email to