Looks interesting and maybe even very useful. Why not put your code on Github or some other public repo of your liking. It's much nicer than pasting all this code ;)
On Mon, Mar 23, 2009 at 9:18 PM, Sean <francoisdev...@gmail.com> wrote: > > Hello Everyone, > I've been reviewing the str-utils package, and I'd like to propose a > few changes to the library. I've included the code at the bottom. > > USE MULTI-METHODS > > I'd like to propose re-writing the following methods to used multi- > methods. Every single method will take an input called input-string. > > *re-split[input-string & remaining-inputs](...)* > > The remaining inputs can be dispatched based on a regex pattern, a > list of patterns, or a map. > > regex pattern - splits a string into a list, like it does now. > e.g. (re-split "1 2 3\n4 5 6" #"\n") => ("1 2 3" "4 5 6") > > list - this splits each element either like a map or a regex. The map > operator is applied recursively to each element > e.g. (re-split "1 2 3\n4 5 6" (list #"\n" #"\s+")) => (("1" "2" "3") > ("4" "5" "6")) > > map - this splits each element based on the inputs of the map. It is > how options are passed to the method. > e.g (re-split "1 2 3" {:pattern #"\s+" :limit 2 :marshal-fn # > (java.lang.Double/parseDouble %)}) => (1.0 2.0) > The :pattern and :limit options are relatively straightforward. > The :marshal-fn is mapped after the string is split. > > These items can be chained together, as the following example shows > e.g. (re-split "1 2 3\n4 5 6" (list #"\n" {:pattern #"\s+" :limit > 2 :marshal-fn #(java.lang.Double/parseDouble %)})) => ((1.0 2.0) (4.0 > 5.0)) > > In my opinion, the :marshal-fn is best used at the end of the list. > However, it could be used earlier in the list, but a exception will > most likely be thrown. > > > *re-partion[input-string & remaining-inputs] > > This methods behaves like the original re-partition method, with the > remaining-inputs being able to a list or a pattern. I don't see a > need to change the behavior of this method at the moment. > > *re-gsub[input-string & remaining-inputs] > > This method can take a list or two atoms as the remaining inputs. > > Two atoms - > e.g. (re-gsub "1 2 3 4 5 6" #"\s" "") => "123456" > > A paired list > e.g (re-gsub "1 2 3 4 5 6" '((#"\s" " ) (#"\d" "D"))) => "DDDDDD" > > *re-sub[input-string & remaining-inputs] > > Again, this method can take a list or two atoms as the remaining > inputs. > > Two atoms > e.g. (re-sub "1 2 3 4 5 6" #"\d" "D") => "D 2 3 4 5 6" > > A paired list > e.g (re-sub "1 2 3 4 5 6" '((#"\d" "D") (#"\d" "E"))) => "D E 3 4 5 6" > > NEW PARSING HELPERS > I've created four methods, str-before, str-before-inc, str-after, str- > after-inc. They are designed to help strip off parts of string before > a regex. > > (str-before "Clojure Is Awesome" #"\s") => "Clojure" > (str-before-inc "Clojure Is Awesome" #"\s") => "Clojure " > (str-after "Clojure Is Awesome" #"\s") => "Is Awesome" > (str-after-inc "Clojure Is Awesome" #"\s") => " Is Awesome" > > These methods can be used to help parse strings > > (str-before (str-after "<h4 ... >" #"<h4") ">") => ;the stuff in the > middle > > NEW INFLECTORS > I've added a few inflectors that I am familiar with from Rails. My > apologies if their origin is anther language. I'd be interested in > knowing where the method originated > > str-reverse > This methods reverses a string > e.g. (str-reverse "Clojure") => "erujolC" > > trim > This is a convenience wrapper for the trim method java supplies > e.g. (trim " Clojure ") => "Clojure" > > strip > This is an alias for trim. I accidently switch between *trim* and > *strip* all the time. > e.g. (strip " Clojure ") => "Clojure" > > ltrim > This method removes the leading whitespace > e.g. (ltrim " Cloure ") => "Clojure " > > rtrim > This method removes the trailing whitespace > e.g. (ltrim " Cloure ") => " Clojure" > > downcase > This is a convenience wrapper for the toLowerCase method java supplies > e.g. (downcase "Clojure") => "clojure" > > upcase > This is a convenience wrapper for the toUpperCase method java supplies > e.g. (upcase "Clojure") => "CLOJURE" > > capitalize > This method capitalizes a string > e.g (capitalize "clojure") => "Clojure" > > titleize, camelize, dasherize, underscore > These methods manipulate "sentences", producing a consistent output. > Check the unit tests for more examples > (titleize "clojure iS Awesome") => "Clojure Is Awesome" > (camleize "clojure iS Awesome") => "clojureIsAwesome" > (dasherize "clojure iS Awesome") => "clojure-is-awesome" > (underscore "clojure iS Awesome") => "clojure_is_awesome" > > *FINAL THOUGHTS* > There are three more methods, str-join, chop, and chomp that were > already in str-utils. I change the implementation of the methods, but > the behavior should be the same. > > There is a big catch with my proposed change. The signature of re- > split, re-partition, re-gsub and re-sub changes. They will not be > backwards compatible, and will break code. However, I think the > flexibility is worth it. > > *TO-DOs* > There are a few more things I'd like to add, but that could done at a > later date. > > *Add more inflectors > > The following additions become pretty easy if the propsed re-gsub is > included: > > *Add HTML-escape function (like Rails' h method) > *Add Javascript-escape function (like Rails' javascript-escape method) > *Add SQL-escape function > > Okay, that's everything I can think of for now. I'd like to thank the > Stuart Sierra, and all of the contributors to this library. This is > possible because I'm standing on their shoulders. > > Oh, and I apologize for not putting this up on github, especially > after I asked someone else to do the same yesterday. I'll try not to > be so hypocritical going forward. > > *CODE* > > (ns devlinsf.str-utils) > > ;;; String Merging & Slicing > > (defn str-join > "Returns a string of all elements in 'sequence', separated by > 'separator'. Like Perl's 'join'." > [separator sequence] > (apply str (interpose separator sequence))) > > > (defmulti re-split (fn[input-string & remaining-inputs] (class (first > remaining-inputs)))) > > (defmethod re-split java.util.regex.Pattern > ([string #^java.util.regex.Pattern pattern] (seq (. pattern (split > string))))) > > (defmethod re-split clojure.lang.PersistentList > [input-string patterns] > (let [reversed (reverse patterns) > pattern (first reversed) > remaining (rest reversed)] > (if (empty? remaining) > (re-split input-string pattern) > (map #(re-split % pattern) (re-split input-string (reverse > remaining)))))) > > (defmethod re-split clojure.lang.PersistentArrayMap > [input-string map-options] > (cond (:limit map-options) (take (:limit map-options) (re-split > input-string (dissoc map-options :limit))) > (:marshal-fn map-options) (map (:marshal-fn map-options) (re-split > input-string (dissoc map-options :marshal-fn))) > 'true (re-split input-string (:pattern map-options)))) > > (defmulti re-partition (fn[input-string & remaining-inputs] (class > (first remaining-inputs)))) > > (defmethod re-partition java.util.regex.Pattern > [string #^java.util.regex.Pattern re] > (let [m (re-matcher re string)] > ((fn step [prevend] > (lazy-seq > (if (.find m) > (cons (.subSequence string prevend (.start m)) > (cons (re-groups m) > (step (+ (.start m) (count (.group m)))))) > (when (< prevend (.length string)) > (list (.subSequence string prevend (.length string))))))) > 0))) > > (defmethod re-partition clojure.lang.PersistentList > [input-string patterns] > (let [reversed (reverse patterns) > pattern (first reversed) > remaining (rest reversed)] > (if (empty? remaining) > (re-partition input-string pattern) > (map #(re-partition % pattern) (re-partition input-string > (reverse remaining)))))) > > (defmulti re-gsub (fn[input-string & remaining-inputs] (class (first > remaining-inputs)))) > > (defmethod re-gsub java.util.regex.Pattern > [#^String string #^java.util.regex.Pattern regex replacement] > (if (ifn? replacement) > (let [parts (vec (re-partition regex string))] > (apply str > (reduce (fn [parts match-idx] > (update-in parts [match-idx] replacement)) > parts (range 1 (count parts) 2)))) > (.. regex (matcher string) (replaceAll replacement)))) > > (defmethod re-gsub clojure.lang.PersistentList > [input-string regex-pattern-pairs] > (let [reversed (reverse regex-pattern-pairs) > pair (first reversed) > remaining (rest reversed)] > (if (empty? remaining) > (re-gsub input-string (first pair) (second pair)) > (re-gsub (re-gsub input-string (reverse remaining)) (first pair) > (second pair))))) > > > (defmulti re-sub (fn[input-string & remaining-inputs] (class (first > remaining-inputs)))) > > (defmethod re-sub java.util.regex.Pattern > [#^String string #^java.util.regex.Pattern regex replacement ] > (if (ifn? replacement) > (let [m (re-matcher regex string)] > (if (.find m) > (str (.subSequence string 0 (.start m)) > (replacement (re-groups m)) > (.subSequence string (.end m) (.length string))) > string)) > (.. regex (matcher string) (replaceFirst replacement)))) > > (defmethod re-sub clojure.lang.PersistentList > [input-string regex-pattern-pairs] > (let [reversed (reverse regex-pattern-pairs) > pair (first reversed) > remaining (rest reversed)] > (if (empty? remaining) > (re-sub input-string (first pair) (second pair)) > (re-sub (re-sub input-string (reverse remaining)) (first pair) > (second pair))))) > > ;;; Parsing Helpers > (defn str-before [input-string regex] > (let [matches (re-partition input-string regex)] > (first matches))) > > (defn str-before-inc [input-string regex] > (let [matches (re-partition input-string regex)] > (str (first matches) (second matches)))) > > (defn str-after [input-string regex] > (let [matches (re-partition input-string regex)] > (str-join "" (rest (rest matches))))) > > (defn str-after-inc [input-string regex] > (let [matches (re-partition input-string regex)] > (str-join "" (rest matches)))) > > > ;;; Inflectors > ;;; These methods only take the input string. > (defn str-reverse > "This method excepts a string and returns the reversed string as a > results" > [input-string] > (apply str (reverse input-string))) > > > (defn upcase > "Converts the entire string to upper case" > [input-string] > (. input-string toUpperCase)) > > (defn downcase [input-string] > "Converts the entire string to lower case" > (. input-string toLowerCase)) > > (defn trim[input-string] > "Shortcut for String.trim" > (. input-string trim)) > > (defn strip > "Alias for trim, like Ruby." > [input-string] > (trim input-string)) > > (defn ltrim > "This method chops all of the leading whitespace." > [input-string] > (str-after input-string #"\s+")) > > (defn rtrim > "This method chops all of the trailing whitespace." > [input-string] > (str-reverse (str-after (str-reverse input-string) #"\s+"))) > > (defn chop > "Removes the last character of string." > [input-string] > (subs input-string 0 (dec (count input-string)))) > > (defn chomp > "Removes all trailing newline \\n or return \\r characters from > string. Note: String.trim() is similar and faster." > [input-string] > (str-before input-string #"[\r\n]+")) > > (defn capitalize > "This method turns a string into a capitalized version, Xxxx" > [input-string] > (str-join "" (list > (upcase (str (first input-string))) > (downcase (apply str (rest input-string)))))) > > (defn titleize > "This method takes an input string, splits it across whitespace, > dashes, and underscores. Each word is capitalized, and the result is > joined with \" \"." > [input-string] > (let [words (re-split input-string #"[\s_-]+")] > (str-join " " (map capitalize words)))) > > (defn camelize > "This method takes an input string, splits it across whitespace, > dashes, and underscores. The first word is captialized, and the rest > are downcased, and the result is joined with \"\"." > [input-string] > (let [words (re-split input-string #"[\s_-]+")] > (str-join "" (cons (downcase (first words)) (map capitalize (rest > words)))))) > > (defn dasherize > "This method takes an input string, splits it across whitespace, > dashes, and underscores. Each word is downcased, and the result is > joined with \"-\"." > [input-string] > (let [words (re-split input-string #"[\s_-]+")] > (str-join "-" (map downcase words)))) > > (defn underscore > "This method takes an input string, splits it across whitespace, > dashes, and underscores. Each word is downcased, and the result is > joined with \"_\"." > [input-string] > (let [words (re-split input-string #"[\s_-]+")] > (str-join "_" (map downcase words)))) > > ;;; Escapees > > ;TO-DO > > ;(defn sql-escape[x]) > ;(defn html-escape[x]) > ;(defn javascript-escape[x]) > ;(defn pdf-escape) > > > *UNIT TESTS* > (ns devlinsf.test-contrib.str-utils > (:use clojure.contrib.test-is > devlinsf.str-utils)) > > (deftest test-str-reverse > (is (= (str-reverse "Clojure") "erujolC"))) > > (deftest test-downcase > (is (= (downcase "Clojure") "clojure"))) > > (deftest test-upcase > (is (= (upcase "Clojure") "CLOJURE"))) > > (deftest test-trim > (is (= (trim " Clojure ") "Clojure"))) > > (deftest test-strip > (is (= (strip " Clojure ") "Clojure"))) > > (deftest test-ltrim > (is (= (ltrim " Clojure ") "Clojure "))) > > (deftest test-rtrim > (is (= (rtrim " Clojure ") " Clojure"))) > > (deftest test-chop > (is (= (chop "Clojure") "Clojur"))) > > (deftest test-chomp > (is (= (chomp "Clojure \n") "Clojure ")) > (is (= (chomp "Clojure \r") "Clojure ")) > (is (= (chomp "Clojure \n\r") "Clojure "))) > > (deftest test-capitalize > (is (= (capitalize "clojure") "Clojure"))) > > (deftest test-titleize > (let [expected-string "Clojure Is Awesome"] > (is (= (titleize "clojure is awesome") expected-string)) > (is (= (titleize "clojure is awesome") expected-string)) > (is (= (titleize "CLOJURE IS AWESOME") expected-string)) > (is (= (titleize "clojure-is-awesome") expected-string)) > (is (= (titleize "clojure- _ is---awesome") expected-string)) > (is (= (titleize "clojure_is_awesome") expected-string)))) > > (deftest test-camelize > (let [expected-string "clojureIsAwesome"] > (is (= (camelize "clojure is awesome") expected-string)) > (is (= (camelize "clojure is awesome") expected-string)) > (is (= (camelize "CLOJURE IS AWESOME") expected-string)) > (is (= (camelize "clojure-is-awesome") expected-string)) > (is (= (camelize "clojure- _ is---awesome") expected-string)) > (is (= (camelize "clojure_is_awesome") expected-string)))) > > (deftest test-underscore > (let [expected-string "clojure_is_awesome"] > (is (= (underscore "clojure is awesome") expected-string)) > (is (= (underscore "clojure is awesome") expected-string)) > (is (= (underscore "CLOJURE IS AWESOME") expected-string)) > (is (= (underscore "clojure-is-awesome") expected-string)) > (is (= (underscore "clojure- _ is---awesome") expected-string)) > (is (= (underscore "clojure_is_awesome") expected-string)))) > > (deftest test-dasherize > (let [expected-string "clojure-is-awesome"] > (is (= (dasherize "clojure is awesome") expected-string)) > (is (= (dasherize "clojure is awesome") expected-string)) > (is (= (dasherize "CLOJURE IS AWESOME") expected-string)) > (is (= (dasherize "clojure-is-awesome") expected-string)) > (is (= (dasherize "clojure- _ is---awesome") expected-string)) > (is (= (dasherize "clojure_is_awesome") expected-string)))) > > (deftest test-str-before > (is (= (str-before "Clojure Is Awesome" #"Is") "Clojure "))) > > (deftest test-str-before-inc > (is (= (str-before-inc "Clojure Is Awesome" #"Is") "Clojure Is"))) > > (deftest test-str-after > (is (= (str-after "Clojure Is Awesome" #"Is") " Awesome"))) > > (deftest test-str-after-inc > (is (= (str-after-inc "Clojure Is Awesome" #"Is") "Is Awesome"))) > > (deftest test-str-join > (is (= (str-join " " '("A" "B")) "A B"))) > > (deftest test-re-split-single-regex > (let [source-string "1\t2\t3\n4\t5\t6"] > (is (= (re-split source-string #"\n") '("1\t2\t3" "4\t5\t6"))))) > > (deftest test-re-split-single-map > (let [source-string "1\t2\t3\n4\t5\t6"] > (is (= (re-split source-string {:pattern #"\n"}) '("1\t2\t3" > "4\t5\t6"))) > (is (= (re-split source-string {:pattern #"\n" :limit 1}) > '("1\t2\t3"))) > (is (= (re-split source-string {:pattern #"\n" :marshal-fn #(str % > "\ta")}) '("1\t2\t3\ta" "4\t5\t6\ta"))) > (is (= (re-split source-string {:pattern #"\n" :limit 1 :marshal- > fn #(str % "\ta")}) '("1\t2\t3\ta"))) > )) > > (deftest test-re-split-single-element-list > (let [source-string "1\t2\t3\n4\t5\t6"] > (is (= (re-split source-string (list #"\n")) '("1\t2\t3" > "4\t5\t6"))))) > > (deftest test-re-split-pure-list > (let [source-string "1\t2\t3\n4\t5\t6"] > (is (= (re-split source-string (list #"\n" #"\t")) '(("1" "2" "3") > ("4" "5" "6")))))) > > (deftest test-re-split-mixed-list > (let [source-string "1\t2\t3\n4\t5\t6"] > (is (= (re-split source-string (list {:pattern #"\n" :limit 1} > #"\t")) '(("1" "2" "3")))) > (is (= (re-split source-string (list {:pattern #"\n" :limit 1} > {:pattern #"\t" :limit 2})) '(("1" "2")))) > (is (= (re-split source-string (list > {:pattern #"\n" :limit 1} > {:pattern #"\t" :limit 2 :marshal-fn > #(java.lang.Double/ > parseDouble %)})) > '((1.0 2.0)))) > (is (= (re-split source-string (list > {:pattern #"\n"} > {:pattern #"\t" :marshal-fn > #(java.lang.Double/parseDouble > %)})) > '((1.0 2.0 3.0) (4.0 5.0 6.0)))) > (is (= (map #(reduce + %) (re-split source-string (list > {:pattern #"\n"} > {:pattern #"\t" > :marshal-fn #(java.lang.Double/ > parseDouble %)}))) > '(6.0 15.0))) > (is (= (reduce +(map #(reduce + %) (re-split source-string (list > {:pattern > #"\n"} > {:pattern > #"\t" :marshal-fn #(java.lang.Double/parseDouble > %)})))) > '21.0)) > )) > > (deftest test-re-partition > (is (= (re-partition "Clojure Is Awesome" #"\s+") '("Clojure" " " > "Is" " " "Awesome")))) > > (deftest test-re-gsub > (let [source-string "1\t2\t3\n4\t5\t6"] > (is (= (re-gsub source-string #"\s+" " ") "1 2 3 4 5 6")) > (is (= (re-gsub source-string '((#"\s+" " "))) "1 2 3 4 5 6")) > (is (= (re-gsub source-string '((#"\s+" " ") (#"\d" "D"))) "D D D > D D D")))) > > (deftest test-re-sub > (let [source-string "1 2 3 4 5 6"] > (is (= (re-sub source-string #"\d" "D") "D 2 3 4 5 6")) > (is (= (re-sub source-string '((#"\d" "D") (#"\d" "E"))) "D E 3 4 > 5 6")))) > > > > --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en -~----------~----~----~----~------~----~------~--~---