Proposed Change to str-utils

Sean Mon, 23 Mar 2009 18:18:33 -0700

Hello Everyone,
I've been reviewing the str-utils package, and I'd like to propose a
few changes to the library.  I've included the code at the bottom.


USE MULTI-METHODS

I'd like to propose re-writing the following methods to used multi-
methods.  Every single method will take an input called input-string.

*re-split[input-string & remaining-inputs](...)*

The remaining inputs can be dispatched based on a regex pattern, a
list of patterns, or a map.

regex pattern - splits a string into a list, like it does now.
e.g. (re-split "1 2 3\n4 5 6" #"\n") => ("1 2 3" "4 5 6")

list - this splits each element either like a map or a regex.  The map
operator is applied recursively to each element
e.g. (re-split "1 2 3\n4 5 6" (list #"\n" #"\s+")) => (("1" "2" "3")
("4" "5" "6"))

map - this splits each element based on the inputs of the map.  It is
how options are passed to the method.
e.g (re-split "1 2 3" {:pattern #"\s+" :limit 2 :marshal-fn #
(java.lang.Double/parseDouble %)}) => (1.0 2.0)
The :pattern and :limit options are relatively straightforward.
The :marshal-fn is mapped after the string is split.

These items can be chained together, as the following example shows
e.g. (re-split "1 2 3\n4 5 6" (list #"\n" {:pattern #"\s+" :limit
2 :marshal-fn #(java.lang.Double/parseDouble %)})) => ((1.0 2.0) (4.0
5.0))

In my opinion, the :marshal-fn is best used at the end of the list.
However, it could be used earlier in the list, but a exception will
most likely be thrown.


*re-partion[input-string & remaining-inputs]

This methods behaves like the original re-partition method, with the
remaining-inputs being able to a list or a pattern.  I don't see a
need to change the behavior of this method at the moment.

*re-gsub[input-string & remaining-inputs]

This method can take a list or two atoms as the remaining inputs.

Two atoms -
e.g. (re-gsub "1 2 3 4 5 6" #"\s" "") => "123456"

A paired list
e.g (re-gsub "1 2 3 4 5 6" '((#"\s" " ) (#"\d" "D"))) => "DDDDDD"

*re-sub[input-string & remaining-inputs]

Again, this method can take a list or two atoms as the remaining
inputs.

Two atoms
e.g. (re-sub "1 2 3 4 5 6" #"\d" "D") => "D 2 3 4 5 6"

A paired list
e.g (re-sub "1 2 3 4 5 6" '((#"\d" "D") (#"\d" "E"))) => "D E 3 4 5 6"

NEW PARSING HELPERS
I've created four methods, str-before, str-before-inc, str-after, str-
after-inc.  They are designed to help strip off parts of string before
a regex.

(str-before "Clojure Is Awesome" #"\s") => "Clojure"
(str-before-inc "Clojure Is Awesome" #"\s") => "Clojure "
(str-after "Clojure Is Awesome" #"\s") => "Is Awesome"
(str-after-inc "Clojure Is Awesome" #"\s") => " Is Awesome"

These methods can be used to help parse strings

(str-before (str-after "<h4 ... >" #"<h4") ">") => ;the stuff in the
middle

NEW INFLECTORS
I've added a few inflectors that I am familiar with from Rails.  My
apologies if their origin is anther language.  I'd be interested in
knowing where the method originated

str-reverse
This methods reverses a string
e.g. (str-reverse "Clojure") => "erujolC"

trim
This is a convenience wrapper for the trim method java supplies
e.g. (trim "  Clojure  ") => "Clojure"

strip
This is an alias for trim.  I accidently switch between *trim* and
*strip* all the time.
e.g. (strip "  Clojure  ") => "Clojure"

ltrim
This method removes the leading whitespace
e.g. (ltrim "  Cloure  ") => "Clojure  "

rtrim
This method removes the trailing whitespace
e.g. (ltrim "  Cloure  ") => "  Clojure"

downcase
This is a convenience wrapper for the toLowerCase method java supplies
e.g. (downcase "Clojure") => "clojure"

upcase
This is a convenience wrapper for the toUpperCase method java supplies
e.g. (upcase "Clojure") => "CLOJURE"

capitalize
This method capitalizes a string
e.g (capitalize "clojure") => "Clojure"

titleize, camelize, dasherize, underscore
These methods manipulate "sentences", producing a consistent output.
Check the unit tests for more examples
(titleize "clojure iS Awesome") => "Clojure Is Awesome"
(camleize "clojure iS Awesome") => "clojureIsAwesome"
(dasherize "clojure iS Awesome") => "clojure-is-awesome"
(underscore "clojure iS Awesome") => "clojure_is_awesome"

*FINAL THOUGHTS*
There are three more methods, str-join, chop, and chomp that were
already in str-utils.  I change the implementation of the methods, but
the behavior should be the same.

There is a big catch with my proposed change.  The signature of re-
split, re-partition, re-gsub and re-sub changes.  They will not be
backwards compatible, and will break code.  However, I think the
flexibility is worth it.

*TO-DOs*
There are a few more things I'd like to add, but that could done at a
later date.

*Add more inflectors

The following additions become pretty easy if the propsed re-gsub is
included:

*Add HTML-escape function (like Rails' h method)
*Add Javascript-escape function (like Rails' javascript-escape method)
*Add SQL-escape function

Okay, that's everything I can think of for now.  I'd like to thank the
Stuart Sierra, and all of the contributors to this library.  This is
possible because I'm standing on their shoulders.

Oh, and I apologize for not putting this up on github, especially
after I asked someone else to do the same yesterday.  I'll try not to
be so hypocritical going forward.

*CODE*

(ns devlinsf.str-utils)

;;; String Merging & Slicing

(defn str-join
  "Returns a string of all elements in 'sequence', separated by
  'separator'.  Like Perl's 'join'."
  [separator sequence]
  (apply str (interpose separator sequence)))


(defmulti re-split (fn[input-string & remaining-inputs] (class (first
remaining-inputs))))

(defmethod re-split java.util.regex.Pattern
  ([string #^java.util.regex.Pattern pattern] (seq (. pattern (split
string)))))

(defmethod re-split clojure.lang.PersistentList
  [input-string patterns]
  (let [reversed (reverse patterns)
        pattern (first reversed)
        remaining (rest reversed)]
    (if (empty? remaining)
      (re-split input-string pattern)
      (map #(re-split % pattern) (re-split input-string (reverse
remaining))))))

(defmethod re-split clojure.lang.PersistentArrayMap
  [input-string map-options]
  (cond (:limit map-options) (take (:limit map-options) (re-split
input-string (dissoc map-options :limit)))
        (:marshal-fn map-options) (map (:marshal-fn map-options) (re-split
input-string (dissoc map-options :marshal-fn)))
        'true (re-split input-string (:pattern map-options))))

(defmulti re-partition (fn[input-string & remaining-inputs] (class
(first remaining-inputs))))

(defmethod re-partition java.util.regex.Pattern
  [string #^java.util.regex.Pattern re]
  (let [m (re-matcher re string)]
    ((fn step [prevend]
       (lazy-seq
        (if (.find m)
          (cons (.subSequence string prevend (.start m))
                (cons (re-groups m)
                      (step (+ (.start m) (count (.group m))))))
          (when (< prevend (.length string))
            (list (.subSequence string prevend (.length string)))))))
     0)))

(defmethod re-partition clojure.lang.PersistentList
  [input-string patterns]
  (let [reversed (reverse patterns)
        pattern (first reversed)
        remaining (rest reversed)]
    (if (empty? remaining)
      (re-partition input-string pattern)
      (map #(re-partition % pattern) (re-partition input-string
(reverse remaining))))))

(defmulti re-gsub (fn[input-string & remaining-inputs] (class (first
remaining-inputs))))

(defmethod re-gsub java.util.regex.Pattern
  [#^String string #^java.util.regex.Pattern regex replacement]
  (if (ifn? replacement)
    (let [parts (vec (re-partition regex string))]
      (apply str
             (reduce (fn [parts match-idx]
                       (update-in parts [match-idx] replacement))
                     parts (range 1 (count parts) 2))))
    (.. regex (matcher string) (replaceAll replacement))))

(defmethod re-gsub clojure.lang.PersistentList
  [input-string regex-pattern-pairs]
  (let [reversed (reverse regex-pattern-pairs)
        pair (first reversed)
        remaining (rest reversed)]
    (if (empty? remaining)
      (re-gsub input-string (first pair) (second pair))
      (re-gsub (re-gsub input-string (reverse remaining)) (first pair)
(second pair)))))


(defmulti re-sub (fn[input-string & remaining-inputs] (class (first
remaining-inputs))))

(defmethod re-sub java.util.regex.Pattern
  [#^String string #^java.util.regex.Pattern regex replacement ]
  (if (ifn? replacement)
    (let [m (re-matcher regex string)]
      (if (.find m)
        (str (.subSequence string 0 (.start m))
             (replacement (re-groups m))
             (.subSequence string (.end m) (.length string)))
        string))
    (.. regex (matcher string) (replaceFirst replacement))))

(defmethod re-sub clojure.lang.PersistentList
  [input-string regex-pattern-pairs]
  (let [reversed (reverse regex-pattern-pairs)
        pair (first reversed)
        remaining (rest reversed)]
    (if (empty? remaining)
      (re-sub input-string (first pair) (second pair))
      (re-sub (re-sub input-string (reverse remaining)) (first pair)
(second pair)))))

;;; Parsing Helpers
(defn str-before [input-string regex]
  (let [matches (re-partition input-string regex)]
    (first matches)))

(defn str-before-inc [input-string regex]
  (let [matches (re-partition input-string regex)]
    (str (first matches) (second matches))))

(defn str-after [input-string regex]
  (let [matches (re-partition input-string regex)]
    (str-join "" (rest (rest matches)))))

(defn str-after-inc [input-string regex]
  (let [matches (re-partition input-string regex)]
    (str-join "" (rest matches))))


;;; Inflectors
;;; These methods only take the input string.
(defn str-reverse
  "This method excepts a string and returns the reversed string as a
results"
  [input-string]
  (apply str (reverse input-string)))


(defn upcase
  "Converts the entire string to upper case"
  [input-string]
  (. input-string toUpperCase))

(defn downcase [input-string]
  "Converts the entire string to lower case"
  (. input-string toLowerCase))

(defn trim[input-string]
  "Shortcut for String.trim"
  (. input-string trim))

(defn strip
  "Alias for trim, like Ruby."
  [input-string]
  (trim input-string))

(defn ltrim
  "This method chops all of the leading whitespace."
  [input-string]
  (str-after input-string #"\s+"))

(defn rtrim
  "This method chops all of the trailing whitespace."
  [input-string]
  (str-reverse (str-after (str-reverse input-string) #"\s+")))

(defn chop
  "Removes the last character of string."
  [input-string]
  (subs input-string 0 (dec (count input-string))))

(defn chomp
  "Removes all trailing newline \\n or return \\r characters from
  string.  Note: String.trim() is similar and faster."
  [input-string]
  (str-before input-string #"[\r\n]+"))

(defn capitalize
  "This method turns a string into a capitalized version, Xxxx"
  [input-string]
  (str-join "" (list
                (upcase (str (first input-string)))
                (downcase (apply str (rest input-string))))))

(defn titleize
  "This method takes an input string, splits it across whitespace,
dashes, and underscores.  Each word is capitalized, and the result is
joined with \" \"."
  [input-string]
  (let [words (re-split input-string #"[\s_-]+")]
    (str-join " " (map capitalize words))))

(defn camelize
  "This method takes an input string, splits it across whitespace,
dashes, and underscores.  The first word is captialized, and the rest
are downcased, and the result is joined with \"\"."
  [input-string]
  (let [words (re-split input-string #"[\s_-]+")]
    (str-join "" (cons (downcase (first words)) (map capitalize (rest
words))))))

(defn dasherize
  "This method takes an input string, splits it across whitespace,
dashes, and underscores.  Each word is downcased, and the result is
joined with \"-\"."
  [input-string]
  (let [words (re-split input-string #"[\s_-]+")]
    (str-join "-" (map downcase words))))

(defn underscore
  "This method takes an input string, splits it across whitespace,
dashes, and underscores.  Each word is downcased, and the result is
joined with \"_\"."
  [input-string]
  (let [words (re-split input-string #"[\s_-]+")]
    (str-join "_" (map downcase words))))

;;; Escapees

;TO-DO

;(defn sql-escape[x])
;(defn html-escape[x])
;(defn javascript-escape[x])
;(defn pdf-escape)


*UNIT TESTS*
(ns devlinsf.test-contrib.str-utils
    (:use clojure.contrib.test-is
          devlinsf.str-utils))

(deftest test-str-reverse
  (is (= (str-reverse "Clojure") "erujolC")))

(deftest test-downcase
  (is (= (downcase "Clojure") "clojure")))

(deftest test-upcase
  (is (= (upcase "Clojure") "CLOJURE")))

(deftest test-trim
  (is (= (trim "  Clojure  ") "Clojure")))

(deftest test-strip
  (is (= (strip "  Clojure  ") "Clojure")))

(deftest test-ltrim
  (is (= (ltrim "  Clojure  ") "Clojure  ")))

(deftest test-rtrim
  (is (= (rtrim "  Clojure  ") "  Clojure")))

(deftest test-chop
  (is (= (chop "Clojure") "Clojur")))

(deftest test-chomp
  (is (= (chomp "Clojure \n") "Clojure "))
  (is (= (chomp "Clojure \r") "Clojure "))
  (is (= (chomp "Clojure \n\r") "Clojure ")))

(deftest test-capitalize
  (is (= (capitalize "clojure") "Clojure")))

(deftest test-titleize
  (let [expected-string "Clojure Is Awesome"]
    (is (= (titleize "clojure is awesome") expected-string))
    (is (= (titleize "clojure   is  awesome") expected-string))
    (is (= (titleize "CLOJURE IS AWESOME") expected-string))
    (is (= (titleize "clojure-is-awesome") expected-string))
    (is (= (titleize "clojure- _ is---awesome") expected-string))
    (is (= (titleize "clojure_is_awesome") expected-string))))

(deftest test-camelize
  (let [expected-string "clojureIsAwesome"]
    (is (= (camelize "clojure is awesome") expected-string))
    (is (= (camelize "clojure   is  awesome") expected-string))
    (is (= (camelize "CLOJURE IS AWESOME") expected-string))
    (is (= (camelize "clojure-is-awesome") expected-string))
    (is (= (camelize "clojure- _ is---awesome") expected-string))
    (is (= (camelize "clojure_is_awesome") expected-string))))

(deftest test-underscore
  (let [expected-string "clojure_is_awesome"]
    (is (= (underscore "clojure is awesome") expected-string))
    (is (= (underscore "clojure   is  awesome") expected-string))
    (is (= (underscore "CLOJURE IS AWESOME") expected-string))
    (is (= (underscore "clojure-is-awesome") expected-string))
    (is (= (underscore "clojure- _ is---awesome") expected-string))
    (is (= (underscore "clojure_is_awesome") expected-string))))

(deftest test-dasherize
  (let [expected-string "clojure-is-awesome"]
    (is (= (dasherize "clojure is awesome") expected-string))
    (is (= (dasherize "clojure   is  awesome") expected-string))
    (is (= (dasherize "CLOJURE IS AWESOME") expected-string))
    (is (= (dasherize "clojure-is-awesome") expected-string))
    (is (= (dasherize "clojure- _ is---awesome") expected-string))
    (is (= (dasherize "clojure_is_awesome") expected-string))))

(deftest test-str-before
  (is (= (str-before "Clojure Is Awesome" #"Is") "Clojure ")))

(deftest test-str-before-inc
  (is (= (str-before-inc "Clojure Is Awesome" #"Is") "Clojure Is")))

(deftest test-str-after
  (is (= (str-after "Clojure Is Awesome" #"Is") " Awesome")))

(deftest test-str-after-inc
  (is (= (str-after-inc "Clojure Is Awesome" #"Is") "Is Awesome")))

(deftest test-str-join
  (is (= (str-join " " '("A" "B")) "A B")))

(deftest test-re-split-single-regex
  (let [source-string "1\t2\t3\n4\t5\t6"]
    (is (= (re-split source-string #"\n") '("1\t2\t3" "4\t5\t6")))))

(deftest test-re-split-single-map
  (let [source-string "1\t2\t3\n4\t5\t6"]
    (is (= (re-split source-string {:pattern #"\n"}) '("1\t2\t3"
"4\t5\t6")))
    (is (= (re-split source-string {:pattern #"\n" :limit 1})
'("1\t2\t3")))
    (is (= (re-split source-string {:pattern #"\n" :marshal-fn #(str %
"\ta")}) '("1\t2\t3\ta" "4\t5\t6\ta")))
    (is (= (re-split source-string {:pattern #"\n" :limit 1 :marshal-
fn #(str % "\ta")}) '("1\t2\t3\ta")))
    ))

(deftest test-re-split-single-element-list
  (let [source-string "1\t2\t3\n4\t5\t6"]
    (is (= (re-split source-string (list #"\n")) '("1\t2\t3"
"4\t5\t6")))))

(deftest test-re-split-pure-list
  (let [source-string "1\t2\t3\n4\t5\t6"]
    (is (= (re-split source-string (list #"\n" #"\t")) '(("1" "2" "3")
("4" "5" "6"))))))

(deftest test-re-split-mixed-list
  (let [source-string "1\t2\t3\n4\t5\t6"]
    (is (= (re-split source-string (list {:pattern #"\n" :limit 1}
#"\t")) '(("1" "2" "3"))))
    (is (= (re-split source-string (list {:pattern #"\n" :limit 1}
{:pattern #"\t" :limit 2})) '(("1" "2"))))
    (is (= (re-split source-string (list
                                    {:pattern #"\n" :limit 1}
                                    {:pattern #"\t" :limit 2 :marshal-fn 
#(java.lang.Double/
parseDouble %)}))
           '((1.0 2.0))))
    (is (= (re-split source-string (list
                                    {:pattern #"\n"}
                                    {:pattern #"\t" :marshal-fn 
#(java.lang.Double/parseDouble
%)}))
           '((1.0 2.0 3.0) (4.0 5.0 6.0))))
    (is (= (map #(reduce + %) (re-split source-string (list
                                                       {:pattern #"\n"}
                                                       {:pattern #"\t" 
:marshal-fn #(java.lang.Double/
parseDouble %)})))
           '(6.0 15.0)))
    (is (= (reduce +(map #(reduce + %) (re-split source-string (list
                                                                {:pattern #"\n"}
                                                                {:pattern #"\t" 
:marshal-fn #(java.lang.Double/parseDouble
%)}))))
           '21.0))
    ))

(deftest test-re-partition
  (is (= (re-partition "Clojure Is Awesome" #"\s+") '("Clojure" " "
"Is" " " "Awesome"))))

(deftest test-re-gsub
  (let [source-string "1\t2\t3\n4\t5\t6"]
    (is (= (re-gsub source-string #"\s+" " ") "1 2 3 4 5 6"))
    (is (= (re-gsub source-string '((#"\s+" " "))) "1 2 3 4 5 6"))
    (is (= (re-gsub source-string '((#"\s+" " ") (#"\d" "D"))) "D D D
D D D"))))

(deftest test-re-sub
  (let [source-string "1 2 3 4 5 6"]
    (is (= (re-sub source-string #"\d" "D") "D 2 3 4 5 6"))
    (is (= (re-sub source-string '((#"\d" "D") (#"\d" "E"))) "D E 3 4
5 6"))))

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To post to this group, send email to clojure@googlegroups.com
To unsubscribe from this group, send email to 
clojure+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/clojure?hl=en
-~----------~----~----~----~------~----~------~--~---

Proposed Change to str-utils

Reply via email to