Ok, I know we've been over this before, but nothing was actually done.

For the record:
http://groups.google.com/group/clojure/browse_thread/thread/81b361a4e82602b7/0313c224a480a161

So here is my attempt formalize a simple proposal.

The reader should take the literal contents of #"..." and pass to
Pattern.compile as a raw string, making no changes to the contents.
That means all backslashes (\) and double quotes (") would be passed
right in.  The only other thing the reader need concern itself with,
is that when it sees a \" it should not treat that double-quote as the
end of the pattern, but rather keep on doing until it sees a
double-quote that is not preceded by a backslash.  Nevertheless is
would pass both the quoting \ and the following " to Pattern.compile.

That's it. Simple. It works because Java's Pattern itself understands
backslash quoting, including literal chars like backslash and double
quote, hex and octal patterns, as well as other regex patterns.

Some examples:

1. Simple text
(re-find #"foo" "foo") --> "foo"

2. Pre-defined character class
(re-find #"\w*" "[EMAIL PROTECTED]") --> "foo"

3. Special character (regex and string)
(re-find #"\t" "\t") --> "\t"

4. Scary special character (regex only)
Note that the escape sequences available inside #"" are Java Pattern
escape sequences, and therefore by definition different from Clojure
String escape sequences.  Of course this is what you need for \w and
such to work:
(re-find #"\a" "\u0007") --> beep ""

5. Special character (string only)
The revere of the previous example -- Clojure strings understand "\b"
as (str \backspace), but Java patterns do not, so this example uses
hex instead:
(re-find #"\x08" "\b") --> "\b"

6. Hex
(re-find #"\x31" "1") --> "1"

7. Octal
(re-find #"\061" "1") --> "1"

8. Word boundary:
(re-find #"\bfoo" "foo") --> "foo"

9. Quoting fun -- double quote, a single character:
(re-find #"\"" "\"") --> "\""

10. Quoting fun -- backslash, a single character:
(re-find #"\\" "\\") --> "\\"

11. Open paren
(re-find #"\(" "(") --> "("

I think this demonstrates you can create any pattern you might need.
For reference, here are the above patterns expressed in the current
(not the proposed) reader syntax:

1. #"foo"
2. #"\\w*"
3. #"\t" or #"\\t"
4. #"\\a" (but #"\a" makes the reader blow up)
5. #"\\x08"
6. #"\\x31"
7. #"\061" or #"\\061"
8. #"\\bfoo" (note #"\bfoo" is legal, but doesn't do what you want)
9. #"\"" or #"\\\"" (but #"\\"" blows up the reader)
10. #"\\\\" (but #"\\" is illegal)
11. #"\\(" (but #"\(" is illegal)

Somehow I'm not sure that communicates how much I dislike the current
syntax.  Oh well, maybe others can chime in on that point. I
implemented this to provide the examples above, not because I think
this is a done deal or anything.  Please comment!

Here is a new print method to match the attached patch to LispReader:

(defmethod print-method java.util.regex.Pattern [p w]
  (.write w "#\"")
  (.write w (.pattern p))
  (.write w "\""))

That print method will take a bit more work to properly quote some
Patterns that could be created by means other than the Clojure
literal.

--Chouser

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To post to this group, send email to clojure@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/clojure?hl=en
-~----------~----~----~----~------~----~------~--~---

diff --git a/src/jvm/clojure/lang/LispReader.java b/src/jvm/clojure/lang/LispReader.java
index e3ef9fc..ce51df1 100644
--- a/src/jvm/clojure/lang/LispReader.java
+++ b/src/jvm/clojure/lang/LispReader.java
@@ -358,8 +358,22 @@ static class RegexReader extends AFn{
 	static StringReader stringrdr = new StringReader();
 
 	public Object invoke(Object reader, Object doublequote) throws Exception{
-		String str = (String) stringrdr.invoke(reader, doublequote);
-		return Pattern.compile(str);
+		StringBuilder sb = new StringBuilder();
+		Reader r = (Reader) reader;
+		for(int ch = r.read(); ch != '"'; ch = r.read())
+			{
+			if(ch == -1)
+				throw new Exception("EOF while reading regex");
+			sb.append( (char) ch );
+			if(ch == '\\')	//escape
+				{
+				ch = r.read();
+				if(ch == -1)
+					throw new Exception("EOF while reading regex");
+				sb.append( (char) ch ) ;
+				}
+			}
+		return Pattern.compile(sb.toString());
 	}
 }
 

Reply via email to