I am afraid that this subject of String Escape Method is becoming a little
off-topic, but I would like to add my ideia and test results - I think
they are relevant to the discussion.
IMHO, if speed, rather that readability, is important, all
processing would be faster if two static arrays of chars were used, with a
final string creation with the result, if needed, 'a la' C.
Method invocation (with stack operations, class checks,etc) will probably
be slower than the method execution time, so I would also try to avoid
calling methods inside the escapeString method (append, charAt,etc)
I have implemented one method using these principles and the results
of calling the method 1000000 (1x10^6) times in my humble computer,
were (in seconds) :
String with no \ or " String with \ and "
My proposal 26 42
The "other" 34 288
Here is the main() I used to test it:
public class exper
{
public static String escapeString(String str)
{
//different implementations...
}
public static void main(String a[])
{
String result=null;
for(int times=0;times<1000000;times++)
{
result = exper.escapeString(a[0]);
}
System.out.println(result);
}
}
This test has a serious drawback that the String to be escaped is always
the same. Anyway, in my proposal, as you will see, after a big string is
allocated, the arrays are not reallocated again, so this makes this test
realistic and valid. As for the "other" proposal, there is no difference.
Here is the output of the tests:
[smacedo@test133 tmp]$ date ; java exper "This is a message that has no strings to
escape and should be long enough to test the classes." ; date
Thu Mar 16 16:05:42 GMT 2000
This is a message that has no strings to escape and should be long enough to test the
classes.
Thu Mar 16 16:06:08 GMT 2000
[26 seconds]
[smacedo@test133 tmp]$ date ; java exper2 "This is a message that has no strings to
escape and should be long enough to test the classes." ; date
Thu Mar 16 16:06:19 GMT 2000
This is a message that has no strings to escape and should be long enough to test the
classes.
Thu Mar 16 16:06:53 GMT 2000
[34 seconds]
[smacedo@test133 tmp]$ date ; java exper "This is a message that has \\ some \"
strings to escape and should be long enough to test the classes."; date
Thu Mar 16 16:07:45 GMT 2000
This is a message that has \\ some \" strings to escape and should be long enough to
test the classes.
Thu Mar 16 16:08:27 GMT 2000
[42 seconds]
[smacedo@test133 tmp]$ date ; java exper2 "This is a message that has \\ some \"
strings to escape and should be long enough to test the classes." ; date
Thu Mar 16 16:08:48 GMT 2000
This is a message that has \\ some \" strings to escape and should be long enough to
test the classes.
Thu Mar 16 16:13:36 GMT 2000
[4 minutes and 48 seconds!!]
Finally, here is the code of my proposal for escapeString:
public class exper
{
private static char a1[];
private static char a2[];
private static int n;
private static int m;
private static int len;
private static boolean ready = false;
public static String escapeString(String str)
{
// n will be used to scan the string
n=0;
// m will keep the number of added chars
m=0;
// Check if str is null
if(str==null) return str;
// Get its length and check if it is empty
len = str.length();
if(len==0) return str;
// If the static arrays haven't been allocated yet
// or if we need a bigger one
if( !ready || len>a1.length)
{
a1= new char[len];
a2= new char[len*2];
ready = true;
}
// get str to array
// It would be great to be able to do
// System.arraycopy of the private var "value"
// without further checking...
str.getChars(0,len,a1,0);
// do usual switch
while(n<len)
{
switch(a1[n])
{
case '\\':
a2[n+(++m)]='\\';
a2[n+(++m)]='\\';
break;
case '"':
a2[n+(++m)]='\\';
a2[n+(++m)]='"';
break;
default:
a2[n+m]=a1[n];
}
n++;
}
// avoid new String if no escaped were needed
if(m==0)
return str;
else
return new String(a2,0,len+m);
}
...
}
Both implementations follow in attach.
I hope this contributes to the JDE effort.
Regards, Silvio
On Thu, 16 Mar 2000, Mark Gibson wrote:
>
> This is exactly what I did in my original implementation, but the only
> real way of testing the efficient would be via profiling the method for
> each implementation, has anyone tried this?
>
> On Thu, 16 Mar 2000, Brad Giaccio wrote:
>
> [snip]
> > But to avoid all this the code as presented calculates the number of
> > escapes to be caluclated
> > char [] array = new char[length + escapeCount];
> > int index;
> > for (int i = 0; i < length; i++) {
> > // do switch here and just to insertion into array here
> > }
> [snip]
>
>
>
``````````` Silvio Emanuel Nunes Barbosa de Macedo (PhD Std) '''''''''''''
[EMAIL PROTECTED] [EMAIL PROTECTED]
Intelligent and Interactive Systems Telecom. and Multimedia
Imperial College, University of London INESC Porto
Exhibition Road, Pc da Republica, 93
London SW7 2AZ, England 4050-497 Porto Portugal
Tel:+44 171 5946323 Tel:+351 22 2094220
public class exper
{
private static char a1[];
private static char a2[];
private static int n;
private static int m;
private static int len;
private static boolean ready = false;
public static String escapeString(String str)
{
// n will be used to scan the string
n=0;
// m will keep the number of added chars
m=0;
// Check if str is null
if(str==null) return str;
// Get its length and check if it is empty
len = str.length();
if(len==0) return str;
// If the static arrays haven't been allocated yet
// or if we need a bigger one
if( !ready || len>a1.length)
{
a1= new char[len];
a2= new char[len*2];
ready = true;
}
// get str to array
// It would be great to be able to do
// System.arraycopy of the private var "value"
// without further checking...
str.getChars(0,len,a1,0);
// do usual switch
while(n<len)
{
switch(a1[n])
{
case '\\':
a2[n+(++m)]='\\';
a2[n+(++m)]='\\';
break;
case '"':
a2[n+(++m)]='\\';
a2[n+(++m)]='"';
break;
default:
a2[n+m]=a1[n];
}
n++;
}
// avoid new String if no escaped were needed
if(m==0)
return str;
else
return new String(a2,0,len+m);
}
public static void main(String a[])
{
String result=null;
for(int times=0;times<1000000;times++)
{
result = exper.escapeString(a[0]);
}
System.out.println(result);
}
}
public class exper2
{
/**
* Prefix \ escapes to all \ and " characters in a string so that
* the quoted string can be printed rereadably. For efficiency,
* if no such characters are found, the argument String itself
* is returned.
*
* @param str String to be prefixed.
* @return A String.
*
* @author David Hay
* @author Mark Gibson
* @author Steve Haflich
* @author Charles Hart
*/
public static String escapeString (String str) {
int escCount = 0;
int len = str.length();
char c;
// Count number of chars that need escaping.
for (int i=0; i<len; i++) {
c = str.charAt(i);
if (c == '\\' || c == '\"')
escCount++;
}
if (escCount > 0) {
StringBuffer buf = new StringBuffer(str.length() + escCount);
for ( int idx = 0; idx < str.length(); idx++ ) {
char ch = str.charAt( idx );
switch ( ch ) {
case '"': buf.append( "\\\"" ); break;
case '\\': buf.append( "\\\\" ); break;
default: buf.append( ch ); break;
}
}
return buf.toString();
}
else
return str;
}
public static void main(String a[])
{
String result=null;
for(int times=0;times<1000000;times++)
{
result=exper2.escapeString(a[0]);
}
System.out.println(result);
}
}
[smacedo@test133 tmp]$ date ; java exper "This is a message that has no strings
to escape and should be long enough to test the classes." ; date
Thu Mar 16 16:05:42 GMT 2000
This is a message that has no strings to escape and should be long enough to test the
classes.
Thu Mar 16 16:06:08 GMT 2000
[smacedo@test133 tmp]$ date ; java exper2 "This is a message that has no strings to
escape and should be long enough to test the classes." ; date
Thu Mar 16 16:06:19 GMT 2000
This is a message that has no strings to escape and should be long enough to test the
classes.
Thu Mar 16 16:06:53 GMT 2000
[smacedo@test133 tmp]$ date ; java exper "This is a message that has \\ some \"
strings to escape and should be long enough to test the classes." ; date
Thu Mar 16 16:07:45 GMT 2000
This is a message that has \\ some \" strings to escape and should be long enough to
test the classes.
Thu Mar 16 16:08:27 GMT 2000
[smacedo@test133 tmp]$ date ; java exper2 "This is a message that has \\ some \"
strings to escape and should be long enough to test the classes." ; date
Thu Mar 16 16:08:48 GMT 2000
This is a message that has \\ some \" strings to escape and should be long enough to
test the classes.
Thu Mar 16 16:13:36 GMT 2000