XMLWriter escaping issue

2006-04-21 Thread Erik Hatcher
I encountered an escaping issue with XMLWriter.  Locally I've added  
the following test to BasicFunctionalityTest to demonstrate:


  public void testXMLWriter() throws Exception {

SolrQueryResponse rsp = new SolrQueryResponse();
rsp.add(\quoted\, value);

StringWriter writer = new StringWriter(32000);
XMLWriter.writeResponse(writer,req(foo),rsp);

System.out.println(writer.toString() =  + writer.toString());
DocumentBuilder builder = DocumentBuilderFactory.newInstance 
().newDocumentBuilder();

builder.parse(new ByteArrayInputStream
 (writer.toString().getBytes(UTF-8)));
  }


Quotes within XML attributes cause invalid XML to be generated.

I've corrected this in my local copy with this patch adding the  
escaping to attribute names and the quot; to XML.chardata_escapes.   
The question is, is it appropriate to escape quotes everywhere, or  
should it just be done when writing attribute values?  It should be  
fine to do it across the board for attribute values and element text,  
but I wanted to verify that with solr-dev before committing it.


Comments?

Erik



Index: src/java/org/apache/solr/request/XMLWriter.java
===
--- src/java/org/apache/solr/request/XMLWriter.java (revision  
395873)

+++ src/java/org/apache/solr/request/XMLWriter.java (working copy)
@@ -178,7 +178,7 @@
 writer.write(tag);
 if (name!=null) {
   writer.write( name=\);
-  writer.write(name);
+  XML.escapeCharData(name, writer);
   if (closeTag) {
 writer.write(\/);
   } else {
Index: src/java/org/apache/solr/util/XML.java
===
--- src/java/org/apache/solr/util/XML.java  (revision 395873)
+++ src/java/org/apache/solr/util/XML.java  (working copy)
@@ -32,7 +32,7 @@
   // many chars less than 0x20 are *not* valid XML, even when escaped!
   // for example, foo#0;foo is invalid XML.
   private static final String[] chardata_escapes=
-   
{#0;,#1;,#2;,#3;,#4;,#5;,#6;,#7;,#8;,null,null,#11;, 
#12;,null,#14;,#15;,#16;,#17;,#18;,#19;,#20;,#21;,#22 
;,#23;,#24;,#25;,#26;,#27;,#28;,#29;,#30;,#31;,null,n 
ull,null,null,null,null,amp;,null,null,null,null,null,null,null,null, 
null,null,null,null,null,null,null,null,null,null,null,null,null,lt;} 
;
+   
{#0;,#1;,#2;,#3;,#4;,#5;,#6;,#7;,#8;,null,null,#11;, 
#12;,null,#14;,#15;,#16;,#17;,#18;,#19;,#20;,#21;,#22 
;,#23;,#24;,#25;,#26;,#27;,#28;,#29;,#30;,#31;,null,n 
ull,quot;,null,null,null,amp;,null,null,null,null,null,null,null,n 
ull,null,null,null,null,null,null,null,null,null,null,null,null,null,l 
t;};




Re: XMLWriter escaping issue

2006-04-21 Thread Yonik Seeley
On 4/21/06, Erik Hatcher [EMAIL PROTECTED] wrote:
 I've corrected this in my local copy with this patch adding the
 escaping to attribute names and the quot; to XML.chardata_escapes.
 The question is, is it appropriate to escape quotes everywhere, or
 should it just be done when writing attribute values?

I'd prefer just escaping quotes in attribute values as it makes things
like debugging output that contains query strings easier to read, and
easier to paste back into the query box for debugging from someone
elses output.

The attribute values definitely need to be XML escaped though.

-Yonik