[MarkLogic Dev General] Error loading a mix of UTF-8 and ISO-8859-1 using xdmp:document-load

Tim Meagher Wed, 30 Jun 2010 08:26:44 -0700

Hi Folks,


I wrote an xquery module (http-load.xqy) that takes an HTTP POST whose
contents contain an XML record containing sub-elements for the XML content,
the repair, location, URI, permissions, collection, and document properties
required to execute xdmp:document-load.  The xquery module http-load.xqy is
available from the marklogic server via an HTTP app server.  I invoke
http-load.xqy via a C#.Net application using the HttpWebRequest .Net library
in which I explicitly set the content type as follows:

 

                wb.ContentType = "text/xml;charset=\"utf-8\"";

 

This was working very well until one of our vendors starting sending content
that explicitly specified the XML encoding as ISO-8859-1 in the XML
declaration (which was not previously supplied in any of their content):

 

<?xml version="1.0" encoding="ISO-8859-1"?>



As a result the loader is giving me the following error:

 

XDMP-DOCUTF8SEQ: Invalid UTF-8 escape sequence at
http://[server]/[path]/doc.xml line 44 -- document is not UTF-8 encoded .

 

I can modify the C#.Net content type to remove the character set declaration
as follows:

 

                wb.ContentType = "text/xml";

 

(which is what I'm inclined to do to see what happens), but I need to be
able to support either UTF-8 or ISO-8859-1 character sets and I don't want
to have to determine the encoding before loading into MarkLogic.  Content
comes from a variety of vendors so it would be nice to let marklogic figure
out the encoding.  The encoding can be explicitly specified as either UTF-8
or ISO-8859-1 in the xdmp-document-load() options, but I'm wondering if the
encoding can be automatically discovered and/or if it assumes UTF-8 unless
explicitly set in the XML declaration?

 

Thanks ahead of time for any help!

 

Tim Meagher

_______________________________________________
General mailing list
General@developer.marklogic.com
http://developer.marklogic.com/mailman/listinfo/general

[MarkLogic Dev General] Error loading a mix of UTF-8 and ISO-8859-1 using xdmp:document-load

Reply via email to