Hi Danny,
I tried setting the repair option to none. The error message was more descriptive. It was: <error:error xsi:schemaLocation="http://marklogic.com/xdmp/error error.xsd" xmlns:error="http://marklogic.com/xdmp/error" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <error:code>XDMP-DOCENTITYREF</error:code> <error:name/> <error:xquery-version>1.0-ml</error:xquery-version> <error:message>Invalid entity reference</error:message> <error:format-string>XDMP-DOCENTITYREF: xdmp:document-get("http://......", <options xmlns="xdmp:document-get"><repair>none</repair><encoding&g t;iso-8859-1</encoding></options>) -- Invalid entity reference "atilde" at http://....xml line 31</error:format-string> <error:retryable>false</error:retryable> <error:expr>xdmp:document-get("http://......", <options xmlns="xdmp:document-get"><repair>none</repair><encoding&g t;iso-8859-1</encoding></options>)</error:expr> <error:data> <error:datum>"atilde"</error:datum> Looks like it first complained about the atilde entity reference. I don't understand why that is a problem. BTW, I've been using the full repair option as occasionally I'll get a document that is not well-formed. Most of the docs I obtain are utf-8, but occasionally an ISO-8859-1 doc arrives. I'm just trying to open the sieve pretty wide. Are the xdmp:tidy options the same in both 4.1 and 4.2? Thank you! Tim From: [email protected] [mailto:[email protected]] On Behalf Of Danny Sokolsky Sent: Monday, July 11, 2011 8:09 PM To: General MarkLogic Developer Discussion Subject: Re: [MarkLogic Dev General] Problems reading ISO-8859-1 character í I would set repair to none instead of full. Repair is really not made for that kind of cleanup. If you need to clean up the content, use xdmp:tidy instead. The last example in the tidy doc shows how to do that: http://docs.marklogic.com/4.2doc/docapp.xqy#search.xqy?start=1 <http://docs.marklogic.com/4.2doc/docapp.xqy#search.xqy?start=1&cat=all&quer y=xdmp:tidy&button=search> &cat=all&query=xdmp:tidy&button=search -Danny From: [email protected] [mailto:[email protected]] On Behalf Of Tim Meagher Sent: Monday, July 11, 2011 4:53 PM To: 'General MarkLogic Developer Discussion' Subject: Re: [MarkLogic Dev General] Problems reading ISO-8859-1 character í Upon further review there appears to be some kind of flaky issue going on. I can delete parts of the XML document that contain no character entities and xdmp:document-get() recognizes the í without error. I cannot pinpoint the problem even though I'm using a hex editor to try to identify problematic characters. Has anyone experienced anything like this? Thx again, Tim From: [email protected] [mailto:[email protected]] On Behalf Of Tim Meagher Sent: Monday, July 11, 2011 6:42 PM To: 'General MarkLogic Developer Discussion' Subject: [MarkLogic Dev General] Problems reading ISO-8859-1 character í Hi Folks, I am trying to use MarkLogic to read an XML file from a web page use xdmp:document-get(). The document is ISO-8895-1 encoded, so my invocation looks like this: let $url:= "http://blah/blah/blah/doc.xml" let $options := <options xmlns="xdmp:document-get"> <repair>full</repair> <encoding>iso-8859-1</encoding> </options> let $err-message := "" let $error := false() let $node := try { xdmp:document-get($url, $options) } catch($e) {( xdmp:set($err-message, $e), xdmp:set($error, true()), xdmp:log(concat("Error getting ", $url, ": ", xdmp:quote($e))) )} return if ($error) then $err-message else $node The following error is returned: <error:error xsi:schemaLocation="http://marklogic.com/xdmp/error error.xsd <http://marklogic.com/xdmp/error%20error.xsd> " xmlns:error="http://marklogic.com/xdmp/error" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <error:code>XDMP-DOCUNEOF</error:code> <error:name/> <error:xquery-version>1.0-ml</error:xquery-version> <error:message>XDMP-DOCUNEOF</error:message> <error:format-string/> <error:retryable>false</error:retryable> I have traced the problem to the use of the ISO-8859-1 character encoding í and I get the error even if I replace it with its numeric equivalent $#237;. Removing the character encoding causes the document to be read without error even though another ISO-8859-1 character encoding of ã is handled without error. I'm using MarkLogic 4.1-7.1. Can anyone tell me what's up with this? From what I can tell í is a valid ISO-8859-1 character entity. Thank you! Tim Meagher
_______________________________________________ General mailing list [email protected] http://developer.marklogic.com/mailman/listinfo/general
