Upon further review there appears to be some kind of flaky issue going on.
I can delete parts of the XML document that contain no character entities
and xdmp:document-get() recognizes the í without error.  I cannot
pinpoint the problem even though I'm using a hex editor to try to identify
problematic characters.  Has anyone experienced anything like this?

 

Thx again,

 

Tim

 

From: [email protected]
[mailto:[email protected]] On Behalf Of Tim Meagher
Sent: Monday, July 11, 2011 6:42 PM
To: 'General MarkLogic Developer Discussion'
Subject: [MarkLogic Dev General] Problems reading ISO-8859-1 character
í 

 

Hi Folks,

 

I am trying to use MarkLogic to read an XML file from a web page use
xdmp:document-get().  The document is ISO-8895-1 encoded, so my invocation
looks like this:

 

let $url:=
    "http://blah/blah/blah/doc.xml";
let $options :=
    <options xmlns="xdmp:document-get">
        <repair>full</repair>
        <encoding>iso-8859-1</encoding>
    </options>

let $err-message := ""
let $error := false()

let $node :=
    try {
        xdmp:document-get($url, $options)
    }
    catch($e) {(
        xdmp:set($err-message, $e),
        xdmp:set($error, true()),
        xdmp:log(concat("Error getting ", $url, ": ", xdmp:quote($e)))
    )}

return
    if ($error) then $err-message
    else $node

 

The following error is returned:

 

<error:error xsi:schemaLocation="http://marklogic.com/xdmp/error error.xsd
<http://marklogic.com/xdmp/error%20error.xsd> "
xmlns:error="http://marklogic.com/xdmp/error";
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance";>
  <error:code>XDMP-DOCUNEOF</error:code>
  <error:name/>
  <error:xquery-version>1.0-ml</error:xquery-version>
  <error:message>XDMP-DOCUNEOF</error:message>
  <error:format-string/>
  <error:retryable>false</error:retryable>

 

I have traced the problem to the use of the ISO-8859-1 character encoding
&iacute; and I get the error even if I replace it with its numeric
equivalent $#237;. Removing the character encoding causes the document to be
read without error even though another ISO-8859-1 character encoding of
&atilde; is handled without error.

 

I'm using MarkLogic 4.1-7.1.

 

Can anyone tell me what's up with this?  From what I can tell &iacute; is a
valid ISO-8859-1 character entity.

Thank you!

 

Tim Meagher

 

_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to