I would set repair to none instead of full.  Repair is really not made for that 
kind of cleanup.

If you need to clean up the content, use xdmp:tidy instead.

The last example in the tidy doc shows how to do that:

http://docs.marklogic.com/4.2doc/docapp.xqy#search.xqy?start=1&cat=all&query=xdmp:tidy&button=search

-Danny

From: [email protected] 
[mailto:[email protected]] On Behalf Of Tim Meagher
Sent: Monday, July 11, 2011 4:53 PM
To: 'General MarkLogic Developer Discussion'
Subject: Re: [MarkLogic Dev General] Problems reading ISO-8859-1 character 
í

Upon further review there appears to be some kind of flaky issue going on.  I 
can delete parts of the XML document that contain no character entities and 
xdmp:document-get() recognizes the í without error.  I cannot pinpoint 
the problem even though I'm using a hex editor to try to identify problematic 
characters.  Has anyone experienced anything like this?

Thx again,

Tim

From: [email protected] 
[mailto:[email protected]] On Behalf Of Tim Meagher
Sent: Monday, July 11, 2011 6:42 PM
To: 'General MarkLogic Developer Discussion'
Subject: [MarkLogic Dev General] Problems reading ISO-8859-1 character í

Hi Folks,

I am trying to use MarkLogic to read an XML file from a web page use 
xdmp:document-get().  The document is ISO-8895-1 encoded, so my invocation 
looks like this:

let $url:=
    "http://blah/blah/blah/doc.xml";
let $options :=
    <options xmlns="xdmp:document-get">
        <repair>full</repair>
        <encoding>iso-8859-1</encoding>
    </options>

let $err-message := ""
let $error := false()

let $node :=
    try {
        xdmp:document-get($url, $options)
    }
    catch($e) {(
        xdmp:set($err-message, $e),
        xdmp:set($error, true()),
        xdmp:log(concat("Error getting ", $url, ": ", xdmp:quote($e)))
    )}

return
    if ($error) then $err-message
    else $node

The following error is returned:

<error:error xsi:schemaLocation="http://marklogic.com/xdmp/error 
error.xsd<http://marklogic.com/xdmp/error%20error.xsd>" 
xmlns:error="http://marklogic.com/xdmp/error"; 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance";>
  <error:code>XDMP-DOCUNEOF</error:code>
  <error:name/>
  <error:xquery-version>1.0-ml</error:xquery-version>
  <error:message>XDMP-DOCUNEOF</error:message>
  <error:format-string/>
  <error:retryable>false</error:retryable>

I have traced the problem to the use of the ISO-8859-1 character encoding 
&iacute; and I get the error even if I replace it with its numeric equivalent 
$#237;. Removing the character encoding causes the document to be read without 
error even though another ISO-8859-1 character encoding of &atilde; is handled 
without error.

I'm using MarkLogic 4.1-7.1.

Can anyone tell me what's up with this?  From what I can tell &iacute; is a 
valid ISO-8859-1 character entity.
Thank you!

Tim Meagher

_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to