I think the issue is more likely to be about the encoding for serialization, not the encoding for input. Everything in MarkLogic is stored in UTF-8. The entity escapes you see are not what is stored: we store the actual characters.
You can force the issue with output settings on the appserver. Look at output-encoding setting in particular. On Sat, 14 Oct 2017 03:48:12 -0700, Zakiya Tamimi <[email protected]> wrote: > I have posted my question at stackoverflow > https://stackoverflow.com/questions/46722188/marklogic-encoding-xdmpdocument-load > > Here's the text of the question: > > I have noticed that utf-8 xml documents loaded (xdmp:document-get() + > xdmp:document-insert()) into our development marklogic server (7.0-6.8) > have ascii encoding. Meanwhile back on production server (7.0-5.1), there > is no problem; utf-8 is loaded as utf-8. I traced the problem and found > it > to be caused by xdmp:document-get(). > > So I wrote the following code snippet and ran it on both server consoles > and got incorrect encoding on the development server and correct encoding > on production. > > let $options := <options xmlns="xdmp:document-get"> > <repair>full</repair> > <encoding>UTF-8</encoding> > <format>xml</format> > </options> > let $url := "http://******/ref_batches/electronic/20170801_e31_004 > /201731780-004.xml" > return xdmp:document-get($url, $options) > > My initial guess: different version numbers may have caused this. So I > tested on a local server (7.0-6-12) and got correct utf-8 encoding. Later > we upgraded our development server to (7.0-6-12) and re-tested to get > incorrect encoding (ascii) > > Is there some marklogic configurations that are responsible for this > trans-coding? > > Thanks -- Using Opera's revolutionary email client: http://www.opera.com/mail/ _______________________________________________ General mailing list [email protected] Manage your subscription at: http://developer.marklogic.com/mailman/listinfo/general
