Hi and thanks for the reply, [reposting it to link it to the correct discussion]
In my first post I talked about using xdmp:document-insert, but it should have been xdmp:document-load. Sorry for the confusion. Below are some working examples of both cases: --- XXE EXAMPLE --- File : xxe-example.xml <?xml version="1.0" encoding="ISO-8859-1"?> <!DOCTYPE foo [ <!ELEMENT foo ANY > <!ENTITY xxe SYSTEM "file:///d:/xxe-inject-me.xml" >]><foo>&xxe;</foo> File : xxe-inject-me.xml <?xml version="1.0"?> <lolz>hallo</lolz> Code : xdmp:document-load("d:\xxe-example.xml", map:map() => map:with("uri", "/test/xxe-result.xml")) Result: <?xml version="1.0" encoding="UTF-8"?> <foo> <lolz>hallo </lolz> </foo> So the file contents gets inserted just as requested. This would be something you want to block \ prevent from happening. Note that xdmp:document-get("d:\xxe-example.xml ") produces the same output\behaviour. As the file location embedded in the xml could also point to an external (http) location, this could be a potential risk when loading xml files. I think this should be addressed in both functions by adding something like 'ignore-dtd' option. Using document-insert in your example reveals something interesting as the unquote function looks like to ignore\disable the DTD stuff, at least in this case. The function expects a node, so the xml is already 'processed' somewhere before. --- XML BOMB --- File : xmlbomb.xml <!DOCTYPE lolz [ <!ENTITY lol "lol"> <!ELEMENT lolz (#PCDATA)> <!ENTITY lol1 "&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;"> <!ENTITY lol2 "&lol1;&lol1;&lol1;&lol1;&lol1;&lol1;&lol1;&lol1;&lol1;&lol1;"> <!ENTITY lol3 "&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;"> <!ENTITY lol4 "&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;"> <!ENTITY lol5 "&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;"> <!ENTITY lol6 "&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;"> <!ENTITY lol7 "&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;"> <!ENTITY lol8 "&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;"> <!ENTITY lol9 "&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;"> ]> <lolz>&lol9;</lolz> Code: xdmp:document-load("d:\xmlbomb.xml", map:map() => map:with("uri", "/test/xmlbomb.xml")) Result: Code is executed and continues running. Interestingly, when reading the content and using the unquote function, it also causes the process to keep loading the file. So it really doesn't ignore all DTD definitions as it did when loading the xxe example. My conclusion thus far: document-load and document-get are vulnerable to the exploits without an option to turn it off. Document-insert is not affected as it expects a node at which point the original document is already processed. The unquote option sometimes prevents the execution of the exploits, sometimes not. Any thoughts on the matter would be appreciated! Thanks, Marcel Date: Wed, 14 Mar 2018 17:28:34 +0000 From: Keith Breinholt <breinhol...@ldschurch.org> Subject: Re: [MarkLogic Dev General] Marklogic XXE and XML Bomb prevention To: MarkLogic Developer Discussion <general@developer.marklogic.com> Message-ID: <sn1pr04mb190429cfb3c02923235f767fb8...@sn1pr04mb1904.namprd04.prod.outlook.com> Content-Type: text/plain; charset="us-ascii" Here is the closest I been able to come to inserting the "document". xquery version "1.0-ml"; let $doc := xdmp:unquote( xdmp:filesystem-file("C:/xxeInjection.xml") ) return ( $doc, xdmp:document-insert( "/xxeInjection.xml", $doc) ) Here is the contents of the xxeInjection.xml file are exactly as you specify below. However, when the file is loaded from the file system it is text and must be unquoted ... xdmp:unquote() strips the invalid HTML DOCTYPE and we get: <?xml version="1.0" encoding="UTF-8"?> <foo>;</foo> Could you please show us the code you used to insert the xxe injection "document" unmodified? -Keith From: Keith Breinholt Sent: Wednesday, March 14, 2018 11:07 AM To: general@developer.marklogic.com Subject: RE: Marklogic XXE and XML Bomb prevention Perhaps you could show the code that you used to insert the document into the database. I, personally, cannot get your code to work for a number of reasons. 1) having both an xml processing statement and an HTML doctype is invalid. 2) Trying to assign the "document" to a variable throws an error because of #1. 3) If I try to put the "document" below into a file on the file system and load it I cannot use xdmp:document-insert() to insert the "document" into the database because there isn't a valid node. There may be something I have overlooked so please share the code you used to insert this document into a database. -Keith From: general-boun...@developer.marklogic.com<mailto:general-boun...@developer.marklogic.com> <general-boun...@developer.marklogic.com<mailto:general-boun...@developer.marklogic.com>> On Behalf Of Marcel de Kleine Sent: Wednesday, March 14, 2018 6:43 AM To: general@developer.marklogic.com<mailto:general@developer.marklogic.com> Subject: [MarkLogic Dev General] Marklogic XXE and XML Bomb prevention Hello, We have noticed Marklogic is vulnerable to xxe (entity expansion) and xml bomb attacks. When loading an malicious document using xdmp:document-insert it won't catch these and cause either loading of unwanted external documents (xxe) and lockup of the system (xml bomb). For example, if I load this document : <?xml version="1.0" encoding="ISO-8859-1"?> <!DOCTYPE foo [ <!ELEMENT foo ANY > <!ENTITY xxe SYSTEM "file:///c:/text.xml" >]> <foo>&xxe;</foo> The file test.xml gets nicely added to the xml document. See OWASP and others for examples. This is clearly a xml processing issue so the question is : can we disable this? And if so, on what levels would this be possible. Best should be system-wide. ( And if you cannot disable this, I think this is something ML should address immediately. Thank you in advance, Marcel de Kleine, EPAM Marcel de Kleine Senior Software Engineer Office: +31 20 241 6134 x 30530<tel:+31%2020%20241%206134;ext=30530> Cell: +31 6 14806016<tel:+31%206%2014806016> Email: marcel_de_kle...@epam.com<mailto:marcel_de_kle...@epam.com> Delft, Netherlands epam.com<https://urldefense.proofpoint.com/v2/url?u=http-3A__www.epam.com&d=DwMFAg&c=z0adcvxXWKG6LAMN6dVEqQ&r=wQ09nIebnRJGH1VgSesPfFnvXo10BKdu-taGZQaoghw&m=yiUEuOMjMBUR5ccv3Gi1vFMsW6pyEFhtMdzfpZtXd7g&s=a20FyQ4Tr_pZurrcjmEjQUs0A9Nd3NR48cC-wrqcKGA&e=> CONFIDENTIALITY CAUTION AND DISCLAIMER This message is intended only for the use of the individual(s) or entity(ies) to which it is addressed and contains information that is legally privileged and confidential. If you are not the intended recipient, or the person responsible for delivering the message to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. All unintended recipients are obliged to delete this message and destroy any printed copies. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://developer.marklogic.com/pipermail/general/attachments/20180314/dd673667/attachment-0001.html _______________________________________________ General mailing list General@developer.marklogic.com Manage your subscription at: http://developer.marklogic.com/mailman/listinfo/general