Hi Karl,
You're not parsing the context_string as XML or HTML; so lxml will be thinking
its just some text that looks horribly like XML but is not XML and therefore
needs to be escaped to be included within XML.
The following:
import lxml.etree as etree
content_text = '<p>line one</p><p>line two</p>'
en_note_el = etree.XML(f'<en-note>{content_text}</en-note>')
en_note_doctype = '<!DOCTYPE en-note SYSTEM
"http://xml.evernote.com/pub/enml2.dtd">'
en_note_str = etree.tostring(en_note_el, encoding='UTF-8', method="xml",
xml_declaration=True,
pretty_print=False, standalone=False,
doctype=en_note_doctype)
content_el = etree.Element('content')
content_el.text = etree.CDATA(en_note_str)
print(etree.tostring(content_el).decode('utf8'))
Produces the output:
<content><![CDATA[<?xml version='1.0' encoding='UTF-8' standalone='no'?>
<!DOCTYPE en-note SYSTEM "http://xml.evernote.com/pub/enml2.dtd">
<en-note><p>line one</p><p>line two</p></en-note>]]></content>
Which would expect is what you're after?
Cheers,
aid
> On 18 Aug 2022, at 15:57, [email protected] wrote:
>
> Hello, I need to add some HTML inside XML. The result should look like this:
>
> <content>
> <![CDATA[<?xml version="1.0" encoding="UTF-8" standalone="no"?>
> <!DOCTYPE en-note SYSTEM
> "http://xml.evernote.com/pub/enml2.dtd"><en-note><p>line one</p><p>line
> two</p></en-note>]]>
> </content>
>
> the code i'm using is this:
> # read html from file - result is :
> content_text = '<p>line one</p><p>line two</p>'
>
> en_note_el = etree.Element('en-note')
> en_note_el.text = content_text
> en_note_doctype = '<!DOCTYPE en-note SYSTEM
> "http://xml.evernote.com/pub/enml2.dtd">'
> en_note_str = etree.tostring(en_note_el, encoding='UTF-8', method="xml",
> xml_declaration=True,
> pretty_print=False, standalone=False,
> doctype=en_note_doctype)
>
> content_el = etree.SubElement(note_el, 'content')
> content_el.text = etree.CDATA(en_note_str)
> ==
>
> This works, except the included HTML in the text element of en-note is
> escaped. Can you help me figure how to not have it be escaped? The contents
> inside the <en-note> tags are supposed to be valid HTML, but without any
> <html> or <body> sections, and there isn't really a root element.
> _______________________________________________
> lxml - The Python XML Toolkit mailing list -- [email protected]
> To unsubscribe send an email to [email protected]
> https://mail.python.org/mailman3/lists/lxml.python.org/
> Member address: [email protected]
_______________________________________________
lxml - The Python XML Toolkit mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3/lists/lxml.python.org/
Member address: [email protected]