On 26 August 2016 at 16:10, Frank Millman <fr...@chagford.com> wrote:
> "Joonas Liik"  wrote in message
> news:cab1gnpqnjdenaa-gzgt0tbcvwjakngd3yroixgyy+mim7fw...@mail.gmail.com...
>
>> On 26 August 2016 at 08:22, Frank Millman <fr...@chagford.com> wrote:
>> >
>> > So this is my conversion routine -
>> >
>> > lines = string.split('"')  # split on attributes
>> > for pos, line in enumerate(lines):
>> >    if pos%2:  # every 2nd line is an attribute
>> >        lines[pos] = line.replace('&lt;', '<').replace('&gt;', '>')
>> > return '"'.join(lines)
>> >
>>
>> or.. you could just escape all & as &amp; before escaping the > and <,
>> and do the reverse on decode
>>
>
> Thanks, Joonas, but I have not quite grasped that.
>
> Would you mind explaining how it would work?
>
> Just to confirm that we are talking about the same thing -
>
> This is not allowed - '<root><fld name="<new>"/></root>'  [A]
>
>>>> import xml.etree.ElementTree as etree
>>>> x = '<root><fld name="<new>"/></root>'
>>>> y = etree.fromstring(x)
>
> Traceback (most recent call last):
>  File "<stdin>", line 1, in <module>
>  File
> "C:\Users\User\AppData\Local\Programs\Python\Python35\lib\xml\etree\ElementTree.py",
> line 1320, in XML
>    parser.feed(text)
> xml.etree.ElementTree.ParseError: not well-formed (invalid token): line 1,
> column 17
>
> You have to escape it like this - '<root><fld name="&lt;new&gt;"/></root>'
> [B]
>
>>>> x = '<root><fld name="&lt;new&gt;"/></root>'
>>>> y = etree.fromstring(x)
>>>> y.find('fld').get('name')
>
> '<new>'
>>>>
>>>>
>
> I want to convert the string from [B] to [A] for editing, and then back to
> [B] before saving.
>
> Thanks
>
> Frank
>
>
> --
> https://mail.python.org/mailman/listinfo/python-list

something like.. (untested)

def escape(untrusted_string):
    ''' Use on the user provided strings to render them inert for storage
      escaping & ensures that the user cant type sth like '&gt;' in
source and have it magically decode as '>'
    '''
    return untrusted_string.replace("&","&amp;").replace("<",
"&lt;").replace(">", "&gt;")

def unescape(escaped_string):
    '''Once the user string is retreived from storage use this
function to restore it to its original form'''
    return escaped_string.replace("&lt;","<").replace("&gt;",
">").replace("&amp;", "&")

i should note tho that this example is very ad-hoc, i'm no xml expert
just know a bit about xml entities.
if you decide to go this route there are probably some much better
tested functions out there to escape text for storage in xml
documents.
-- 
https://mail.python.org/mailman/listinfo/python-list

Reply via email to