[issue5752] xml.dom.minidom does not handle newline characters in attribute values

2009-04-14 Thread Tomalak

New submission from Tomalak :

Current behavior upon toxml() is:



Upon reading the document again, the new line is normalized and
collapsed into a space (according to the XML spec, section 3.3.3), which
means that it is lost.

Better behavior would be something like this (within attribute values only):



--
components: XML
messages: 85964
nosy: Tomalak
severity: normal
status: open
title: xml.dom.minidom does not handle newline characters in attribute values
versions: Python 2.4, Python 2.5, Python 2.6, Python 2.7, Python 3.0, Python 3.1

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue5752] xml.dom.minidom does not handle newline characters in attribute values

2009-04-14 Thread Tomalak

Changes by Tomalak :


--
type:  -> behavior

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue5752] xml.dom.minidom does not handle newline characters in attribute values

2009-04-21 Thread Daniel Diniz

Changes by Daniel Diniz :


--
keywords: +easy
stage:  -> test needed
versions:  -Python 2.4, Python 2.5, Python 2.7, Python 3.0

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue5752] xml.dom.minidom does not handle newline characters in attribute values

2009-05-02 Thread Francesco Sechi

Changes by Francesco Sechi :


--
nosy: +sechi_francesco

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue5752] xml.dom.minidom does not handle newline characters in attribute values

2009-05-02 Thread Francesco Sechi

Changes by Francesco Sechi :


Added file: http://bugs.python.org/file13837/test_toxml.py

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue5752] xml.dom.minidom does not handle newline characters in attribute values

2009-05-04 Thread Francesco Sechi

Francesco Sechi  added the comment:

Ok, I've tried to solve this problem, but I think that the keyword
'easy' is not suitable for this kind of task, because it is necessary to
modify the expat module that is very complex.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue5752] xml.dom.minidom does not handle newline characters in attribute values

2009-05-06 Thread Tomalak

Tomalak  added the comment:

@Francesco Sechi: Would it not just require a minimal change to the
_write_data() method? Something along the lines of (sorry, no Python
expert, maybe I am way off):

def _write_data(writer, data, is_attrib=False):
"Writes datachars to writer."
if is_attrib: 
data = data.replace("\r", "
").replace("\n", "
")
data = data.replace("&", "&").replace("<", "<")
data = data.replace("\"", """).replace(">", ">")
writer.write(data)

and in Element.writexml():

#[...]
for a_name in a_names:
writer.write(" %s=\"" % a_name)
_write_data(writer, attrs[a_name].value, True)
#[...]

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue5752] xml.dom.minidom does not handle newline characters in attribute values

2009-05-06 Thread Tomalak

Tomalak  added the comment:

Of course it should be:

def _write_data(writer, data, is_attrib=False):
"Writes datachars to writer."
data = data.replace("&", "&").replace("<", "<")
data = data.replace("\"", """).replace(">", ">")
if is_attrib: 
data = data.replace("\r", "
").replace("\n", "
")
writer.write(data)

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue5752] xml.dom.minidom does not handle newline characters in attribute values

2009-05-06 Thread Francesco Sechi

Francesco Sechi  added the comment:

Don't worry, I'm a newer too.
No, your solution does not work, because the method you refer
(_write_data) is called by the toxml() function, but the newline is
replaced with a whitespace by the parsestring() function. The
parsestring function, as I already said, refers to the 'expat' module,
that is very complex (for me).

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue5752] xml.dom.minidom does not handle newline characters in attribute values

2009-05-08 Thread Tomalak

Tomalak  added the comment:

Hmm... I thought toxml() is the part that needs to be fixed, not the
parsing/reading. I mentioned the reading only to outline the data loss
that occurs eventually.

My point is: The toxml() (i.e. _write_data) *actually writes* the
newline to the output. And within parameters, it just shouldn't.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue5752] xml.dom.minidom does not handle newline characters in attribute values

2009-05-08 Thread Tomalak

Tomalak  added the comment:

Attaching a patch that fixes the problem.

--
keywords: +patch
Added file: http://bugs.python.org/file13919/minidom.patch

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue5752] xml.dom.minidom does not handle newline characters in attribute values

2009-05-08 Thread Tomalak

Tomalak  added the comment:

Attaching a test file that outlines the problem. Output on my system
(Windows / Python 3.0) is:

Without the patch:
C:\Python30>python.exe c:\minidom_test.py
False
1 -->"multiline
value"
2 -->"multiline value"

With the patch:
C:\Python30>python.exe c:\minidom_test.py
True
1 -->"multiline
value"
2 -->"multiline
value"

--
Added file: http://bugs.python.org/file13920/toxml_test.py

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue5752] xml.dom.minidom does not handle newline characters in attribute values

2009-05-08 Thread Tomalak

Changes by Tomalak :


Removed file: http://bugs.python.org/file13920/toxml_test.py

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue5752] xml.dom.minidom does not handle newline characters in attribute values

2009-05-08 Thread Tomalak

Changes by Tomalak :


Added file: http://bugs.python.org/file13921/minidom_test.py

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue5752] xml.dom.minidom does not handle newline characters in attribute values

2009-05-10 Thread Francesco Sechi

Francesco Sechi  added the comment:

I think that the problem is: the xmldoc1 has the newline or not? If it
hasn't your patch works only in the particular case you pass a toxml
return value to 'parsestring'. If I pass an XML string with newlines it
doesn't work. So your solution is not generic and cannot be considered a
patch for the issue you proposed.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue5752] xml.dom.minidom does not handle newline characters in attribute values

2009-05-10 Thread Francesco Sechi

Francesco Sechi  added the comment:

I try to explain better what is my opinion:
- If you add the attribute by using setAttribute the newlines are kept
and the toxml works well
- If you add the attribute by using the parsestring, passing it an XML
string the newlines are replaced

- Your patch works only if you act on a well-constructed (i.e.with
newlines kept in internal data structures) xml.dom.minidom.Document
object only, so...
- If you try to execute your patched toxml method of a
xml.dom.minidom.Document constructed using parsestring passing it a
string with newline it does not work.

So your patch works only in a specific case: you are trying to fix a
problem in parsestring, acting on its actual parameter.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue5752] xml.dom.minidom does not handle newline characters in attribute values

2009-05-10 Thread Tomalak

Tomalak  added the comment:

Francesco, I think you are missing the point. :-) The problem has two sides.

If I create an XML document using the DOM (not by parsing it from a
string!), then I can put newline characters into attribute value. This
is allowed and conforms to the XML spec. 

However, *literal* newlines in an attribute value (i.e. when the
document is parsed from a string) have no meaning. The parser treats
them as if they were insignificant whitespace -- they are converted to a
single space. This is also valid and conforms to the XML spec.

The catch: This leads to an actual data loss if I *wanted* to store
newline characters in an attribute -- unless the newline characters are
properly encoded. Encoding the newline characters is also valid and
conforms to the spec, so the DOM implementation should do it. 

In other words - the parsing process you refer to is actually working
fine. If an attribute contains a literal newline, it is indeed okay to
collapse it into a space. It's only the document serializing that is broken.

Minidom is clearly missing functionality here, and it does not conform
to the XML spec. If I store a string of data in an XML document, it must
be ensured that upon reading the document again, I get the *same* data
back. This is what I check with my test script.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com