Re: Python script to optimize XML text
Gabriel Genellina wrote: En Mon, 24 Sep 2007 17:36:05 -0300, Robert Dailey [EMAIL PROTECTED] escribi�: I'm currently seeking a python script that provides a way of optimizing out useless characters in an XML document to provide the optimal size for the file. For example, assume the following XML script: root Test/Test !-- CommentedOutElement/ -- !-- Do Something Else -- /root By running this through an XML optimizer, the file would appear as: rootTest//root ElementTree does that almost for free. As the OP is currently using lxml.etree (and as this was a cross-post to c.l.py and lxml-dev), I already answered on the lxml list. This is just to mention that the XMLParser of lxml.etree accepts keyword options to ignore plain whitespace content, comments and processing instructions, and that you can provide a DTD to tell it what whitespace-only content really is useless in the sense of your specific application. Stefan -- http://mail.python.org/mailman/listinfo/python-list
Re: Python script to optimize XML text
Hey guys, Thanks for everyone's input. I wanted to learn regular expressions, however I'm finding them to be quite evil. I think I've learned that it's always a good idea to make regex a very LAST resort. This is my opinion I'm developing on. In any case, I like the ideas mentioned here concerning using the XML parser to do the job for me. Thanks again everyone, I think I'll be going with the XML parser to do what I need. Have a good day everyone. On 9/25/07, Stefan Behnel [EMAIL PROTECTED] wrote: Gabriel Genellina wrote: En Mon, 24 Sep 2007 17:36:05 -0300, Robert Dailey [EMAIL PROTECTED] escribi�: I'm currently seeking a python script that provides a way of optimizing out useless characters in an XML document to provide the optimal size for the file. For example, assume the following XML script: root Test/Test !-- CommentedOutElement/ -- !-- Do Something Else -- /root By running this through an XML optimizer, the file would appear as: rootTest//root ElementTree does that almost for free. As the OP is currently using lxml.etree (and as this was a cross-post to c.l.py and lxml-dev), I already answered on the lxml list. This is just to mention that the XMLParser of lxml.etree accepts keyword options to ignore plain whitespace content, comments and processing instructions, and that you can provide a DTD to tell it what whitespace-only content really is useless in the sense of your specific application. Stefan -- http://mail.python.org/mailman/listinfo/python-list -- http://mail.python.org/mailman/listinfo/python-list
Re: Python script to optimize XML text
En Tue, 25 Sep 2007 11:43:03 -0300, Robert Dailey [EMAIL PROTECTED] escribi�: Thanks for everyone's input. I wanted to learn regular expressions, however I'm finding them to be quite evil. I think I've learned that it's always a good idea to make regex a very LAST resort. This is my opinion I'm developing on. In any case, I like the ideas mentioned here concerning using the XML parser to do the job for me. Thanks again everyone, I think I'll be going with the XML parser to do what I need. So you get by yourself to the truth of this famous quote: Some people, when confronted with a problem, think “I know, I'll use regular expressions.” Now they have two problems. (Jamie Zawinski) -- Gabriel Genellina -- http://mail.python.org/mailman/listinfo/python-list
Python script to optimize XML text
Hi, I'm currently seeking a python script that provides a way of optimizing out useless characters in an XML document to provide the optimal size for the file. For example, assume the following XML script: root Test/Test !-- CommentedOutElement/ -- !-- Do Something Else -- /root By running this through an XML optimizer, the file would appear as: rootTest//root Note that the following were changed: - All comments were stripped from the XML - All spaces, tabs, carriage returns, and other forms of unimportant whitespace are removed - Elements that contain no text or children that are in the form of Empty/Empty use the short-hand method for ending an element body: Empty/ Anyone know of a tool or python script that can perform optimizations like explained above? I realize I could probably do this with regular expressions in python, however I was hoping someone already did this work. -- http://mail.python.org/mailman/listinfo/python-list
Re: [lxml-dev] Python script to optimize XML text
If your XML is well-formed, a XSLT is probably your best choice. I believe even the most trivial 'pass through' example might produce the output you expect here. -- Sidnei da Silva Enfold Systemshttp://enfoldsystems.com Fax +1 832 201 8856 Office +1 713 942 2377 Ext 214 -- http://mail.python.org/mailman/listinfo/python-list
Re: Python script to optimize XML text
En Mon, 24 Sep 2007 17:36:05 -0300, Robert Dailey [EMAIL PROTECTED] escribi�: I'm currently seeking a python script that provides a way of optimizing out useless characters in an XML document to provide the optimal size for the file. For example, assume the following XML script: root Test/Test !-- CommentedOutElement/ -- !-- Do Something Else -- /root By running this through an XML optimizer, the file would appear as: rootTest//root ElementTree does that almost for free. I've just posted an example. source = root Test/Test !-- CommentedOutElement/ -- !-- Do Something Else -- /root import xml.etree.ElementTree as ET tree = ET.XML(source) print ET.tostring(tree) Output: root Test / /root If you still want to remove all whitespace: def stripws(node): if node.text: node.text = node.text.strip() if node.tail: node.tail = node.tail.strip() for child in node.getchildren(): stripws(child) stripws(tree) print ET.tostring(tree) Output: rootTest //root -- Gabriel Genellina -- http://mail.python.org/mailman/listinfo/python-list