Re: Python script to optimize XML text

2007-09-25 Thread Stefan Behnel
Gabriel Genellina wrote:
 En Mon, 24 Sep 2007 17:36:05 -0300, Robert Dailey [EMAIL PROTECTED]
 escribi�:
 
 I'm currently seeking a python script that provides a way of
 optimizing out
 useless characters in an XML document to provide the optimal size for the
 file. For example, assume the following XML script:

 root
 Test/Test
 !-- CommentedOutElement/ --

 !-- Do Something Else --
 /root

 By running this through an XML optimizer, the file would appear as:

 rootTest//root
 
 ElementTree does that almost for free.

As the OP is currently using lxml.etree (and as this was a cross-post to
c.l.py and lxml-dev), I already answered on the lxml list.

This is just to mention that the XMLParser of lxml.etree accepts keyword
options to ignore plain whitespace content, comments and processing
instructions, and that you can provide a DTD to tell it what whitespace-only
content really is useless in the sense of your specific application.

Stefan
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Python script to optimize XML text

2007-09-25 Thread Robert Dailey
Hey guys,

Thanks for everyone's input. I wanted to learn regular expressions, however
I'm finding them to be quite evil. I think I've learned that it's always a
good idea to make regex a very LAST resort. This is my opinion I'm
developing on. In any case, I like the ideas mentioned here concerning using
the XML parser to do the job for me. Thanks again everyone, I think I'll be
going with the XML parser to do what I need.

Have a good day everyone.

On 9/25/07, Stefan Behnel [EMAIL PROTECTED] wrote:

 Gabriel Genellina wrote:
  En Mon, 24 Sep 2007 17:36:05 -0300, Robert Dailey [EMAIL PROTECTED]
  escribi�:
 
  I'm currently seeking a python script that provides a way of
  optimizing out
  useless characters in an XML document to provide the optimal size for
 the
  file. For example, assume the following XML script:
 
  root
  Test/Test
  !-- CommentedOutElement/ --
 
  !-- Do Something Else --
  /root
 
  By running this through an XML optimizer, the file would appear as:
 
  rootTest//root
 
  ElementTree does that almost for free.

 As the OP is currently using lxml.etree (and as this was a cross-post to
 c.l.py and lxml-dev), I already answered on the lxml list.

 This is just to mention that the XMLParser of lxml.etree accepts keyword
 options to ignore plain whitespace content, comments and processing
 instructions, and that you can provide a DTD to tell it what
 whitespace-only
 content really is useless in the sense of your specific application.

 Stefan
 --
 http://mail.python.org/mailman/listinfo/python-list
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Python script to optimize XML text

2007-09-25 Thread Gabriel Genellina
En Tue, 25 Sep 2007 11:43:03 -0300, Robert Dailey [EMAIL PROTECTED]  
escribi�:

 Thanks for everyone's input. I wanted to learn regular expressions,  
 however
 I'm finding them to be quite evil. I think I've learned that it's always  
 a
 good idea to make regex a very LAST resort. This is my opinion I'm
 developing on. In any case, I like the ideas mentioned here concerning  
 using
 the XML parser to do the job for me. Thanks again everyone, I think I'll  
 be
 going with the XML parser to do what I need.

So you get by yourself to the truth of this famous quote:

Some people, when confronted with a problem, think
“I know, I'll use regular expressions.”   Now they have two problems.
(Jamie Zawinski)

-- 
Gabriel Genellina

-- 
http://mail.python.org/mailman/listinfo/python-list

Python script to optimize XML text

2007-09-24 Thread Robert Dailey
Hi,

I'm currently seeking a python script that provides a way of optimizing out
useless characters in an XML document to provide the optimal size for the
file. For example, assume the following XML script:

root
Test/Test
!-- CommentedOutElement/ --

!-- Do Something Else --
/root

By running this through an XML optimizer, the file would appear as:

rootTest//root

Note that the following were changed:
- All comments were stripped from the XML
- All spaces, tabs, carriage returns, and other forms of unimportant
whitespace are removed
- Elements that contain no text or children that are in the form of
Empty/Empty use the short-hand method for ending an element body:
Empty/

Anyone know of a tool or python script that can perform optimizations like
explained above? I realize I could probably do this with regular expressions
in python, however I was hoping someone already did this work.
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: [lxml-dev] Python script to optimize XML text

2007-09-24 Thread Sidnei da Silva
If your XML is well-formed, a XSLT is probably your best choice. I
believe even the most trivial 'pass through' example might produce the
output you expect here.

-- 
Sidnei da Silva
Enfold Systemshttp://enfoldsystems.com
Fax +1 832 201 8856 Office +1 713 942 2377 Ext 214
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Python script to optimize XML text

2007-09-24 Thread Gabriel Genellina
En Mon, 24 Sep 2007 17:36:05 -0300, Robert Dailey [EMAIL PROTECTED]  
escribi�:

 I'm currently seeking a python script that provides a way of optimizing  
 out
 useless characters in an XML document to provide the optimal size for the
 file. For example, assume the following XML script:

 root
 Test/Test
 !-- CommentedOutElement/ --

 !-- Do Something Else --
 /root

 By running this through an XML optimizer, the file would appear as:

 rootTest//root

ElementTree does that almost for free. I've just posted an example.

source = root
  Test/Test
  !-- CommentedOutElement/ --

  !-- Do Something Else --
/root

import xml.etree.ElementTree as ET
tree = ET.XML(source)
print ET.tostring(tree)

Output:
root
  Test /



/root

If you still want to remove all whitespace:

def stripws(node):
 if node.text:
 node.text = node.text.strip()
 if node.tail:
 node.tail = node.tail.strip()
 for child in node.getchildren():
 stripws(child)
stripws(tree)
print ET.tostring(tree)

Output:
rootTest //root

-- 
Gabriel Genellina

-- 
http://mail.python.org/mailman/listinfo/python-list