from:"uche . ogbuji"

Re: lxml/ElementTree and .tail

2006-11-19 Thread Uche Ogbuji

Paul McGuire wrote:
> Thankfully, I'm largely on the periphery of that universe (except for being
> a sometimes victim).  But it is certainly frustrating to see many of the OMG
> concepts of the 90's reimplemented in Java services, and then again in
> XML/SOAP, with no detectable awareness that these messaging and
> serialization problems have been considered before, and much more
> thoroughly.

You'll be surprised at how many XMLers agree that Web services are a
pretty inept reinvention of CORBA.  I was pretty much slain by this
take:

http://wanderingbarque.com/nonintersecting/2006/11/15/the-s-stands-for-simple

I think Duncan Grisby of OmniORB put it most succintly when he pointed
out that SOAP and friends are more complex, more bloated, and less
interoprable than CORBA ever was.  But they use XML so they get the
teacher's pet treatment.

> I liked XML when I could read it and hack it out in Notepad.

You still can, and don't let anyone tell you otherwise.  I've always
argued that XML doesn't work unless it's Notepad-hackable.  I do
usually allow an exception for SVG.

> I like
> attributes, which puts me on the outs with most XML zealots who forswear the
> use of attributes on purely academic grounds (they defeat the future
> possible expansion of an attribute's value into more complex substructure).

Really?  Do you have any references for this?  I haven't seen much
criticism of attributes since the very early days, and almost all XML
technologies make heavy use of attributes.  Here's my take:

http://www.ibm.com/developerworks/xml/library/x-eleatt.html

As you can see, elements and attributes get equal billing.

> I dislike namespaces, especially the default xmlns kind, as they make me
> take extra steps when retrieving nodes via Xpaths; and everyone seems to
> think their application needs namespaces, when there is no threat that these
> tags will ever get mixed up with anyone else's.

Namespaces are possibly the worst thing to have ever happened to XML.
Again, my take:

http://www.ibm.com/developerworks/xml/library/x-namcar.html

And yes, default namespaces are about 50% of the problem with
namespace.  QNames in content (which are of course an abuse of
namespaces) are almost all of the other 50%.  I call them "hidden
namespaces":

http://copia.ogbuji.net/blog/2006-08-14/Some_thoug

--
Uche Ogbuji   Fourthought, Inc.
http://uche.ogbuji.nethttp://fourthought.com
http://copia.ogbuji.net   http://4Suite.org
Articles: http://uche.ogbuji.net/tech/publications/

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: lxml/ElementTree and .tail

2006-11-19 Thread Uche Ogbuji

Fredrik Lundh wrote:
> Uche Ogbuji wrote:
>
> > I certainly have never liked the aspects of the ElementTree API under
> > present discussion.  But that's not as important as the fact that I
> > think the above statement is misleading.  There has always been a
> > battle in XML between the people who think the serialization is
> > preeminent, and those who believe some data model is preeminent, but
> > the reality is that XML 1.0 (an 1.1) is a spec *defined* by its
> > serialization.
>
> sure, the computing world is and has always been full of people who want
> the simplest thing to look a lot harder than it actually is.  after all,
> *they* spent lots of time reading all the specifications, they've bought
> all the books, and went to all the seminars, so it's simply not fair
> when others are cheating.

You sound bitter about something.  Don't worry, it's really not all
that serious.

> in reality, *all* interchange formats are easier to understand and use
> if you focus on a (complete or intentionally simplified) data model of
> the things being interchanged, and treat various artifacts of the
> byte-stream used by the wire format as artifacts, historical accidents
> based on what specification happened to be written before the other, or
> what some guy did or did not do in the seventies, as accidents, and
> esoteric arcana disseminated on limited-distribution mailing lists as
> about as relevant for your customer as last week's episode of American Idol.

The fact that the XML Infoset is hardly used outside W3C XML Schema,
and that the XPath data model is far more common, and that focus on the
serialization is even more common than that is a matter of everyday
practicality.

And oh by the way, this thread is all about *your* customer's
complaining.  And your response is to give them your philosophical take
on XML.  Doesn't that contradict what you're saying above?

Oh never mind.  You posted something misleading, and I posted another
point of view.  I know you're incapable of any disagreement that
doesn't devolve into a full-scale flame-war.  Sometimes I have time for
that sort of thing.  This is not one fo those times, so this is
probably where I get off.

--
Uche Ogbuji   Fourthought, Inc.
http://uche.ogbuji.nethttp://fourthought.com
http://copia.ogbuji.net   http://4Suite.org
Articles: http://uche.ogbuji.net/tech/publications/

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: newbie: minidom

2006-11-17 Thread Uche Ogbuji

Paul Watson wrote:
> Explicit [XML declaration] is better than implicit.

Yes indeed.

"Always use an XML declaration"
http://www-128.ibm.com/developerworks/xml/library/x-tipdecl.html

--
Uche Ogbuji   Fourthought, Inc.
http://uche.ogbuji.nethttp://fourthought.com
http://copia.ogbuji.net   http://4Suite.org
Articles: http://uche.ogbuji.net/tech/publications/

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: lxml/ElementTree and .tail

2006-11-17 Thread Uche Ogbuji

Fredrik Lundh wrote:
> Chas Emerick wrote:
> > If I'm wrong, just chalk it up to the fact that this is the first
> > time I've ever looked at the Infoset spec, and I'm simply confused.
>
> the Infoset spec *is* the essence of XML; if you don't realize that an
> XML document is just a serialization of a very simple data model, you're
> bound to be fighting with XML all the time.

I certainly have never liked the aspects of the ElementTree API under
present discussion.  But that's not as important as the fact that I
think the above statement is misleading.  There has always been a
battle in XML between the people who think the serialization is
preeminent, and those who believe some data model is preeminent, but
the reality is that XML 1.0 (an 1.1) is a spec *defined* by its
serialization.  Infoset is a secondary and optional spec.  In fact, I
think it's clear that Infoset is not even the preeminent *data model*
of the XML world.  That distinction goes to the XPath data model, which
is quite different from the Infoset.

--
Uche Ogbuji   Fourthought, Inc.
http://uche.ogbuji.nethttp://fourthought.com
http://copia.ogbuji.net   http://4Suite.org
Articles: http://uche.ogbuji.net/tech/publications/

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: WSGI - How Does It Affect Me?

2006-10-10 Thread uche . ogbuji

goon wrote:
> > Trying to research this on the web now
>
> Lots of articles now appearing summarising WSGI ...
>
> For definitive reference:
>
> <http://www.python.org/dev/peps/pep-0333/> [0]
>
> Overview:
>
> <http://www.xml.com/lpt/a/1674>  [1] and
> <http://www.xml.com/lpt/a/1675>  [2]

And also the following article, by me, focusing on middleware:

http://www.ibm.com/developerworks/library/wa-wsgi/
(cover Weblog entry: http://copia.ogbuji.net/blog/2006-08-23/_Mix_and_m
)

--
Uche Ogbuji   Fourthought, Inc.
http://uche.ogbuji.nethttp://fourthought.com
http://copia.ogbuji.net   http://4Suite.org
Articles: http://uche.ogbuji.net/tech/publications/

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Trying to find a elements Xpath and store it as a attribute

2006-10-06 Thread uche . ogbuji

provowallis wrote:
> Hi all,
>
> I've been struggling with this for a while so I'm hoping that someone
> could point me in the right direction. Here's my problem: I'm trying to
> get the XPath for a given node in my document and then store that XPath
> as an attribute of the element itself. If anyone has a recommendation
> I'd be happy to hear it.


Sorry.  I only check c.l.py once a week or so...

> For instance, I would take this XML
>
> ###before
>
> 
> 
> An XSLT Programmer
> Hello, World!
> 
>
> ###after
>
> 
> 
> An XSLT Programmer
> Hello, World!
> 
>
> ###
>
> import sets
> import amara
> from amara import binderytools
>
> doc = amara.parse('hello.xml')
> elems = {}
>
> for e in doc.xml_xpath('//*'):
>
>  paths = elems.setdefault((e.namespaceURI, e.localName),
> sets.Set())
>  path = u'/'.join([n.nodeName for n in
> e.xml_xpath(u'ancestor::*')])
>  paths.add(u'/' + path)
>
> for name in elems:
>
>  doc.name.km = elems[name]

It's a tougher problem than you may think :-)

Luckily it's a problem I've worked on.  For discussion see:

http://www.xml.com/pub/a/2004/11/24/py-xml.html

For an updated solution see abs_path in Amara domtools.  In most cases
you can safely call that on an Amara bindery node.


--
Uche Ogbuji   Fourthought, Inc.
http://uche.ogbuji.nethttp://fourthought.com
http://copia.ogbuji.net   http://4Suite.org
Articles: http://uche.ogbuji.net/tech/publications/

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: File I/O

2006-10-06 Thread uche . ogbuji

Ant wrote:
> Kirt wrote:
> ...
> > i dont wanna parse the xml file..
> >
> > Just open the file as:
> >
> > f=open('test.xml','a')
> >
> > and append a line "abc" before  tag 
>
> The other guys are right - you should look at something like
> ElementTree which makes this sort of thing pretty easy, and is robust.
> But if you are sure that:
>
> 1)  is going to be on its own line (rather than something like
> )
> 2) the ending tag will *definitely* have no extraneous whitespace (e.g.
> < / Top >)
>
> then the following will probably be the simplest solution:
>
> f=open('test.xml')
> out = []
> for line in f:
> if "" in line:
> out.append("abc")
> out.append(line")
>
> f.close()
>
> f_out = open("test.xml", "w")
> f.write("".join(out))
> f_out.close()

And the most dangerous solution.  Start with the line
"out.append(line")"

And have a look at the many failure possibilities I detail here:

http://www.xml.com/pub/a/2002/11/13/py-xml.html

Then add to that the fact that "" can legitimately appear in an
XML comment, so that logic is even more brittle.

The following code does this *safely* with Amara:

import amara
doc = amara.parse('test.xml')
top = doc.xml_xpath('//Top')[0]
top.xml_parent.xml_insert_before(top, doc.xml_create_element(u'Body2',
content=u'abc'))
top.xml(stream=open('test.xml', 'w'))

Amara: http://uche.ogbuji.net/tech/4suite/amara/


--
Uche Ogbuji   Fourthought, Inc.
http://uche.ogbuji.nethttp://fourthought.com
http://copia.ogbuji.net   http://4Suite.org
Articles: http://uche.ogbuji.net/tech/publications/

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: PyXML not supported, what to use next?

2006-10-06 Thread uche . ogbuji

Paul Watson wrote:
> It would appear that xml.dom.minidom or xml.sax.* might be the best
> thing to use since PyXML is going without support.  Best of all it is
> included in the base Python distribution, so no addition hunting required.

FWIW, easy_install [1] is making things so that more and more
installing stuff is not much additional burden.  I'll admit that I've
hardly found easy_install to be problem-free, but since it seems to be
the wave of the future (and a welcome wave at that) I've pushed for
support in recent versions of the XML tools I co-develop: 4Suite [2]
and Amara [3].  For many people these are now very easy to install.
This is the case for some other third-party XML tools as well.

[1] http://peak.telecommunity.com/DevCenter/EasyInstall
[2] http://4suite.org/
[3] http://uche.ogbuji.net/tech/4suite/amara/

--
Uche Ogbuji   Fourthought, Inc.
http://uche.ogbuji.nethttp://fourthought.com
http://copia.ogbuji.net   http://4Suite.org
Articles: http://uche.ogbuji.net/tech/publications/

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: XSLT speed comparisons

2006-09-29 Thread uche . ogbuji

[EMAIL PROTECTED] wrote:
> For what it's worth I just developed, and switched to WSGI middleware
> that only does the transform on the server side if the client doesn't
> understand XSLT.  It's called applyxslt and is part of wsgi.xml [1].
> That reduces server load, and with caching (via Myghty), there's really
> no issue for me.  For more on WSGI middleware see [2].
>
> [1] http://uche.ogbuji.net/tech/4suite/wsgixml/
> [2] http://www.ibm.com/developerworks/library/wa-wsgi/

I just wanted to clarify that not only does the applyxslt middleware
approach reduce server load, but in the case of clients running IE6 or
IE7, the XSLT *does* end up being executed in MSXML after all: MSXML on
the client's browser, rather than on the server.  In the case of
Mozilla it's Transformiix, which is between MSXML and 4Suite in
performance.  Not sure what's the XSLT processor in the case of Safari
(only the most recent versions of Safari).  But regardless, with that
coverage you can write apps using XSLT, support the entire spectrum of
browsers (and mobile apps, spiders, etc.) and yet rarely ever require
XSLT applied on the server side.

> --
> Uche Ogbuji   Fourthought, Inc.
> http://uche.ogbuji.nethttp://fourthought.com
> http://copia.ogbuji.net   http://4Suite.org
> Articles: http://uche.ogbuji.net/tech/publications/

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: XSLT speed comparisons

2006-09-29 Thread uche . ogbuji

Ross Ridge wrote:
> Damian wrote:
> It could just be that 4suite is slower than MSXML.  If so, you can use
> MSXML in Python if you want.  You'll need to install the Python for
> Windows extensions.  Something like this:
>
>   from os import environ
>   import win32com.client
>
>   def buildPage():

[SNIP]

Added to:

http://uche.ogbuji.net/tech/akara/nodes/2003-01-01/python-xslt

--
Uche Ogbuji   Fourthought, Inc.
http://uche.ogbuji.nethttp://fourthought.com
http://copia.ogbuji.net   http://4Suite.org
Articles: http://uche.ogbuji.net/tech/publications/

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: XSLT speed comparisons

2006-09-29 Thread uche . ogbuji

Damian wrote:
> Hi, I'm from an ASP.NET background an am considering making the switch
> to Python. I decided to develop my next project in tandem to test the
> waters and everything is working well, loving the language, etc.
>
> What I've got is:
> two websites, one in ASP.NET v2 and one in Python 2.5 (using 4suite for
> XML/XSLT)
> both on the same box (Windows Server 2003)
> both using the same XML, XSLT, CSS
>
> The problem is, the Python version is (at a guess) about three times
> slower than the ASP one. I'm very new to the language and it's likely

The ASP one being MSXML, right?  In that case that result doesn't
surprise me.

> that I'm doing something wrong here:

Now wrong, but we can definitely simplify your API

> from os import environ
> from Ft.Lib.Uri import OsPathToUri
> from Ft.Xml import InputSource
> from Ft.Xml.Xslt import Processor
>
> def buildPage():
> try:
> xsluri = OsPathToUri('xsl/plainpage.xsl')
> xmluri = OsPathToUri('website.xml')
>
> xsl = InputSource.DefaultFactory.fromUri(xsluri)
> xml = InputSource.DefaultFactory.fromUri(xmluri)
>
> proc = Processor.Processor()
> proc.appendStylesheet(xsl)
>
> params = {"url":environ['QUERY_STRING'].split("=")[1]}
> for i, v in enumerate(environ['QUERY_STRING'].split("/")[1:]):
> params["selected_section%s" % (i + 1)] = "/" + v
>
> return proc.run(xml, topLevelParams=params)
> except:
> return "Error blah blah"
>
> print "Content-Type: text/html\n\n"
> print buildPage()

This should work:

from os import environ
from Ft.Xml.Xslt import Transform

def buildPage():
try:
params = {"url":environ['QUERY_STRING'].split("=")[1]}
for i, v in enumerate(environ['QUERY_STRING'].split("/")[1:]):
params["selected_section%s" % (i + 1)] = "/" + v

return Transform('website.xml', 'xsl/plainpage.xsl',
topLevelParams=params)
except:
return "Error blah blah"

print "Content-Type: text/html\n\n"
print buildPage()

-- % --

For what it's worth I just developed, and switched to WSGI middleware
that only does the transform on the server side if the client doesn't
understand XSLT.  It's called applyxslt and is part of wsgi.xml [1].
That reduces server load, and with caching (via Myghty), there's really
no issue for me.  For more on WSGI middleware see [2].

[1] http://uche.ogbuji.net/tech/4suite/wsgixml/
[2] http://www.ibm.com/developerworks/library/wa-wsgi/

--
Uche Ogbuji   Fourthought, Inc.
http://uche.ogbuji.nethttp://fourthought.com
http://copia.ogbuji.net   http://4Suite.org
Articles: http://uche.ogbuji.net/tech/publications/

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: XML parsing and writing

2006-08-28 Thread uche . ogbuji

c00i90wn wrote:
> Nice package ElementTree is but sadly it doesn't have a pretty print,
> well, guess I'll have to do it myself, if you have one already can you
> please give it to me? thanks :)

FWIW Amara and plain old 4Suite both support pretty-print, canonical
XML print and more such options.

http://uche.ogbuji.net/tech/4suite/amara/
http://4Suite.org

--
Uche Ogbuji   Fourthought, Inc.
http://uche.ogbuji.nethttp://fourthought.com
http://copia.ogbuji.net   http://4Suite.org
Articles: http://uche.ogbuji.net/tech/publications/

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Need help in xml

2006-07-10 Thread uche . ogbuji

Kirt wrote:
> i have two xml documemts of type
>
> 
>  test
>  2006-12-12
>  12:12:12
>  
>   /home/
>   
> test2
> 12:12:12
>
>   
>  
>   /home/test
>
> test3
> 12:12:12
>
> 
>  
>
> i have to compare 2 similar xml document and get the add, changed and
> deleted files.and write it into acd.xml file.
> can u help me with the python code for this. I am using SAX.

Use the right tool and such problems tend to become much simpler.

http://www.logilab.org/projects/xmldiff

--
Uche Ogbuji   Fourthought, Inc.
http://uche.ogbuji.nethttp://fourthought.com
http://copia.ogbuji.net   http://4Suite.org
Articles: http://uche.ogbuji.net/tech/publications/

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Amara: Where's my attribute?

2006-07-07 Thread uche . ogbuji

AdSR wrote:
> [EMAIL PROTECTED] wrote:
> > What is the actual problem you're trying to solve?  If you just want to
> > force a namespace declaration in output (this is sually to support
> > QNames in content) the most well-known XML hack is to create a dummy
> > attribute with the needed prefix and namespace.  But this does not work
> > when you're trying to force a default namespace declaration.  Then
> > again, you generally can't use QNames in content with a default
> > namespace declaration.  So my guess is that you somehow got way off the
> > rails in your problem-solving, and you'll need to provide mre
> > background if you want help.
>
> I wanted to remove documentation elements from some XML Schema files.
> The problem showed when I tried to use the stripped schemas, because
> the namespace declaration for user-defined types was missing. Of
> course, since these types are named and referred to in attribute
> *values*, Amara had no way to know that the namespace declaration was
> still needed (didn't matter if default or non-default). This is more a
> problem of how XML Schema is defined against XML namespace rules, since
> XML Schena uses namespaces in a context of which XML parsers aren't
> normally aware.

Yeah.  Just so you know.  This is one of those things about XML that
make sane people want to dye their eyeballs red.

Unfortunately there isn't much recourse but to switch to namespace
qualified form for your QNames and adding dummy attributes so the
namespace is recognized.  Let me know if you need an example.


> > BTW, I recommend upgrading to Amara 1.1.7.  That branch will soon be
> > 1.2, and I consider it more mature than 1.0 at this point.  The API's
> > also easier:
>
> I know, especially the insert-before/after feature :) But I ran into a
> problem that I describe below and you advertised 1.0 as "stable
> version", so I switched immediately.
>
> The problem can be reproduced like this:
>
> >>> import amara
> >>> amara.parse('http://www.w3.org/2001/XMLSchema.xsd')
> START DTD xs:schema -//W3C//DTD XMLSCHEMA 200102//EN XMLSchema.dtd
> http://www.w3.org/2001/datatypes.dtd:99:23: Attribute 'id' already
> declared
> http://www.w3.org/2001/datatypes.dtd:122:23: Attribute 'id' already
> declared
> http://www.w3.org/2001/datatypes.dtd:130:27: Attribute 'id' already
> declared
> ...some 40 more lines like this and then Python crashes (Windows shows
> the bug-reporting dialog)


I don't get a crash on my system (Ubuntu), but I do get a legitimate
error message because that DTD is broken.  The W3C seems to like
disseminating broken DTDs.  Just yesterday I was helping someone around
the infamous broken XHTML 1.1 DTDs.

I do want to know why you're gettign a crash rather than just the error
message.  What version of Python is that?  Any chance you can try with
current CVS Amara (you can use easy_install)?  This part of the
discussion should perhaps move to the 4Suite mailing list.  I only
check this NG once a week.

--
Uche Ogbuji   Fourthought, Inc.
http://uche.ogbuji.nethttp://fourthought.com
http://copia.ogbuji.net   http://4Suite.org
Articles: http://uche.ogbuji.net/tech/publications/

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Amara: Where's my attribute?

2006-07-02 Thread uche . ogbuji

AdSR wrote:
> Hi,
>
> I'm having a problem with the Amara toolkit. Try this:
>
> >>> from amara import binderytools
> >>> raw = 'http://example.com/namespace"; 
> >>> xmlns:pq="http://pq.com/ns2"/>'
> >>> rwd = binderytools.bind_string(raw)
> >>> print rwd.xml()
> 
> http://pq.com/ns2"/>
>
> What happened to the xmlns attribute? Does anyone know a solution to
> this? The only workaround I found is to:
>
> >>> rwd.test.xml_set_attribute(u'xmlns', u'http://example.com/namespace')
> u'xmlns'
> >>> print rwd.xml()
> 
> http://pq.com/ns2";
> xmlns="http://example.com/namespace"/>
>
> but it only helps if you know what to patch.
>
> My setup:
>
> Python 2.4.3
> 4Suite 1.0b3
> Amara 1.0
>
> I see that people have reported similar problems with other XML
> toolkits, so I guess this is a general namespace ugliness.

What is the actual problem you're trying to solve?  If you just want to
force a namespace declaration in output (this is sually to support
QNames in content) the most well-known XML hack is to create a dummy
attribute with the needed prefix and namespace.  But this does not work
when you're trying to force a default namespace declaration.  Then
again, you generally can't use QNames in content with a default
namespace declaration.  So my guess is that you somehow got way off the
rails in your problem-solving, and you'll need to provide mre
background if you want help.

BTW, I recommend upgrading to Amara 1.1.7.  That branch will soon be
1.2, and I consider it more mature than 1.0 at this point.  The API's
also easier:

>>> import amara
>>> rwd = amara.parse('http://example.com/namespace"; 
>>> xmlns:pq="http://pq.com/ns2"/>')

--
Uche Ogbuji   Fourthought, Inc.
http://uche.ogbuji.nethttp://fourthought.com
http://copia.ogbuji.net   http://4Suite.org
Articles: http://uche.ogbuji.net/tech/publications/

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: beautifulsoup .vs tidy

2006-07-02 Thread uche . ogbuji

bruce wrote:
> hi paddy...
>
> that's exactly what i'm trying to accomplish... i've used tidy, but it seems
> to still generate warnings...
>
>  initFile -> tidy ->cleanFile -> perl app (using xpath/livxml)
>
> the xpath/linxml functions in the perl app complain regarding the file. my
> thought is that tidy isn't cleaning enough, or that the perl xpath/libxml
> functions are too strict!
>
> which is why i decided to see if anyone on the python side has
> experienced/solved this problem..

FWIW here's my usual approach:

http://copia.ogbuji.net/blog/2005-07-22/Beyond_HTM

Personally, I avoid Tidy.  I've too often seen it crash or hang on
really bad HTML.  TagSoup seems to be built like a tank.  I've also
never seen BeautifulSoup choke, but I don't use it as much as TagSoup.

--
Uche Ogbuji   Fourthought, Inc.
http://uche.ogbuji.nethttp://fourthought.com
http://copia.ogbuji.net   http://4Suite.org
Articles: http://uche.ogbuji.net/tech/publications/

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: xpath question

2006-07-02 Thread uche . ogbuji

bruce wrote:
> is there anyone with XPath expertise here? i'm trying to figure out if
> there's a way to use regex expressions with an xpath query? i've seen
> references to the ability to use regex and xpath/xml, but i'm not sure how
> to do it...
>
> i have a situation where i have something like:
>  /html/table//[EMAIL PROTECTED]'foo']
>
> is it possible to do soomething like [EMAIL PROTECTED]/fo/] so i'd match the 
> class
> attribute with fo
>
> i'm trying to parse HTML/Web docs...

4Suite [1] supports regex in XPath using the EXSLT community standard's
regex module [2].  It would be something like:

[re:match(@class, 'fo.*']

With the re prefix set as required by the EXSLT module.

[1] http://4Suite.org
[2] http://www.exslt.org/regexp/

--
Uche Ogbuji   Fourthought, Inc.
http://uche.ogbuji.nethttp://fourthought.com
http://copia.ogbuji.net   http://4Suite.org
Articles: http://uche.ogbuji.net/tech/publications/

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: 10GB XML Blows out Memory, Suggestions?

2006-06-11 Thread uche . ogbuji

K.S.Sreeram wrote:
> Fredrik Lundh wrote:
> > both ElementTree and cElementTree support "sax-style" event generation
> > (through XMLTreeBuilder/XMLParser) and incremental parsing (through
> > iterparse).  the cElementTree versions of these are even faster than
> > pyexpat.
> >
> > the iterparse interface is described here:
> >
> >  http://effbot.org/zone/element-iterparse.htm
> >
> Thats cool! Thanks for the info!
>
> For a multi-gigabyte file, I would still recommend C/C++, because the
> processing code which sits on top of the XML library needs to be Python,
> and that could turn out to be a significant overhead in such extreme cases.
>
> Of course, the exact strategy to follow would depend on the specifics of
> the case, and all this speculation may not really apply! :)

Honestly, i think that legitimate use-cases for multi-gigabyte XML are
very rare.  Many people abuse XML as some sort of DBMS replacement.
This abuse is part of the reason why so many developers are hostile to
XML.  XML is best for documents, and documents can get to the
multi-gigabyte range, but rarely do.  Usually, when they do, there is a
logical way to decompose them, process them, and re-compose them,
whereas with XML used as a DBMS replacement, relations and datatyping
complicate such natural divide-and-conquer techniques.

I always say that if you're dealing with gigabyte XML, it's well worth
considering whether you're not using a hammer to screw in a bolt.

If monster XML is inevitable, then I extend's Fredrik earlier mention
of Amara to say that Pushdom allows you to pre-declare the chunks of
XML you're interested in, and then it processes the XML in streaming
mode, only instantiating the chunks of interest one at a time.  This
allows for handling of huge files with a very simple programming idiom.

http://uche.ogbuji.net/tech/4suite/amara/

--
Uche Ogbuji   Fourthought, Inc.
http://uche.ogbuji.nethttp://fourthought.com
http://copia.ogbuji.net   http://4Suite.org
Articles: http://uche.ogbuji.net/tech/publications/

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: how to print newline in xml?

2006-06-04 Thread uche . ogbuji

[EMAIL PROTECTED] wrote:
> I use Python/XML packages are xml.dom.minidom and xml.dom.ext (second
> just for PrettyPrint)

You don't need xml.dom.ext for prettyprint.  You can use
doc.toprettyxml()

I gather you want to tweak the prettyprinter to not add the newline
before the comment.  The only way to do this is to write your own
printing logic, which is really not that hard, if you just start by
copying the code from .writexml (used by .toprettyxml).

But there's an even easier (if slower) way: pretty print the document,
then parse it in again, remove the text node between the element in
question and the following comment, and then use .writexml() to
serialize it it again.

A few general notes:

* You cannot set the order of attributes in most XML tools, whether
Python or not.  This is unfortunate for people who would like to
preserve such details for usability reasons, but that's just the way
XML is.  The closest you can get is by using canonicalization [1],
which is available in PyXML as xml.dom.ext.c14n.  It just so happens
that canonical XML leaves the attributes in the order you want.  You
won't always be so lucky.

* You can always create text nodes by using doc.createTextNode.

* You can always remove text nodes (or any other kind) by using
.removeChild

* It's much easier to navigate if you use XPath.  PyXML has an
xml.xpath module you can use.

Good luck.

[1] http://www-128.ibm.com/developerworks/xml/library/x-c14n/

--
Uche Ogbuji   Fourthought, Inc.
http://uche.ogbuji.nethttp://fourthought.com
http://copia.ogbuji.net   http://4Suite.org
Articles: http://uche.ogbuji.net/tech/publications/

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Announcing atomfeed.py, xmlelements.py, and feedutils.py

2006-03-17 Thread uche . ogbuji


Steve R. Hastings wrote:
> I have written some Python library modules to help with creating Atom
> syndication feeds.  Originally, I had a single module called "PyAtom"; now
> I have split it up into three modules: xmlelements.py, atomfeed.py, and
> feedutils.py.

FWIW, see also Sylvain Hellegouarch's atomixlib [1].  It's used in
production to generate and manage PlanetAtom [2][3].

[1] http://trac.defuze.org/browser/oss/atomixlib
[2] http://planetatom.net/
[3] http://copia.ogbuji.net/blog/2006-01-25/Planet_Ato

--
Uche Ogbuji   Fourthought, Inc.
http://uche.ogbuji.nethttp://fourthought.com
http://copia.ogbuji.net   http://4Suite.org
Articles: http://uche.ogbuji.net/tech/publications/

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: How to search XML? Are there special libs?

2006-03-17 Thread uche . ogbuji

Ravi Teja wrote:
> Yes! XPath is a good bet.
> You can also try some Pythonic XML libraries like Amara. You need not
> learn any special language even.
>
> There are good database approaches to XML too, especially if you are
> going to query a document collection as a whole rather than file by
> file. You can try XQuery. I think 4Suite can do this (But I am too
> sleepy to confirm :-) ). You also use eXist (Java but you can use
> XMLRPC or SOAP to interface with it from Python). Not optimal like
> parent said, but if it is XML that have to live with ...

4Suite does not support XQuery.  It does support full XPath plus EXSLT
and enough other extensions to come close to the power of XQuery.
Amara [1] makes it really easy to get XQuery-like power from right
within Python, as I've blogged before (e.g. [2][3]).

I don't know whether full-text indexing of XML is something the OP
needs as well.  If so, see [3].

[1] http://uche.ogbuji.net/tech/4Suite/amara/
[2] http://copia.ogbuji.net/blog/2005-06-12/Amara_equi
[3] http://copia.ogbuji.net/blog/2005/Sep/20
[4] http://www.xml.com/pub/a/2004/12/08/py-xml.html

--
Uche Ogbuji   Fourthought, Inc.
http://uche.ogbuji.nethttp://fourthought.com
http://copia.ogbuji.net   http://4Suite.org
Articles: http://uche.ogbuji.net/tech/publications/

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Python version of XMLUnit?

2006-03-07 Thread uche . ogbuji

Kent Johnson wrote:
> I have found XMLUnit to be very helpful for testing Java and Jython code
> that generates XML. At its heart XMLUnit is an XML-aware diff - it
> parses expected and actual XML and pinpoints any differences. It is
> smart enough to ignore things like attribute order, different quoting
> and escaping styles, and insignificant whitespace.
>
> Now I am working on a CPython project and have a similar need. Is there
> any comparable tool for Python? Basically I'm looking for a tool to
> compare XML and show diffs in an intelligible fashion that is usable
> from Python unit tests (using py.test, if it matters).

One possible approach is to use c14n to in effect normalize the XML so
that you can use regular text compare.  This is not as sophisticated as
a full XML diff, but it's definitely a viable approach for testing.

For those who migh tbe interested in that approach, learn more about
c14n here:

http://www.ibm.com/developerworks/xml/library/x-c14n/

It includes a brief example using the c14n module in PyXML

http://pyxml.sourceforge.net/

I also recently checked in c14n capability for 4Suite.  It offers the
same level of coverage as PyXML's, but operates in streaming, rather
than DOM mode.

http://4suite.org/

4Suite also contains in its test suite routines (TreeCompare) for
comparing XMl and HTML while ignoring non-significant syntactic
variations.

Certainly full xmldiff is very useful.  One nice thing about LogiLabs's
app is that it  can output XUpdate, which could be used with, say
4Suite's 4XUpdate to apply a patch to another document.

--
Uche Ogbuji   Fourthought, Inc.
http://uche.ogbuji.nethttp://fourthought.com
http://copia.ogbuji.net   http://4Suite.org
Articles: http://uche.ogbuji.net/tech/publications/

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: XSLT and gettext?

2006-03-03 Thread uche . ogbuji

KW wrote:
> I'm looking for a nice way to do i18n with XSLT, preferably using the
> gettext framework. Currently I'm using 4Suite for XSLT processing. Do
> you know of any solutions to this problem?
>
> If no solutions currently exist, I'll try to write something myself. Any
> ideas on how to do this properly? Any existing python code to start with?
>
> I was thinking about wrappingg the text in a new XML tag, say  and
> processing this to generate an XSL for alle languages, but it will also
> require printf like substitution to do this properly.

4Suite has some friendly gettext-based i18n extensions.  See:

http://copia.ogbuji.net/blog/2005-06-14/i18n_for_X

--
Uche Ogbuji   Fourthought, Inc.
http://uche.ogbuji.nethttp://fourthought.com
http://copia.ogbuji.net   http://4Suite.org
Articles: http://uche.ogbuji.net/tech/publications/

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: only a simple xml reader value

2006-02-11 Thread uche . ogbuji

[EMAIL PROTECTED] wrote:
> The only thing I must read is the response I get from a EPP server.
> A response like this:
>
> 
> http://www.eurid.eu/xml/epp/epp-1.0";
> xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance";
> xmlns:contact="http://www.eurid.eu/xml/epp/contact-1.0";
> xmlns:domain="http://www.eurid.eu/xml/epp/domain-1.0";
> xmlns:eurid="http://www.eurid.eu/xml/epp/eurid-1.0";
> xmlns:nsgroup="http://www.eurid.eu/xml/epp/nsgroup-1.0";
> xsi:schemaLocation="http://www.eurid.eu/xml/epp/epp-1.0 epp-1.0.xsd
> http://www.eurid.eu/xml/epp/contact-1.0 contact-1.0.xsd
> http://www.eurid.eu/xml/epp/domain-1.0 domain-1.0.xsd
> http://www.eurid.eu/xml/epp/eurid-1.0 eurid-1.0.xsd
> http://www.eurid.eu/xml/epp/nsgroup-1.0 nsgroup-1.0.xsd">
> 
> 
> Command completed successfully; ending session
> 
> 
> 
> c-and-a.eu
>  c-and-a_1
> 25651602
> 2005-11-08T14:51:08.929Z
> 
> 
> 
> 
> 

So to get the msg, you can do:

print doc.getElementsByTagName('msg')[0].toxml()

But to get the domain:name you have to use the declared namespace:

print
doc.getElementsByTagNameNS('http://www.eurid.eu/xml/epp/domain-1.0',
'name')[0].toxml()

Or you can make life a bit easier with Amara [1]:

import amara
doc = amara.parse(theXML)
print doc.response.result.msg #to get the text content
print doc.response.result.msg.xml() #to get the XML source for that
element
print doc.response.resData.appData.name
print doc.response.resData.appData.name.xml()

[1] http://uche.ogbuji.net/tech/4Suite/amara/

--
Uche Ogbuji   Fourthought, Inc.
http://uche.ogbuji.nethttp://fourthought.com
http://copia.ogbuji.net   http://4Suite.org
Articles: http://uche.ogbuji.net/tech/publications/

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: only a simple xml reader value

2006-02-08 Thread uche . ogbuji

[EMAIL PROTECTED] wrote:
> H!,
>
> Is it possible to get a value value ?
>
> When I do this:
> -
> theXML = """
> The Fascist Menace
> """
> import xml.dom.minidom as dom
> doc = dom.parseString(theXML)
> print doc.getElementsByTagName('title')[0].toxml()
>
> I get : The Fascist Menace thats oke for me
> -
>
> But the xmlfile I must read have other tags:
> theXML = """
> The Fascist Menace
> bla la etc
> """
>
> how to get that values ?
> I try things like:
> print doc.getElementsByTagName('title:id')[0].toxml() <--error

Addressing your general question, unfortunately you're a bit stuck.
Minidom is rather confused about whether or not it's a namespace aware
library.  Addressing your specific example, I strongly advise you not
to use documents that are not well-formed according to Namespaces 1.0.
Your second example is a well-formed XML 1.0 external parsed entity,
but not a well-formed XML 1.0 document entity, because it has multiple
elements at document level.  It's also not well-formed according to
XMLNS 1.0 unless you declare the "title" prefix.  You will not be able
to use a non XMLNS 1.0 document with most XML technologies, including
XSLT, WXS, RELAX NG, etc.

If you have indeed declared a namespace and are just giving us a very
bad example, use:

print doc.getElementsByTagNameNS(title_namespace, 'id')

--
Uche Ogbuji   Fourthought, Inc.
http://uche.ogbuji.nethttp://fourthought.com
http://copia.ogbuji.net   http://4Suite.org
Articles: http://uche.ogbuji.net/tech/publications/

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Fromatting an xml file

2006-02-07 Thread uche . ogbuji

sir_alex wrote:
> Hi! I have a little problem writing xml files formatted in a way like
> the following:
>
> 
>  bla
>  bla
> 
>
> Every new node element should have a tabulation before it, but when I
> use xml.dom.minidom I use writexml, which considers as a new node also
> the text (in my little example, "bla" phrases), so the best result I
> achieved has been the following
>
> 
> 
>  bla
> 
> 
>
> but I don't want the text to be written on newlines... is there a good
> solution? Thanks!

That minidom behavior is fairly unsafe.  4Suite's PrettyPrinter is much
safer:

>>> from Ft.Xml import Parse
>>> from Ft.Xml.Domlette import PrettyPrint
>>> XML = "blabla"
>>> doc = Parse(XML)
>>> PrettyPrint(doc)


  bla
  bla

>>>

http://4Suite.org

--
Uche Ogbuji   Fourthought, Inc.
http://uche.ogbuji.nethttp://fourthought.com
http://copia.ogbuji.net   http://4Suite.org
Articles: http://uche.ogbuji.net/tech/publications/

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: XML Writer in wxPython

2006-02-07 Thread uche . ogbuji

Tim Roberts wrote re using print to generate XML:
> def PrintAddress( last, first, address, city, state, zip ):
> print "  "
> print "%s" % last
> print "%s" % first
> print "%s" % address
> print "%s" % city
> print "%s" % state
> print "%s" % zip
> print " "
>
> print ""
> for row in addressDatabase:
> PrintAddress( row.last, row.first,
>row.address, row.city, row.state, row.zip )
> print ""

Just be sure you're well aware of all the issues:

http://www.xml.com/pub/a/2002/11/13/py-xml.html

See also:

http://www.ibm.com/developerworks/xml/library/x-think35.html

--
Uche Ogbuji   Fourthought, Inc.
http://uche.ogbuji.nethttp://fourthought.com
http://copia.ogbuji.net   http://4Suite.org
Articles: http://uche.ogbuji.net/tech/publications/

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Large XML Document Processing

2006-02-07 Thread uche . ogbuji

Albert Leibbrandt wrote:
> Hi
>
> Just want to check which xml parser you guys have found to be the
> quickest. I have xml documents with 250 000 records or more and the
> processing of these documents are taking way to long. The validation is
> the main problem. Any module names, non validating would be find to,
> would help a lot.

It would help us help you if you posted samples of the target docs.
XML processing strategy often depends on the structure of the XML, just
as relational query optimization strategy often depends on the schema.
In general SAX or iterative tree-callback methods will give you the
best speed.  Fredrik already mentioned ElementTree's IterParse.
Amara's pushbind and pushdom and 4Suite's Saxlette (which has some neat
callback features) are other options.

http://uche.ogbuji.net/tech/4suite/amara/
http://4suite.org/docs/CoreManual.xml#saxlette

--
Uche Ogbuji   Fourthought, Inc.
http://uche.ogbuji.nethttp://fourthought.com
http://copia.ogbuji.net   http://4Suite.org
Articles: http://uche.ogbuji.net/tech/publications/

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: XML Writer in wxPython

2006-02-07 Thread uche . ogbuji

Tim Roberts wrote re using print to generate XML:
> def PrintAddress( last, first, address, city, state, zip ):
> print "  "
> print "%s" % last
> print "%s" % first
> print "%s" % address
> print "%s" % city
> print "%s" % state
> print "%s" % zip
> print " "
>
> print ""
> for row in addressDatabase:
> PrintAddress( row.last, row.first,
>row.address, row.city, row.state, row.zip )
> print ""

Just be sure you're well aware of all the issues:

http://www.xml.com/pub/a/2002/11/13/py-xml.html

See also:

http://www.ibm.com/developerworks/xml/library/x-think35.html

--
Uche Ogbuji   Fourthought, Inc.
http://uche.ogbuji.nethttp://fourthought.com
http://copia.ogbuji.net   http://4Suite.org
Articles: http://uche.ogbuji.net/tech/publications/

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: XML SAX parser bug?

2006-02-07 Thread uche . ogbuji

[EMAIL PROTECTED] wrote:
> Fredrik Lundh schreef:
> > [EMAIL PROTECTED] wrote:
> > > I think I ran into a bug in the XML SAX parser.
> > >
> > > part of my program consist of reading a rather large XML file (about
> > > 10Mb) containing a few thousand elements.
> > > I have the following problem. Sometimes that SAX parses misreads a
> > > line.
> >
> > it's not a bug; the parser is free to split up character runs (due to 
> > buffering,
> > entities or character references, etc).  it's up to you to merge character 
> > runs
> > into strings.
>
> but how do I detect that the parser has split up the characters? I gues
> I need to detect it in order to reconstruct the complete string

Here's a recipe:

http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/265881

Using this filter you can then write SAX code that assumes normalized
text events.  Also, 4Suite's SAX implementation, Saxlette,
automatically does this text event merging for you at C speed:

http://4suite.org/docs/CoreManual.xml#saxlette

--
Uche Ogbuji   Fourthought, Inc.
http://uche.ogbuji.nethttp://fourthought.com
http://copia.ogbuji.net   http://4Suite.org
Articles: http://uche.ogbuji.net/tech/publications/

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Rss/xml namespaces sgmllib, sax, minidom

2006-01-01 Thread uche . ogbuji

Sakcee wrote:
> I want to build a simple validator for rss2 feeds, that checks basic
> structure and reports channels , items , and their attributes etc.
>
> I have been reading Mark Pilgrims articles on xml.com, diveintopython
> and someother stuff on sgmllib, sax.handlers and content handlers,
> xml.dom.minidom
>
> why is all of this necessary, what is the difference between all these
> libraries, it seems to me that I can parse the rss2 feed with any of
> these libraries.!! ?
>
> what is the difference between namespaces and non-namspaces functions
> in sax.handlers.contenthandler , is the namespace defined like domain
> names on some website?

Based on this question, I tend to think you might want to leave the XML
processing to someone else's code.  How about using Pilgrim's
feedparser?

http://feedparser.org/

--
Uche Ogbuji   Fourthought, Inc.
http://uche.ogbuji.nethttp://fourthought.com
http://copia.ogbuji.net   http://4Suite.org
Articles: http://uche.ogbuji.net/tech/publications/

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Amara (XML) problem on Solaris

2006-01-01 Thread uche . ogbuji

Doru-Catalin Togea wrote:
> import amara
>
> doc = amara.create_document()
> doc.xml_append(doc.xml_create_element(u"units"))
>
> print "OK"
>
> On Windows XP Pro it runs like this:
>
> C:\owera\test\xaps2-test>python amara-test1.py
> OK
>
> C:\owera\test\xaps2-test>
>
> On Solaris it runs like this:
>
> bash-2.03$ python amara-test1.py
> Traceback (most recent call last):
>File "amara-test1.py", line 3, in ?
>  doc = amara.create_document()
> AttributeError: 'module' object has no attribute 'create_document'
> bash-2.03$

This came up when I was on vacation and incommunicado.  What version of
Amara are you using on both platforms?  How did you install them?

Thanks.

--
Uche Ogbuji   Fourthought, Inc.
http://uche.ogbuji.nethttp://fourthought.com
http://copia.ogbuji.net   http://4Suite.org
Articles: http://uche.ogbuji.net/tech/publications/

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Help designing reading/writing a xml-fileformat

2005-12-13 Thread uche . ogbuji

Jacob Kroon wrote:
> I'm writing a block-diagram editor, and could use some tips about
> writing/reading
> diagrams to/from an xml file format. The basic layout of my code :
>
> class Diagram {
> Blocks blocks[]
> }
>
> class Block {
> int x, y
> }
>
> class Square(Block) {
> int width, height
> }
>
> class Circle(Block) {
> int radius
> }
>
> I'd like to be able to output something similar to this:
>
> 
> 
>

Re: Using XML w/ Python...

2005-12-12 Thread uche . ogbuji

Rick, thanks.  Based on your clue I checked, and it seems those Amara
packages are not being built rightly.  I'll look to get those packages
fixed and updated tomorrow.

--
Uche Ogbuji   Fourthought, Inc.
http://uche.ogbuji.nethttp://fourthought.com
http://copia.ogbuji.net   http://4Suite.org
Articles: http://uche.ogbuji.net/tech/publications/

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Using XML w/ Python...

2005-12-12 Thread uche . ogbuji

"""
But anyway, i get this...
>>> import amara
>>>from amara import domtools
>>> print domtools.py

Traceback (most recent call last):
  File "", line 1, in ?
NameError: name 'domtools' is not defined
"""

Sheesh!  That right after waking up.  And it shows :-)

Should have been "print domtools.__file__"

--
Uche Ogbuji   Fourthought, Inc.
http://uche.ogbuji.nethttp://fourthought.com
http://copia.ogbuji.net   http://4Suite.org
Articles: http://uche.ogbuji.net/tech/publications/

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Using XML w/ Python...

2005-12-12 Thread uche . ogbuji

"""
Not wanting to hijack this thread, but it got me interested in
installing amara. I downloaded Amara-allinone-1.0.win32-py2.4.exe
and ran it. It professed that the installation directory was to be
D:\Python24\Lib\site-packages\ ... but it placed FT and amara in D:
\Python24\Python24\Lib\site-packages . Possibly the installer is
part of the problem here?
"""

That's really good to know.  Someone else builds the Windows installer
package for Amara (I'm a near Windows illiterate), but I definitely
want to help be sure the installer works properly.  In fact, your
message rings a bell that this specifically came up before:

http://lists.fourthought.com/pipermail/4suite/2005-November/007610.html

I'll have to ask some of the Windows gurus on the 4Suite list whether
they know why this might be.  Do you mind if I cc you on those
messages, so that you can perhaps try out any solutions we come up
with?

Thanks.

--
Uche Ogbuji   Fourthought, Inc.
http://uche.ogbuji.nethttp://fourthought.com
http://copia.ogbuji.net   http://4Suite.org
Articles: http://uche.ogbuji.net/tech/publications/

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Using XML w/ Python...

2005-12-12 Thread uche . ogbuji

"""
>>> import amara
>>> print dir(amara)

['__builtins__', '__doc__', '__file__', '__name__', '__path__',
'__version__', 'binderytools', 'os', 'parse']
"""

So it's not able to load domtools.  What do you get trying

from amara import domtools
print domtools.py

--
Uche Ogbuji   Fourthought, Inc.
http://uche.ogbuji.nethttp://fourthought.com
http://copia.ogbuji.net   http://4Suite.org
Articles: http://uche.ogbuji.net/tech/publications/

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Using XML w/ Python...

2005-12-11 Thread uche . ogbuji

"""
Spoke too soon, i get this error when running amara in ActivePython

>>> import amara
>>> amara.parse("http://www.digg.com/rss/index.xml";)

Traceback (most recent call last):
  File "", line 1, in ?
  File "C:\Python23\Lib\site-packages\amara\__init__.py", line 50, in
parse
if IsXml(source):
NameError: global name 'IsXml' is not defined

So im guessing theres an error with one of the files...
"""

IsXml is imported conditionally, so this is an indicator that somethign
about your module setup is still not agreeing with ActivePython.   What
do you see as the output of:

python -c "import amara; print dir(amara)"

?  I get:

['InputSource', 'IsXml', 'Uri', 'Uuid', '__builtins__', '__doc__',
'__file__', '__name__', '__path__', '__version__', 'bindery',
'binderytools', 'binderyxpath', 'create_document', 'dateutil_standins',
'domtools', 'os', 'parse', 'pushbind', 'pushdom', 'pyxml_standins',
'saxtools']

--
Uche Ogbuji   Fourthought, Inc.
http://uche.ogbuji.nethttp://fourthought.com
http://copia.ogbuji.net   http://4Suite.org
Articles: http://uche.ogbuji.net/tech/publications/

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Using XML w/ Python...

2005-12-11 Thread uche . ogbuji

"""
No, when i said
 "As far as it should work since their both transparent, umm, well its
not."

I meant that only mine isnt, maybe urs is but for some reason it isnt.
And you said amara works fine for you, ok, then could you tell me what
package to install...

I have installed Amara 1.1.6 for Python 2.4 and it works on python 2.4
only.
Now, which package should i download for it to work on any python
prompt:
  Allinone
  Standalone
  Or something else
"""

I've never used ActivePython.  I don't know of any special gotchas for
it.  But Amara works in Python 2.3 or 2.4.  The only differences
between the Allinone and standalone packages is that Allinone includes
4Suite.  Do get at least version 1.1.6.

If you're still having trouble with the ActivePython setup, the first
thing I'd ask is how you installed Amara.  DId you run a WIndows
installer?  Next I'd check the library path for ActivePython.  What is
the output of

python -c "import sys; print sys.path"

Where you replace "python" abpve with whatever way you invoke
ActivePython.


--
Uche Ogbuji   Fourthought, Inc.
http://uche.ogbuji.nethttp://fourthought.com
http://copia.ogbuji.net   http://4Suite.org
Articles: http://uche.ogbuji.net/tech/publications/

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Using XML w/ Python...

2005-12-11 Thread uche . ogbuji

"""
 Ok, i am now understanding some of parseing and how to use it and
nodes, things like that. But say i wanted to take the title of
http://www.digg.com/rss/index.xml

and XMLTramp seemed the most simple to understand.

would the path be something like this?

import xmltramp
rssDigg = xmltramp.load("http://www.digg.com/rss/index.xml";)
print note.rss.channel.item.title

I think thats wat im having the most confusion on now, is how to direct
to the path that i want...
"""

I suggest you read at least the front page information for the tools
you are using.  It's quite clear from the xmltramp Web site (
http://www.aaronsw.com/2002/xmltramp/ ) that you want tomething like
(untested: the least homework you can do is to refine the example
yourself):

print rssDigg[rss.channel][item][title]

BTW, in Amara, the API is pretty much exactly what you guessed:

>>> import amara
>>> rssDigg = amara.parse("http://www.digg.com/rss/index.xml";)
>>> print rssDigg.rss.channel.item.title
Video: Conan O'Brien iPod Ad Parody


--
Uche Ogbuji   Fourthought, Inc.
http://uche.ogbuji.nethttp://fourthought.com
http://copia.ogbuji.net   http://4Suite.org
Articles: http://uche.ogbuji.net/tech/publications/

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Using XML w/ Python...

2005-12-11 Thread uche . ogbuji

Jay:
"""
K, I have this  XML doc, i dont know much about XML, but what i want
to do is take certain parts of the XML doc, such as  blah
 and take just that and put onto a text doc. Then same thing
doe the  part. Thats about it, i checked out some of the xml
modules but dont understand how to use them. Dont get parsing, so if
you could please explain working with XML and python to me.
"""

Someone already mentioned

http://www.oreillynet.com/pub/wlg/6225

I do want to update that Amara API.  As of recent releases it's as
simple as

import amara
doc = amara.parse("foo.opml")
for url in doc.xpath("//@xmlUrl"):
print url.value

Besides the XPath option, Amara [1] provides Python API options for
unknown elements, such as

node.xml_child_elements
node.xml_attributes

This is all covered with plenty of examples in the manual [2]

[1] http://uche.ogbuji.net/tech/4suite/amara/
[2] http://uche.ogbuji.net/uche.ogbuji.net/tech/4suite/amara/manual-dev

--
Uche Ogbuji   Fourthought, Inc.
http://uche.ogbuji.nethttp://fourthought.com
http://copia.ogbuji.net   http://4Suite.org
Articles: http://uche.ogbuji.net/tech/publications/

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Creating referenceable objects from XML

2005-12-11 Thread uche . ogbuji

Michael Williams wrote:
> Hi All,

> I'm looking for a quality Python XML implementation.  All of the DOM
> and SAX implementations I've come across so far are rather
> convoluted.  Are there any quality implementations that will (after
> parsing the XML) return an object that is accessible by name? Such as
> the following:

> xml = """
> 
>MyBook
>the author
> 
> """

> And after parsing the XML allow me to access it as so:

> book.title

> I need it to somehow convert my XML to intuitively referenceable
> object.  Any ideas?  I could even do it myself if I knew the
> mechanism by which python classes do this (create variables on the fly).

Looks as if MIchael is working with Amara now, but I did want to note
for the record that APIs that allow one to access a node in the
"book.title" fashion are what I call Python data bindings.

Python data bindings I usually point out are:

Amara Bindery: http://www.xml.com/pub/a/2005/01/19/amara.html
Gnosis: http://www.xml.com/pub/a/2003/07/02/py-xml.html
generateDS: http://www.xml.com/pub/a/2003/06/11/py-xml.html

Based on updates to EaseXML in response to my article another entry
might be:

EaseXML: http://www.xml.com/pub/a/2005/07/27/py-xml.html

ElementTree ( http://www.xml.com/pub/a/2003/02/12/py-xml.html ) is a
Python InfoSet rather than a Python data binding.  You access nodes
using generic names related to the node type rather than the node name.
 Whether data bindings or Infosets are your preference is a matter of
taste, but it's a useful distinction to make between the approaches.
It looks as if Gerald Flanagan has constructed a little specialized
binding tool on top of ElementTree, and that's one possible hybrid
approach.

xmltramp ( http://www.aaronsw.com/2002/xmltramp/ ) is another
interesting hybrid.

--
Uche Ogbuji   Fourthought, Inc.
http://uche.ogbuji.nethttp://fourthought.com
http://copia.ogbuji.net   http://4Suite.org
Articles: http://uche.ogbuji.net/tech/publications/

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: XML and namespaces

2005-12-05 Thread uche . ogbuji

Wilfredo Sánchez Vega:
"""
   I'm having some issues around namespace handling with XML:

>>> document = xml.dom.minidom.Document()
>>> element = document.createElementNS("DAV:", "href")
>>> document.appendChild(element)

>>> document.toxml()
'\n'

   Note that the namespace wasn't emitted.  If I have PyXML,
xml.dom.ext.Print does emit the namespace:

>>> xml.dom.ext.Print(document)


   Is that a limitation in toxml(), or is there an option to make it
include namespaces?
"""

Getting back to the OP:

PyXML's xml.dom.ext.Print does get things right, and based on
discussion in this thread, the only way you can serialize correctly is
to use that add-on with minidom, or to use a third party, properly
Namespaces-aware tool such as 4Suite (there are others as well).

Good luck.

--
Uche Ogbuji   Fourthought, Inc.
http://uche.ogbuji.nethttp://fourthought.com
http://copia.ogbuji.net   http://4Suite.org
Articles: http://uche.ogbuji.net/tech/publications/

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: XML and namespaces

2005-12-05 Thread uche . ogbuji

Alan Kennedy
"""
Although I am sympathetic to your bewilderment: xml namespaces can be
overly complex when it comes to the nitty, gritty details.
"""

You're the one who doesn't seem to clearly understand XML namespaces.
It's your position that is bewildering, not XML namespaces (well, they
are confusing, but I have a good handle on all the nuances by now).

Again, no skin off my back here: I write and use tools that are XML
namespaces compliant.  It doesn't hurt me that Minidom is not.  I was
hoping to help, but again I don't have time for ths argument.

--
Uche Ogbuji   Fourthought, Inc.
http://uche.ogbuji.nethttp://fourthought.com
http://copia.ogbuji.net   http://4Suite.org
Articles: http://uche.ogbuji.net/tech/publications/

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: XML and namespaces

2005-12-05 Thread uche . ogbuji

I wrote:
"""
The reality is that once the poor user has done:

element = document.createElementNS("DAV:", "href")

They are following DOM specification that they have created an element
in a namespace, and you seem to be arguing that they cannot usefully
have completed their work until they also do:

element.setAttributeNS(xml.dom.XMLNS_NAMESPACE, None, "DAV:")

I'd love to hear how many actual minidom users would agree with you.
"""

Of course (FWIW) I meant

element.setAttributeNS(xml.dom.XMLNS_NAMESPACE, "xmlns", "DAV:")

--
Uche Ogbuji   Fourthought, Inc.
http://uche.ogbuji.nethttp://fourthought.com
http://copia.ogbuji.net   http://4Suite.org
Articles: http://uche.ogbuji.net/tech/publications/

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: XML and namespaces

2005-12-05 Thread uche . ogbuji

Alan Kennedy:
"""
These namespace declaration nodes, i.e. attribute nodes in the
xml.dom.XMLNS_NAMESPACE namespace, are a pre-requisite for any
namespaced DOM document to be well-formed, and thus naively
serializable.

The argument could be made that application authors should be protected
from themselves by having the underlying DOM library automatically
create the relevant namespace nodes.

But to me that's not pythonic: it's implicit, not explicit.

My vote is that the existing xml.dom.minidom behaviour wrt namespace
nodes is correct and should not be changed.
"""

Andrew Clover also suggested an overly-legalistic argument that current
minidom behavior is not a bug.

It's a very strange attitude that because a behavior is not
specifically proscribed in a spec, that it is not a bug.  Let me try a
reducto ad absurdum, which I think in this case is a very fair
stratagem.  If the code in question:

>>> document = xml.dom.minidom.Document()
>>> element = document.createElementNS("DAV:", "href")
>>> document.appendChild(element)

>>> document.toxml()
'\n'

(i.e. "ferh" rather than "href"), would you not consider that a minidom
bug?

Now consider that DOM Level 2 does not proscribe such mangling.

Do you still think that's a useful way to determine what is a bug?

The current, erroneous behavior, which you advocate, is of the same
bug.  Minidom is an XML Namespaces aware API.  In XML Namespaces, the
namespace URI is *part of* the name.  No question about it.  In Clark
notation the element name that is specified in

element = document.createElementNS("DAV:", "href")

is "{DAV:}href".  In Clark notation the element name of the document
element in the created docuent is "href".  That is not the name the
user specified.  It is a mangled version of it.  The mangling is no
better than my reductio of reversing the qname.  This is a bug.  Simple
as that.  WIth this behavior, minidom is an API correct with respect to
XML Namespaces.

So you try the tack of invoking "pythonicness".  Well I have one for
ya:

"In the face of ambiguity, refuse the temptation to guess."

You re guessing that explicit XMLNS attributes are the only way the
user means to express namespace information, even though DOM allows
this to be provided through such attributes *or* through namespace
properties.  I could easily argue that since these are core properties
in the DOM, that DOM should ignore explicit XMLNS attributes and only
use namespace properties in determining output namespace.  You are
guessing that XMLNS attributes (and only those) represent what the user
really means.  I would be arguing the same of namespace properties.

The reality is that once the poor user has done:

element = document.createElementNS("DAV:", "href")

They are following DOM specification that they have created an element
in a namespace, and you seem to be arguing that they cannot usefully
have completed their work until they also do:

element.setAttributeNS(xml.dom.XMLNS_NAMESPACE, None, "DAV:")

I'd love to hear how many actual minidom users would agree with you.

It's currently a bug.  It needs to be fixed.  However, I have no time
for this bewildering fight.  If the consensus is to leave minidom the
way it is, I'll just wash my hands of the matter, but I'll be sure to
emphasize heavily to users that minidom is broken with respect to
Namespaces and serialization, and that they abandon it in favor of
third-party tools.

--
Uche Ogbuji   Fourthought, Inc.
http://uche.ogbuji.nethttp://fourthought.com
http://copia.ogbuji.net   http://4Suite.org
Articles: http://uche.ogbuji.net/tech/publications/

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: XML and namespaces

2005-12-02 Thread uche . ogbuji

Alan Kennedy:
"""
> Oh no.  That only means that namespace declaration attributes are not
> created in the DOM data structure.  However, output has to fix up
> namespaces in .namespaceURI properties as well as directly asserted
> "xmlns" attributes.  It would be silly for DOM to produce malformed
> XML+XMLNS, and of course it is not meant to.  The minidom behavior
> needs fixing, badly.

My interpretation of namespace nodes is that the application is
responsible for creating whatever namespace declaration attribute nodes
are required, on the DOM tree.

DOM should not have to imply any attributes on output.
"""

I'm sorry but you're wrong on this.  First of all, DOM L2 (the level
minidom targets) does not have the concept of "namespace nodes".
That's XPath.  DOM supports two ways of expressing namespace
information.  The first way is through the node properties
.namespaceURI, .prefix (for the QName) and .localName.  It *also*
supports literal namespace declaration atrributes (the NSDecl
attributes themselves must have a namespace of
"http://www.w3.org/2000/xmlns/";).  As if this is not confusing enough
the Level 1 propoerty .nodeName must provide the QName, redundantly.

As a result, you have to perform fix-up to merge properties with
explicit NSDEcl attributes in order to serialize.  If it does not do
so, it is losing all the information in namespace properties, and the
resulting output is not the same document that is represented in the
DOM.

Believe me, I've spent many weary hours with all these issues, and
implemented code to deal with the mess multiple times, and I know it
all too painfully well.  I wrote Amara largely because I got
irrecoverably sick of DOM's idiosyncracies.

Andrew, for this reason I probably take the initiative to work up a
patch for the issue.  I'll do what I can to get to it tomorrow.  If you
help me with code review and maybe writing some tests, that would be a
huge help.

--
Uche Ogbuji   Fourthought, Inc.
http://uche.ogbuji.nethttp://fourthought.com
http://copia.ogbuji.net   http://4Suite.org
Articles: http://uche.ogbuji.net/tech/publications/

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: XML and namespaces

2005-12-02 Thread uche . ogbuji

Quoting Andrew Kuchling:
"""
> >>> element = document.createElementNS("DAV:", "href")

This call is incorrect; the signature is createElementNS(namespaceURI,
qualifiedName).
"""

Not at all, Andrew.  "href" is a valid qname, as is "foo:href".  The
prefix is optional in a QName.  Here is the correct behavior, taken
from a non-broken DOM library (4Suite's Domlette)

>>> from Ft.Xml import Domlette
>>> document = Domlette.implementation.createDocument(None, None, None)
>>> element = document.createElementNS("DAV:", "href")
>>> document.appendChild(element)

>>> Domlette.Print(document)

>>>

"""
If you call .createElementNS('whatever', 'DAV:href'),
the output is the expected:

"""

Oh, no.  That is not at all expected.  The output should be:



"""
It doesn't look like there's any code in minidom that will
automatically create an 'xmlns:DAV="whatever"' attribute for you.  Is
this automatic creation an expected behaviour?
"""

Of course.  Minidom implements level 2 (thus the "NS" at the end of the
method name), which means that its APIs should all be namespace aware.
The bug is that writexml() and thus toxml() are not so.

"""
(I assume not.  Section 1.3.3 of the DOM Level 3 says "Similarly,
creating a node with a namespace prefix and namespace URI, or changing
the namespace prefix of a node, does not result in any addition,
removal, or modification of any special attributes for declaring the
appropriate XML namespaces."  So the DOM can create XML documents that
aren't well-formed w.r.t. namespaces, I think.)
"""

Oh no.  That only means that namespace declaration attributes are not
created in the DOM data structure.  However, output has to fix up
namespaces in .namespaceURI properties as well as directly asserted
"xmlns" attributes.  It would be silly for DOM to produce malformed
XML+XMLNS, and of course it is not meant to.  The minidom behavior
needs fixing, badly.

--
Uche Ogbuji   Fourthought, Inc.
http://uche.ogbuji.nethttp://fourthought.com
http://copia.ogbuji.net   http://4Suite.org
Articles: http://uche.ogbuji.net/tech/publications/

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: XML and namespaces

2005-11-30 Thread uche . ogbuji

Wilfredo Sánchez Vega:
"""
I'm having some issues around namespace handling with XML:

>>> document = xml.dom.minidom.Document()
>>> element = document.createElementNS("DAV:", "href")
>>> document.appendChild(element)

>>> document.toxml()
'\n'
"""

I haven't worked with minidom in just about forever, but from what I
can tell this is a serious bug (or at least an appalling mising
feature).  I can't find anything in the Element,writexml() method that
deals with namespaces.  But I'm just baffled.  Is there really any way
such a bug could have gone so long unnoticed in Python and PyXML?  I
searched both trackers, and the closest thing I could find was this
from 2002:

http://sourceforge.net/tracker/index.php?func=detail&aid=637355&group_id=6473&atid=106473

Different symptom, but also looks like a case of namespace ignorant
code.

Can anyone who's worked on minidom more recently let me know if I'm
just blind to something?

--
Uche Ogbuji   Fourthought, Inc.
http://uche.ogbuji.nethttp://fourthought.com
http://copia.ogbuji.net   http://4Suite.org
Articles: http://uche.ogbuji.net/tech/publications/

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: xpath support in python 2.4

2005-11-29 Thread uche . ogbuji

"And80": "Is [the xml.xpath module] still part of the standard
library?"
Alan Kennedy: "No, it's not. Not sure if it ever was. "

It never was.

--
Uche Ogbuji   Fourthought, Inc.
http://uche.ogbuji.nethttp://fourthought.com
http://copia.ogbuji.net   http://4Suite.org
Articles: http://uche.ogbuji.net/tech/publications/

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: XMLSchema Parsing

2005-11-29 Thread uche . ogbuji

km wrote:
> i'd like to know if there are any good XMLSchema (.xsd files) parsing modules 
> in python.
> regards,

Parse and do what?  You can parse WXS (a.k.a. XSD) with any XML parser
out there.

Anyway, off-head, Python tools that handle WXS, to some extent:

xsv
libxml2/Python
lxml
generateDS.py

Good luck.

--
Uche Ogbuji   Fourthought, Inc.
http://uche.ogbuji.nethttp://fourthought.com
http://copia.ogbuji.net   http://4Suite.org
Articles: http://uche.ogbuji.net/tech/publications/

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Move xml.sax.saxutils.*?

2005-11-29 Thread uche . ogbuji

"""
It seems like functions such as xml.sax.saxutils.escape and unescape
are generally useful, and not at all tied to the xml.sax module.  Would
it make sense to move them somewhere else, like to xml?
"""

It would be useful to allow

from xml import escape, unescape

But as an alias, rather than a replacement for the current import.

--
Uche Ogbuji   Fourthought, Inc.
http://uche.ogbuji.nethttp://fourthought.com
http://copia.ogbuji.net   http://4Suite.org
Articles: http://uche.ogbuji.net/tech/publications/

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: BisonGen parser generator. Newbie question

2005-11-29 Thread uche . ogbuji

"""
I'm trying to run the calculator example included with the "BisonGen"
parser generator, but I've been unable to put it to work.

When I compile the xml file "simple.bgen" with the script
"BisonGen.bat", the only parser I get is a C file. I've heard BisonGen
generates also a python file, which is, I believe, the one used
imported by "test.py" to run the testing.
"""

Apologies for the late reply.  Holidays and all that...

Anyway, this is strange.  You should get both C and .py file (and .java
files if you're using a recent CVS version).  Here is what I get:

$BisonGen simple.bgen
Generate parser simple.c
Generate parser simple.java
Generate constants simpleConstants.java
Generate handler simpleHandler.java
Generate handler DefaultsimpleHandler.java

What do you get for output?  BTW, if you want to try a recent CVS
version, grab the snapshot:

ftp://ftp.fourthought.com/pub/cvs-snapshots/BisonGen-CVS.tar.gz (.zip
also available).

Also, you might want to ask BGen questions on the 4Suite mailing list,
where other BGen developers hang out.

http://lists.fourthought.com/pipermail/4suite/

--
Uche Ogbuji   Fourthought, Inc.
http://uche.ogbuji.nethttp://fourthought.com
http://copia.ogbuji.net   http://4Suite.org
Articles: http://uche.ogbuji.net/tech/publications/

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Writing big XML files where beginning depends on end.

2005-11-28 Thread uche . ogbuji

"""
 Then we need something that allows parts of the XML file
to be written to file and purged from RAM to avoid the
memory problem.

Suggestions for solutions are appreciated.
"""

Multiple XML files is not an option, but what about general entities or
XInclude?  That way you don't need to change your parsing code.

Using 4Suite's MarkupWriter [1] you could write the outer shell and
inner subtrees to separate streams, only filling in values for the
outer stream when the inner stream is complete, and your computations
are ready.  You can then use the writer.xmlFragment method to stitch
the inner subtrees to the outer shell.  MarkupWriter operates in
streaming mode, so you would not be holding much XML in memory at all.

http://www.xml.com/pub/a/2005/04/20/py-xml.html

--
Uche Ogbuji   Fourthought, Inc.
http://uche.ogbuji.nethttp://fourthought.com
http://copia.ogbuji.net   http://4Suite.org
Articles: http://uche.ogbuji.net/tech/publications/

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Recommendation please: forming an XML toolkit

2005-11-11 Thread uche . ogbuji

*SNIP Long list of possible criteria for chooseing an XML library*

Even with all your personal considerations, there is no one "correct"
answer for you.  I can think of four or five packages that would meet
all your criteria.

You said something quite apt:

"This question is a bit like the ones pertaining to 'Which web
framework
to use?', there is a lot of good stuff out there, and often it boils
down to personnal preference, mind-fitting interface and such..."

I use this comparison myself.  People are used to the incredible
diversity of Web application needs, but for some reason their
imagination tends to flag a bit when it comes to acknowledging the
similar diversity of XML processing needs.  It's a big domain, and you
won't find a universal, one-size-fits-all solution.  That's why I
surprise people by saying I don't have a problem with the fact that
Python bundles at least 4 XML processing libraries, and that there are
at least 30 viable third-party options.

Anyway you go on to say:

"BUT... to make it more precise I will give more context on the future
projects involved... "

I appreciate your effort, but I don't think you succeeded.  With
respect to Web frameworks,  It's easy to some up with a list of even 20
criteria for Python Web frameworks and still wind up with 4-5 fitting
options.  Same thing for XML processing.

You seem to have done a bit of homework with the packages.  I'm sure
you have initial impressions based on that.  If you have specific
outstanding questions, do ask.  If not, I would just take a chance on
whatever your present leaning may be.


--
Uche Ogbuji   Fourthought, Inc.
http://uche.ogbuji.nethttp://fourthought.com
http://copia.ogbuji.net   http://4Suite.org
Articles: http://uche.ogbuji.net/tech/publications/

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Add attribute using pyxml

2005-11-04 Thread uche . ogbuji

"How do I add a new attribute to the existing xml Document tree??? "

what do you mean by "using pyxml"?  There are several pyxml modules.
Do you mean minidom?  If so that comes with stock Python as well (hint:
element_node.setAttributeNS(ns, qname)).

--
Uche Ogbuji   Fourthought, Inc.
http://uche.ogbuji.nethttp://fourthought.com
http://copia.ogbuji.net   http://4Suite.org
Articles: http://uche.ogbuji.net/tech/publications/

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: XML DOM: XML/XHTML inside a text node

2005-11-04 Thread uche . ogbuji

"""
In my program, I get input from the user and insert it into an XHTML
document.  Sometimes, this input will contain XHTML, but since I'm
inserting it as a text node, xml.dom.minidom escapes the angle brackets
('<' becomes '<', '>' becomes '>').  I want to be able to
override this behavior cleanly.  I know I could pipe the input through
a SAX parser and create nodes to insert into the tree, but that seems
kind of messy.  Is there a better way?
"""

Amara 1.1.6 supports inserting an XML fragment into a document or
element object.  Many short examples here:

http://copia.ogbuji.net/blog/2005-09-21/Dare_s_XLI

excerpt:

Adding a  element as a child of the  element'

contacts.xml_append_fragment('%s'%'206-555-0168'

http://uche.ogbuji.net/tech/4suite/amara

--
Uche Ogbuji   Fourthought, Inc.
http://uche.ogbuji.nethttp://fourthought.com
http://copia.ogbuji.net   http://4Suite.org
Articles: http://uche.ogbuji.net/tech/publications/

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: XML Tree Discovery (script, tool, __?)

2005-10-30 Thread uche . ogbuji

"Neat, though non-trivial XSLT makes my head spin."

Well, you don't have to know XSLT at all to use the Examplotron
transform, although I can understand wanting to understand and hack
what you're using.

"Just for kicks, I
rewrote in python Michael Kay's DTDGenerator
(http://saxon.sourceforge.net/dtdgen.html), though as the original it
has several limitations on the accuracy of the inferred DTD. "

Ah.  Cool.  Got a link?

--
Uche Ogbuji   Fourthought, Inc.
http://uche.ogbuji.nethttp://fourthought.com
http://copia.ogbuji.net   http://4Suite.org
Articles: http://uche.ogbuji.net/tech/publications/

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: XML Tree Discovery (script, tool, __?)

2005-10-28 Thread uche . ogbuji

"""
I was looking for something similar (XML to DTD inference) but I didn't
find anything related in python. Trang
(http://www.thaiopensource.com/relaxng/trang-manual.html#introduction),
on the other hand seems impressive after a few non-trivial tests. It
would be neat to have it ported in python, at least the inference part.

"""

If you're OK with RELAX NG rather than DTD as the schema output
(probably a good idea if you're using namespaces), consider
Examplotron, which I've used on many such production tasks.

http://www-128.ibm.com/developerworks/xml/library/x-xmptron/

It's XSLT rather than Python, but the good news is that XSLT is easy to
invoke from Python using tools such as 4Suite.

http://uche.ogbuji.net/tech/akara/nodes/2003-01-01/python-xslt

--
 Uche Ogbuji   Fourthought, Inc.
http://uche.ogbuji.nethttp://fourthought.com
http://copia.ogbuji.net   http://4Suite.org
Articles: http://uche.ogbuji.net/tech/publications/

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: XML Tree Discovery (script, tool, __?)

2005-10-27 Thread uche . ogbuji

> - don't use SAX unless your document is huge
> - don't use DOM unless someone is putting a gun to your head

What I say is: use what works for you.  I think SAX would be fine for
this task, but, hey, I personally would use Amara (
http://uche.ogbuji.net/tech/4suite/amara/ ), of course.  The following
does the trick:

import sets
import amara
from amara import binderytools

#element_skeleton_rule suppresses char data from the resulting binding
#tree.  If you have a large document and only care about element/attr
#structure and not text, this saves a lot of memory
rules = [binderytools.element_skeleton_rule()]
#XML can be a file path, URI, string, or even an open-file-like object
doc = amara.parse(XML, rules=rules)
elems = {}
for e in doc.xml_xpath('//*'):
paths = elems.setdefault((e.namespaceURI, e.localName), sets.Set())
path = u'/'.join([n.nodeName for n in e.xml_xpath(u'ancestor::*')])
paths.add(u'/' + path)

#Pretty-print output
for name in elems:
print name, '\n\t\t\t', '\n\t\t\t'.join(elems[name])


--
 Uche Ogbuji   Fourthought, Inc.
http://uche.ogbuji.nethttp://fourthought.com
http://copia.ogbuji.net   http://4Suite.org
Articles: http://uche.ogbuji.net/tech/publications/

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: XML Tree Discovery (script, tool, __?)

2005-10-26 Thread uche . ogbuji

"""
The output I was contemplating was a DOM "DNA" - that is the DOM
without the instances of the elements or their data, a bare tree, a
prototype tree based on what is in the document (rather than what is
legal to include in the document).

Just enough data that for an arbitrary element I would know:

1) whether the element was in a document
2) where to find it (the chain of parents)
"""

This is easy to do in SAX.  For some hints, see page 2 of my article:

http://www.xml.com/pub/a/2004/11/24/py-xml.html

--
 Uche Ogbuji   Fourthought, Inc.
http://uche.ogbuji.nethttp://fourthought.com
http://copia.ogbuji.net   http://4Suite.org
Articles: http://uche.ogbuji.net/tech/publications/

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: XML Tree Discovery (script, tool, __?)

2005-10-24 Thread uche . ogbuji

"inally diving into XML programmatically.  Does anyone have a best
practice recommendation for programmatically discovering the structure
of an arbitrary XML document via Python?"

You can do this with DOM or SAX, or any of the many more friendly XML
processing libraries out there.  You might want to be more specific.
What sort of output do you want from this discovery?

--
Uche Ogbuji   Fourthought, Inc.
http://uche.ogbuji.nethttp://fourthought.com
http://copia.ogbuji.net   http://4Suite.org
Articles: http://uche.ogbuji.net/tech/publications/

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: xml2schema

2005-09-30 Thread uche . ogbuji

"""
Er, do you mean to generate a Relax NG (or possibly a DTD in fact) from
some XML file??

If you do mean this then just think of that how you could generate
grammar from some paragraphs of English text...  Sorta non-trivial, if
possible at all, isn't it? :-)
"""

Very well put.  However, for RELAX NG there is a tool that might work
for the OP: Examplotron.  See:

http://www-128.ibm.com/developerworks/xml/library/x-xmptron/

As I show in that article, you can use Examplotron from any XSLT
processor, including one invoked through Python API.

-- 
Uche
http://copia.ogbuji.net

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: xml2schema

2005-09-23 Thread uche . ogbuji

"""
Er, do you mean to generate a Relax NG (or possibly a DTD in fact) from
some XML file??

If you do mean this then just think of that how you could generate
grammar from some paragraphs of English text...  Sorta non-trivial, if
possible at all, isn't it? :-)
"""

Very well put.  However, for RELAX NG there is a tool that might work
for the OP: Examplotron.  See:

http://www-128.ibm.com/developerworks/xml/library/x-xmptron/

As I show in that article, you can use Examplotron from any XSLT
processor, including one invoked through Python API.

-- 
Uche
http://copia.ogbuji.net

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: What XML lib to use?

2005-09-13 Thread uche . ogbuji

"""
I'm confused, I want to read/write XML files but I don't really
understand
what library to use.

I've used DOM-based libraries in other languages, is PyXML the library
to
use?
"""

There are many options (some say too many):

http://www.xml.com/pub/a/2004/10/13/py-xml.html

Try out Amara Bindery, if you like:

http://uche.ogbuji.net/tech/4suite/amara/

Browsing the manual should let you know whether you like the API:

http://uche.ogbuji.net/tech/4suite/amara/manual

BTW, lots on Python/XML processing covered in my column, including
other options besides Amara:

http://www.xml.com/pub/at/24

-- 
Uche
http://copia.ogbuji.net

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Limited XML tidy

2005-08-25 Thread uche . ogbuji

> The problem is that when the sax handler raises an exception,
I can't see how to find out why. What I want to do is for
DodgyErrorHandler to do something different depending on
where we are in the course of parsing. Is there anyway
to get that information back from xml.sax (or indeed from
any other sax handler?)

You can get raw location information, yes.  See:

http://www.xml.com/pub/a/2004/11/24/py-xml.html

But I don't think this is enough for you.  You also need recovery,
which you're implementing in crude form.

I tend to agree with Magnus that using an SGML parser might be your
best bet.  You might even be able to turn that SGML into XML using a
tool such as James Clark's SX:

http://www.jclark.com/sp/sx.htm

-- 
Uche
http://copia.ogbuji.net

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: PyXML and xml.dom

2005-08-17 Thread uche . ogbuji

> Is PyXML now part of the Python distribution, or is it still an add-on?

Parts of PyXML have been migrated into Python core since Python 2.0,
but there is still also a standalone PyXML package.,.

See:

http://www.xml.com/pub/a/2002/09/25/py.html

-- 
Uche
http://copia.ogbuji.net

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: [XML-SIG] xml-object mapping

2005-07-30 Thread Uche Ogbuji

On Thu, 2005-07-28 at 12:21 -0400, Tamas Hegedus wrote:
> Hi!
> 
> I am looking for an xml-object mapping tool ('XML Data Binding-design 
> time product') where I can define the mapping rules in 'binding files' 
> and the parser is generated automatically.
> 
> Similar to the solution of Dave Kuhlman 
> (http://www.rexx.com/~dkuhlman/generateDS.html) where the mapping is 
> defined in an xml file (if I am understand well).
> 
> But I already have the target object. The xml-tags should not be used as 
> a property/member name, but should be mapped to an existing object.
> 
> (There are existing tools, but written in Java (I would prefer Python; I 
> am biologist not using Java for 5 years), like JiBX 
> (http://jibx.sourceforge.net), Castor (http://www.castor.org; "XML-based 
> mapping file to specify bindings for existing object models"))

Answered:

http://groups-beta.google.com/group/comp.lang.python/browse_thread/thread/a63d0ad3fd23cb37/6ad0223c5b8f9946?lnk=st&q=python+xml&rnum=3&hl=en


-- 
Uche Ogbuji   Fourthought, Inc.
http://uche.ogbuji.nethttp://fourthought.com
http://copia.ogbuji.net   http://4Suite.org
Use CSS to display XML, part 2 - 
http://www-128.ibm.com/developerworks/edu/x-dw-x-xmlcss2-i.html
XML Output with 4Suite & Amara - http://www.xml.com/pub/a/2005/04/20/py-xml.html
Use XSLT to prepare XML for import into OpenOffice Calc - 
http://www.ibm.com/developerworks/xml/library/x-oocalc/
Schema standardization for top-down semantic transparency - 
http://www-128.ibm.com/developerworks/xml/library/x-think31.html

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: xml-object mapping

2005-07-29 Thread uche . ogbuji

"I am looking for an xml-object mapping tool ('XML Data Binding-design
time product') where I can define the mapping rules in 'binding files'
and the parser is generated automatically.

Similar to the solution of Dave Kuhlman
(http://www.rexx.com/~dkuhlman/ generateDS.html) where the mapping is
defined in an xml file (if I am understand well).

But I already have the target object. The xml-tags should not be used
as
a property/member name, but should be mapped to an existing object. "

In generateDS the mapping is not just deined in any old sort of XML
file: it's defined in a W3C XML Schema file, which makes good sense
(except that in my case I dislike WXS).

Amara does not use a mapping specification: it maps automatically, and
it allows you to specify your own classes for the mapping.  This is
discussed in the manual.

http://uche.ogbuji.net/tech/4Suite/amara/

-- 
Uche
http://uche.ogbuji.net

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Suggestions for Python XML library which can search and insert

2005-07-29 Thread uche . ogbuji

"I'm looking for a library that can search through an XML document
tree,
locate an element by attribute (ideally this can be done through
XPath), and insert an element (as its child).

Simple? Yes? ...but the code I've seen so far which does this uses
'nested for loops' for trees which are relatively shallow compared to
mine. "

Amara can easily do this using XPath (complete with predicates,
functions, etc.), without nested for loops:

http://uche.ogbuji.net/tech/4Suite/amara/

-- 
Uche
http://uche.ogbuji.net

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Differences between RDFlib - 4RDF and Redfoot - 4Suite?

2005-07-15 Thread uche . ogbuji

"I was wondering about the differences with the referred libs and
servers.
Since the documentation isn't so thorough(and a bit because of my
laziness), I thought I'd make request for usage accounts etc. stating
the pros and cons of the aforementioned. Any notes would be
appreciated."

RDFLib is a thinner layer, more of the raw API.  4RDF adds in Versa
query, a graph visualization tool, and multiple back ends.  However,
for the longest time the idea has been to merge the strengths of the
two packages (big example: rdflib's parser is up to the latest round of
specs.  4RDF's is not).  As part of a client project I've actually
begun the process of replacing 4RDF's parser with rdflib's in 4Suite (a
separate add-on until the 4Suite 1.1. branch emerges).

I'd say for now if you just need quick RDF parsing, and you're not also
using plain XML, and stuff like Versa RDF query language aren't
important to you, you'll get along just fine with rdflib.

-- 
Uche
http://copia.ogbuji.net

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: formatted xml output from ElementTree inconsistency

2005-06-26 Thread uche . ogbuji

Patrick Maupin wrote:
"""
Dennis Bieber wrote:
> Off hand, I'd consider the non-binary nature to be because the
> internet protocols are mostly designed for text, not binary.

A document at http://www.w3.org/TR/REC-xml/ lists "the design goals for
XML".

One of the listed goals is "XML documents should be human-legible and
reasonably clear".
"""

Yes.  Thanks for mentioning this, because people too often forget it.

minidom, 4Suite's Domlette and Amara all provide good pretty-print
output functions.  The latter two use rules from the XSLT spec, which
is designed by people who have the above design goal well in their
blood.

-- 
Uche
http://copia.ogbuji.net

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: ElementTree Namespace Prefixes

2005-06-17 Thread uche . ogbuji

Chris Spencer:
"""
Fredrik Lundh wrote:
> Chris Spencer wrote:

> > If an XML parser reads in and then writes out a document without having
> > altered it, then the new document should be the same as the original.

> says who?

Good question. There is no One True Answer even within the XML
standards.

It all boils down to how you define "the same". Which parts of the XML
document are meaningful content that needs to be preserved and which
ones are mere encoding variations that may be omitted from the internal
representation?
"""

One can point out the XML namespaces spec all one wants, but it doesn't
matter.  The fact is that regardless of what that spec says, as you
say, Chris, there are too many XML technologies that require prefix
retention.As a simple example, XPath and XSLT, W3C specs just like
XMLNS, uses qnames in context, which requires prefix retention.
Besides all that, prefix retention is generally more user friendly in
round-trip or poartial round-trip scenarios.

That's why cDomlette, part of 4Suite [1] and Amara [2], a more Pythonic
API for this, both support prefix retention by default.

[1] http://4suite.org
[2] http://uche.ogbuji.net/tech/4Suite/amara/

-- 
Uche
http://copia.ogbuji.net

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Getting a DOM element's children by type (STUPID)

2005-06-10 Thread uche . ogbuji

"""
If i get myself a DOM tree using xml.dom.minidom (or full-fat xml.dom,
i
don't mind)
"""

Don't do that.  Stick to minidom.  The "full" xml.dom from PyXML is
ancient and slow.  Of course, there are other, better libraries
available now, too.

"""
is there an easy way to ask a element for its child elements
of a particular type? By 'type' i mean 'having a certain tag'.
"""

You can use list comprehensions[1].  You could use XPath, if you're
willing to use a library that supports XPath.

In Amara[2], this task is trivial.  To get all the images in an XHTML
div, you'd simply do:

for img in div.img:
process_img(img)

You access names directly as objects according to their element type
name.

[1] see, e.g., http://www.xml.com/pub/a/2003/01/08/py-xml.html
[2] see http://www.xml.com/pub/a/2005/01/19/amara.html

--
Uche Ogbuji   Fourthought, Inc.
http://uche.ogbuji.nethttp://fourthought.com
http://copia.ogbuji.net   http://4Suite.org
Use CSS to display XML, part 2 -
http://www-128.ibm.com/developerworks/edu/x-dw-x-xmlcss2-i.html
XML Output with 4Suite & Amara -
http://www.xml.com/pub/a/2005/04/20/py-xml.htmlUse XSLT to prepare XML
for import into OpenOffice Calc -
http://www.ibm.com/developerworks/xml/library/x-oocalc/
Schema standardization for top-down semantic transparency -
http://www-128.ibm.com/developerworks/xml/library/x-think31.html

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Elementtree and CDATA handling

2005-06-04 Thread uche . ogbuji

"If, instead, you want to keep track of where the CDATA sections are,
and output them again without change, you'll need to use an
XML-handling interface that supports this feature. Typically, DOM
implementations do - the default Python minidom does, as does pxdom.
DOM is a more comprehensive but less friendly/Python-like interface for
XML processing. "

Amara in CVS makes it easy to perform the output part of this:

text="""
Document



//<![CDATA[
function matchwo(a,b)
{
if (a < b && a > 0) then
   {
   return 1
   }
}

//]]>



"""

from amara.binderytools import bind_string
doc = bind_string(text)
print doc.xml(cdataSectionElements=[u'script'])

Output:



Document


<![CDATA[
//
function matchwo(a,b)
{
if (a < b && a > 0) then
   {
   return 1
   }
}

//
]]>



Unfortunately, in cooking up this example I did find a bug in the Amara
1.0b1 release that requires a workaround.  I should be releasing 1.0b2
this weekend, which fixes this bug (among other fixes and
improvements).

"If you're generating output for legacy browsers, you might want to
just
use a 'real' HTML serialiser. "

Amara does provide for this, e.g.:

from amara.binderytools import bind_string
doc = bind_string(text)
print doc.xml(method=u"html")

Which automatically and transparently brings to bear the full power of
the XSLT HTML output method.

--
Uche Ogbuji   Fourthought, Inc.
http://uche.ogbuji.nethttp://fourthought.com
http://copia.ogbuji.net   http://4Suite.org
Use CSS to display XML, part 2 -
http://www-128.ibm.com/developerworks/edu/x-dw-x-xmlcss2-i.html
XML Output with 4Suite & Amara -
http://www.xml.com/pub/a/2005/04/20/py-xml.htmlUse XSLT to prepare XML
for import into OpenOffice Calc -
http://www.ibm.com/developerworks/xml/library/x-oocalc/
Schema standardization for top-down semantic transparency -
http://www-128.ibm.com/developerworks/xml/library/x-think31.html

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Newbie Python & XML

2005-06-04 Thread uche . ogbuji

"I have a situation at work.  Will be receiving XML file which contains
quote information for car insurance.  I need to translate this file
into a flat comma delimited file which will be imported into a software
package.  Each XML file I receive will contain information on one quote
only.  I have documentation on layout of flat file and examples of XML
file (lot of fields but only container tags and field tags no
DTD's,look easy enough).  I am just starting to learn python and have
never had to work with XML files before.  Working in MS Windows
environment.  I have Python 2.4 with win32 extensions. "

Sounds like the sort of thing I and others have done very easily with
Amara:

http://www.xml.com/pub/a/2005/01/19/amara.html

Overall, if you're new to Python and XML, here are some resources:

http://www.xml.com/pub/at/24
http://uche.ogbuji.net/tech/akara/nodes/2003-01-01/general-section

--
Uche Ogbuji   Fourthought, Inc.
http://uche.ogbuji.nethttp://fourthought.com
http://copia.ogbuji.net   http://4Suite.org
Use CSS to display XML, part 2 -
http://www-128.ibm.com/developerworks/edu/x-dw-x-xmlcss2-i.html
XML Output with 4Suite & Amara -
http://www.xml.com/pub/a/2005/04/20/py-xml.htmlUse XSLT to prepare XML
for import into OpenOffice Calc -
http://www.ibm.com/developerworks/xml/library/x-oocalc/
Schema standardization for top-down semantic transparency -
http://www-128.ibm.com/developerworks/xml/library/x-think31.html

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Bug in Elementtree/Expat

2005-05-20 Thread uche . ogbuji

"""
 > Most examples in the book do not include such a declaration and yet
are
> properly rendered by Internet Explorer.
> Is it mandatory and why is it that Expat crashes on it?

It's not mandatory but it's probably good practice to make the document
self-contained. The xlink prefix is defined in the DTD but Expat, as a
nonvalidating parser, won't fetch it.
"""

Important clarification:

The decision whether or not to read the external DTD subset is separate
from the decision whether or not to validate.  Expat does not validate,
but it does read the external subset, if you tell it to.  There are
other uses for reading the external subset, such as entity resolution.
And you can have validation constructs in the internal DTD subset (IOW
right in the XML source file itself), and expat will not do anything
with them because it does not validate.

This may seem a subtle distinction, but it lies behind a lot of user
confusion in practice.  The XML WG really should have simplified such
matters (IIRC SGML compatability was a big obstruction to doing so).

-- 
Uche
http://uche.ogbuji.net

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: XML file parsing with SAX

2005-04-23 Thread Uche Ogbuji

On 4/23/05, Willem Ligtenberg <[EMAIL PROTECTED]> wrote:
> so that will be sax.handler.feature_external_ges = "false"

Yes.

> And it will work?

Honestly, I'm not sure.  It should, but I've found these edge cases a
bit hard to predict in  the Python built-in libs :-(

> But what about using a catalog? I am very new to python and XML...

Catalogs allow you to rewrite the IDs for entities and such.  So if
you had an XML file with an entity at a URL, but you were working
offline, you could use a catalog to "redirect" the entity to a copy on
your local filesystem.

Problem, now that I think of it, is that I'm not sure you can specify
an catalog in PySAX.  You might instead have to override the method
entityResolver in your handler (and be sure to ).  See the example in
listing 1 and and discussion here:

http://www.xml.com/pub/a/2005/03/02/pyxml.html

Good luck.

-- 
Uche Ogbuji   Fourthought, Inc.
http://uche.ogbuji.nethttp://fourthought.com
http://copia.ogbuji.net   http://4Suite.org
Use CSS to display XML, part 2 -
http://www-128.ibm.com/developerworks/edu/x-dw-x-xmlcss2-i.html
XML Output with 4Suite & Amara -
http://www.xml.com/pub/a/2005/04/20/py-xml.htmlUse XSLT to prepare XML
for import into OpenOffice Calc -
http://www.ibm.com/developerworks/xml/library/x-oocalc/
Schema standardization for top-down semantic transparency -
http://www-128.ibm.com/developerworks/xml/library/x-think31.html
--
http://mail.python.org/mailman/listinfo/python-list

Re: XML file parsing with SAX

2005-04-23 Thread Uche Ogbuji

On Sat, 2005-04-23 at 15:20 +0200, Willem Ligtenberg wrote:
> I decided to use SAX to parse my xml file.
> But the parser crashes on:
>   File "/usr/lib/python2.3/site-packages/_xmlplus/sax/handler.py", line 38, 
> in fatalError
> raise exception
> xml.sax._exceptions.SAXParseException: NCBI_Entrezgene.dtd:8:0: error in 
> processing external entity reference
> 
> This is caused by:
>  "NCBI_Entrezgene.dtd">
> 
> If I remove it, it parses normally.
> I've created my parser like this:
> import sys
> from xml.sax import make_parser
> from handler import EntrezGeneHandler
> 
> fopen = open("mouse2.xml", "r")
> ch = EntrezGeneHandler()
> saxparser = make_parser()
> saxparser.setContentHandler(ch)
> saxparser.parse(fopen)
> 
> And the handler is:
> from xml.sax import ContentHandler
> 
> class EntrezGeneHandler(ContentHandler):
>   """
>   A handler to deal with EntrezGene in XML
>   """
>   
>   def startElement(self, name, attrs):
>   print "Start element:", name
> 
> So it doesn't do much yet. And still it crashes...
> How can I tell the parser not to look at the DOCTYPE declaration.
> On a website:
> http://www.devarticles.com/c/a/XML/Parsing-XML-with-SAX-and-Python/1/
> it states that the SAX parsers are not validating, so this error shouldn't
> even occur?

Just because it's not validating doesn't mean that the parser won't try
to read the external entity.

Maybe you're looking for 

"""
feature_external_ges
Value: "http://xml.org/sax/features/external-general-entities"; 
true: Include all external general (text) entities. 
false: Do not include external general entities. 
access: (parsing) read-only; (not parsing) read/write
"""

Quote from:

http://docs.python.org/lib/module-xml.sax.handler.html

But you're on pretty shaky ground in any XML 1.x toolkit using a bogus
DTDecl in this way.  Why go through the hassle?  Why not use a catalog,
or remove the DTDecl?


-- 
Uche Ogbuji   Fourthought, Inc.
http://uche.ogbuji.nethttp://fourthought.com
http://copia.ogbuji.net   http://4Suite.org
Use CSS to display XML, part 2 - 
http://www-128.ibm.com/developerworks/edu/x-dw-x-xmlcss2-i.html
XML Output with 4Suite & AMara - http://www.xml.com/pub/a/2005/04/20/py-xml.html
Use XSLT to prepare XML for import into OpenOffice Calc - 
http://www.ibm.com/developerworks/xml/library/x-oocalc/
Schema standardization for top-down semantic transparency - 
http://www-128.ibm.com/developerworks/xml/library/x-think31.html

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: python modules in home dir

2005-04-23 Thread Uche Ogbuji

On Sat, 2005-04-16 at 08:12 -0600, Uche Ogbuji wrote:
> On Sat, 2005-04-09 at 14:09 -0700, dzieciou wrote:
> 
> > I'm new-comer in Python.
> > I want to install few Python modules (4Suite, RDFLib, Twisted and Racoon)
> > in my home directory, since Python installation is already installed in the
> > system
> > and I'm NOT its admin.
> > I cannot install pyvm (portable binary python machine) - have no such big
> > quota.
> > Any idea how can I solve it?
> 
> To install 4Suite in the home dir, use an incantation such as:
> 
> ./setup.py config --prefix=$HOME/lib
> ./setup.py install
> 
> Note: I expect you also installed Python in your home dir?

BTW, I expanded on this suggestion at:

http://copia.ogbuji.net/blog/2005-04-16/Installing


-- 
Uche Ogbuji   Fourthought, Inc.
http://uche.ogbuji.nethttp://fourthought.com
http://copia.ogbuji.net   http://4Suite.org
Use CSS to display XML, part 2 - 
http://www-128.ibm.com/developerworks/edu/x-dw-x-xmlcss2-i.html
XML Output with 4Suite & AMara - http://www.xml.com/pub/a/2005/04/20/py-xml.html
Use XSLT to prepare XML for import into OpenOffice Calc - 
http://www.ibm.com/developerworks/xml/library/x-oocalc/
Schema standardization for top-down semantic transparency - 
http://www-128.ibm.com/developerworks/xml/library/x-think31.html

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: python modules in home dir

2005-04-16 Thread Uche Ogbuji

On Sat, 2005-04-09 at 14:09 -0700, dzieciou wrote:

> I'm new-comer in Python.
> I want to install few Python modules (4Suite, RDFLib, Twisted and Racoon)
> in my home directory, since Python installation is already installed in the
> system
> and I'm NOT its admin.
> I cannot install pyvm (portable binary python machine) - have no such big
> quota.
> Any idea how can I solve it?

To install 4Suite in the home dir, use an incantation such as:

./setup.py config --prefix=$HOME/lib
./setup.py install

Note: I expect you also installed Python in your home dir?


-- 
Uche Ogbuji   Fourthought, Inc.
http://uche.ogbuji.nethttp://fourthought.com
http://copia.ogbuji.net   http://4Suite.org
Use CSS to display XML, part 2 - 
http://www-128.ibm.com/developerworks/edu/x-dw-x-xmlcss2-i.html
Writing and Reading XML with XIST - 
http://www.xml.com/pub/a/2005/03/16/py-xml.html
Use XSLT to prepare XML for import into OpenOffice Calc - 
http://www.ibm.com/developerworks/xml/library/x-oocalc/
Schema standardization for top-down semantic transparency - 
http://www-128.ibm.com/developerworks/xml/library/x-think31.html

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: how to use structured markup tools

2005-03-22 Thread Uche Ogbuji

On Sat, 2005-03-19 at 00:14 -0800, Sean McIlroy wrote: 
> I'm dealing with XML files in which there are lots of tags of the
> following form: xy (all of these letters are being
> used as 'metalinguistic variables') Not all of the tags in the file are
> of that form, but that's the only type of tag I'm interested in. (For
> the insatiably curious, I'm talking about a conversation log from MSN
> Messenger.) What I need to do is to pull out all the x's and y's in a
> form I can use. In other words, from...
> 
> .
> .
> x1y1
> .
> .
> x2y2
> .
> .
> x3y3
> .
> .
> 
> ...I would like to produce, for example,...
> 
> [ (x1,y1), (x2,y2), (x3,y3) ]
> 
> Now, I'm aware that there are extensive libraries for dealing with
> marked-up text, but here's the thing: I think I have a reasonable
> understanding of python, but I use it in a lisplike way, and in
> particular I only know the rudiments of how classes work. So here's
> what I'm asking for:
> 
> Can anybody give me a rough idea how to come to grips with the problem
> described above? Or even (dare to dream) example code? Any help will be
> very much appreciated.

There are many tools you can use to get this done in Python.  Here's a
recipe using Amara ( http://www.xml.com/pub/a/2005/01/19/amara.html )

DOC = """\

x1y1
x2y2
x3y3

"""

from amara import binderytools

matrix = []
for row in binderytools.pushbind(u'a', string=DOC):
matrix.append((unicode(row.b), unicode(row.c)))

print matrix

Which outputs:

[(u'x1', u'y1'), (u'x2', u'y2'), (u'x3', u'y3')]

If your matrix actually has a variable or previously unknown number of
columns (e.g. x1y1z1 ), the following
version of the for loop is a more general solution:

for row in binderytools.pushbind(u'a', string=DOC):
matrix.append(tuple([ unicode(e) for e in row.xml_xpath(u'*') ]))

Same output, of course.  I even tested it for you in Amara 0.9.4.  And
what the heck, while I was there, I added it to the demos.

You can make things even more obfuscated^H^H^H^H^H^H^H^H^H^Hterse using
further lambda or list comp tricks, but I leave that as an exercise for
the perverse ;-)


-- 
Uche OgbujiFourthought, Inc.
http://uche.ogbuji.nethttp://4Suite.orghttp://fourthought.com
Use CSS to display XML, part 2 - 
http://www-128.ibm.com/developerworks/edu/x-dw-x-xmlcss2-i.html
Writing and Reading XML with XIST - 
http://www.xml.com/pub/a/2005/03/16/py-xml.html
Introducing the Amara XML Toolkit - http://www.xml.com/pub/a/2005/01/19/amara.ht
Be humble, not imperial (in design) - http://www.adtmag.com/article.asp?id=10286
Querying WordNet as XML - 
http://www.ibm.com/developerworks/xml/library/x-think29.html
Packaging XSLT lookup tables as EXSLT functions - 
http://www.ibm.com/developerworks/xml/library/x-tiplook2.html

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Simple XML-to-Python conversion

2005-03-22 Thread Uche Ogbuji

On Fri, 2005-03-18 at 11:04 -0800, [EMAIL PROTECTED] wrote:
> Since I've exhausted every option except for Amara, I've decided to
> give it a try.  However, this will only work if I can compile Amara and
> 4suite along with my application.  I doubt 4suite will be able to be
> compiled, but I'll try it anyway.

Actually, as I mentioned in my last message, we do have some success
reports re: 4Suite + py2exe.  See the March archives of the 4Suite list.
I think it took some work from those of the 4Suite developers who are
Windows-savvy, it did the job.


-- 
Uche OgbujiFourthought, Inc.
http://uche.ogbuji.nethttp://4Suite.orghttp://fourthought.com
Use CSS to display XML, part 2 - 
http://www-128.ibm.com/developerworks/edu/x-dw-x-xmlcss2-i.html
Introducing the Amara XML Toolkit - 
http://www.xml.com/pub/a/2005/01/19/amara.html
Gems from the Mines: 2002 to 2003 - 
http://www.xml.com/pub/a/2005/03/02/pyxml.html
Be humble, not imperial (in design) - http://www.adtmag.com/article.asp?id=10286
Querying WordNet as XML - 
http://www.ibm.com/developerworks/xml/library/x-think29.html
Packaging XSLT lookup tables as EXSLT functions - 
http://www.ibm.com/developerworks/xml/library/x-tiplook2.html

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Simple XML-to-Python conversion

2005-03-22 Thread Uche Ogbuji

On Sat, 2005-03-19 at 15:38 -0800, [EMAIL PROTECTED] wrote:
> Thanks Lutz!
> 
> I should have looked into Amara's binderytools module earlier.  This is
> just the type of tool I was looking for.  When I tried testing its
> compatibility with py2exe, I was _almost_ able to compile...  Does
> anyone know where the following libraries exist?  I thought Amara would
> have these included, but it looks like I need to install another
> module.

Were currently on the 4Suite mailing list chasing down all the magic
required for py2exe.  I'm largely a Windows illiterate, but this looks
like what I remember:

http://lists.fourthought.com/pipermail/4suite/2005-March/013450.html

I do want to be sure Amara can be packaged with py2exe, so please let me
know if this helps.  You might want to consider continuing the
discussion on the 4SUite list (which I use for Amara discussion as
well).  I follow that list far more diligently than c.l.py.


-- 
Uche OgbujiFourthought, Inc.
http://uche.ogbuji.nethttp://4Suite.orghttp://fourthought.com
Use CSS to display XML, part 2 - 
http://www-128.ibm.com/developerworks/edu/x-dw-x-xmlcss2-i.html
Introducing the Amara XML Toolkit - 
http://www.xml.com/pub/a/2005/01/19/amara.html
Gems from the Mines: 2002 to 2003 - 
http://www.xml.com/pub/a/2005/03/02/pyxml.html
Be humble, not imperial (in design) - http://www.adtmag.com/article.asp?id=10286
Querying WordNet as XML - 
http://www.ibm.com/developerworks/xml/library/x-think29.html
Packaging XSLT lookup tables as EXSLT functions - 
http://www.ibm.com/developerworks/xml/library/x-tiplook2.html

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: SAX parsing problem

2005-03-22 Thread Uche Ogbuji

On Wed, 2005-03-16 at 00:14 -0800, gh wrote:
> The characters handler routine is fired 3 times for a
> single text block.  Why does it do this?  Is there a way to prevent
> doing this? 

Continuing in the vein of closing matters cross-posted to XML-SIG:

http://mail.python.org/pipermail/xml-sig/2005-March/011013.html

-- 
Uche OgbujiFourthought, Inc.
http://uche.ogbuji.nethttp://4Suite.orghttp://fourthought.com
Use CSS to display XML, part 2 - 
http://www-128.ibm.com/developerworks/edu/x-dw-x-xmlcss2-i.html
Introducing the Amara XML Toolkit - 
http://www.xml.com/pub/a/2005/01/19/amara.html
Gems from the Mines: 2002 to 2003 - 
http://www.xml.com/pub/a/2005/03/02/pyxml.html
Be humble, not imperial (in design) - http://www.adtmag.com/article.asp?id=10286
Querying WordNet as XML - 
http://www.ibm.com/developerworks/xml/library/x-think29.html
Packaging XSLT lookup tables as EXSLT functions - 
http://www.ibm.com/developerworks/xml/library/x-tiplook2.html

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: SAX: Help on processing qualified attribute values

2005-03-22 Thread Uche Ogbuji

On Thu, 2005-03-10 at 15:22 +0100, Markus Doering wrote:
> Hey,
> 
> I am trying to process XML schema documents using namespace aware SAX 
> handlers. Currently I am using the default python 2.3 parser:
> 
> parser = xml.sax.make_parser()
> parser.setFeature(xml.sax.handler.feature_namespaces, 1)
> 
> 
> At some point I need to parse xml attributes which contain namespace 
> prefixes as their value. For example:
> 
> 
> 
> The default SAX parser does a good job on dealing with qualified names 
> as xml tags, but is there a way I can access the internal sax mapping 
> between prefixes and full namespaces to be able to parse "qualified 
> attribute values"? A simple private dictionary prefix2namespace would be 
> sufficient.

Just for others, this was answered here:

http://mail.python.org/pipermail/xml-sig/2005-March/010989.html

I also provide a useful mix-in class for this purpose in Amara's
saxtools:

http://www.xml.com/pub/a/2005/01/19/amara.html
http://cvs.4suite.org/viewcvs/Amara/lib/saxtools.py?rev=1.9&view=markup

In the latter link see class namespace_mixin, which you should be
able to copy to your code if you don't want to install Amara).


-- 
Uche OgbujiFourthought, Inc.
http://uche.ogbuji.nethttp://4Suite.orghttp://fourthought.com
Use CSS to display XML, part 2 - 
http://www-128.ibm.com/developerworks/edu/x-dw-x-xmlcss2-i.html
Introducing the Amara XML Toolkit - 
http://www.xml.com/pub/a/2005/01/19/amara.html
Gems from the Mines: 2002 to 2003 - 
http://www.xml.com/pub/a/2005/03/02/pyxml.html
Be humble, not imperial (in design) - http://www.adtmag.com/article.asp?id=10286
Querying WordNet as XML - 
http://www.ibm.com/developerworks/xml/library/x-think29.html
Packaging XSLT lookup tables as EXSLT functions - 
http://www.ibm.com/developerworks/xml/library/x-tiplook2.html

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: best XSLT processor?

2005-03-04 Thread uche . ogbuji

Steve Holden:

"I don't know what news reader you are using, but I wonder if I could
ask
you to retain just a little more context in your posts. If they were
personal emails I would probably be able to follow the thread, but in a
newsgroup it's always helpful when I see a comment such as your above
if
I know what the heck you are talking about ;-)."

I'm using Google Groups.  I'd assumed it maintains quoting, but I guess
not.  Looks as if I'll have to ditch it, which makes things awkward
because I don't have time to follow this NG in its entirety: I prefer
to just search weekly for "Python XML".

--Uche

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: get textual content of a Xml element using 4DOM

2005-03-04 Thread uche . ogbuji

I suggest using minidom or pxdom [1] rather than 4DOM.  If you insist
on using 4DOM, xml.dom.ext.Print(node) or xml.dom.ext.PrettyPrint(node)
does what you want.

[1] http://www.doxdesk.com/software/py/pxdom.html

--Uche

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: get textual content of a Xml element using 4DOM

2005-03-04 Thread uche . ogbuji

I suggest using minidom or pxdom [1] rather than 4DOM.  If you insist
on using 4DOM, xml.dom.ext.Print(node) or xml.dom.ext.PrettyPrint(node)
does what you want.

--Uche

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: best XSLT processor?

2005-02-28 Thread uche . ogbuji

Actually, most of the compliant problems I can remember off-head with
respect to Xalan have been regarding EXSLT 1.0, not base XSLT 1.0.
Sorry for any misconstruction.

--Uche

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: best XSLT processor?

2005-02-28 Thread uche . ogbuji

This is a good way to kick off a tussle among interested parties, but
hinestly, at this point, most packages work fine.  In my opinion your
rade-off right now is raw speed (e.g. libxslt) versus flexibility (e.g.
4Suite).  All are bug-free enough that you'd have to be doing somethign
*very* exotic to run into trouble.

Just pick one or two and try them.

http://uche.ogbuji.net/tech/akara/nodes/2003-01-01/python-xslt

--Uche

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: best XSLT processor?

2005-02-28 Thread uche . ogbuji

Xalan is certainly faster, but it is almost certainly not more
compliant than 4Suite.  Xalan actually has a bit of a reputation among
XSLT processors in its carelessness with compliance.  But I suppoose in
order to settle these counter-claims, one of us will have to come up
with specific compliance examples.  You fired the first shot.  Can you
back it up?

--Uche

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: best XSLT processor?

2005-02-28 Thread uche . ogbuji

Who says 4Suite is buggy?  Do they have any evidence to back that up?
We have a huge test suite, and though 4Suite is by no means the fastest
option, it's quite reliable for XSLT.

The XSLT processor in PyXML is just a very old version of 4XSLT.

--Uche

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: forms, xslt and python

2005-02-04 Thread Uche Ogbuji

Firstly, that isn't an XML file.  You're missing quotes around
attribute values.

Secondly, your question is very unclear.  Are you looking for an XSLT
way to correlate the correct_answer attribute to the alternative
element in corresponding order?  Are you looking for a Python means to
do this?


--
Uche OgbujiFourthought, Inc.
http://uche.ogbuji.nethttp://4Suite.orghttp://fourthought.com
Use CSS to display XML -
http://www.ibm.com/developerworks/edu/x-dw-x-xmlcss-i.html
Introducing the Amara XML Toolkit -
http://www.xml.com/pub/a/2005/01/19/amara.html
Be humble, not imperial (in design) -
http://www.adtmag.com/article.asp?id=10286Querying WordNet as XML -
http://www.ibm.com/developerworks/xml/library/x-think29.html
Manage XML collections with XAPI -
http://www-106.ibm.com/developerworks/xml/library/x-xapi.html
Default and error handling in XSLT lookup tables -
http://www.ibm.com/developerworks/xml/library/x-tiplook.html
Packaging XSLT lookup tables as EXSLT functions -
http://www.ibm.com/developerworks/xml/library/x-tiplook2.html

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: parsing WSDL

2005-01-27 Thread Uche Ogbuji

Just for completeness I wanted to mention that yes, you can use 4Suite
to parse WSDL and get method signature information, but I do agree that
it's better to do this at a higher level, if you can.  WHy reinvent
that wheel?

SOAPpy has a decent WSDL class.

--
Uche OgbujiFourthought, Inc.
http://uche.ogbuji.nethttp://4Suite.orghttp://fourthought.com
Use CSS to display XML -
http://www.ibm.com/developerworks/edu/x-dw-x-xmlcss-i.html
Introducing the Amara XML Toolkit -
http://www.xml.com/pub/a/2005/01/19/amara.html
Be humble, not imperial (in design) -
http://www.adtmag.com/article.asp?id=10286UBL 1.0 -
http://www-106.ibm.com/developerworks/xml/library/x-think28.html
Manage XML collections with XAPI -
http://www-106.ibm.com/developerworks/xml/library/x-xapi.html
Default and error handling in XSLT lookup tables -
http://www.ibm.com/developerworks/xml/library/x-tiplook.html
Packaging XSLT lookup tables as EXSLT functions -
http://www.ibm.com/developerworks/xml/library/x-tiplook2.html

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: 4suite XSLT thread safe ?

2005-01-27 Thread Uche Ogbuji

Sorry I'm late to the whole thread.  Diez B. Roggisch is pretty much
right on the money in all his comments.  4XSLT *is* thread safe, but
each individual processor instance is not thread safe.  Yes, this is
typical OO style: you encapsulate state in an instance so that as long
as each thread has its own instance, there are no state clashes.

Therefore, you should be creating at least one processor object per
thread.

Note: the 4Suite server is a multi-threaded architecture that uses
4XSLT heavily using processor-per-thread.

--
Uche OgbujiFourthought, Inc.
http://uche.ogbuji.nethttp://4Suite.orghttp://fourthought.com
Use CSS to display XML -
http://www.ibm.com/developerworks/edu/x-dw-x-xmlcss-i.html
Introducing the Amara XML Toolkit -
http://www.xml.com/pub/a/2005/01/19/amara.html
Be humble, not imperial (in design) -
http://www.adtmag.com/article.asp?id=10286UBL 1.0 -
http://www-106.ibm.com/developerworks/xml/library/x-think28.html
Manage XML collections with XAPI -
http://www-106.ibm.com/developerworks/xml/library/x-xapi.html
Default and error handling in XSLT lookup tables -
http://www.ibm.com/developerworks/xml/library/x-tiplook.html
Packaging XSLT lookup tables as EXSLT functions -
http://www.ibm.com/developerworks/xml/library/x-tiplook2.html

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Clarification on XML parsing & namespaces (xml.dom.minidom)

2005-01-27 Thread Uche Ogbuji

Greg Wogan-Browne wrote:
> I am having some trouble figuring out what is going on here - is this
a
> bug, or correct behaviour? Basically, when I create an XML document
with
> a namespace using xml.dom.minidom.parse() or parseString(), the
> namespace exists as an xmlns attribute in the DOM (fair enough, as
it's
> in the original source document). However, if I use the DOM
> implementation to create an identical document with a namespace, the
> xmlns attribute is not present.
>
> This mainly affects me when I go to print out the document again
using
> Document.toxml(), as the xmlns attribute is not printed for documents
I
> create dynamically, and therefore XSLT does not kick in (I'm using an

> external processor).
>
> Any thoughts on this would be appreciated. Should I file a bug on
pyxml?

It's odd behavior, but I think it's a stretch to call it a bug.  You
problem is that you're mixing namespaced documents with the
non-namespace DOM API.  That means trouble and such odd quirks every
time.

Use getAttributeNS, createElementNS, setAttributeNS, etc. rather than
getAttribute, createElement, setAttribute, etc.

--
Uche OgbujiFourthought, Inc.
http://uche.ogbuji.nethttp://4Suite.orghttp://fourthought.com
Use CSS to display XML -
http://www.ibm.com/developerworks/edu/x-dw-x-xmlcss-i.html
Introducing the Amara XML Toolkit -
http://www.xml.com/pub/a/2005/01/19/amara.html
Be humble, not imperial (in design) -
http://www.adtmag.com/article.asp?id=10286UBL 1.0 -
http://www-106.ibm.com/developerworks/xml/library/x-think28.html
Manage XML collections with XAPI -
http://www-106.ibm.com/developerworks/xml/library/x-xapi.html
Default and error handling in XSLT lookup tables -
http://www.ibm.com/developerworks/xml/library/x-tiplook.html
Packaging XSLT lookup tables as EXSLT functions -
http://www.ibm.com/developerworks/xml/library/x-tiplook2.html

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: XPath and XQuery in Python?

2005-01-14 Thread Uche Ogbuji

Interesting discussion.  My own thoughts:

http://www.oreillynet.com/pub/wlg/6224
http://www.oreillynet.com/pub/wlg/6225

Meanwhile, please don't make the mistake of bothering with XQuery.
It's despicable crap.  And a huge impedance mismatch with Python.
--Uche

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Any Python XML Data Binding Utilities Avaiable?

2005-01-01 Thread Uche Ogbuji

Sounds like generateDS is closest to what you want:

http://www.rexx.com/~dkuhlman/generateDS.html

If you can bind from instances only and don't need schema, see Amara
Bindery:

http://uche.ogbuji.net/tech/4Suite/amara/

Also consider Gnosis Utilities and ElementTree.

--
Uche OgbujiFourthought, Inc.
http://uche.ogbuji.nethttp://4Suite.orghttp://fourthought.com
Use CSS to display XML -
http://www.ibm.com/developerworks/edu/x-dw-x-xmlcss-i.html
Full XML Indexes with Gnosis -
http://www.xml.com/pub/a/2004/12/08/py-xml.html
Be humble, not imperial (in design) -
http://www.adtmag.com/article.asp?id=10286UBL 1.0 -
http://www-106.ibm.com/developerworks/xml/library/x-think28.html
Use Universal Feed Parser to tame RSS -
http://www.ibm.com/developerworks/xml/library/x-tipufp.html
Default and error handling in XSLT lookup tables -
http://www.ibm.com/developerworks/xml/library/x-tiplook.html
A survey of XML standards -
http://www-106.ibm.com/developerworks/xml/library/x-stand4/
The State of Python-XML in 2004 -
http://www.xml.com/pub/a/2004/10/13/py-xml.html

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: editing XML via DOM

2004-12-24 Thread Uche Ogbuji


jaco wrote:
> Hi,
>
> I'm new to Python and XML but still I want to create something that
> includes creating and editing XML using Python.
>
> Now I'm looking for a little example program that does (some of) this
to
> set me on my way.
>
> Is there something like this available or can somebody give me some
> example lines that creates and saves some XML data?

Start with:

http://www.xml.com/pub/a/2002/11/13/py-xml.html

then see:

http://www.xml.com/pub/a/2003/10/15/py-xml.html

Overall, there is a lot on DOM throughout the series:

http://www.xml.com/pub/at/24


--
Uche OgbujiFourthought, Inc.
http://uche.ogbuji.nethttp://4Suite.orghttp://fourthought.com
Use CSS to display XML -
http://www.ibm.com/developerworks/edu/x-dw-x-xmlcss-i.html
Full XML Indexes with Gnosis -
http://www.xml.com/pub/a/2004/12/08/py-xml.html
Be humble, not imperial (in design) -
http://www.adtmag.com/article.asp?id=10286UBL 1.0 -
http://www-106.ibm.com/developerworks/xml/library/x-think28.html
Use Universal Feed Parser to tame RSS -
http://www.ibm.com/developerworks/xml/library/x-tipufp.html
Default and error handling in XSLT lookup tables -
http://www.ibm.com/developerworks/xml/library/x-tiplook.html
A survey of XML standards -
http://www-106.ibm.com/developerworks/xml/library/x-stand4/
The State of Python-XML in 2004 -
http://www.xml.com/pub/a/2004/10/13/py-xml.html

-- 
http://mail.python.org/mailman/listinfo/python-list

1 2 >

1 - 100 of 102 matches

Mail list logo