Re: lxml/ElementTree and .tail
Paul McGuire wrote: > Thankfully, I'm largely on the periphery of that universe (except for being > a sometimes victim). But it is certainly frustrating to see many of the OMG > concepts of the 90's reimplemented in Java services, and then again in > XML/SOAP, with no detectable awareness that these messaging and > serialization problems have been considered before, and much more > thoroughly. You'll be surprised at how many XMLers agree that Web services are a pretty inept reinvention of CORBA. I was pretty much slain by this take: http://wanderingbarque.com/nonintersecting/2006/11/15/the-s-stands-for-simple I think Duncan Grisby of OmniORB put it most succintly when he pointed out that SOAP and friends are more complex, more bloated, and less interoprable than CORBA ever was. But they use XML so they get the teacher's pet treatment. > I liked XML when I could read it and hack it out in Notepad. You still can, and don't let anyone tell you otherwise. I've always argued that XML doesn't work unless it's Notepad-hackable. I do usually allow an exception for SVG. > I like > attributes, which puts me on the outs with most XML zealots who forswear the > use of attributes on purely academic grounds (they defeat the future > possible expansion of an attribute's value into more complex substructure). Really? Do you have any references for this? I haven't seen much criticism of attributes since the very early days, and almost all XML technologies make heavy use of attributes. Here's my take: http://www.ibm.com/developerworks/xml/library/x-eleatt.html As you can see, elements and attributes get equal billing. > I dislike namespaces, especially the default xmlns kind, as they make me > take extra steps when retrieving nodes via Xpaths; and everyone seems to > think their application needs namespaces, when there is no threat that these > tags will ever get mixed up with anyone else's. Namespaces are possibly the worst thing to have ever happened to XML. Again, my take: http://www.ibm.com/developerworks/xml/library/x-namcar.html And yes, default namespaces are about 50% of the problem with namespace. QNames in content (which are of course an abuse of namespaces) are almost all of the other 50%. I call them "hidden namespaces": http://copia.ogbuji.net/blog/2006-08-14/Some_thoug -- Uche Ogbuji Fourthought, Inc. http://uche.ogbuji.nethttp://fourthought.com http://copia.ogbuji.net http://4Suite.org Articles: http://uche.ogbuji.net/tech/publications/ -- http://mail.python.org/mailman/listinfo/python-list
Re: lxml/ElementTree and .tail
Fredrik Lundh wrote: > Uche Ogbuji wrote: > > > I certainly have never liked the aspects of the ElementTree API under > > present discussion. But that's not as important as the fact that I > > think the above statement is misleading. There has always been a > > battle in XML between the people who think the serialization is > > preeminent, and those who believe some data model is preeminent, but > > the reality is that XML 1.0 (an 1.1) is a spec *defined* by its > > serialization. > > sure, the computing world is and has always been full of people who want > the simplest thing to look a lot harder than it actually is. after all, > *they* spent lots of time reading all the specifications, they've bought > all the books, and went to all the seminars, so it's simply not fair > when others are cheating. You sound bitter about something. Don't worry, it's really not all that serious. > in reality, *all* interchange formats are easier to understand and use > if you focus on a (complete or intentionally simplified) data model of > the things being interchanged, and treat various artifacts of the > byte-stream used by the wire format as artifacts, historical accidents > based on what specification happened to be written before the other, or > what some guy did or did not do in the seventies, as accidents, and > esoteric arcana disseminated on limited-distribution mailing lists as > about as relevant for your customer as last week's episode of American Idol. The fact that the XML Infoset is hardly used outside W3C XML Schema, and that the XPath data model is far more common, and that focus on the serialization is even more common than that is a matter of everyday practicality. And oh by the way, this thread is all about *your* customer's complaining. And your response is to give them your philosophical take on XML. Doesn't that contradict what you're saying above? Oh never mind. You posted something misleading, and I posted another point of view. I know you're incapable of any disagreement that doesn't devolve into a full-scale flame-war. Sometimes I have time for that sort of thing. This is not one fo those times, so this is probably where I get off. -- Uche Ogbuji Fourthought, Inc. http://uche.ogbuji.nethttp://fourthought.com http://copia.ogbuji.net http://4Suite.org Articles: http://uche.ogbuji.net/tech/publications/ -- http://mail.python.org/mailman/listinfo/python-list
Re: newbie: minidom
Paul Watson wrote: > Explicit [XML declaration] is better than implicit. Yes indeed. "Always use an XML declaration" http://www-128.ibm.com/developerworks/xml/library/x-tipdecl.html -- Uche Ogbuji Fourthought, Inc. http://uche.ogbuji.nethttp://fourthought.com http://copia.ogbuji.net http://4Suite.org Articles: http://uche.ogbuji.net/tech/publications/ -- http://mail.python.org/mailman/listinfo/python-list
Re: lxml/ElementTree and .tail
Fredrik Lundh wrote: > Chas Emerick wrote: > > If I'm wrong, just chalk it up to the fact that this is the first > > time I've ever looked at the Infoset spec, and I'm simply confused. > > the Infoset spec *is* the essence of XML; if you don't realize that an > XML document is just a serialization of a very simple data model, you're > bound to be fighting with XML all the time. I certainly have never liked the aspects of the ElementTree API under present discussion. But that's not as important as the fact that I think the above statement is misleading. There has always been a battle in XML between the people who think the serialization is preeminent, and those who believe some data model is preeminent, but the reality is that XML 1.0 (an 1.1) is a spec *defined* by its serialization. Infoset is a secondary and optional spec. In fact, I think it's clear that Infoset is not even the preeminent *data model* of the XML world. That distinction goes to the XPath data model, which is quite different from the Infoset. -- Uche Ogbuji Fourthought, Inc. http://uche.ogbuji.nethttp://fourthought.com http://copia.ogbuji.net http://4Suite.org Articles: http://uche.ogbuji.net/tech/publications/ -- http://mail.python.org/mailman/listinfo/python-list
Re: WSGI - How Does It Affect Me?
goon wrote: > > Trying to research this on the web now > > Lots of articles now appearing summarising WSGI ... > > For definitive reference: > > <http://www.python.org/dev/peps/pep-0333/> [0] > > Overview: > > <http://www.xml.com/lpt/a/1674> [1] and > <http://www.xml.com/lpt/a/1675> [2] And also the following article, by me, focusing on middleware: http://www.ibm.com/developerworks/library/wa-wsgi/ (cover Weblog entry: http://copia.ogbuji.net/blog/2006-08-23/_Mix_and_m ) -- Uche Ogbuji Fourthought, Inc. http://uche.ogbuji.nethttp://fourthought.com http://copia.ogbuji.net http://4Suite.org Articles: http://uche.ogbuji.net/tech/publications/ -- http://mail.python.org/mailman/listinfo/python-list
Re: Trying to find a elements Xpath and store it as a attribute
provowallis wrote: > Hi all, > > I've been struggling with this for a while so I'm hoping that someone > could point me in the right direction. Here's my problem: I'm trying to > get the XPath for a given node in my document and then store that XPath > as an attribute of the element itself. If anyone has a recommendation > I'd be happy to hear it. Sorry. I only check c.l.py once a week or so... > For instance, I would take this XML > > ###before > > > > An XSLT Programmer > Hello, World! > > > ###after > > > > An XSLT Programmer > Hello, World! > > > ### > > import sets > import amara > from amara import binderytools > > doc = amara.parse('hello.xml') > elems = {} > > for e in doc.xml_xpath('//*'): > > paths = elems.setdefault((e.namespaceURI, e.localName), > sets.Set()) > path = u'/'.join([n.nodeName for n in > e.xml_xpath(u'ancestor::*')]) > paths.add(u'/' + path) > > for name in elems: > > doc.name.km = elems[name] It's a tougher problem than you may think :-) Luckily it's a problem I've worked on. For discussion see: http://www.xml.com/pub/a/2004/11/24/py-xml.html For an updated solution see abs_path in Amara domtools. In most cases you can safely call that on an Amara bindery node. -- Uche Ogbuji Fourthought, Inc. http://uche.ogbuji.nethttp://fourthought.com http://copia.ogbuji.net http://4Suite.org Articles: http://uche.ogbuji.net/tech/publications/ -- http://mail.python.org/mailman/listinfo/python-list
Re: File I/O
Ant wrote: > Kirt wrote: > ... > > i dont wanna parse the xml file.. > > > > Just open the file as: > > > > f=open('test.xml','a') > > > > and append a line "abc" before tag > > The other guys are right - you should look at something like > ElementTree which makes this sort of thing pretty easy, and is robust. > But if you are sure that: > > 1) is going to be on its own line (rather than something like > ) > 2) the ending tag will *definitely* have no extraneous whitespace (e.g. > < / Top >) > > then the following will probably be the simplest solution: > > f=open('test.xml') > out = [] > for line in f: > if "" in line: > out.append("abc") > out.append(line") > > f.close() > > f_out = open("test.xml", "w") > f.write("".join(out)) > f_out.close() And the most dangerous solution. Start with the line "out.append(line")" And have a look at the many failure possibilities I detail here: http://www.xml.com/pub/a/2002/11/13/py-xml.html Then add to that the fact that "" can legitimately appear in an XML comment, so that logic is even more brittle. The following code does this *safely* with Amara: import amara doc = amara.parse('test.xml') top = doc.xml_xpath('//Top')[0] top.xml_parent.xml_insert_before(top, doc.xml_create_element(u'Body2', content=u'abc')) top.xml(stream=open('test.xml', 'w')) Amara: http://uche.ogbuji.net/tech/4suite/amara/ -- Uche Ogbuji Fourthought, Inc. http://uche.ogbuji.nethttp://fourthought.com http://copia.ogbuji.net http://4Suite.org Articles: http://uche.ogbuji.net/tech/publications/ -- http://mail.python.org/mailman/listinfo/python-list
Re: PyXML not supported, what to use next?
Paul Watson wrote: > It would appear that xml.dom.minidom or xml.sax.* might be the best > thing to use since PyXML is going without support. Best of all it is > included in the base Python distribution, so no addition hunting required. FWIW, easy_install [1] is making things so that more and more installing stuff is not much additional burden. I'll admit that I've hardly found easy_install to be problem-free, but since it seems to be the wave of the future (and a welcome wave at that) I've pushed for support in recent versions of the XML tools I co-develop: 4Suite [2] and Amara [3]. For many people these are now very easy to install. This is the case for some other third-party XML tools as well. [1] http://peak.telecommunity.com/DevCenter/EasyInstall [2] http://4suite.org/ [3] http://uche.ogbuji.net/tech/4suite/amara/ -- Uche Ogbuji Fourthought, Inc. http://uche.ogbuji.nethttp://fourthought.com http://copia.ogbuji.net http://4Suite.org Articles: http://uche.ogbuji.net/tech/publications/ -- http://mail.python.org/mailman/listinfo/python-list
Re: XSLT speed comparisons
[EMAIL PROTECTED] wrote: > For what it's worth I just developed, and switched to WSGI middleware > that only does the transform on the server side if the client doesn't > understand XSLT. It's called applyxslt and is part of wsgi.xml [1]. > That reduces server load, and with caching (via Myghty), there's really > no issue for me. For more on WSGI middleware see [2]. > > [1] http://uche.ogbuji.net/tech/4suite/wsgixml/ > [2] http://www.ibm.com/developerworks/library/wa-wsgi/ I just wanted to clarify that not only does the applyxslt middleware approach reduce server load, but in the case of clients running IE6 or IE7, the XSLT *does* end up being executed in MSXML after all: MSXML on the client's browser, rather than on the server. In the case of Mozilla it's Transformiix, which is between MSXML and 4Suite in performance. Not sure what's the XSLT processor in the case of Safari (only the most recent versions of Safari). But regardless, with that coverage you can write apps using XSLT, support the entire spectrum of browsers (and mobile apps, spiders, etc.) and yet rarely ever require XSLT applied on the server side. > -- > Uche Ogbuji Fourthought, Inc. > http://uche.ogbuji.nethttp://fourthought.com > http://copia.ogbuji.net http://4Suite.org > Articles: http://uche.ogbuji.net/tech/publications/ -- http://mail.python.org/mailman/listinfo/python-list
Re: XSLT speed comparisons
Ross Ridge wrote: > Damian wrote: > It could just be that 4suite is slower than MSXML. If so, you can use > MSXML in Python if you want. You'll need to install the Python for > Windows extensions. Something like this: > > from os import environ > import win32com.client > > def buildPage(): [SNIP] Added to: http://uche.ogbuji.net/tech/akara/nodes/2003-01-01/python-xslt -- Uche Ogbuji Fourthought, Inc. http://uche.ogbuji.nethttp://fourthought.com http://copia.ogbuji.net http://4Suite.org Articles: http://uche.ogbuji.net/tech/publications/ -- http://mail.python.org/mailman/listinfo/python-list
Re: XSLT speed comparisons
Damian wrote: > Hi, I'm from an ASP.NET background an am considering making the switch > to Python. I decided to develop my next project in tandem to test the > waters and everything is working well, loving the language, etc. > > What I've got is: > two websites, one in ASP.NET v2 and one in Python 2.5 (using 4suite for > XML/XSLT) > both on the same box (Windows Server 2003) > both using the same XML, XSLT, CSS > > The problem is, the Python version is (at a guess) about three times > slower than the ASP one. I'm very new to the language and it's likely The ASP one being MSXML, right? In that case that result doesn't surprise me. > that I'm doing something wrong here: Now wrong, but we can definitely simplify your API > from os import environ > from Ft.Lib.Uri import OsPathToUri > from Ft.Xml import InputSource > from Ft.Xml.Xslt import Processor > > def buildPage(): > try: > xsluri = OsPathToUri('xsl/plainpage.xsl') > xmluri = OsPathToUri('website.xml') > > xsl = InputSource.DefaultFactory.fromUri(xsluri) > xml = InputSource.DefaultFactory.fromUri(xmluri) > > proc = Processor.Processor() > proc.appendStylesheet(xsl) > > params = {"url":environ['QUERY_STRING'].split("=")[1]} > for i, v in enumerate(environ['QUERY_STRING'].split("/")[1:]): > params["selected_section%s" % (i + 1)] = "/" + v > > return proc.run(xml, topLevelParams=params) > except: > return "Error blah blah" > > print "Content-Type: text/html\n\n" > print buildPage() This should work: from os import environ from Ft.Xml.Xslt import Transform def buildPage(): try: params = {"url":environ['QUERY_STRING'].split("=")[1]} for i, v in enumerate(environ['QUERY_STRING'].split("/")[1:]): params["selected_section%s" % (i + 1)] = "/" + v return Transform('website.xml', 'xsl/plainpage.xsl', topLevelParams=params) except: return "Error blah blah" print "Content-Type: text/html\n\n" print buildPage() -- % -- For what it's worth I just developed, and switched to WSGI middleware that only does the transform on the server side if the client doesn't understand XSLT. It's called applyxslt and is part of wsgi.xml [1]. That reduces server load, and with caching (via Myghty), there's really no issue for me. For more on WSGI middleware see [2]. [1] http://uche.ogbuji.net/tech/4suite/wsgixml/ [2] http://www.ibm.com/developerworks/library/wa-wsgi/ -- Uche Ogbuji Fourthought, Inc. http://uche.ogbuji.nethttp://fourthought.com http://copia.ogbuji.net http://4Suite.org Articles: http://uche.ogbuji.net/tech/publications/ -- http://mail.python.org/mailman/listinfo/python-list
Re: XML parsing and writing
c00i90wn wrote: > Nice package ElementTree is but sadly it doesn't have a pretty print, > well, guess I'll have to do it myself, if you have one already can you > please give it to me? thanks :) FWIW Amara and plain old 4Suite both support pretty-print, canonical XML print and more such options. http://uche.ogbuji.net/tech/4suite/amara/ http://4Suite.org -- Uche Ogbuji Fourthought, Inc. http://uche.ogbuji.nethttp://fourthought.com http://copia.ogbuji.net http://4Suite.org Articles: http://uche.ogbuji.net/tech/publications/ -- http://mail.python.org/mailman/listinfo/python-list
Re: Need help in xml
Kirt wrote: > i have two xml documemts of type > > > test > 2006-12-12 > 12:12:12 > > /home/ > > test2 > 12:12:12 > > > > /home/test > > test3 > 12:12:12 > > > > > i have to compare 2 similar xml document and get the add, changed and > deleted files.and write it into acd.xml file. > can u help me with the python code for this. I am using SAX. Use the right tool and such problems tend to become much simpler. http://www.logilab.org/projects/xmldiff -- Uche Ogbuji Fourthought, Inc. http://uche.ogbuji.nethttp://fourthought.com http://copia.ogbuji.net http://4Suite.org Articles: http://uche.ogbuji.net/tech/publications/ -- http://mail.python.org/mailman/listinfo/python-list
Re: Amara: Where's my attribute?
AdSR wrote: > [EMAIL PROTECTED] wrote: > > What is the actual problem you're trying to solve? If you just want to > > force a namespace declaration in output (this is sually to support > > QNames in content) the most well-known XML hack is to create a dummy > > attribute with the needed prefix and namespace. But this does not work > > when you're trying to force a default namespace declaration. Then > > again, you generally can't use QNames in content with a default > > namespace declaration. So my guess is that you somehow got way off the > > rails in your problem-solving, and you'll need to provide mre > > background if you want help. > > I wanted to remove documentation elements from some XML Schema files. > The problem showed when I tried to use the stripped schemas, because > the namespace declaration for user-defined types was missing. Of > course, since these types are named and referred to in attribute > *values*, Amara had no way to know that the namespace declaration was > still needed (didn't matter if default or non-default). This is more a > problem of how XML Schema is defined against XML namespace rules, since > XML Schena uses namespaces in a context of which XML parsers aren't > normally aware. Yeah. Just so you know. This is one of those things about XML that make sane people want to dye their eyeballs red. Unfortunately there isn't much recourse but to switch to namespace qualified form for your QNames and adding dummy attributes so the namespace is recognized. Let me know if you need an example. > > BTW, I recommend upgrading to Amara 1.1.7. That branch will soon be > > 1.2, and I consider it more mature than 1.0 at this point. The API's > > also easier: > > I know, especially the insert-before/after feature :) But I ran into a > problem that I describe below and you advertised 1.0 as "stable > version", so I switched immediately. > > The problem can be reproduced like this: > > >>> import amara > >>> amara.parse('http://www.w3.org/2001/XMLSchema.xsd') > START DTD xs:schema -//W3C//DTD XMLSCHEMA 200102//EN XMLSchema.dtd > http://www.w3.org/2001/datatypes.dtd:99:23: Attribute 'id' already > declared > http://www.w3.org/2001/datatypes.dtd:122:23: Attribute 'id' already > declared > http://www.w3.org/2001/datatypes.dtd:130:27: Attribute 'id' already > declared > ...some 40 more lines like this and then Python crashes (Windows shows > the bug-reporting dialog) I don't get a crash on my system (Ubuntu), but I do get a legitimate error message because that DTD is broken. The W3C seems to like disseminating broken DTDs. Just yesterday I was helping someone around the infamous broken XHTML 1.1 DTDs. I do want to know why you're gettign a crash rather than just the error message. What version of Python is that? Any chance you can try with current CVS Amara (you can use easy_install)? This part of the discussion should perhaps move to the 4Suite mailing list. I only check this NG once a week. -- Uche Ogbuji Fourthought, Inc. http://uche.ogbuji.nethttp://fourthought.com http://copia.ogbuji.net http://4Suite.org Articles: http://uche.ogbuji.net/tech/publications/ -- http://mail.python.org/mailman/listinfo/python-list
Re: Amara: Where's my attribute?
AdSR wrote: > Hi, > > I'm having a problem with the Amara toolkit. Try this: > > >>> from amara import binderytools > >>> raw = 'http://example.com/namespace"; > >>> xmlns:pq="http://pq.com/ns2"/>' > >>> rwd = binderytools.bind_string(raw) > >>> print rwd.xml() > > http://pq.com/ns2"/> > > What happened to the xmlns attribute? Does anyone know a solution to > this? The only workaround I found is to: > > >>> rwd.test.xml_set_attribute(u'xmlns', u'http://example.com/namespace') > u'xmlns' > >>> print rwd.xml() > > http://pq.com/ns2"; > xmlns="http://example.com/namespace"/> > > but it only helps if you know what to patch. > > My setup: > > Python 2.4.3 > 4Suite 1.0b3 > Amara 1.0 > > I see that people have reported similar problems with other XML > toolkits, so I guess this is a general namespace ugliness. What is the actual problem you're trying to solve? If you just want to force a namespace declaration in output (this is sually to support QNames in content) the most well-known XML hack is to create a dummy attribute with the needed prefix and namespace. But this does not work when you're trying to force a default namespace declaration. Then again, you generally can't use QNames in content with a default namespace declaration. So my guess is that you somehow got way off the rails in your problem-solving, and you'll need to provide mre background if you want help. BTW, I recommend upgrading to Amara 1.1.7. That branch will soon be 1.2, and I consider it more mature than 1.0 at this point. The API's also easier: >>> import amara >>> rwd = amara.parse('http://example.com/namespace"; >>> xmlns:pq="http://pq.com/ns2"/>') -- Uche Ogbuji Fourthought, Inc. http://uche.ogbuji.nethttp://fourthought.com http://copia.ogbuji.net http://4Suite.org Articles: http://uche.ogbuji.net/tech/publications/ -- http://mail.python.org/mailman/listinfo/python-list
Re: beautifulsoup .vs tidy
bruce wrote: > hi paddy... > > that's exactly what i'm trying to accomplish... i've used tidy, but it seems > to still generate warnings... > > initFile -> tidy ->cleanFile -> perl app (using xpath/livxml) > > the xpath/linxml functions in the perl app complain regarding the file. my > thought is that tidy isn't cleaning enough, or that the perl xpath/libxml > functions are too strict! > > which is why i decided to see if anyone on the python side has > experienced/solved this problem.. FWIW here's my usual approach: http://copia.ogbuji.net/blog/2005-07-22/Beyond_HTM Personally, I avoid Tidy. I've too often seen it crash or hang on really bad HTML. TagSoup seems to be built like a tank. I've also never seen BeautifulSoup choke, but I don't use it as much as TagSoup. -- Uche Ogbuji Fourthought, Inc. http://uche.ogbuji.nethttp://fourthought.com http://copia.ogbuji.net http://4Suite.org Articles: http://uche.ogbuji.net/tech/publications/ -- http://mail.python.org/mailman/listinfo/python-list
Re: xpath question
bruce wrote: > is there anyone with XPath expertise here? i'm trying to figure out if > there's a way to use regex expressions with an xpath query? i've seen > references to the ability to use regex and xpath/xml, but i'm not sure how > to do it... > > i have a situation where i have something like: > /html/table//[EMAIL PROTECTED]'foo'] > > is it possible to do soomething like [EMAIL PROTECTED]/fo/] so i'd match the > class > attribute with fo > > i'm trying to parse HTML/Web docs... 4Suite [1] supports regex in XPath using the EXSLT community standard's regex module [2]. It would be something like: [re:match(@class, 'fo.*'] With the re prefix set as required by the EXSLT module. [1] http://4Suite.org [2] http://www.exslt.org/regexp/ -- Uche Ogbuji Fourthought, Inc. http://uche.ogbuji.nethttp://fourthought.com http://copia.ogbuji.net http://4Suite.org Articles: http://uche.ogbuji.net/tech/publications/ -- http://mail.python.org/mailman/listinfo/python-list
Re: 10GB XML Blows out Memory, Suggestions?
K.S.Sreeram wrote: > Fredrik Lundh wrote: > > both ElementTree and cElementTree support "sax-style" event generation > > (through XMLTreeBuilder/XMLParser) and incremental parsing (through > > iterparse). the cElementTree versions of these are even faster than > > pyexpat. > > > > the iterparse interface is described here: > > > > http://effbot.org/zone/element-iterparse.htm > > > Thats cool! Thanks for the info! > > For a multi-gigabyte file, I would still recommend C/C++, because the > processing code which sits on top of the XML library needs to be Python, > and that could turn out to be a significant overhead in such extreme cases. > > Of course, the exact strategy to follow would depend on the specifics of > the case, and all this speculation may not really apply! :) Honestly, i think that legitimate use-cases for multi-gigabyte XML are very rare. Many people abuse XML as some sort of DBMS replacement. This abuse is part of the reason why so many developers are hostile to XML. XML is best for documents, and documents can get to the multi-gigabyte range, but rarely do. Usually, when they do, there is a logical way to decompose them, process them, and re-compose them, whereas with XML used as a DBMS replacement, relations and datatyping complicate such natural divide-and-conquer techniques. I always say that if you're dealing with gigabyte XML, it's well worth considering whether you're not using a hammer to screw in a bolt. If monster XML is inevitable, then I extend's Fredrik earlier mention of Amara to say that Pushdom allows you to pre-declare the chunks of XML you're interested in, and then it processes the XML in streaming mode, only instantiating the chunks of interest one at a time. This allows for handling of huge files with a very simple programming idiom. http://uche.ogbuji.net/tech/4suite/amara/ -- Uche Ogbuji Fourthought, Inc. http://uche.ogbuji.nethttp://fourthought.com http://copia.ogbuji.net http://4Suite.org Articles: http://uche.ogbuji.net/tech/publications/ -- http://mail.python.org/mailman/listinfo/python-list
Re: how to print newline in xml?
[EMAIL PROTECTED] wrote: > I use Python/XML packages are xml.dom.minidom and xml.dom.ext (second > just for PrettyPrint) You don't need xml.dom.ext for prettyprint. You can use doc.toprettyxml() I gather you want to tweak the prettyprinter to not add the newline before the comment. The only way to do this is to write your own printing logic, which is really not that hard, if you just start by copying the code from .writexml (used by .toprettyxml). But there's an even easier (if slower) way: pretty print the document, then parse it in again, remove the text node between the element in question and the following comment, and then use .writexml() to serialize it it again. A few general notes: * You cannot set the order of attributes in most XML tools, whether Python or not. This is unfortunate for people who would like to preserve such details for usability reasons, but that's just the way XML is. The closest you can get is by using canonicalization [1], which is available in PyXML as xml.dom.ext.c14n. It just so happens that canonical XML leaves the attributes in the order you want. You won't always be so lucky. * You can always create text nodes by using doc.createTextNode. * You can always remove text nodes (or any other kind) by using .removeChild * It's much easier to navigate if you use XPath. PyXML has an xml.xpath module you can use. Good luck. [1] http://www-128.ibm.com/developerworks/xml/library/x-c14n/ -- Uche Ogbuji Fourthought, Inc. http://uche.ogbuji.nethttp://fourthought.com http://copia.ogbuji.net http://4Suite.org Articles: http://uche.ogbuji.net/tech/publications/ -- http://mail.python.org/mailman/listinfo/python-list
Re: Announcing atomfeed.py, xmlelements.py, and feedutils.py
Steve R. Hastings wrote: > I have written some Python library modules to help with creating Atom > syndication feeds. Originally, I had a single module called "PyAtom"; now > I have split it up into three modules: xmlelements.py, atomfeed.py, and > feedutils.py. FWIW, see also Sylvain Hellegouarch's atomixlib [1]. It's used in production to generate and manage PlanetAtom [2][3]. [1] http://trac.defuze.org/browser/oss/atomixlib [2] http://planetatom.net/ [3] http://copia.ogbuji.net/blog/2006-01-25/Planet_Ato -- Uche Ogbuji Fourthought, Inc. http://uche.ogbuji.nethttp://fourthought.com http://copia.ogbuji.net http://4Suite.org Articles: http://uche.ogbuji.net/tech/publications/ -- http://mail.python.org/mailman/listinfo/python-list
Re: How to search XML? Are there special libs?
Ravi Teja wrote: > Yes! XPath is a good bet. > You can also try some Pythonic XML libraries like Amara. You need not > learn any special language even. > > There are good database approaches to XML too, especially if you are > going to query a document collection as a whole rather than file by > file. You can try XQuery. I think 4Suite can do this (But I am too > sleepy to confirm :-) ). You also use eXist (Java but you can use > XMLRPC or SOAP to interface with it from Python). Not optimal like > parent said, but if it is XML that have to live with ... 4Suite does not support XQuery. It does support full XPath plus EXSLT and enough other extensions to come close to the power of XQuery. Amara [1] makes it really easy to get XQuery-like power from right within Python, as I've blogged before (e.g. [2][3]). I don't know whether full-text indexing of XML is something the OP needs as well. If so, see [3]. [1] http://uche.ogbuji.net/tech/4Suite/amara/ [2] http://copia.ogbuji.net/blog/2005-06-12/Amara_equi [3] http://copia.ogbuji.net/blog/2005/Sep/20 [4] http://www.xml.com/pub/a/2004/12/08/py-xml.html -- Uche Ogbuji Fourthought, Inc. http://uche.ogbuji.nethttp://fourthought.com http://copia.ogbuji.net http://4Suite.org Articles: http://uche.ogbuji.net/tech/publications/ -- http://mail.python.org/mailman/listinfo/python-list
Re: Python version of XMLUnit?
Kent Johnson wrote: > I have found XMLUnit to be very helpful for testing Java and Jython code > that generates XML. At its heart XMLUnit is an XML-aware diff - it > parses expected and actual XML and pinpoints any differences. It is > smart enough to ignore things like attribute order, different quoting > and escaping styles, and insignificant whitespace. > > Now I am working on a CPython project and have a similar need. Is there > any comparable tool for Python? Basically I'm looking for a tool to > compare XML and show diffs in an intelligible fashion that is usable > from Python unit tests (using py.test, if it matters). One possible approach is to use c14n to in effect normalize the XML so that you can use regular text compare. This is not as sophisticated as a full XML diff, but it's definitely a viable approach for testing. For those who migh tbe interested in that approach, learn more about c14n here: http://www.ibm.com/developerworks/xml/library/x-c14n/ It includes a brief example using the c14n module in PyXML http://pyxml.sourceforge.net/ I also recently checked in c14n capability for 4Suite. It offers the same level of coverage as PyXML's, but operates in streaming, rather than DOM mode. http://4suite.org/ 4Suite also contains in its test suite routines (TreeCompare) for comparing XMl and HTML while ignoring non-significant syntactic variations. Certainly full xmldiff is very useful. One nice thing about LogiLabs's app is that it can output XUpdate, which could be used with, say 4Suite's 4XUpdate to apply a patch to another document. -- Uche Ogbuji Fourthought, Inc. http://uche.ogbuji.nethttp://fourthought.com http://copia.ogbuji.net http://4Suite.org Articles: http://uche.ogbuji.net/tech/publications/ -- http://mail.python.org/mailman/listinfo/python-list
Re: XSLT and gettext?
KW wrote: > I'm looking for a nice way to do i18n with XSLT, preferably using the > gettext framework. Currently I'm using 4Suite for XSLT processing. Do > you know of any solutions to this problem? > > If no solutions currently exist, I'll try to write something myself. Any > ideas on how to do this properly? Any existing python code to start with? > > I was thinking about wrappingg the text in a new XML tag, say and > processing this to generate an XSL for alle languages, but it will also > require printf like substitution to do this properly. 4Suite has some friendly gettext-based i18n extensions. See: http://copia.ogbuji.net/blog/2005-06-14/i18n_for_X -- Uche Ogbuji Fourthought, Inc. http://uche.ogbuji.nethttp://fourthought.com http://copia.ogbuji.net http://4Suite.org Articles: http://uche.ogbuji.net/tech/publications/ -- http://mail.python.org/mailman/listinfo/python-list
Re: only a simple xml reader value
[EMAIL PROTECTED] wrote: > The only thing I must read is the response I get from a EPP server. > A response like this: > > > http://www.eurid.eu/xml/epp/epp-1.0"; > xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"; > xmlns:contact="http://www.eurid.eu/xml/epp/contact-1.0"; > xmlns:domain="http://www.eurid.eu/xml/epp/domain-1.0"; > xmlns:eurid="http://www.eurid.eu/xml/epp/eurid-1.0"; > xmlns:nsgroup="http://www.eurid.eu/xml/epp/nsgroup-1.0"; > xsi:schemaLocation="http://www.eurid.eu/xml/epp/epp-1.0 epp-1.0.xsd > http://www.eurid.eu/xml/epp/contact-1.0 contact-1.0.xsd > http://www.eurid.eu/xml/epp/domain-1.0 domain-1.0.xsd > http://www.eurid.eu/xml/epp/eurid-1.0 eurid-1.0.xsd > http://www.eurid.eu/xml/epp/nsgroup-1.0 nsgroup-1.0.xsd"> > > > Command completed successfully; ending session > > > > c-and-a.eu > c-and-a_1 > 25651602 > 2005-11-08T14:51:08.929Z > > > > > So to get the msg, you can do: print doc.getElementsByTagName('msg')[0].toxml() But to get the domain:name you have to use the declared namespace: print doc.getElementsByTagNameNS('http://www.eurid.eu/xml/epp/domain-1.0', 'name')[0].toxml() Or you can make life a bit easier with Amara [1]: import amara doc = amara.parse(theXML) print doc.response.result.msg #to get the text content print doc.response.result.msg.xml() #to get the XML source for that element print doc.response.resData.appData.name print doc.response.resData.appData.name.xml() [1] http://uche.ogbuji.net/tech/4Suite/amara/ -- Uche Ogbuji Fourthought, Inc. http://uche.ogbuji.nethttp://fourthought.com http://copia.ogbuji.net http://4Suite.org Articles: http://uche.ogbuji.net/tech/publications/ -- http://mail.python.org/mailman/listinfo/python-list
Re: only a simple xml reader value
[EMAIL PROTECTED] wrote: > H!, > > Is it possible to get a value value ? > > When I do this: > - > theXML = """ > The Fascist Menace > """ > import xml.dom.minidom as dom > doc = dom.parseString(theXML) > print doc.getElementsByTagName('title')[0].toxml() > > I get : The Fascist Menace thats oke for me > - > > But the xmlfile I must read have other tags: > theXML = """ > The Fascist Menace > bla la etc > """ > > how to get that values ? > I try things like: > print doc.getElementsByTagName('title:id')[0].toxml() <--error Addressing your general question, unfortunately you're a bit stuck. Minidom is rather confused about whether or not it's a namespace aware library. Addressing your specific example, I strongly advise you not to use documents that are not well-formed according to Namespaces 1.0. Your second example is a well-formed XML 1.0 external parsed entity, but not a well-formed XML 1.0 document entity, because it has multiple elements at document level. It's also not well-formed according to XMLNS 1.0 unless you declare the "title" prefix. You will not be able to use a non XMLNS 1.0 document with most XML technologies, including XSLT, WXS, RELAX NG, etc. If you have indeed declared a namespace and are just giving us a very bad example, use: print doc.getElementsByTagNameNS(title_namespace, 'id') -- Uche Ogbuji Fourthought, Inc. http://uche.ogbuji.nethttp://fourthought.com http://copia.ogbuji.net http://4Suite.org Articles: http://uche.ogbuji.net/tech/publications/ -- http://mail.python.org/mailman/listinfo/python-list
Re: Fromatting an xml file
sir_alex wrote: > Hi! I have a little problem writing xml files formatted in a way like > the following: > > > bla > bla > > > Every new node element should have a tabulation before it, but when I > use xml.dom.minidom I use writexml, which considers as a new node also > the text (in my little example, "bla" phrases), so the best result I > achieved has been the following > > > > bla > > > > but I don't want the text to be written on newlines... is there a good > solution? Thanks! That minidom behavior is fairly unsafe. 4Suite's PrettyPrinter is much safer: >>> from Ft.Xml import Parse >>> from Ft.Xml.Domlette import PrettyPrint >>> XML = "blabla" >>> doc = Parse(XML) >>> PrettyPrint(doc) bla bla >>> http://4Suite.org -- Uche Ogbuji Fourthought, Inc. http://uche.ogbuji.nethttp://fourthought.com http://copia.ogbuji.net http://4Suite.org Articles: http://uche.ogbuji.net/tech/publications/ -- http://mail.python.org/mailman/listinfo/python-list
Re: XML Writer in wxPython
Tim Roberts wrote re using print to generate XML: > def PrintAddress( last, first, address, city, state, zip ): > print " " > print "%s" % last > print "%s" % first > print "%s" % address > print "%s" % city > print "%s" % state > print "%s" % zip > print " " > > print "" > for row in addressDatabase: > PrintAddress( row.last, row.first, >row.address, row.city, row.state, row.zip ) > print "" Just be sure you're well aware of all the issues: http://www.xml.com/pub/a/2002/11/13/py-xml.html See also: http://www.ibm.com/developerworks/xml/library/x-think35.html -- Uche Ogbuji Fourthought, Inc. http://uche.ogbuji.nethttp://fourthought.com http://copia.ogbuji.net http://4Suite.org Articles: http://uche.ogbuji.net/tech/publications/ -- http://mail.python.org/mailman/listinfo/python-list
Re: Large XML Document Processing
Albert Leibbrandt wrote: > Hi > > Just want to check which xml parser you guys have found to be the > quickest. I have xml documents with 250 000 records or more and the > processing of these documents are taking way to long. The validation is > the main problem. Any module names, non validating would be find to, > would help a lot. It would help us help you if you posted samples of the target docs. XML processing strategy often depends on the structure of the XML, just as relational query optimization strategy often depends on the schema. In general SAX or iterative tree-callback methods will give you the best speed. Fredrik already mentioned ElementTree's IterParse. Amara's pushbind and pushdom and 4Suite's Saxlette (which has some neat callback features) are other options. http://uche.ogbuji.net/tech/4suite/amara/ http://4suite.org/docs/CoreManual.xml#saxlette -- Uche Ogbuji Fourthought, Inc. http://uche.ogbuji.nethttp://fourthought.com http://copia.ogbuji.net http://4Suite.org Articles: http://uche.ogbuji.net/tech/publications/ -- http://mail.python.org/mailman/listinfo/python-list
Re: XML Writer in wxPython
Tim Roberts wrote re using print to generate XML: > def PrintAddress( last, first, address, city, state, zip ): > print " " > print "%s" % last > print "%s" % first > print "%s" % address > print "%s" % city > print "%s" % state > print "%s" % zip > print " " > > print "" > for row in addressDatabase: > PrintAddress( row.last, row.first, >row.address, row.city, row.state, row.zip ) > print "" Just be sure you're well aware of all the issues: http://www.xml.com/pub/a/2002/11/13/py-xml.html See also: http://www.ibm.com/developerworks/xml/library/x-think35.html -- Uche Ogbuji Fourthought, Inc. http://uche.ogbuji.nethttp://fourthought.com http://copia.ogbuji.net http://4Suite.org Articles: http://uche.ogbuji.net/tech/publications/ -- http://mail.python.org/mailman/listinfo/python-list
Re: XML SAX parser bug?
[EMAIL PROTECTED] wrote: > Fredrik Lundh schreef: > > [EMAIL PROTECTED] wrote: > > > I think I ran into a bug in the XML SAX parser. > > > > > > part of my program consist of reading a rather large XML file (about > > > 10Mb) containing a few thousand elements. > > > I have the following problem. Sometimes that SAX parses misreads a > > > line. > > > > it's not a bug; the parser is free to split up character runs (due to > > buffering, > > entities or character references, etc). it's up to you to merge character > > runs > > into strings. > > but how do I detect that the parser has split up the characters? I gues > I need to detect it in order to reconstruct the complete string Here's a recipe: http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/265881 Using this filter you can then write SAX code that assumes normalized text events. Also, 4Suite's SAX implementation, Saxlette, automatically does this text event merging for you at C speed: http://4suite.org/docs/CoreManual.xml#saxlette -- Uche Ogbuji Fourthought, Inc. http://uche.ogbuji.nethttp://fourthought.com http://copia.ogbuji.net http://4Suite.org Articles: http://uche.ogbuji.net/tech/publications/ -- http://mail.python.org/mailman/listinfo/python-list
Re: Rss/xml namespaces sgmllib, sax, minidom
Sakcee wrote: > I want to build a simple validator for rss2 feeds, that checks basic > structure and reports channels , items , and their attributes etc. > > I have been reading Mark Pilgrims articles on xml.com, diveintopython > and someother stuff on sgmllib, sax.handlers and content handlers, > xml.dom.minidom > > why is all of this necessary, what is the difference between all these > libraries, it seems to me that I can parse the rss2 feed with any of > these libraries.!! ? > > what is the difference between namespaces and non-namspaces functions > in sax.handlers.contenthandler , is the namespace defined like domain > names on some website? Based on this question, I tend to think you might want to leave the XML processing to someone else's code. How about using Pilgrim's feedparser? http://feedparser.org/ -- Uche Ogbuji Fourthought, Inc. http://uche.ogbuji.nethttp://fourthought.com http://copia.ogbuji.net http://4Suite.org Articles: http://uche.ogbuji.net/tech/publications/ -- http://mail.python.org/mailman/listinfo/python-list
Re: Amara (XML) problem on Solaris
Doru-Catalin Togea wrote: > import amara > > doc = amara.create_document() > doc.xml_append(doc.xml_create_element(u"units")) > > print "OK" > > On Windows XP Pro it runs like this: > > C:\owera\test\xaps2-test>python amara-test1.py > OK > > C:\owera\test\xaps2-test> > > On Solaris it runs like this: > > bash-2.03$ python amara-test1.py > Traceback (most recent call last): >File "amara-test1.py", line 3, in ? > doc = amara.create_document() > AttributeError: 'module' object has no attribute 'create_document' > bash-2.03$ This came up when I was on vacation and incommunicado. What version of Amara are you using on both platforms? How did you install them? Thanks. -- Uche Ogbuji Fourthought, Inc. http://uche.ogbuji.nethttp://fourthought.com http://copia.ogbuji.net http://4Suite.org Articles: http://uche.ogbuji.net/tech/publications/ -- http://mail.python.org/mailman/listinfo/python-list
Re: Help designing reading/writing a xml-fileformat
Jacob Kroon wrote: > I'm writing a block-diagram editor, and could use some tips about > writing/reading > diagrams to/from an xml file format. The basic layout of my code : > > class Diagram { > Blocks blocks[] > } > > class Block { > int x, y > } > > class Square(Block) { > int width, height > } > > class Circle(Block) { > int radius > } > > I'd like to be able to output something similar to this: > > > >
Re: Using XML w/ Python...
Rick, thanks. Based on your clue I checked, and it seems those Amara packages are not being built rightly. I'll look to get those packages fixed and updated tomorrow. -- Uche Ogbuji Fourthought, Inc. http://uche.ogbuji.nethttp://fourthought.com http://copia.ogbuji.net http://4Suite.org Articles: http://uche.ogbuji.net/tech/publications/ -- http://mail.python.org/mailman/listinfo/python-list
Re: Using XML w/ Python...
""" But anyway, i get this... >>> import amara >>>from amara import domtools >>> print domtools.py Traceback (most recent call last): File "", line 1, in ? NameError: name 'domtools' is not defined """ Sheesh! That right after waking up. And it shows :-) Should have been "print domtools.__file__" -- Uche Ogbuji Fourthought, Inc. http://uche.ogbuji.nethttp://fourthought.com http://copia.ogbuji.net http://4Suite.org Articles: http://uche.ogbuji.net/tech/publications/ -- http://mail.python.org/mailman/listinfo/python-list
Re: Using XML w/ Python...
""" Not wanting to hijack this thread, but it got me interested in installing amara. I downloaded Amara-allinone-1.0.win32-py2.4.exe and ran it. It professed that the installation directory was to be D:\Python24\Lib\site-packages\ ... but it placed FT and amara in D: \Python24\Python24\Lib\site-packages . Possibly the installer is part of the problem here? """ That's really good to know. Someone else builds the Windows installer package for Amara (I'm a near Windows illiterate), but I definitely want to help be sure the installer works properly. In fact, your message rings a bell that this specifically came up before: http://lists.fourthought.com/pipermail/4suite/2005-November/007610.html I'll have to ask some of the Windows gurus on the 4Suite list whether they know why this might be. Do you mind if I cc you on those messages, so that you can perhaps try out any solutions we come up with? Thanks. -- Uche Ogbuji Fourthought, Inc. http://uche.ogbuji.nethttp://fourthought.com http://copia.ogbuji.net http://4Suite.org Articles: http://uche.ogbuji.net/tech/publications/ -- http://mail.python.org/mailman/listinfo/python-list
Re: Using XML w/ Python...
""" >>> import amara >>> print dir(amara) ['__builtins__', '__doc__', '__file__', '__name__', '__path__', '__version__', 'binderytools', 'os', 'parse'] """ So it's not able to load domtools. What do you get trying from amara import domtools print domtools.py -- Uche Ogbuji Fourthought, Inc. http://uche.ogbuji.nethttp://fourthought.com http://copia.ogbuji.net http://4Suite.org Articles: http://uche.ogbuji.net/tech/publications/ -- http://mail.python.org/mailman/listinfo/python-list
Re: Using XML w/ Python...
""" Spoke too soon, i get this error when running amara in ActivePython >>> import amara >>> amara.parse("http://www.digg.com/rss/index.xml";) Traceback (most recent call last): File "", line 1, in ? File "C:\Python23\Lib\site-packages\amara\__init__.py", line 50, in parse if IsXml(source): NameError: global name 'IsXml' is not defined So im guessing theres an error with one of the files... """ IsXml is imported conditionally, so this is an indicator that somethign about your module setup is still not agreeing with ActivePython. What do you see as the output of: python -c "import amara; print dir(amara)" ? I get: ['InputSource', 'IsXml', 'Uri', 'Uuid', '__builtins__', '__doc__', '__file__', '__name__', '__path__', '__version__', 'bindery', 'binderytools', 'binderyxpath', 'create_document', 'dateutil_standins', 'domtools', 'os', 'parse', 'pushbind', 'pushdom', 'pyxml_standins', 'saxtools'] -- Uche Ogbuji Fourthought, Inc. http://uche.ogbuji.nethttp://fourthought.com http://copia.ogbuji.net http://4Suite.org Articles: http://uche.ogbuji.net/tech/publications/ -- http://mail.python.org/mailman/listinfo/python-list
Re: Using XML w/ Python...
""" No, when i said "As far as it should work since their both transparent, umm, well its not." I meant that only mine isnt, maybe urs is but for some reason it isnt. And you said amara works fine for you, ok, then could you tell me what package to install... I have installed Amara 1.1.6 for Python 2.4 and it works on python 2.4 only. Now, which package should i download for it to work on any python prompt: Allinone Standalone Or something else """ I've never used ActivePython. I don't know of any special gotchas for it. But Amara works in Python 2.3 or 2.4. The only differences between the Allinone and standalone packages is that Allinone includes 4Suite. Do get at least version 1.1.6. If you're still having trouble with the ActivePython setup, the first thing I'd ask is how you installed Amara. DId you run a WIndows installer? Next I'd check the library path for ActivePython. What is the output of python -c "import sys; print sys.path" Where you replace "python" abpve with whatever way you invoke ActivePython. -- Uche Ogbuji Fourthought, Inc. http://uche.ogbuji.nethttp://fourthought.com http://copia.ogbuji.net http://4Suite.org Articles: http://uche.ogbuji.net/tech/publications/ -- http://mail.python.org/mailman/listinfo/python-list
Re: Using XML w/ Python...
""" Ok, i am now understanding some of parseing and how to use it and nodes, things like that. But say i wanted to take the title of http://www.digg.com/rss/index.xml and XMLTramp seemed the most simple to understand. would the path be something like this? import xmltramp rssDigg = xmltramp.load("http://www.digg.com/rss/index.xml";) print note.rss.channel.item.title I think thats wat im having the most confusion on now, is how to direct to the path that i want... """ I suggest you read at least the front page information for the tools you are using. It's quite clear from the xmltramp Web site ( http://www.aaronsw.com/2002/xmltramp/ ) that you want tomething like (untested: the least homework you can do is to refine the example yourself): print rssDigg[rss.channel][item][title] BTW, in Amara, the API is pretty much exactly what you guessed: >>> import amara >>> rssDigg = amara.parse("http://www.digg.com/rss/index.xml";) >>> print rssDigg.rss.channel.item.title Video: Conan O'Brien iPod Ad Parody -- Uche Ogbuji Fourthought, Inc. http://uche.ogbuji.nethttp://fourthought.com http://copia.ogbuji.net http://4Suite.org Articles: http://uche.ogbuji.net/tech/publications/ -- http://mail.python.org/mailman/listinfo/python-list
Re: Using XML w/ Python...
Jay: """ K, I have this XML doc, i dont know much about XML, but what i want to do is take certain parts of the XML doc, such as blah and take just that and put onto a text doc. Then same thing doe the part. Thats about it, i checked out some of the xml modules but dont understand how to use them. Dont get parsing, so if you could please explain working with XML and python to me. """ Someone already mentioned http://www.oreillynet.com/pub/wlg/6225 I do want to update that Amara API. As of recent releases it's as simple as import amara doc = amara.parse("foo.opml") for url in doc.xpath("//@xmlUrl"): print url.value Besides the XPath option, Amara [1] provides Python API options for unknown elements, such as node.xml_child_elements node.xml_attributes This is all covered with plenty of examples in the manual [2] [1] http://uche.ogbuji.net/tech/4suite/amara/ [2] http://uche.ogbuji.net/uche.ogbuji.net/tech/4suite/amara/manual-dev -- Uche Ogbuji Fourthought, Inc. http://uche.ogbuji.nethttp://fourthought.com http://copia.ogbuji.net http://4Suite.org Articles: http://uche.ogbuji.net/tech/publications/ -- http://mail.python.org/mailman/listinfo/python-list
Re: Creating referenceable objects from XML
Michael Williams wrote: > Hi All, > I'm looking for a quality Python XML implementation. All of the DOM > and SAX implementations I've come across so far are rather > convoluted. Are there any quality implementations that will (after > parsing the XML) return an object that is accessible by name? Such as > the following: > xml = """ > >MyBook >the author > > """ > And after parsing the XML allow me to access it as so: > book.title > I need it to somehow convert my XML to intuitively referenceable > object. Any ideas? I could even do it myself if I knew the > mechanism by which python classes do this (create variables on the fly). Looks as if MIchael is working with Amara now, but I did want to note for the record that APIs that allow one to access a node in the "book.title" fashion are what I call Python data bindings. Python data bindings I usually point out are: Amara Bindery: http://www.xml.com/pub/a/2005/01/19/amara.html Gnosis: http://www.xml.com/pub/a/2003/07/02/py-xml.html generateDS: http://www.xml.com/pub/a/2003/06/11/py-xml.html Based on updates to EaseXML in response to my article another entry might be: EaseXML: http://www.xml.com/pub/a/2005/07/27/py-xml.html ElementTree ( http://www.xml.com/pub/a/2003/02/12/py-xml.html ) is a Python InfoSet rather than a Python data binding. You access nodes using generic names related to the node type rather than the node name. Whether data bindings or Infosets are your preference is a matter of taste, but it's a useful distinction to make between the approaches. It looks as if Gerald Flanagan has constructed a little specialized binding tool on top of ElementTree, and that's one possible hybrid approach. xmltramp ( http://www.aaronsw.com/2002/xmltramp/ ) is another interesting hybrid. -- Uche Ogbuji Fourthought, Inc. http://uche.ogbuji.nethttp://fourthought.com http://copia.ogbuji.net http://4Suite.org Articles: http://uche.ogbuji.net/tech/publications/ -- http://mail.python.org/mailman/listinfo/python-list
Re: XML and namespaces
Wilfredo Sánchez Vega: """ I'm having some issues around namespace handling with XML: >>> document = xml.dom.minidom.Document() >>> element = document.createElementNS("DAV:", "href") >>> document.appendChild(element) >>> document.toxml() '\n' Note that the namespace wasn't emitted. If I have PyXML, xml.dom.ext.Print does emit the namespace: >>> xml.dom.ext.Print(document) Is that a limitation in toxml(), or is there an option to make it include namespaces? """ Getting back to the OP: PyXML's xml.dom.ext.Print does get things right, and based on discussion in this thread, the only way you can serialize correctly is to use that add-on with minidom, or to use a third party, properly Namespaces-aware tool such as 4Suite (there are others as well). Good luck. -- Uche Ogbuji Fourthought, Inc. http://uche.ogbuji.nethttp://fourthought.com http://copia.ogbuji.net http://4Suite.org Articles: http://uche.ogbuji.net/tech/publications/ -- http://mail.python.org/mailman/listinfo/python-list
Re: XML and namespaces
Alan Kennedy """ Although I am sympathetic to your bewilderment: xml namespaces can be overly complex when it comes to the nitty, gritty details. """ You're the one who doesn't seem to clearly understand XML namespaces. It's your position that is bewildering, not XML namespaces (well, they are confusing, but I have a good handle on all the nuances by now). Again, no skin off my back here: I write and use tools that are XML namespaces compliant. It doesn't hurt me that Minidom is not. I was hoping to help, but again I don't have time for ths argument. -- Uche Ogbuji Fourthought, Inc. http://uche.ogbuji.nethttp://fourthought.com http://copia.ogbuji.net http://4Suite.org Articles: http://uche.ogbuji.net/tech/publications/ -- http://mail.python.org/mailman/listinfo/python-list
Re: XML and namespaces
I wrote: """ The reality is that once the poor user has done: element = document.createElementNS("DAV:", "href") They are following DOM specification that they have created an element in a namespace, and you seem to be arguing that they cannot usefully have completed their work until they also do: element.setAttributeNS(xml.dom.XMLNS_NAMESPACE, None, "DAV:") I'd love to hear how many actual minidom users would agree with you. """ Of course (FWIW) I meant element.setAttributeNS(xml.dom.XMLNS_NAMESPACE, "xmlns", "DAV:") -- Uche Ogbuji Fourthought, Inc. http://uche.ogbuji.nethttp://fourthought.com http://copia.ogbuji.net http://4Suite.org Articles: http://uche.ogbuji.net/tech/publications/ -- http://mail.python.org/mailman/listinfo/python-list
Re: XML and namespaces
Alan Kennedy: """ These namespace declaration nodes, i.e. attribute nodes in the xml.dom.XMLNS_NAMESPACE namespace, are a pre-requisite for any namespaced DOM document to be well-formed, and thus naively serializable. The argument could be made that application authors should be protected from themselves by having the underlying DOM library automatically create the relevant namespace nodes. But to me that's not pythonic: it's implicit, not explicit. My vote is that the existing xml.dom.minidom behaviour wrt namespace nodes is correct and should not be changed. """ Andrew Clover also suggested an overly-legalistic argument that current minidom behavior is not a bug. It's a very strange attitude that because a behavior is not specifically proscribed in a spec, that it is not a bug. Let me try a reducto ad absurdum, which I think in this case is a very fair stratagem. If the code in question: >>> document = xml.dom.minidom.Document() >>> element = document.createElementNS("DAV:", "href") >>> document.appendChild(element) >>> document.toxml() '\n' (i.e. "ferh" rather than "href"), would you not consider that a minidom bug? Now consider that DOM Level 2 does not proscribe such mangling. Do you still think that's a useful way to determine what is a bug? The current, erroneous behavior, which you advocate, is of the same bug. Minidom is an XML Namespaces aware API. In XML Namespaces, the namespace URI is *part of* the name. No question about it. In Clark notation the element name that is specified in element = document.createElementNS("DAV:", "href") is "{DAV:}href". In Clark notation the element name of the document element in the created docuent is "href". That is not the name the user specified. It is a mangled version of it. The mangling is no better than my reductio of reversing the qname. This is a bug. Simple as that. WIth this behavior, minidom is an API correct with respect to XML Namespaces. So you try the tack of invoking "pythonicness". Well I have one for ya: "In the face of ambiguity, refuse the temptation to guess." You re guessing that explicit XMLNS attributes are the only way the user means to express namespace information, even though DOM allows this to be provided through such attributes *or* through namespace properties. I could easily argue that since these are core properties in the DOM, that DOM should ignore explicit XMLNS attributes and only use namespace properties in determining output namespace. You are guessing that XMLNS attributes (and only those) represent what the user really means. I would be arguing the same of namespace properties. The reality is that once the poor user has done: element = document.createElementNS("DAV:", "href") They are following DOM specification that they have created an element in a namespace, and you seem to be arguing that they cannot usefully have completed their work until they also do: element.setAttributeNS(xml.dom.XMLNS_NAMESPACE, None, "DAV:") I'd love to hear how many actual minidom users would agree with you. It's currently a bug. It needs to be fixed. However, I have no time for this bewildering fight. If the consensus is to leave minidom the way it is, I'll just wash my hands of the matter, but I'll be sure to emphasize heavily to users that minidom is broken with respect to Namespaces and serialization, and that they abandon it in favor of third-party tools. -- Uche Ogbuji Fourthought, Inc. http://uche.ogbuji.nethttp://fourthought.com http://copia.ogbuji.net http://4Suite.org Articles: http://uche.ogbuji.net/tech/publications/ -- http://mail.python.org/mailman/listinfo/python-list
Re: XML and namespaces
Alan Kennedy: """ > Oh no. That only means that namespace declaration attributes are not > created in the DOM data structure. However, output has to fix up > namespaces in .namespaceURI properties as well as directly asserted > "xmlns" attributes. It would be silly for DOM to produce malformed > XML+XMLNS, and of course it is not meant to. The minidom behavior > needs fixing, badly. My interpretation of namespace nodes is that the application is responsible for creating whatever namespace declaration attribute nodes are required, on the DOM tree. DOM should not have to imply any attributes on output. """ I'm sorry but you're wrong on this. First of all, DOM L2 (the level minidom targets) does not have the concept of "namespace nodes". That's XPath. DOM supports two ways of expressing namespace information. The first way is through the node properties .namespaceURI, .prefix (for the QName) and .localName. It *also* supports literal namespace declaration atrributes (the NSDecl attributes themselves must have a namespace of "http://www.w3.org/2000/xmlns/";). As if this is not confusing enough the Level 1 propoerty .nodeName must provide the QName, redundantly. As a result, you have to perform fix-up to merge properties with explicit NSDEcl attributes in order to serialize. If it does not do so, it is losing all the information in namespace properties, and the resulting output is not the same document that is represented in the DOM. Believe me, I've spent many weary hours with all these issues, and implemented code to deal with the mess multiple times, and I know it all too painfully well. I wrote Amara largely because I got irrecoverably sick of DOM's idiosyncracies. Andrew, for this reason I probably take the initiative to work up a patch for the issue. I'll do what I can to get to it tomorrow. If you help me with code review and maybe writing some tests, that would be a huge help. -- Uche Ogbuji Fourthought, Inc. http://uche.ogbuji.nethttp://fourthought.com http://copia.ogbuji.net http://4Suite.org Articles: http://uche.ogbuji.net/tech/publications/ -- http://mail.python.org/mailman/listinfo/python-list
Re: XML and namespaces
Quoting Andrew Kuchling: """ > >>> element = document.createElementNS("DAV:", "href") This call is incorrect; the signature is createElementNS(namespaceURI, qualifiedName). """ Not at all, Andrew. "href" is a valid qname, as is "foo:href". The prefix is optional in a QName. Here is the correct behavior, taken from a non-broken DOM library (4Suite's Domlette) >>> from Ft.Xml import Domlette >>> document = Domlette.implementation.createDocument(None, None, None) >>> element = document.createElementNS("DAV:", "href") >>> document.appendChild(element) >>> Domlette.Print(document) >>> """ If you call .createElementNS('whatever', 'DAV:href'), the output is the expected: """ Oh, no. That is not at all expected. The output should be: """ It doesn't look like there's any code in minidom that will automatically create an 'xmlns:DAV="whatever"' attribute for you. Is this automatic creation an expected behaviour? """ Of course. Minidom implements level 2 (thus the "NS" at the end of the method name), which means that its APIs should all be namespace aware. The bug is that writexml() and thus toxml() are not so. """ (I assume not. Section 1.3.3 of the DOM Level 3 says "Similarly, creating a node with a namespace prefix and namespace URI, or changing the namespace prefix of a node, does not result in any addition, removal, or modification of any special attributes for declaring the appropriate XML namespaces." So the DOM can create XML documents that aren't well-formed w.r.t. namespaces, I think.) """ Oh no. That only means that namespace declaration attributes are not created in the DOM data structure. However, output has to fix up namespaces in .namespaceURI properties as well as directly asserted "xmlns" attributes. It would be silly for DOM to produce malformed XML+XMLNS, and of course it is not meant to. The minidom behavior needs fixing, badly. -- Uche Ogbuji Fourthought, Inc. http://uche.ogbuji.nethttp://fourthought.com http://copia.ogbuji.net http://4Suite.org Articles: http://uche.ogbuji.net/tech/publications/ -- http://mail.python.org/mailman/listinfo/python-list
Re: XML and namespaces
Wilfredo Sánchez Vega: """ I'm having some issues around namespace handling with XML: >>> document = xml.dom.minidom.Document() >>> element = document.createElementNS("DAV:", "href") >>> document.appendChild(element) >>> document.toxml() '\n' """ I haven't worked with minidom in just about forever, but from what I can tell this is a serious bug (or at least an appalling mising feature). I can't find anything in the Element,writexml() method that deals with namespaces. But I'm just baffled. Is there really any way such a bug could have gone so long unnoticed in Python and PyXML? I searched both trackers, and the closest thing I could find was this from 2002: http://sourceforge.net/tracker/index.php?func=detail&aid=637355&group_id=6473&atid=106473 Different symptom, but also looks like a case of namespace ignorant code. Can anyone who's worked on minidom more recently let me know if I'm just blind to something? -- Uche Ogbuji Fourthought, Inc. http://uche.ogbuji.nethttp://fourthought.com http://copia.ogbuji.net http://4Suite.org Articles: http://uche.ogbuji.net/tech/publications/ -- http://mail.python.org/mailman/listinfo/python-list
Re: xpath support in python 2.4
"And80": "Is [the xml.xpath module] still part of the standard library?" Alan Kennedy: "No, it's not. Not sure if it ever was. " It never was. -- Uche Ogbuji Fourthought, Inc. http://uche.ogbuji.nethttp://fourthought.com http://copia.ogbuji.net http://4Suite.org Articles: http://uche.ogbuji.net/tech/publications/ -- http://mail.python.org/mailman/listinfo/python-list
Re: XMLSchema Parsing
km wrote: > i'd like to know if there are any good XMLSchema (.xsd files) parsing modules > in python. > regards, Parse and do what? You can parse WXS (a.k.a. XSD) with any XML parser out there. Anyway, off-head, Python tools that handle WXS, to some extent: xsv libxml2/Python lxml generateDS.py Good luck. -- Uche Ogbuji Fourthought, Inc. http://uche.ogbuji.nethttp://fourthought.com http://copia.ogbuji.net http://4Suite.org Articles: http://uche.ogbuji.net/tech/publications/ -- http://mail.python.org/mailman/listinfo/python-list
Re: Move xml.sax.saxutils.*?
""" It seems like functions such as xml.sax.saxutils.escape and unescape are generally useful, and not at all tied to the xml.sax module. Would it make sense to move them somewhere else, like to xml? """ It would be useful to allow from xml import escape, unescape But as an alias, rather than a replacement for the current import. -- Uche Ogbuji Fourthought, Inc. http://uche.ogbuji.nethttp://fourthought.com http://copia.ogbuji.net http://4Suite.org Articles: http://uche.ogbuji.net/tech/publications/ -- http://mail.python.org/mailman/listinfo/python-list
Re: BisonGen parser generator. Newbie question
""" I'm trying to run the calculator example included with the "BisonGen" parser generator, but I've been unable to put it to work. When I compile the xml file "simple.bgen" with the script "BisonGen.bat", the only parser I get is a C file. I've heard BisonGen generates also a python file, which is, I believe, the one used imported by "test.py" to run the testing. """ Apologies for the late reply. Holidays and all that... Anyway, this is strange. You should get both C and .py file (and .java files if you're using a recent CVS version). Here is what I get: $BisonGen simple.bgen Generate parser simple.c Generate parser simple.java Generate constants simpleConstants.java Generate handler simpleHandler.java Generate handler DefaultsimpleHandler.java What do you get for output? BTW, if you want to try a recent CVS version, grab the snapshot: ftp://ftp.fourthought.com/pub/cvs-snapshots/BisonGen-CVS.tar.gz (.zip also available). Also, you might want to ask BGen questions on the 4Suite mailing list, where other BGen developers hang out. http://lists.fourthought.com/pipermail/4suite/ -- Uche Ogbuji Fourthought, Inc. http://uche.ogbuji.nethttp://fourthought.com http://copia.ogbuji.net http://4Suite.org Articles: http://uche.ogbuji.net/tech/publications/ -- http://mail.python.org/mailman/listinfo/python-list
Re: Writing big XML files where beginning depends on end.
""" Then we need something that allows parts of the XML file to be written to file and purged from RAM to avoid the memory problem. Suggestions for solutions are appreciated. """ Multiple XML files is not an option, but what about general entities or XInclude? That way you don't need to change your parsing code. Using 4Suite's MarkupWriter [1] you could write the outer shell and inner subtrees to separate streams, only filling in values for the outer stream when the inner stream is complete, and your computations are ready. You can then use the writer.xmlFragment method to stitch the inner subtrees to the outer shell. MarkupWriter operates in streaming mode, so you would not be holding much XML in memory at all. http://www.xml.com/pub/a/2005/04/20/py-xml.html -- Uche Ogbuji Fourthought, Inc. http://uche.ogbuji.nethttp://fourthought.com http://copia.ogbuji.net http://4Suite.org Articles: http://uche.ogbuji.net/tech/publications/ -- http://mail.python.org/mailman/listinfo/python-list
Re: Recommendation please: forming an XML toolkit
*SNIP Long list of possible criteria for chooseing an XML library* Even with all your personal considerations, there is no one "correct" answer for you. I can think of four or five packages that would meet all your criteria. You said something quite apt: "This question is a bit like the ones pertaining to 'Which web framework to use?', there is a lot of good stuff out there, and often it boils down to personnal preference, mind-fitting interface and such..." I use this comparison myself. People are used to the incredible diversity of Web application needs, but for some reason their imagination tends to flag a bit when it comes to acknowledging the similar diversity of XML processing needs. It's a big domain, and you won't find a universal, one-size-fits-all solution. That's why I surprise people by saying I don't have a problem with the fact that Python bundles at least 4 XML processing libraries, and that there are at least 30 viable third-party options. Anyway you go on to say: "BUT... to make it more precise I will give more context on the future projects involved... " I appreciate your effort, but I don't think you succeeded. With respect to Web frameworks, It's easy to some up with a list of even 20 criteria for Python Web frameworks and still wind up with 4-5 fitting options. Same thing for XML processing. You seem to have done a bit of homework with the packages. I'm sure you have initial impressions based on that. If you have specific outstanding questions, do ask. If not, I would just take a chance on whatever your present leaning may be. -- Uche Ogbuji Fourthought, Inc. http://uche.ogbuji.nethttp://fourthought.com http://copia.ogbuji.net http://4Suite.org Articles: http://uche.ogbuji.net/tech/publications/ -- http://mail.python.org/mailman/listinfo/python-list
Re: Add attribute using pyxml
"How do I add a new attribute to the existing xml Document tree??? " what do you mean by "using pyxml"? There are several pyxml modules. Do you mean minidom? If so that comes with stock Python as well (hint: element_node.setAttributeNS(ns, qname)). -- Uche Ogbuji Fourthought, Inc. http://uche.ogbuji.nethttp://fourthought.com http://copia.ogbuji.net http://4Suite.org Articles: http://uche.ogbuji.net/tech/publications/ -- http://mail.python.org/mailman/listinfo/python-list
Re: XML DOM: XML/XHTML inside a text node
""" In my program, I get input from the user and insert it into an XHTML document. Sometimes, this input will contain XHTML, but since I'm inserting it as a text node, xml.dom.minidom escapes the angle brackets ('<' becomes '<', '>' becomes '>'). I want to be able to override this behavior cleanly. I know I could pipe the input through a SAX parser and create nodes to insert into the tree, but that seems kind of messy. Is there a better way? """ Amara 1.1.6 supports inserting an XML fragment into a document or element object. Many short examples here: http://copia.ogbuji.net/blog/2005-09-21/Dare_s_XLI excerpt: Adding a element as a child of the element' contacts.xml_append_fragment('%s'%'206-555-0168' http://uche.ogbuji.net/tech/4suite/amara -- Uche Ogbuji Fourthought, Inc. http://uche.ogbuji.nethttp://fourthought.com http://copia.ogbuji.net http://4Suite.org Articles: http://uche.ogbuji.net/tech/publications/ -- http://mail.python.org/mailman/listinfo/python-list
Re: XML Tree Discovery (script, tool, __?)
"Neat, though non-trivial XSLT makes my head spin." Well, you don't have to know XSLT at all to use the Examplotron transform, although I can understand wanting to understand and hack what you're using. "Just for kicks, I rewrote in python Michael Kay's DTDGenerator (http://saxon.sourceforge.net/dtdgen.html), though as the original it has several limitations on the accuracy of the inferred DTD. " Ah. Cool. Got a link? -- Uche Ogbuji Fourthought, Inc. http://uche.ogbuji.nethttp://fourthought.com http://copia.ogbuji.net http://4Suite.org Articles: http://uche.ogbuji.net/tech/publications/ -- http://mail.python.org/mailman/listinfo/python-list
Re: XML Tree Discovery (script, tool, __?)
""" I was looking for something similar (XML to DTD inference) but I didn't find anything related in python. Trang (http://www.thaiopensource.com/relaxng/trang-manual.html#introduction), on the other hand seems impressive after a few non-trivial tests. It would be neat to have it ported in python, at least the inference part. """ If you're OK with RELAX NG rather than DTD as the schema output (probably a good idea if you're using namespaces), consider Examplotron, which I've used on many such production tasks. http://www-128.ibm.com/developerworks/xml/library/x-xmptron/ It's XSLT rather than Python, but the good news is that XSLT is easy to invoke from Python using tools such as 4Suite. http://uche.ogbuji.net/tech/akara/nodes/2003-01-01/python-xslt -- Uche Ogbuji Fourthought, Inc. http://uche.ogbuji.nethttp://fourthought.com http://copia.ogbuji.net http://4Suite.org Articles: http://uche.ogbuji.net/tech/publications/ -- http://mail.python.org/mailman/listinfo/python-list
Re: XML Tree Discovery (script, tool, __?)
> - don't use SAX unless your document is huge > - don't use DOM unless someone is putting a gun to your head What I say is: use what works for you. I think SAX would be fine for this task, but, hey, I personally would use Amara ( http://uche.ogbuji.net/tech/4suite/amara/ ), of course. The following does the trick: import sets import amara from amara import binderytools #element_skeleton_rule suppresses char data from the resulting binding #tree. If you have a large document and only care about element/attr #structure and not text, this saves a lot of memory rules = [binderytools.element_skeleton_rule()] #XML can be a file path, URI, string, or even an open-file-like object doc = amara.parse(XML, rules=rules) elems = {} for e in doc.xml_xpath('//*'): paths = elems.setdefault((e.namespaceURI, e.localName), sets.Set()) path = u'/'.join([n.nodeName for n in e.xml_xpath(u'ancestor::*')]) paths.add(u'/' + path) #Pretty-print output for name in elems: print name, '\n\t\t\t', '\n\t\t\t'.join(elems[name]) -- Uche Ogbuji Fourthought, Inc. http://uche.ogbuji.nethttp://fourthought.com http://copia.ogbuji.net http://4Suite.org Articles: http://uche.ogbuji.net/tech/publications/ -- http://mail.python.org/mailman/listinfo/python-list
Re: XML Tree Discovery (script, tool, __?)
""" The output I was contemplating was a DOM "DNA" - that is the DOM without the instances of the elements or their data, a bare tree, a prototype tree based on what is in the document (rather than what is legal to include in the document). Just enough data that for an arbitrary element I would know: 1) whether the element was in a document 2) where to find it (the chain of parents) """ This is easy to do in SAX. For some hints, see page 2 of my article: http://www.xml.com/pub/a/2004/11/24/py-xml.html -- Uche Ogbuji Fourthought, Inc. http://uche.ogbuji.nethttp://fourthought.com http://copia.ogbuji.net http://4Suite.org Articles: http://uche.ogbuji.net/tech/publications/ -- http://mail.python.org/mailman/listinfo/python-list
Re: XML Tree Discovery (script, tool, __?)
"inally diving into XML programmatically. Does anyone have a best practice recommendation for programmatically discovering the structure of an arbitrary XML document via Python?" You can do this with DOM or SAX, or any of the many more friendly XML processing libraries out there. You might want to be more specific. What sort of output do you want from this discovery? -- Uche Ogbuji Fourthought, Inc. http://uche.ogbuji.nethttp://fourthought.com http://copia.ogbuji.net http://4Suite.org Articles: http://uche.ogbuji.net/tech/publications/ -- http://mail.python.org/mailman/listinfo/python-list
Re: xml2schema
""" Er, do you mean to generate a Relax NG (or possibly a DTD in fact) from some XML file?? If you do mean this then just think of that how you could generate grammar from some paragraphs of English text... Sorta non-trivial, if possible at all, isn't it? :-) """ Very well put. However, for RELAX NG there is a tool that might work for the OP: Examplotron. See: http://www-128.ibm.com/developerworks/xml/library/x-xmptron/ As I show in that article, you can use Examplotron from any XSLT processor, including one invoked through Python API. -- Uche http://copia.ogbuji.net -- http://mail.python.org/mailman/listinfo/python-list
Re: xml2schema
""" Er, do you mean to generate a Relax NG (or possibly a DTD in fact) from some XML file?? If you do mean this then just think of that how you could generate grammar from some paragraphs of English text... Sorta non-trivial, if possible at all, isn't it? :-) """ Very well put. However, for RELAX NG there is a tool that might work for the OP: Examplotron. See: http://www-128.ibm.com/developerworks/xml/library/x-xmptron/ As I show in that article, you can use Examplotron from any XSLT processor, including one invoked through Python API. -- Uche http://copia.ogbuji.net -- http://mail.python.org/mailman/listinfo/python-list
Re: What XML lib to use?
""" I'm confused, I want to read/write XML files but I don't really understand what library to use. I've used DOM-based libraries in other languages, is PyXML the library to use? """ There are many options (some say too many): http://www.xml.com/pub/a/2004/10/13/py-xml.html Try out Amara Bindery, if you like: http://uche.ogbuji.net/tech/4suite/amara/ Browsing the manual should let you know whether you like the API: http://uche.ogbuji.net/tech/4suite/amara/manual BTW, lots on Python/XML processing covered in my column, including other options besides Amara: http://www.xml.com/pub/at/24 -- Uche http://copia.ogbuji.net -- http://mail.python.org/mailman/listinfo/python-list
Re: Limited XML tidy
> The problem is that when the sax handler raises an exception, I can't see how to find out why. What I want to do is for DodgyErrorHandler to do something different depending on where we are in the course of parsing. Is there anyway to get that information back from xml.sax (or indeed from any other sax handler?) You can get raw location information, yes. See: http://www.xml.com/pub/a/2004/11/24/py-xml.html But I don't think this is enough for you. You also need recovery, which you're implementing in crude form. I tend to agree with Magnus that using an SGML parser might be your best bet. You might even be able to turn that SGML into XML using a tool such as James Clark's SX: http://www.jclark.com/sp/sx.htm -- Uche http://copia.ogbuji.net -- http://mail.python.org/mailman/listinfo/python-list
Re: PyXML and xml.dom
> Is PyXML now part of the Python distribution, or is it still an add-on? Parts of PyXML have been migrated into Python core since Python 2.0, but there is still also a standalone PyXML package.,. See: http://www.xml.com/pub/a/2002/09/25/py.html -- Uche http://copia.ogbuji.net -- http://mail.python.org/mailman/listinfo/python-list
Re: [XML-SIG] xml-object mapping
On Thu, 2005-07-28 at 12:21 -0400, Tamas Hegedus wrote: > Hi! > > I am looking for an xml-object mapping tool ('XML Data Binding-design > time product') where I can define the mapping rules in 'binding files' > and the parser is generated automatically. > > Similar to the solution of Dave Kuhlman > (http://www.rexx.com/~dkuhlman/generateDS.html) where the mapping is > defined in an xml file (if I am understand well). > > But I already have the target object. The xml-tags should not be used as > a property/member name, but should be mapped to an existing object. > > (There are existing tools, but written in Java (I would prefer Python; I > am biologist not using Java for 5 years), like JiBX > (http://jibx.sourceforge.net), Castor (http://www.castor.org; "XML-based > mapping file to specify bindings for existing object models")) Answered: http://groups-beta.google.com/group/comp.lang.python/browse_thread/thread/a63d0ad3fd23cb37/6ad0223c5b8f9946?lnk=st&q=python+xml&rnum=3&hl=en -- Uche Ogbuji Fourthought, Inc. http://uche.ogbuji.nethttp://fourthought.com http://copia.ogbuji.net http://4Suite.org Use CSS to display XML, part 2 - http://www-128.ibm.com/developerworks/edu/x-dw-x-xmlcss2-i.html XML Output with 4Suite & Amara - http://www.xml.com/pub/a/2005/04/20/py-xml.html Use XSLT to prepare XML for import into OpenOffice Calc - http://www.ibm.com/developerworks/xml/library/x-oocalc/ Schema standardization for top-down semantic transparency - http://www-128.ibm.com/developerworks/xml/library/x-think31.html -- http://mail.python.org/mailman/listinfo/python-list
Re: xml-object mapping
"I am looking for an xml-object mapping tool ('XML Data Binding-design time product') where I can define the mapping rules in 'binding files' and the parser is generated automatically. Similar to the solution of Dave Kuhlman (http://www.rexx.com/~dkuhlman/ generateDS.html) where the mapping is defined in an xml file (if I am understand well). But I already have the target object. The xml-tags should not be used as a property/member name, but should be mapped to an existing object. " In generateDS the mapping is not just deined in any old sort of XML file: it's defined in a W3C XML Schema file, which makes good sense (except that in my case I dislike WXS). Amara does not use a mapping specification: it maps automatically, and it allows you to specify your own classes for the mapping. This is discussed in the manual. http://uche.ogbuji.net/tech/4Suite/amara/ -- Uche http://uche.ogbuji.net -- http://mail.python.org/mailman/listinfo/python-list
Re: Suggestions for Python XML library which can search and insert
"I'm looking for a library that can search through an XML document tree, locate an element by attribute (ideally this can be done through XPath), and insert an element (as its child). Simple? Yes? ...but the code I've seen so far which does this uses 'nested for loops' for trees which are relatively shallow compared to mine. " Amara can easily do this using XPath (complete with predicates, functions, etc.), without nested for loops: http://uche.ogbuji.net/tech/4Suite/amara/ -- Uche http://uche.ogbuji.net -- http://mail.python.org/mailman/listinfo/python-list
Re: Differences between RDFlib - 4RDF and Redfoot - 4Suite?
"I was wondering about the differences with the referred libs and servers. Since the documentation isn't so thorough(and a bit because of my laziness), I thought I'd make request for usage accounts etc. stating the pros and cons of the aforementioned. Any notes would be appreciated." RDFLib is a thinner layer, more of the raw API. 4RDF adds in Versa query, a graph visualization tool, and multiple back ends. However, for the longest time the idea has been to merge the strengths of the two packages (big example: rdflib's parser is up to the latest round of specs. 4RDF's is not). As part of a client project I've actually begun the process of replacing 4RDF's parser with rdflib's in 4Suite (a separate add-on until the 4Suite 1.1. branch emerges). I'd say for now if you just need quick RDF parsing, and you're not also using plain XML, and stuff like Versa RDF query language aren't important to you, you'll get along just fine with rdflib. -- Uche http://copia.ogbuji.net -- http://mail.python.org/mailman/listinfo/python-list
Re: formatted xml output from ElementTree inconsistency
Patrick Maupin wrote: """ Dennis Bieber wrote: > Off hand, I'd consider the non-binary nature to be because the > internet protocols are mostly designed for text, not binary. A document at http://www.w3.org/TR/REC-xml/ lists "the design goals for XML". One of the listed goals is "XML documents should be human-legible and reasonably clear". """ Yes. Thanks for mentioning this, because people too often forget it. minidom, 4Suite's Domlette and Amara all provide good pretty-print output functions. The latter two use rules from the XSLT spec, which is designed by people who have the above design goal well in their blood. -- Uche http://copia.ogbuji.net -- http://mail.python.org/mailman/listinfo/python-list
Re: ElementTree Namespace Prefixes
Chris Spencer: """ Fredrik Lundh wrote: > Chris Spencer wrote: > > If an XML parser reads in and then writes out a document without having > > altered it, then the new document should be the same as the original. > says who? Good question. There is no One True Answer even within the XML standards. It all boils down to how you define "the same". Which parts of the XML document are meaningful content that needs to be preserved and which ones are mere encoding variations that may be omitted from the internal representation? """ One can point out the XML namespaces spec all one wants, but it doesn't matter. The fact is that regardless of what that spec says, as you say, Chris, there are too many XML technologies that require prefix retention.As a simple example, XPath and XSLT, W3C specs just like XMLNS, uses qnames in context, which requires prefix retention. Besides all that, prefix retention is generally more user friendly in round-trip or poartial round-trip scenarios. That's why cDomlette, part of 4Suite [1] and Amara [2], a more Pythonic API for this, both support prefix retention by default. [1] http://4suite.org [2] http://uche.ogbuji.net/tech/4Suite/amara/ -- Uche http://copia.ogbuji.net -- http://mail.python.org/mailman/listinfo/python-list
Re: Getting a DOM element's children by type (STUPID)
""" If i get myself a DOM tree using xml.dom.minidom (or full-fat xml.dom, i don't mind) """ Don't do that. Stick to minidom. The "full" xml.dom from PyXML is ancient and slow. Of course, there are other, better libraries available now, too. """ is there an easy way to ask a element for its child elements of a particular type? By 'type' i mean 'having a certain tag'. """ You can use list comprehensions[1]. You could use XPath, if you're willing to use a library that supports XPath. In Amara[2], this task is trivial. To get all the images in an XHTML div, you'd simply do: for img in div.img: process_img(img) You access names directly as objects according to their element type name. [1] see, e.g., http://www.xml.com/pub/a/2003/01/08/py-xml.html [2] see http://www.xml.com/pub/a/2005/01/19/amara.html -- Uche Ogbuji Fourthought, Inc. http://uche.ogbuji.nethttp://fourthought.com http://copia.ogbuji.net http://4Suite.org Use CSS to display XML, part 2 - http://www-128.ibm.com/developerworks/edu/x-dw-x-xmlcss2-i.html XML Output with 4Suite & Amara - http://www.xml.com/pub/a/2005/04/20/py-xml.htmlUse XSLT to prepare XML for import into OpenOffice Calc - http://www.ibm.com/developerworks/xml/library/x-oocalc/ Schema standardization for top-down semantic transparency - http://www-128.ibm.com/developerworks/xml/library/x-think31.html -- http://mail.python.org/mailman/listinfo/python-list
Re: Elementtree and CDATA handling
"If, instead, you want to keep track of where the CDATA sections are, and output them again without change, you'll need to use an XML-handling interface that supports this feature. Typically, DOM implementations do - the default Python minidom does, as does pxdom. DOM is a more comprehensive but less friendly/Python-like interface for XML processing. " Amara in CVS makes it easy to perform the output part of this: text=""" Document //<![CDATA[ function matchwo(a,b) { if (a < b && a > 0) then { return 1 } } //]]> """ from amara.binderytools import bind_string doc = bind_string(text) print doc.xml(cdataSectionElements=[u'script']) Output: Document <![CDATA[ // function matchwo(a,b) { if (a < b && a > 0) then { return 1 } } // ]]> Unfortunately, in cooking up this example I did find a bug in the Amara 1.0b1 release that requires a workaround. I should be releasing 1.0b2 this weekend, which fixes this bug (among other fixes and improvements). "If you're generating output for legacy browsers, you might want to just use a 'real' HTML serialiser. " Amara does provide for this, e.g.: from amara.binderytools import bind_string doc = bind_string(text) print doc.xml(method=u"html") Which automatically and transparently brings to bear the full power of the XSLT HTML output method. -- Uche Ogbuji Fourthought, Inc. http://uche.ogbuji.nethttp://fourthought.com http://copia.ogbuji.net http://4Suite.org Use CSS to display XML, part 2 - http://www-128.ibm.com/developerworks/edu/x-dw-x-xmlcss2-i.html XML Output with 4Suite & Amara - http://www.xml.com/pub/a/2005/04/20/py-xml.htmlUse XSLT to prepare XML for import into OpenOffice Calc - http://www.ibm.com/developerworks/xml/library/x-oocalc/ Schema standardization for top-down semantic transparency - http://www-128.ibm.com/developerworks/xml/library/x-think31.html -- http://mail.python.org/mailman/listinfo/python-list
Re: Newbie Python & XML
"I have a situation at work. Will be receiving XML file which contains quote information for car insurance. I need to translate this file into a flat comma delimited file which will be imported into a software package. Each XML file I receive will contain information on one quote only. I have documentation on layout of flat file and examples of XML file (lot of fields but only container tags and field tags no DTD's,look easy enough). I am just starting to learn python and have never had to work with XML files before. Working in MS Windows environment. I have Python 2.4 with win32 extensions. " Sounds like the sort of thing I and others have done very easily with Amara: http://www.xml.com/pub/a/2005/01/19/amara.html Overall, if you're new to Python and XML, here are some resources: http://www.xml.com/pub/at/24 http://uche.ogbuji.net/tech/akara/nodes/2003-01-01/general-section -- Uche Ogbuji Fourthought, Inc. http://uche.ogbuji.nethttp://fourthought.com http://copia.ogbuji.net http://4Suite.org Use CSS to display XML, part 2 - http://www-128.ibm.com/developerworks/edu/x-dw-x-xmlcss2-i.html XML Output with 4Suite & Amara - http://www.xml.com/pub/a/2005/04/20/py-xml.htmlUse XSLT to prepare XML for import into OpenOffice Calc - http://www.ibm.com/developerworks/xml/library/x-oocalc/ Schema standardization for top-down semantic transparency - http://www-128.ibm.com/developerworks/xml/library/x-think31.html -- http://mail.python.org/mailman/listinfo/python-list
Re: Bug in Elementtree/Expat
""" > Most examples in the book do not include such a declaration and yet are > properly rendered by Internet Explorer. > Is it mandatory and why is it that Expat crashes on it? It's not mandatory but it's probably good practice to make the document self-contained. The xlink prefix is defined in the DTD but Expat, as a nonvalidating parser, won't fetch it. """ Important clarification: The decision whether or not to read the external DTD subset is separate from the decision whether or not to validate. Expat does not validate, but it does read the external subset, if you tell it to. There are other uses for reading the external subset, such as entity resolution. And you can have validation constructs in the internal DTD subset (IOW right in the XML source file itself), and expat will not do anything with them because it does not validate. This may seem a subtle distinction, but it lies behind a lot of user confusion in practice. The XML WG really should have simplified such matters (IIRC SGML compatability was a big obstruction to doing so). -- Uche http://uche.ogbuji.net -- http://mail.python.org/mailman/listinfo/python-list
Re: XML file parsing with SAX
On 4/23/05, Willem Ligtenberg <[EMAIL PROTECTED]> wrote: > so that will be sax.handler.feature_external_ges = "false" Yes. > And it will work? Honestly, I'm not sure. It should, but I've found these edge cases a bit hard to predict in the Python built-in libs :-( > But what about using a catalog? I am very new to python and XML... Catalogs allow you to rewrite the IDs for entities and such. So if you had an XML file with an entity at a URL, but you were working offline, you could use a catalog to "redirect" the entity to a copy on your local filesystem. Problem, now that I think of it, is that I'm not sure you can specify an catalog in PySAX. You might instead have to override the method entityResolver in your handler (and be sure to ). See the example in listing 1 and and discussion here: http://www.xml.com/pub/a/2005/03/02/pyxml.html Good luck. -- Uche Ogbuji Fourthought, Inc. http://uche.ogbuji.nethttp://fourthought.com http://copia.ogbuji.net http://4Suite.org Use CSS to display XML, part 2 - http://www-128.ibm.com/developerworks/edu/x-dw-x-xmlcss2-i.html XML Output with 4Suite & Amara - http://www.xml.com/pub/a/2005/04/20/py-xml.htmlUse XSLT to prepare XML for import into OpenOffice Calc - http://www.ibm.com/developerworks/xml/library/x-oocalc/ Schema standardization for top-down semantic transparency - http://www-128.ibm.com/developerworks/xml/library/x-think31.html -- http://mail.python.org/mailman/listinfo/python-list
Re: XML file parsing with SAX
On Sat, 2005-04-23 at 15:20 +0200, Willem Ligtenberg wrote: > I decided to use SAX to parse my xml file. > But the parser crashes on: > File "/usr/lib/python2.3/site-packages/_xmlplus/sax/handler.py", line 38, > in fatalError > raise exception > xml.sax._exceptions.SAXParseException: NCBI_Entrezgene.dtd:8:0: error in > processing external entity reference > > This is caused by: > "NCBI_Entrezgene.dtd"> > > If I remove it, it parses normally. > I've created my parser like this: > import sys > from xml.sax import make_parser > from handler import EntrezGeneHandler > > fopen = open("mouse2.xml", "r") > ch = EntrezGeneHandler() > saxparser = make_parser() > saxparser.setContentHandler(ch) > saxparser.parse(fopen) > > And the handler is: > from xml.sax import ContentHandler > > class EntrezGeneHandler(ContentHandler): > """ > A handler to deal with EntrezGene in XML > """ > > def startElement(self, name, attrs): > print "Start element:", name > > So it doesn't do much yet. And still it crashes... > How can I tell the parser not to look at the DOCTYPE declaration. > On a website: > http://www.devarticles.com/c/a/XML/Parsing-XML-with-SAX-and-Python/1/ > it states that the SAX parsers are not validating, so this error shouldn't > even occur? Just because it's not validating doesn't mean that the parser won't try to read the external entity. Maybe you're looking for """ feature_external_ges Value: "http://xml.org/sax/features/external-general-entities"; true: Include all external general (text) entities. false: Do not include external general entities. access: (parsing) read-only; (not parsing) read/write """ Quote from: http://docs.python.org/lib/module-xml.sax.handler.html But you're on pretty shaky ground in any XML 1.x toolkit using a bogus DTDecl in this way. Why go through the hassle? Why not use a catalog, or remove the DTDecl? -- Uche Ogbuji Fourthought, Inc. http://uche.ogbuji.nethttp://fourthought.com http://copia.ogbuji.net http://4Suite.org Use CSS to display XML, part 2 - http://www-128.ibm.com/developerworks/edu/x-dw-x-xmlcss2-i.html XML Output with 4Suite & AMara - http://www.xml.com/pub/a/2005/04/20/py-xml.html Use XSLT to prepare XML for import into OpenOffice Calc - http://www.ibm.com/developerworks/xml/library/x-oocalc/ Schema standardization for top-down semantic transparency - http://www-128.ibm.com/developerworks/xml/library/x-think31.html -- http://mail.python.org/mailman/listinfo/python-list
Re: python modules in home dir
On Sat, 2005-04-16 at 08:12 -0600, Uche Ogbuji wrote: > On Sat, 2005-04-09 at 14:09 -0700, dzieciou wrote: > > > I'm new-comer in Python. > > I want to install few Python modules (4Suite, RDFLib, Twisted and Racoon) > > in my home directory, since Python installation is already installed in the > > system > > and I'm NOT its admin. > > I cannot install pyvm (portable binary python machine) - have no such big > > quota. > > Any idea how can I solve it? > > To install 4Suite in the home dir, use an incantation such as: > > ./setup.py config --prefix=$HOME/lib > ./setup.py install > > Note: I expect you also installed Python in your home dir? BTW, I expanded on this suggestion at: http://copia.ogbuji.net/blog/2005-04-16/Installing -- Uche Ogbuji Fourthought, Inc. http://uche.ogbuji.nethttp://fourthought.com http://copia.ogbuji.net http://4Suite.org Use CSS to display XML, part 2 - http://www-128.ibm.com/developerworks/edu/x-dw-x-xmlcss2-i.html XML Output with 4Suite & AMara - http://www.xml.com/pub/a/2005/04/20/py-xml.html Use XSLT to prepare XML for import into OpenOffice Calc - http://www.ibm.com/developerworks/xml/library/x-oocalc/ Schema standardization for top-down semantic transparency - http://www-128.ibm.com/developerworks/xml/library/x-think31.html -- http://mail.python.org/mailman/listinfo/python-list
Re: python modules in home dir
On Sat, 2005-04-09 at 14:09 -0700, dzieciou wrote: > I'm new-comer in Python. > I want to install few Python modules (4Suite, RDFLib, Twisted and Racoon) > in my home directory, since Python installation is already installed in the > system > and I'm NOT its admin. > I cannot install pyvm (portable binary python machine) - have no such big > quota. > Any idea how can I solve it? To install 4Suite in the home dir, use an incantation such as: ./setup.py config --prefix=$HOME/lib ./setup.py install Note: I expect you also installed Python in your home dir? -- Uche Ogbuji Fourthought, Inc. http://uche.ogbuji.nethttp://fourthought.com http://copia.ogbuji.net http://4Suite.org Use CSS to display XML, part 2 - http://www-128.ibm.com/developerworks/edu/x-dw-x-xmlcss2-i.html Writing and Reading XML with XIST - http://www.xml.com/pub/a/2005/03/16/py-xml.html Use XSLT to prepare XML for import into OpenOffice Calc - http://www.ibm.com/developerworks/xml/library/x-oocalc/ Schema standardization for top-down semantic transparency - http://www-128.ibm.com/developerworks/xml/library/x-think31.html -- http://mail.python.org/mailman/listinfo/python-list
Re: how to use structured markup tools
On Sat, 2005-03-19 at 00:14 -0800, Sean McIlroy wrote: > I'm dealing with XML files in which there are lots of tags of the > following form: xy (all of these letters are being > used as 'metalinguistic variables') Not all of the tags in the file are > of that form, but that's the only type of tag I'm interested in. (For > the insatiably curious, I'm talking about a conversation log from MSN > Messenger.) What I need to do is to pull out all the x's and y's in a > form I can use. In other words, from... > > . > . > x1y1 > . > . > x2y2 > . > . > x3y3 > . > . > > ...I would like to produce, for example,... > > [ (x1,y1), (x2,y2), (x3,y3) ] > > Now, I'm aware that there are extensive libraries for dealing with > marked-up text, but here's the thing: I think I have a reasonable > understanding of python, but I use it in a lisplike way, and in > particular I only know the rudiments of how classes work. So here's > what I'm asking for: > > Can anybody give me a rough idea how to come to grips with the problem > described above? Or even (dare to dream) example code? Any help will be > very much appreciated. There are many tools you can use to get this done in Python. Here's a recipe using Amara ( http://www.xml.com/pub/a/2005/01/19/amara.html ) DOC = """\ x1y1 x2y2 x3y3 """ from amara import binderytools matrix = [] for row in binderytools.pushbind(u'a', string=DOC): matrix.append((unicode(row.b), unicode(row.c))) print matrix Which outputs: [(u'x1', u'y1'), (u'x2', u'y2'), (u'x3', u'y3')] If your matrix actually has a variable or previously unknown number of columns (e.g. x1y1z1 ), the following version of the for loop is a more general solution: for row in binderytools.pushbind(u'a', string=DOC): matrix.append(tuple([ unicode(e) for e in row.xml_xpath(u'*') ])) Same output, of course. I even tested it for you in Amara 0.9.4. And what the heck, while I was there, I added it to the demos. You can make things even more obfuscated^H^H^H^H^H^H^H^H^H^Hterse using further lambda or list comp tricks, but I leave that as an exercise for the perverse ;-) -- Uche OgbujiFourthought, Inc. http://uche.ogbuji.nethttp://4Suite.orghttp://fourthought.com Use CSS to display XML, part 2 - http://www-128.ibm.com/developerworks/edu/x-dw-x-xmlcss2-i.html Writing and Reading XML with XIST - http://www.xml.com/pub/a/2005/03/16/py-xml.html Introducing the Amara XML Toolkit - http://www.xml.com/pub/a/2005/01/19/amara.ht Be humble, not imperial (in design) - http://www.adtmag.com/article.asp?id=10286 Querying WordNet as XML - http://www.ibm.com/developerworks/xml/library/x-think29.html Packaging XSLT lookup tables as EXSLT functions - http://www.ibm.com/developerworks/xml/library/x-tiplook2.html -- http://mail.python.org/mailman/listinfo/python-list
Re: Simple XML-to-Python conversion
On Fri, 2005-03-18 at 11:04 -0800, [EMAIL PROTECTED] wrote: > Since I've exhausted every option except for Amara, I've decided to > give it a try. However, this will only work if I can compile Amara and > 4suite along with my application. I doubt 4suite will be able to be > compiled, but I'll try it anyway. Actually, as I mentioned in my last message, we do have some success reports re: 4Suite + py2exe. See the March archives of the 4Suite list. I think it took some work from those of the 4Suite developers who are Windows-savvy, it did the job. -- Uche OgbujiFourthought, Inc. http://uche.ogbuji.nethttp://4Suite.orghttp://fourthought.com Use CSS to display XML, part 2 - http://www-128.ibm.com/developerworks/edu/x-dw-x-xmlcss2-i.html Introducing the Amara XML Toolkit - http://www.xml.com/pub/a/2005/01/19/amara.html Gems from the Mines: 2002 to 2003 - http://www.xml.com/pub/a/2005/03/02/pyxml.html Be humble, not imperial (in design) - http://www.adtmag.com/article.asp?id=10286 Querying WordNet as XML - http://www.ibm.com/developerworks/xml/library/x-think29.html Packaging XSLT lookup tables as EXSLT functions - http://www.ibm.com/developerworks/xml/library/x-tiplook2.html -- http://mail.python.org/mailman/listinfo/python-list
Re: Simple XML-to-Python conversion
On Sat, 2005-03-19 at 15:38 -0800, [EMAIL PROTECTED] wrote: > Thanks Lutz! > > I should have looked into Amara's binderytools module earlier. This is > just the type of tool I was looking for. When I tried testing its > compatibility with py2exe, I was _almost_ able to compile... Does > anyone know where the following libraries exist? I thought Amara would > have these included, but it looks like I need to install another > module. Were currently on the 4Suite mailing list chasing down all the magic required for py2exe. I'm largely a Windows illiterate, but this looks like what I remember: http://lists.fourthought.com/pipermail/4suite/2005-March/013450.html I do want to be sure Amara can be packaged with py2exe, so please let me know if this helps. You might want to consider continuing the discussion on the 4SUite list (which I use for Amara discussion as well). I follow that list far more diligently than c.l.py. -- Uche OgbujiFourthought, Inc. http://uche.ogbuji.nethttp://4Suite.orghttp://fourthought.com Use CSS to display XML, part 2 - http://www-128.ibm.com/developerworks/edu/x-dw-x-xmlcss2-i.html Introducing the Amara XML Toolkit - http://www.xml.com/pub/a/2005/01/19/amara.html Gems from the Mines: 2002 to 2003 - http://www.xml.com/pub/a/2005/03/02/pyxml.html Be humble, not imperial (in design) - http://www.adtmag.com/article.asp?id=10286 Querying WordNet as XML - http://www.ibm.com/developerworks/xml/library/x-think29.html Packaging XSLT lookup tables as EXSLT functions - http://www.ibm.com/developerworks/xml/library/x-tiplook2.html -- http://mail.python.org/mailman/listinfo/python-list
Re: SAX parsing problem
On Wed, 2005-03-16 at 00:14 -0800, gh wrote: > The characters handler routine is fired 3 times for a > single text block. Why does it do this? Is there a way to prevent > doing this? Continuing in the vein of closing matters cross-posted to XML-SIG: http://mail.python.org/pipermail/xml-sig/2005-March/011013.html -- Uche OgbujiFourthought, Inc. http://uche.ogbuji.nethttp://4Suite.orghttp://fourthought.com Use CSS to display XML, part 2 - http://www-128.ibm.com/developerworks/edu/x-dw-x-xmlcss2-i.html Introducing the Amara XML Toolkit - http://www.xml.com/pub/a/2005/01/19/amara.html Gems from the Mines: 2002 to 2003 - http://www.xml.com/pub/a/2005/03/02/pyxml.html Be humble, not imperial (in design) - http://www.adtmag.com/article.asp?id=10286 Querying WordNet as XML - http://www.ibm.com/developerworks/xml/library/x-think29.html Packaging XSLT lookup tables as EXSLT functions - http://www.ibm.com/developerworks/xml/library/x-tiplook2.html -- http://mail.python.org/mailman/listinfo/python-list
Re: SAX: Help on processing qualified attribute values
On Thu, 2005-03-10 at 15:22 +0100, Markus Doering wrote: > Hey, > > I am trying to process XML schema documents using namespace aware SAX > handlers. Currently I am using the default python 2.3 parser: > > parser = xml.sax.make_parser() > parser.setFeature(xml.sax.handler.feature_namespaces, 1) > > > At some point I need to parse xml attributes which contain namespace > prefixes as their value. For example: > > > > The default SAX parser does a good job on dealing with qualified names > as xml tags, but is there a way I can access the internal sax mapping > between prefixes and full namespaces to be able to parse "qualified > attribute values"? A simple private dictionary prefix2namespace would be > sufficient. Just for others, this was answered here: http://mail.python.org/pipermail/xml-sig/2005-March/010989.html I also provide a useful mix-in class for this purpose in Amara's saxtools: http://www.xml.com/pub/a/2005/01/19/amara.html http://cvs.4suite.org/viewcvs/Amara/lib/saxtools.py?rev=1.9&view=markup In the latter link see class namespace_mixin, which you should be able to copy to your code if you don't want to install Amara). -- Uche OgbujiFourthought, Inc. http://uche.ogbuji.nethttp://4Suite.orghttp://fourthought.com Use CSS to display XML, part 2 - http://www-128.ibm.com/developerworks/edu/x-dw-x-xmlcss2-i.html Introducing the Amara XML Toolkit - http://www.xml.com/pub/a/2005/01/19/amara.html Gems from the Mines: 2002 to 2003 - http://www.xml.com/pub/a/2005/03/02/pyxml.html Be humble, not imperial (in design) - http://www.adtmag.com/article.asp?id=10286 Querying WordNet as XML - http://www.ibm.com/developerworks/xml/library/x-think29.html Packaging XSLT lookup tables as EXSLT functions - http://www.ibm.com/developerworks/xml/library/x-tiplook2.html -- http://mail.python.org/mailman/listinfo/python-list
Re: best XSLT processor?
Steve Holden: "I don't know what news reader you are using, but I wonder if I could ask you to retain just a little more context in your posts. If they were personal emails I would probably be able to follow the thread, but in a newsgroup it's always helpful when I see a comment such as your above if I know what the heck you are talking about ;-)." I'm using Google Groups. I'd assumed it maintains quoting, but I guess not. Looks as if I'll have to ditch it, which makes things awkward because I don't have time to follow this NG in its entirety: I prefer to just search weekly for "Python XML". --Uche -- http://mail.python.org/mailman/listinfo/python-list
Re: get textual content of a Xml element using 4DOM
I suggest using minidom or pxdom [1] rather than 4DOM. If you insist on using 4DOM, xml.dom.ext.Print(node) or xml.dom.ext.PrettyPrint(node) does what you want. [1] http://www.doxdesk.com/software/py/pxdom.html --Uche -- http://mail.python.org/mailman/listinfo/python-list
Re: get textual content of a Xml element using 4DOM
I suggest using minidom or pxdom [1] rather than 4DOM. If you insist on using 4DOM, xml.dom.ext.Print(node) or xml.dom.ext.PrettyPrint(node) does what you want. --Uche -- http://mail.python.org/mailman/listinfo/python-list
Re: best XSLT processor?
Actually, most of the compliant problems I can remember off-head with respect to Xalan have been regarding EXSLT 1.0, not base XSLT 1.0. Sorry for any misconstruction. --Uche -- http://mail.python.org/mailman/listinfo/python-list
Re: best XSLT processor?
This is a good way to kick off a tussle among interested parties, but hinestly, at this point, most packages work fine. In my opinion your rade-off right now is raw speed (e.g. libxslt) versus flexibility (e.g. 4Suite). All are bug-free enough that you'd have to be doing somethign *very* exotic to run into trouble. Just pick one or two and try them. http://uche.ogbuji.net/tech/akara/nodes/2003-01-01/python-xslt --Uche -- http://mail.python.org/mailman/listinfo/python-list
Re: best XSLT processor?
Xalan is certainly faster, but it is almost certainly not more compliant than 4Suite. Xalan actually has a bit of a reputation among XSLT processors in its carelessness with compliance. But I suppoose in order to settle these counter-claims, one of us will have to come up with specific compliance examples. You fired the first shot. Can you back it up? --Uche -- http://mail.python.org/mailman/listinfo/python-list
Re: best XSLT processor?
Who says 4Suite is buggy? Do they have any evidence to back that up? We have a huge test suite, and though 4Suite is by no means the fastest option, it's quite reliable for XSLT. The XSLT processor in PyXML is just a very old version of 4XSLT. --Uche -- http://mail.python.org/mailman/listinfo/python-list
Re: forms, xslt and python
Firstly, that isn't an XML file. You're missing quotes around attribute values. Secondly, your question is very unclear. Are you looking for an XSLT way to correlate the correct_answer attribute to the alternative element in corresponding order? Are you looking for a Python means to do this? -- Uche OgbujiFourthought, Inc. http://uche.ogbuji.nethttp://4Suite.orghttp://fourthought.com Use CSS to display XML - http://www.ibm.com/developerworks/edu/x-dw-x-xmlcss-i.html Introducing the Amara XML Toolkit - http://www.xml.com/pub/a/2005/01/19/amara.html Be humble, not imperial (in design) - http://www.adtmag.com/article.asp?id=10286Querying WordNet as XML - http://www.ibm.com/developerworks/xml/library/x-think29.html Manage XML collections with XAPI - http://www-106.ibm.com/developerworks/xml/library/x-xapi.html Default and error handling in XSLT lookup tables - http://www.ibm.com/developerworks/xml/library/x-tiplook.html Packaging XSLT lookup tables as EXSLT functions - http://www.ibm.com/developerworks/xml/library/x-tiplook2.html -- http://mail.python.org/mailman/listinfo/python-list
Re: parsing WSDL
Just for completeness I wanted to mention that yes, you can use 4Suite to parse WSDL and get method signature information, but I do agree that it's better to do this at a higher level, if you can. WHy reinvent that wheel? SOAPpy has a decent WSDL class. -- Uche OgbujiFourthought, Inc. http://uche.ogbuji.nethttp://4Suite.orghttp://fourthought.com Use CSS to display XML - http://www.ibm.com/developerworks/edu/x-dw-x-xmlcss-i.html Introducing the Amara XML Toolkit - http://www.xml.com/pub/a/2005/01/19/amara.html Be humble, not imperial (in design) - http://www.adtmag.com/article.asp?id=10286UBL 1.0 - http://www-106.ibm.com/developerworks/xml/library/x-think28.html Manage XML collections with XAPI - http://www-106.ibm.com/developerworks/xml/library/x-xapi.html Default and error handling in XSLT lookup tables - http://www.ibm.com/developerworks/xml/library/x-tiplook.html Packaging XSLT lookup tables as EXSLT functions - http://www.ibm.com/developerworks/xml/library/x-tiplook2.html -- http://mail.python.org/mailman/listinfo/python-list
Re: 4suite XSLT thread safe ?
Sorry I'm late to the whole thread. Diez B. Roggisch is pretty much right on the money in all his comments. 4XSLT *is* thread safe, but each individual processor instance is not thread safe. Yes, this is typical OO style: you encapsulate state in an instance so that as long as each thread has its own instance, there are no state clashes. Therefore, you should be creating at least one processor object per thread. Note: the 4Suite server is a multi-threaded architecture that uses 4XSLT heavily using processor-per-thread. -- Uche OgbujiFourthought, Inc. http://uche.ogbuji.nethttp://4Suite.orghttp://fourthought.com Use CSS to display XML - http://www.ibm.com/developerworks/edu/x-dw-x-xmlcss-i.html Introducing the Amara XML Toolkit - http://www.xml.com/pub/a/2005/01/19/amara.html Be humble, not imperial (in design) - http://www.adtmag.com/article.asp?id=10286UBL 1.0 - http://www-106.ibm.com/developerworks/xml/library/x-think28.html Manage XML collections with XAPI - http://www-106.ibm.com/developerworks/xml/library/x-xapi.html Default and error handling in XSLT lookup tables - http://www.ibm.com/developerworks/xml/library/x-tiplook.html Packaging XSLT lookup tables as EXSLT functions - http://www.ibm.com/developerworks/xml/library/x-tiplook2.html -- http://mail.python.org/mailman/listinfo/python-list
Re: Clarification on XML parsing & namespaces (xml.dom.minidom)
Greg Wogan-Browne wrote: > I am having some trouble figuring out what is going on here - is this a > bug, or correct behaviour? Basically, when I create an XML document with > a namespace using xml.dom.minidom.parse() or parseString(), the > namespace exists as an xmlns attribute in the DOM (fair enough, as it's > in the original source document). However, if I use the DOM > implementation to create an identical document with a namespace, the > xmlns attribute is not present. > > This mainly affects me when I go to print out the document again using > Document.toxml(), as the xmlns attribute is not printed for documents I > create dynamically, and therefore XSLT does not kick in (I'm using an > external processor). > > Any thoughts on this would be appreciated. Should I file a bug on pyxml? It's odd behavior, but I think it's a stretch to call it a bug. You problem is that you're mixing namespaced documents with the non-namespace DOM API. That means trouble and such odd quirks every time. Use getAttributeNS, createElementNS, setAttributeNS, etc. rather than getAttribute, createElement, setAttribute, etc. -- Uche OgbujiFourthought, Inc. http://uche.ogbuji.nethttp://4Suite.orghttp://fourthought.com Use CSS to display XML - http://www.ibm.com/developerworks/edu/x-dw-x-xmlcss-i.html Introducing the Amara XML Toolkit - http://www.xml.com/pub/a/2005/01/19/amara.html Be humble, not imperial (in design) - http://www.adtmag.com/article.asp?id=10286UBL 1.0 - http://www-106.ibm.com/developerworks/xml/library/x-think28.html Manage XML collections with XAPI - http://www-106.ibm.com/developerworks/xml/library/x-xapi.html Default and error handling in XSLT lookup tables - http://www.ibm.com/developerworks/xml/library/x-tiplook.html Packaging XSLT lookup tables as EXSLT functions - http://www.ibm.com/developerworks/xml/library/x-tiplook2.html -- http://mail.python.org/mailman/listinfo/python-list
Re: XPath and XQuery in Python?
Interesting discussion. My own thoughts: http://www.oreillynet.com/pub/wlg/6224 http://www.oreillynet.com/pub/wlg/6225 Meanwhile, please don't make the mistake of bothering with XQuery. It's despicable crap. And a huge impedance mismatch with Python. --Uche -- http://mail.python.org/mailman/listinfo/python-list
Re: Any Python XML Data Binding Utilities Avaiable?
Sounds like generateDS is closest to what you want: http://www.rexx.com/~dkuhlman/generateDS.html If you can bind from instances only and don't need schema, see Amara Bindery: http://uche.ogbuji.net/tech/4Suite/amara/ Also consider Gnosis Utilities and ElementTree. -- Uche OgbujiFourthought, Inc. http://uche.ogbuji.nethttp://4Suite.orghttp://fourthought.com Use CSS to display XML - http://www.ibm.com/developerworks/edu/x-dw-x-xmlcss-i.html Full XML Indexes with Gnosis - http://www.xml.com/pub/a/2004/12/08/py-xml.html Be humble, not imperial (in design) - http://www.adtmag.com/article.asp?id=10286UBL 1.0 - http://www-106.ibm.com/developerworks/xml/library/x-think28.html Use Universal Feed Parser to tame RSS - http://www.ibm.com/developerworks/xml/library/x-tipufp.html Default and error handling in XSLT lookup tables - http://www.ibm.com/developerworks/xml/library/x-tiplook.html A survey of XML standards - http://www-106.ibm.com/developerworks/xml/library/x-stand4/ The State of Python-XML in 2004 - http://www.xml.com/pub/a/2004/10/13/py-xml.html -- http://mail.python.org/mailman/listinfo/python-list
Re: editing XML via DOM
jaco wrote: > Hi, > > I'm new to Python and XML but still I want to create something that > includes creating and editing XML using Python. > > Now I'm looking for a little example program that does (some of) this to > set me on my way. > > Is there something like this available or can somebody give me some > example lines that creates and saves some XML data? Start with: http://www.xml.com/pub/a/2002/11/13/py-xml.html then see: http://www.xml.com/pub/a/2003/10/15/py-xml.html Overall, there is a lot on DOM throughout the series: http://www.xml.com/pub/at/24 -- Uche OgbujiFourthought, Inc. http://uche.ogbuji.nethttp://4Suite.orghttp://fourthought.com Use CSS to display XML - http://www.ibm.com/developerworks/edu/x-dw-x-xmlcss-i.html Full XML Indexes with Gnosis - http://www.xml.com/pub/a/2004/12/08/py-xml.html Be humble, not imperial (in design) - http://www.adtmag.com/article.asp?id=10286UBL 1.0 - http://www-106.ibm.com/developerworks/xml/library/x-think28.html Use Universal Feed Parser to tame RSS - http://www.ibm.com/developerworks/xml/library/x-tipufp.html Default and error handling in XSLT lookup tables - http://www.ibm.com/developerworks/xml/library/x-tiplook.html A survey of XML standards - http://www-106.ibm.com/developerworks/xml/library/x-stand4/ The State of Python-XML in 2004 - http://www.xml.com/pub/a/2004/10/13/py-xml.html -- http://mail.python.org/mailman/listinfo/python-list