Re: CGI Tutorial
Clodoaldo Pinto Neto wrote: > print 'The submited name was "' + name + '"' Bzzt! Script injection security hole. See cgi.escape and use it (or a similar function) for *all* text -> HTML output. > open('files/' + fileitem.filename, 'w') BZZT. filesystem overwriting security hole, possibly escalatable to code execution. clue: fileitem.filename= '../../something.py' > sid = cookie['sid'].value > session = shelve.open('/tmp/.session/sess_' + sid Bad filename use allows choice of non-session files, opening with shelve allows all sorts of pickle weirdnesses. Just use strings. > p = sub.Popen(str_command, o.O Sure this stuff may not matter for Hello World on a test server, but if you're writing a tutorial you should ensure newbies know the Right Way to do it from the start. The proliferation of security-oblivious PHP tutorials is directly responsible for the disasterous amount of script-injection- and SQL-injection-vulnerable webapps out there - let's not have the same for Python. -- And Clover mailto:[EMAIL PROTECTED] http://www.doxdesk.com/ -- http://mail.python.org/mailman/listinfo/python-list
Re: A critique of cgi.escape
Jon Ribbens wrote: > I'm sorry, that's not good enough. How, precisely, would it break > "existing code"? ('owdo Mr. Ribbens!) It's possible there could be software that relies on ' not being escaped, for example: # Auto-markup links to O'Reilly, everyone's favourite # example name with an apostrophe in it # URI= 'http://www.oreilly.com/' html= cgi.escape(text) html= html.replace('O\'Reilly', 'O\'Reilly' % URI) Sure this may be rare, but it's what the documentation says, and changing it may not only fix things but also subtly break things in ways that are hard to detect. A similar change to str.encode('unicode-escape') in Python 2.5 caused a number of similar subtle problems. (In this case the old documentation was a bit woolly so didn't prescribe the exact older behaviour.) I'm not saying that the cgi.escape interface is *good*, just that it's too late to change it. I personally think the entire function should be deprecated, firstly because it's insufficient in some corner cases (apostrophes as you pointed out, and XHTML CDATA), and secondly because it's in the wrong place: HTML-escaping is nothing to do with the CGI interface. A good template library should deal with escaping more smoothly and correctly than cgi.escape. (It may be able to deal with escape-or-not-bother and character encoding issues automatically, for example.) -- And Clover mailto:[EMAIL PROTECTED] http://www.doxdesk.com/ -- http://mail.python.org/mailman/listinfo/python-list
Re: getting POST vars from BaseHTTPRequestHandler
Christopher J. Bottaro wrote: > When I make a post, it just hangs (in self.rfile.read()). I don't know about BaseHTTPRequestHandler in particular, but in general you don't want to call an unlimited read() on an HTTP request - it will try to read the entire incoming stream, up until the stream is ended by the client dropping the connection (by which point it's too late to send a response). Instead you'll normally want to read the request's Content-Length header (int(os.environ['CONTENT_LENGTH']) under CGI) and read(that many) bytes. -- And Clover mailto:[EMAIL PROTECTED] http://www.doxdesk.com/ -- http://mail.python.org/mailman/listinfo/python-list
Re: Having problems with strings in HTML
Sion Arrowsmith wrote: > I've never encountred a browser getting tripped up by it. I suppose you > might need it if you've got parameters called quot or nbsp There are many more entities than you can comfortably remember, and browsers can interpret anything starting with one as being an entity reference, hence all the problems with parameters like 'section' (-> §). Plus of course there's nothing stopping future browsers supporting more entities, breaking your apps. Just write &. There's no reason not to (except ignorance). The fact that so much of the web is written with broken HTML is not an argument for doing the same. -- And Clover mailto:[EMAIL PROTECTED] http://www.doxdesk.com/ -- http://mail.python.org/mailman/listinfo/python-list
Re: new python icons for windows
Istvan Albert wrote: > But these new icons are too large, too blocky and too pastel. Hooray! Glad to see *someone* doesn't like 'em, I'll expect a few more when b1 hits. :-) Although I can't really see 'large', 'blocky' or 'pastel'... they're the same size and shape as other Windows document icons, and I personally find the Python logo colours quite striking. If it's the new-fangled shadey gradienty kind of nonsense you don't like, you could also try the low-colour versions. eg. ICOs compiled with only 16-colour and 16/32 sizes: http://doxdesk.com/file/software/py/pyicons-tiny.zip > For example it resembles the icon for text files. This is intentional: to make it obvious that .py files are the readable, editable scripts, contrasting with .pyc's binary gunk - something that wasn't 100% clear before. With the obviousness of the Python-plus and the strong difference between the white and black base document icons, squinting shouldn't really be necessary IMO. > can someone point me to a page/link that contains the old icons? Sure, http://svn.python.org/view/python/branches/release24-maint/PC/py.ico http://svn.python.org/view/python/branches/release24-maint/PC/pyc.ico http://svn.python.org/view/python/branches/release24-maint/PC/pycon.ico -- And Clover mailto:[EMAIL PROTECTED] http://www.doxdesk.com/ -- http://mail.python.org/mailman/listinfo/python-list
Re: Dr. Dobb's Python-URL! - weekly Python news and links (Jun 12)
John Salerno wrote: > I love the new 'folder' icon, but how can I access it as an icon? I've just given these are proper home, so here: http://doxdesk.com/software/py/pyicons.html cheers! -- And Clover mailto:[EMAIL PROTECTED] http://www.doxdesk.com/ -- http://mail.python.org/mailman/listinfo/python-list
Re: CGI redirection: let us discuss it further
Sullivan WxPyQtKinter wrote: > 1. Are there any method (in python of course) to redirect to a web page > without causing a "Back" button trap... rather than the redirection page > with a "Location: url" head What's wrong with the redirection page? If there's really a necessary reason for not using an HTTP redirect (for example, needing to set a cookie, which doesn't work cross-browser on redirects), the best bet is a page containing a plain link and
Re: Uploading files from IE
AB wrote: > I tried the following with the same result: > myName = ulImage.filename > newFile = file (os.path.join(upload_dir, os.path.basename(myName)), 'wb') os.path is different on your system to the uploader's system. You are using Unix pathnames, with a '/' separator - they are using Windows ones, with '\', so os.path.basename won't recognise them as separators. Old-school-Macintosh and RISC OS machines have different path separators again. The Content-Disposition filename parameter can be set by the user-agent to *anything at all*. Using it without some serious sanitising beforehand is a recipe for security holes. In your original code an attacker could have arbitrarily written to any file the web user had access to. The code with os.path.basename is better but could still be confused by things like an empty string, '.', '..' or invalid characters. It's best not to use any user-submitted data as the basis for filenames. If you absolutely *must* use Content-Disposition as a local filename you must send it through some strict checking first, whether the browser sends full paths to you or not. -- And Clover mailto:[EMAIL PROTECTED] http://www.doxdesk.com/ -- http://mail.python.org/mailman/listinfo/python-list
Re: New-style Python icons
Fredrik Lundh wrote: > could you perhaps add an SVG version ? Yes. I'll look at converting when I've used them a bit and am happy with them. I think some of the higher-level Xara effects may not convert easily to SVG but I'm sure there'll be workarounds of some sort. -- And Clover mailto:[EMAIL PROTECTED] http://www.doxdesk.com/ -- http://mail.python.org/mailman/listinfo/python-list
Re: New-style Python icons
Luis M. González wrote: > This is strange... I've been trying to access this site since > yesterday, but I couldn't Might it be possible you have malware installed? Since I do a bunch of anti-spyware work, there are a few different bits of malware that try to block doxdesk.com, usually using a Hosts file hijack. Try it with the IP address 64.251.25.168 instead - if that works you should probably investigate your Hosts file and/or look at spyware removers. -- Andrew Clover mailto:[EMAIL PROTECTED] http://www.doxdesk.com/ -- http://mail.python.org/mailman/listinfo/python-list
Re: New-style Python icons
Michael Tobis wrote: > Besides the pleasant colors what do you like about it? I like that whilst being a solid and easily-recognisable, it isn't clever-clever. I had personally been idly doodling some kind of swooshy thing before, with a snake's head forming a P and its forked tongue a Y coming out of it, but in retrospect it was just trying too hard. The plus-tadpoles' simplicity appeals to me. -- And Clover mailto:[EMAIL PROTECTED] http://www.doxdesk.com/ -- http://mail.python.org/mailman/listinfo/python-list
Re: why isn't Unicode the default encoding?
John Salerno wrote: > So as it turns out, Unicode and UTF-8 are not the same thing? Well yes. UTF-8 is one scheme in which the whole Unicode character repertoire can be represented as bytes. Confusion arises because Windows uses the name 'Unicode' in character encoding lists, to mean UTF-16_LE, which is another encoding that can store the whole Unicode character repertoire as bytes. However UTF-16_LE is not any more definitively 'Unicode' than UTF-8 is. Further confusion arises because the encoding 'UTF-16' can actually mean two things that are deceptively different: - Unicode characters stored natively in 16-bit units (using two UTF-16 characters to represent characters outside of the Basic Multilingual Plane) - Either of the 8-bit encodings UTF-16_LE and UTF-16_BE, detected automatically using a Byte Order Mark when loaded, or chosen arbitrarily when saving Yet more confusion arises because UTF-32 (which can reference any Unicode character directly) has the same problem. And though wide-unicode builds of Python understand the first meaning (unicode() strings are stored natively as UTF-32), they don't support the 8-bit encodings UTF-32_LE and UTF-32_BE. Phew! To summarise: confusion. > Am I right to say that UTF-8 stores the first 128 Unicode code points > in a single byte, and then stores higher code points in however many > bytes they may need? That is correct. To answer the original question, we're always going to need byte strings. They're a fundamental part of computing and the need to process them isn't going to go away. However as Unicode text manipulation becomes a more common event than byte string processing, it makes sense to change the default kind of string you get when you type a literal. Personally I would like to see byte strings available under an easy syntax like b'...' and UTF-32 strings available as w'...', or something like that - currently having u'...' mean either UTF-16 or UTF-32 depending on compile-time options is very very annoying to the few kinds of programs that really do need to know the difference. But whatever is chosen, it's all tasty Python 3000 future-soup and not worth worrying about for the moment. -- And Clover mailto:[EMAIL PROTECTED] http://www.doxdesk.com/ -- http://mail.python.org/mailman/listinfo/python-list
Re: New-style Python icons
Scott David Daniels wrote: > Maybe you could change the ink color to better distinguish > the pycon and pyc icons. Yeah, might do that... I'm thinking I might flip the pycon icon so that the Windows shortcut badge doesn't obscure the Python logo, too. Maybe. I'll let them stew on my desktop for a bit first though... -- Andrew Clover mailto:[EMAIL PROTECTED] http://www.doxdesk.com/ -- http://mail.python.org/mailman/listinfo/python-list
New-style Python icons
Personally, I *like* the new website look, and I'm glad to see Python having a proper logo at last! I've taken the opportunity to knock up some icons using it, finally banishing the poor old standard-VGA-palette snake from my desktop. If you like, you can grab them from: http://www.doxdesk.com/img/software/py/icons.zip in .ICO format for Windows - containing all resolutions/depths up to and including Windows Vista's crazy new enormo-icons. Also contains the vector graphics source file in Xara format. You can also see a preview here: http://www.doxdesk.com/img/software/py/icons.png -- And Clover mailto:[EMAIL PROTECTED] http://www.doxdesk.com/ -- http://mail.python.org/mailman/listinfo/python-list
Re: Pure python implementation of string-like class
Akihiro KAYAMA wrote: > As the character set is wider than UTF-16(U+10), I can't use > Python's native unicode string class. Have you tried using Python compiled in Wide Unicode mode (--enable-unicode=ucs4)? You get native UTF-32/UCS-4 strings then, which should be enough for most purposes. -- And Clover mailto:[EMAIL PROTECTED] http://www.doxdesk.com/ -- http://mail.python.org/mailman/listinfo/python-list
Re: Retrieve a GIF's palette entries using Python Imaging Library (PIL)
Stuart wrote: > I see that the 'Image' class has a 'palette' attribute which returns an > object of type 'ImagePalette'. However, the documentation is a bit > lacking regarding how to maniuplate the ImagePalette class to retrieve > the palette entries' RGB values. ImagePalette.getdata() should do it. There seems to be some kind of bug, however, where Images lose their ImagePalettes after being convert()ed to paletted images (eg. using Image.ADAPTIVE). For this reason I personally use the getpalette() method from the wrapped image object, which seems to contain the proper raw palette data. For example to get a list of [r,g,b] colour lists: def chunk(seq, size): return [seq[i:i+size] for i in range(0, len(seq), size)] palette= image.im.getpalette() colours= [map(ord, bytes) for bytes in chunk(palette, 3)] -- And Clover mailto:[EMAIL PROTECTED] http://www.doxdesk.com/ -- http://mail.python.org/mailman/listinfo/python-list
Re: XML and namespaces
Uche Ogbuji <[EMAIL PROTECTED]> wrote: > Andrew Clover also suggested an overly-legalistic argument that current > minidom behavior is not a bug. I stick by my language-law interpretation of spec. DOM 2 Core specifically disclaims any responsibility for namespace fixup and advises the application writer to do it themselves if they want to be sure of the right output. W3C knew they weren't going to get all that standardised by Level 2 so they left it open for future work - if minidom claimed to support DOM 3 LS it would be a different matter. > '\n' > (i.e. "ferh" rather than "href"), would you not consider that a minidom > bug? It's not a *spec* bug, as no spec that minidom claims to conform to says anything about serialisation. It's a *minidom* bug in that it fails to conform to the minimal documentation of the method toxml() which claims to "Return the XML that the DOM represents as a string" - the DOM does not represent that XML. However that doc for toxml() says nothing about being namespace-aware. XML and XML-with-namespaces both still exist, and for the former class of document the minidom behaviour is correct. > The reality is that once the poor user has done: > element = document.createElementNS("DAV:", "href") > They are following DOM specification that they have created an element > in a namespace It's possible that a namespaced node could also be imported/parsed into a non-namespace document and then serialised; it's particularly likely this could happen for scripts processing XHTML. We shouldn't change the existing behaviour for toxml/writexml because people may be relying on it. One of the reasons I ended up writing a replacement was that the behaviour of minidom was not only wrong, but kept changing under my feet with each version. However, adding the ability to do fixup on serialisation would indeed be very welcome - toxmlns() maybe, or toxml(namespaces= True)? > I'll be sure to emphasize heavily to users that minidom is broken > with respect to Namespaces and serialization, and that they > abandon it in favor of third-party tools. Well yes... there are in any case more fundamental bugs than just serialisation problems. Frederik wrote: > can anyone perhaps dig up a DOM L2 implementation that's not written > by anyone involved in this thread -- And Clover mailto:[EMAIL PROTECTED] http://doxdesk.com/ -- http://mail.python.org/mailman/listinfo/python-list
Re: XML and namespaces
Uche <[EMAIL PROTECTED]> wrote: > Of course. Minidom implements level 2 (thus the "NS" at the end of the > method name), which means that its APIs should all be namespace aware. > The bug is that writexml() and thus toxml() are not so. Not exactly a bug - DOM Level 2 Core 1.1.8p2 explicitly leaves namespace fixup at the mercy of the application. It's only standardised as a DOM feature in Level 3, which minidom does not yet claim to support. It would be a nice feature to add, but it's not entirely trivial to implement, especially when you can serialize a partial DOM tree. Additionally, it might have some compatibility problems with apps that don't expect namespace declarations to automagically appear. For example, perhaps, an app dealing with HTML that doesn't want spare xmlns="http://www.w3.org/1999/xhtml"; declarations appearing in every snippet of serialized output. So it should probably be optional. In DOM Level 3 (and pxdom) there's a DOMConfiguration parameter 'namespaces' to control it; perhaps for minidom an argument to toxml() might be best? -- And Clover mailto:[EMAIL PROTECTED] http://www.doxdesk.com/ -- http://mail.python.org/mailman/listinfo/python-list
Re: PIL: retreive image resolution (dpi)
[EMAIL PROTECTED] wrote: > I looked at the PIL Image class but cannot see a posibility to retreive > the image resolution dots per inch (or pixels per inch) Not all formats provide a DPI value; since PIL doesn't do anything with DPI it's not part of the main interface. For PNG and JPEG at least the value may be retrievable from the extra info dictionary (image.info['dpi']) when loaded from a file that sets it. Expect an (x, y) tuple (not necessarily square-pixel). -- And Clover mailto:[EMAIL PROTECTED] http://www.doxdesk.com/ -- http://mail.python.org/mailman/listinfo/python-list
Re: Importing User-defined Modules
Walter Brunswick <[EMAIL PROTECTED]> wrote: > I need to import modules with user-defined file extensions > that differ from '.py', and also (if possible) redirect the > bytecode output of the file to a file of a user-defined > extension. You shouldn't really need a PEP for that; you can take control of the compile and import processes manually. See the py_compile and imp modules. -- Andrew Clover mailto:[EMAIL PROTECTED] http://www.doxdesk.com/ -- http://mail.python.org/mailman/listinfo/python-list
Re: Yet Another Python Web Programming Question
Daniel Bickett wrote: > Python using CGI, for example, was enough for him until he started > getting 500 errors that he wasn't sure how to fix. Every time you mention web applications on this list, there will necessarily be a flood of My Favourite Framework Is X posts. But you* sound like you don't want a framework to take over the architecture of your app and tell you what to do. And, indeed, you don't need to do that. There are plenty of standalone modules you can use - even ones that are masquerading as part of a framework. I personally use my own input-stage and templating modules, along with many others, over standard CGI, and only bother moving to a faster server interface which can support DB connection pooling (such as mod_python) if it's actually necessary - which is, surprisingly, not that often. Hopefully if WSGI catches on we will have a better interface available as standard in the future. Not quite sure what 500 Errors you're getting, but usually 500s are caused by unhandled exceptions, which Apache doesn't display the traceback from (for security reasons). Bang the cgitb module in there and you should be able to diagnose problems more easily. > He is also interested in some opinions on the best/most carefree way > of interfacing with MySQL databases. MySQLdb works fine for me: http://sourceforge.net/projects/mysql-python/ (* - er, I mean, Hypothetical. But Hypothetical is a girl's name!) -- Andrew Clover mailto:[EMAIL PROTECTED] http://www.doxdesk.com/ -- http://mail.python.org/mailman/listinfo/python-list
Re: why UnboundLocalError?
Alex Gittens wrote: > I'm getting an UnboundLocalError > def fieldprint(widths,align,fields): [...] > def cutbits(): [...] > fields = fields[widths[i]:] There's your problem. You are assigning 'fields' a completely new value. Python doesn't allow you to rebind a variable from an outer scope in an inner scope (except for the special case where you explicitly use the 'global' directive, which is no use for the nested scopes you are using here). So when you assign an identifier in a function Python assumes that you want that identifier to be a completely new local variable, *not* a reference to the variable in the outer scope. By writing 'fields= ...' in cutbits you are telling Python that fields is now a local variable to cutbits. So when the function is entered, fields is a new variable with no value yet, and when you first try to read it without writing to it first you'll get an error. What you probably want to do is keep 'fields' pointing to the same list, but just change the contents of the list. So replace the assign operation with a slicing one: del fields[:widths[i]] -- Andrew Clover mailto:[EMAIL PROTECTED] http://www.doxdesk.com/ -- http://mail.python.org/mailman/listinfo/python-list
Re: Problem with sha.new
Florian Lindner wrote: > sha = sha.new(f.read()) > this generates a traceback when sha.new() is called for the second time You have reassigned the variable 'sha'. First time around, sha is the sha module object as obtained by 'import sha'. Second time around, sha is the SHA hashing object you used the first time around. This does not have a 'new' method. Python does not have separate namespaces for packages and variables. Modules are stored in variables just like any other object. -- Andrew Clover mailto:[EMAIL PROTECTED] http://www.doxdesk.com/ -- http://mail.python.org/mailman/listinfo/python-list
Re: Python as CGI on IIS and Windows 2003 Server
Lothat <[EMAIL PROTECTED]> wrote: > No test with or without any " let the IIS execute python scrits as cgi. > Http Error code is 404 (but i'm sure that the file exists in the > requested path). Have you checked the security restrictions? IIS6 has a new feature whereby script mappings are disabled by default even if they are listed in the configuration list. To turn CGI on, go to the IIS Manager snap-in and select the 'Web Service Extensions' folder. Select 'All Unknown CGI Extensions' and click 'Allow'. Incidentally, the string I am using is: "C:\Program Files\Python\2.4\python.exe" -u "%s" "%s" -- Andrew Clover mailto:[EMAIL PROTECTED] http://www.doxdesk.com/ -- http://mail.python.org/mailman/listinfo/python-list
Re: Elementtree and CDATA handling
Alain <[EMAIL PROTECTED]> wrote: > I would expect a piece of XML to be read, parsed and written back > without corruption [...]. It isn't however the case when it comes > to CDATA handling. This is not corruption, exactly. For most intents and purposes, CDATA sections should behave identically to normal character data. In a real XML-based browser (such as Mozilla in application/xhtml+xml mode), this line of script would actually work fine: > if (a < b && a > 0) { The problem is you're (presumably) producing output that you want to be understood by things that are not XML parsers, namely legacy-HTML web browsers, which have special exceptions-to-the-rule like "
Re: Python 2.4.1 install broke RedHat 9 printconf-backend
BrianS wrote: > File "/usr/share/printconf/util/printconf_conf.py", line 83, in ? > from xml.utils import qp_xml > ImportError: No module named utils > It seems that the xml package have been changed. Not exactly. xml.utils is part of the XML processing package PyXML - you don't get it in the cut-down XML stuff available in the standard library. You could try downloading and installing from http://pyxml.sf.net/. Though I can't guarantee there won't be other problems as RedHat can be very annoying like this. You might have to keep Python 2.2 around in addition to 2.4 for RH's benefit; in any case trying to remove 2.2 will probably lead you into an RPM dependency nightmare. -- Andrew Clover mailto:[EMAIL PROTECTED] http://www.doxdesk.com/ -- http://mail.python.org/mailman/listinfo/python-list
Re: File Uploads
Doug Helm wrote: > form = cgi.FieldStorage() > if lobjUp.Save('filename', 'SomeFile.jpg'): > class BLOB(staticobject.StaticObject): > def Save(self, pstrFormFieldName, pstrFilePathAndName): > form = cgi.FieldStorage() You are instantiating cgi.FieldStorage twice. This won't work for POST requests, because instantiating a FieldStorage reads the form data from the standard input stream (the HTTP request). Try to create a second one and cgi will try to read all the form data again; this will hang, waiting for the socket to send it a load more data which will not be forthcoming. When using CGI, parse the input only once, then pass the results (a FieldStorage object if you are using the cgi module) in to any other functions that need to read it. -- Andrew Clover mailto:[EMAIL PROTECTED] http://www.doxdesk.com/ -- http://mail.python.org/mailman/listinfo/python-list
Re: injecting "set" into 2.3's builtins?
Skip Montanaro wrote: > I use sets a lot in my Python 2.3 code at work and have been using > this hideous import to make the future move to 2.4's set type > transparent: > try: > x = set (Surely just 'set' on its own is sufficient? This avoids the ugly else clause.) > __builtin__.set = sets.Set > I'm wondering if others have tried it. If so, did it cause any > problems? I don't know of any specific case where it would cause problems but I'd be very wary of this; certainly doing the same with True and False has caused problems in the past. A module might sniff for 'set' and assume it is running on 2.4 if it sees it, with unpredictable results if it relies on any other 2.4 behaviour. I'd personally put this at the top of local scripts: from siteglobals import * Then put compatibility hacks like set and bool in siteglobals.py. Then any modules or other non-site scripts could continue without the polluted builtin scope. -- Andrew Clover mailto:[EMAIL PROTECTED] http://www.doxdesk.com/ -- http://mail.python.org/mailman/listinfo/python-list
Re: function with a state
Xah Lee <[EMAIL PROTECTED]> wrote: > is it possible in Python to create a function that maintains a > variable value? Yes. There's no concept of a 'static' function variable as such, but there are many other ways to achieve the same thing. > globe=0; > def myFun(): > globe=globe+1 > return globe This would work except that you have to tell it explicitly that you're working with a global, otherwise Python sees the "globe=" and decides you want 'globe' be a local variable. globe= 0 def myFun(): global globe globe= globe+1 return globe Alternatively, wrap the value in a mutable type so you don't have to do an assignment (and can use it in nested scopes): globe= [ 0 ] def myFun(): globe[0]+= 1 return globe[0] A hack you can use to hide statics from code outside the function is to abuse the fact that default parameters are calcuated at define-time: def myFun(globe= [ 0 ]): globe[0]+= 1 return globe[0] For more complicated cases, it might be better to be explicit and use objects: class Counter: def __init__(self): self.globe= 0 def count(self): self.globe+= 1 return self.globe myFun= Counter().count -- Andrew Clover mailto:[EMAIL PROTECTED] http://www.doxdesk.com/ -- http://mail.python.org/mailman/listinfo/python-list
Re: convert gb18030 to utf16
Xah Lee <[EMAIL PROTECTED]> wrotE: > i have a bunch of files encoded in GB18030. Is there a way to convert > them to utf16 with python? You will need CJKCodecs (http://cjkpython.i18n.org/), or Python 2.4, which has them built in. Then just use them like any other codec. eg. f= open(path, 'rb') content= unicode(f.read(), 'gb18030') f.close() f= open(path, 'wb') f.write(content.encode('utf-16')) f.close() -- Andrew Clover mailto:[EMAIL PROTECTED] http://www.doxdesk.com/ -- http://mail.python.org/mailman/listinfo/python-list
Re: get textual content of a Xml element using 4DOM
Frank Abel Cancio Bello <[EMAIL PROTECTED]> wrote: > PrettyPrint or Print return the value to the console, and i need > keep this value in a string variable to work with it, how can i > do this? The second parameter to either of these functions can be a stream object, so you can use a StringIO to get string output: from StringIO import StringIO from xml.dom.ext import Print buf= StringIO() Print(doc, buf) xml= buf.getvalue() -- Andrew Clover http://www.doxdesk.com/ mailto:[EMAIL PROTECTED] -- http://mail.python.org/mailman/listinfo/python-list
Re: Problem with minidom and special chars in HTML
Horst Gutmann wrote: > I currently have quite a big problem with minidom and special chars > (for example ü) in HTML. Yes. Ignoring the issue of the wrong doctype, minidom is a pure XML parser and knows nothing of XHTML and its doctype's entities 'uuml' and the like. Only the built-in entities (& etc.) will work. Unfortunately the parser minidom uses won't read external entities - including the external subset of the DTD (which is where all the stuff about what "ü" means is stored). And because minidom does not support EntityReference nodes, the information that there was an entity reference there at all gets thrown away as it is replaced with the empty string. Which is kind of bad. Possible workarounds: 1. pass minidom a different parser to use, one which supports external entities and which will parse all the DTD stuff. I don't know if there is anything suitable available, though... 2. use a DOM implementation with the option to support external entities. For example, with pxdom, one can use DOM Level 3 LS methods, or pxdom.parse(f, {'pxdom-external-entities': True}). However note that reading and parsing an external entity will introduce significant slowdown, especially in the case of the rather complex multi-file XHTML DTD. Other possibilities: 3. hack the content on the way into the parser to replace the DOCTYPE declaration with one including entity definitions in the internal subset: ... ]> ... 4. hack the content on the way into the parser to replace entity references with character references, eg. ü -> ü. This is 'safe' for simple documents without an internal subset; charrefs and entrefs can be used in the same places with the same meaning, except for some issues in the internal subset. 5. use a DOM implementation that supports EntityReference nodes, such as pxdom. Entity references with no replacement text (or all entity references if the DOM Level 3 LS parameter 'entities' is set) will exist as EntityReference DOM objects instead of being flattened to text. They can safely be reserialized as ü without the implementation having to know what text they represent. Entities are a big source of complication and confusion, which I wish had not made it into XML! -- Andrew Clover mailto:[EMAIL PROTECTED] http://www.doxdesk.com/ -- http://mail.python.org/mailman/listinfo/python-list
Re: CGI POST problem was: How to read POSTed data
Dan Perl wrote: > how is a multipart POST request parsed by CGIHTTPServer? It isn't; the input stream containing the multipart/form-data content is passed to the CGI script, which can choose to parse it or not using any code it has to hand - which could be the 'cgi' module, but not necessarily. > Where is the parsing done for the POST data following the header? If you are using the 'cgi' module, then cgi.parse_multipart. > As a side note, I found other old reports of problems with cgi > handling POST requests, reports that don't seem to have had a > resolution. (in particular?) FWIW, for interface-style and multipart-POST-file-upload-speed reasons I wrote an alternative to cgi, form.py (http://www.doxdesk.com/software/py/form.html). But I haven't found cgi's POST reading to be buggy in general. > There is even a bug reported just a few days ago (1112856) that is > exactly about multipart post requests. If I understand the bug > report correctly though, it is only on the latest version in CVS > and it states that what is in the 2.4 release works. That's correct. > All this tells me that it could be a "fragile" part in the standard > library. I don't really think so; it's really an old stable part of the library that is a bit crufty in places due to age. The patch that caused 1112856 was an attempt to rip out and replace the parser stuff, which as a big change to old code is bound to cause trouble. But that's what the dev cycle is for. CGIHTTPServer, on the other hand, I have never really trusted. I would suspect that fella. -- Andrew Clover mailto:[EMAIL PROTECTED] http://www.doxdesk.com/ -- http://mail.python.org/mailman/listinfo/python-list
Re: Octal notation: severe deprecation
John Machin wrote: > I regard continued usage of octal as a pox and a pestilence. Quite agree. I was disappointed that it ever made it into Python. Octal's only use is: a) umasks b) confusing the hell out of normal non-programmers for whom a leading zero is in no way magic (a) does not outweigh (b). In Mythical Future Python I would like to be able to use any base in integer literals, which would be better. Example random syntax: flags= 2x00011010101001 umask= 8x664 answer= 10x42 addr= 16x0E84 # 16x == 0x gunk= 36x8H6Z9A0X But either way, I want rid of 0->octal. > Is it not regretted? Maybe the problem just doesn't occur to people who have used C too long. OT: Also, if Google doesn't stop lstrip()ing my posts I may have to get a proper news feed. What use is that on a Python newsgroup? Grr. -- Andrew Clover mailto:[EMAIL PROTECTED] http://www.doxdesk.com/ -- http://mail.python.org/mailman/listinfo/python-list
Re: Looking for source preservation features in XML libs
Grzegorz Adam Hankiewicz <[EMAIL PROTECTED]> wrote: > I have looked at xml.minidom, elementtree and gnosis and haven't > found any such features. Are there libs providing these? pxdom (http://www.doxdesk.com/software/py/pxdom.html) has some of this, but I think it's still way off what you're envisaging. > One is to be able to tell which source file line a tag starts > and ends. You can get the file and line/column where a node begins in pxdom using the non-standard property Node.pxdomLocation, which returns a DOM Level 3 DOMLocator object, eg.: uri= node.pxdomLocation.uri line= node.pxdomLocation.lineNumber col= node.pxdomLocation.columnNumber There is no way to get the location of an Element's end-tag, however. Except guessing by looking at the positions of adjacent nodes, which is kind of cheating and probably not reliable. SAX processors can in theory use Locator information too, but AFAIK (?) this isn't currently implemented. > Another feature is to be able to save the processed XML code in a way > that unmodified tags preserve the original identation. Do you mean whitespace *inside* the start-tag? I don't know of any XML processor that will do anything but ignore whitespace here; in XML terms it is utterly insignificant and there is no place to store the information in the infoset or DOM properties. pxdom will preserve the *order* of the attributes, but even that is not required by any XML standard. > Or in the worst case, all identation is lost, but I can control to > some degree the outlook of the final XML output. The DOM Level 3 LS feature format-pretty-print (and PyXML's PrettyPrint) influence whitespace in content. However if you do want control of whitespace inside the tags themselves I don't know of any XML tools that will do it. You might have to write your own serializer, or hack it into a DOM implementation of your choice. -- Andrew Clover mailto:[EMAIL PROTECTED] http://www.doxdesk.com/ -- http://mail.python.org/mailman/listinfo/python-list
Re: Web forum (made by python)
Choe, Cheng-Dae wrote: > example site is http://bbs.pythonworld.net:9080/pybbs.py Since this seems quite happy to accept posted
Re: regex syntax
Andreas Volz <[EMAIL PROTECTED]> schrieb: > Ich hab mir schon überlegt einfach die letzten viel Stellen des > strings "per Hand" auf die Zeichenfolge zu vergleichen und so > regex zu umgehen. Aber ich muss es irgendwann ja doch mal nutzen "Muss"? stimme nicht zu! Regexps sind ja fuer begrenzte Zwecke eine gute Loesung, aber kein Basisteil der Programmierung. Bei diesem Beispiel waere: >>> filename.endswith('.jpg') viel besser als das vergleichbare Regexp: >>> re.match('.*\.jpg$', filename) -- Andrew Clover mailto:[EMAIL PROTECTED] http://www.doxdesk.com/ -- http://mail.python.org/mailman/listinfo/python-list