Re: codecs in a chroot / without fs access
On 1/9/12 16:41 , Philipp Hagemeister wrote: I want to forbid my application to access the filesystem. The easiest way seems to be chrooting and droping privileges. However, surprisingly, python loads the codecs from the filesystem on-demand, which makes my program crash: import os os.getuid() 0 os.chroot('/tmp') ''.decode('raw-unicode-escape') Traceback (most recent call last): File "", line 1, in (Interestingly, Python goes looking for the literal file "" in sys.path. Wonder what happens if I touch /usr/lib/python2.7/dist-packages/). Is there a neat way to solve this problem, i.e. have access to all codecs in a chroot? The traditional solution is to copy the data you want to make available into the subdirectory tree that will be used as the target of the chroot. If not, I'd love to have a function codecs.preload_all() that does what my workaround does: import codecs,glob,os.path encs = [os.path.splitext(os.path.basename(f))[0] for f in glob.glob('/usr/lib/python*/encodings/*.py')] for e in encs: try: codecs.lookup(e) except LookupError: pass # __init__.py or something enumerate /usr/lib/python.*/encodings/*.py and call codecs.lookup for every os.path.splitext(os.path.basename(filename))[0] Dou you see any problem with this design? Only the timing. If you're using the shell level chroot(1) program then you're already chroot'd before this can execute. If you're using os.chroot, then: a) you're unix specific b) your program must initially run as root c) you have to drop privilege yourself rather than letting something like chroot(1) handle it. As alternatives, you might consider building a root file system in a file and mounting it separately on a read-only basis. You can chroot into that without much worry of how it will affect your regular file system. With btrfs as root, you can create snapshots and chroot into those. You can even mount them separately, read-only if you like, before chrooting. The advantage of this approach is that the chroot target is built "automatically" in the sense that it's a direct clone of your underlying root file system, without allowing anything in the underlying root file system to be altered. Files can be changed, but since btrfs is copy-on-write, only the files in the snapshot will be changed. --rich -- http://mail.python.org/mailman/listinfo/python-list
Re: codecs in a chroot / without fs access
Another option is to copy the data to the a location under the new chroot and register a new lookup functions (http://docs.python.org/library/codecs.html#codecs.register). This way you can save some memory. -- http://mail.python.org/mailman/listinfo/python-list
Re: codecs, csv issues
On Aug 22, 11:52 pm, George Sakkis <[EMAIL PROTECTED]> wrote: > I'm trying to use codecs.open() and I see two issues when I pass > encoding='utf8': > > 1) Newlines are hardcoded to LINEFEED (ascii 10) instead of the > platform-specific byte(s). > > import codecs > f = codecs.open('tmp.txt', 'w', encoding='utf8') > s = u'\u0391\u03b8\u03ae\u03bd\u03b1' > print >> f, s > print >> f, s > f.close() This is documented behaviour: """ Note Files are always opened in binary mode, even if no binary mode was specified. This is done to avoid data loss due to encodings using 8- bit values. This means that no automatic conversion of '\n' is done on reading and writing. """ -- http://mail.python.org/mailman/listinfo/python-list
Re: codecs, csv issues
George Sakkis wrote: > I'm trying to use codecs.open() and I see two issues when I pass > encoding='utf8': > > 1) Newlines are hardcoded to LINEFEED (ascii 10) instead of the > platform-specific byte(s). > > import codecs > f = codecs.open('tmp.txt', 'w', encoding='utf8') > s = u'\u0391\u03b8\u03ae\u03bd\u03b1' > print >> f, s > print >> f, s > f.close() > > This doesn't happen for the default encoding (=None). > > 2) csv.writer doesn't seem to work as expected when being passed a > codecs object; it treats it as if encoding is ascii: > > import codecs, csv > f = codecs.open('tmp.txt', 'w', encoding='utf8') > s = u'\u0391\u03b8\u03ae\u03bd\u03b1' > # this works fine > print >> f, s > # this doesn't > csv.writer(f).writerow([s]) > f.close() > > Traceback (most recent call last): > ... > csv.writer(f).writerow([s]) > UnicodeEncodeError: 'ascii' codec can't encode character u'\u0391' in > position 0: ordinal not in range(128) > > Is this the expected behavior or are these bugs ? Looking into the documentation """ Note: This version of the csv module doesn't support Unicode input. Also, there are currently some issues regarding ASCII NUL characters. Accordingly, all input should be UTF-8 or printable ASCII to be safe; see the examples in section 9.1.5. These restrictions will be removed in the future. """ and into the source code if encoding is not None and \ 'b' not in mode: # Force opening of the file in binary mode mode = mode + 'b' I'd be willing to say that both are implementation limitations. Peter -- http://mail.python.org/mailman/listinfo/python-list
Re: codecs / subprocess interaction: utf help requested
On Jun 10, 6:10 pm, John Machin <[EMAIL PROTECTED]> wrote: > On Jun 11, 7:17 am, smitty1e <[EMAIL PROTECTED]> wrote: > > > The first print statement does what you'd expect. > > The second print statement has rather a lot of rat in it. > > The goal here is to write a function that will return the man page for > > some command (mktemp used as a short example here) as text to client > > code, where the groff markup will be chopped to extract all of the > > command options. Those options will eventually be used within an > > emacs mode, all things going swimmingly. > > I don't know what's going on with the piping in the second version. > > It looks like the output of p0 gets converted to unicode at some > > point, > > Whatever gave you that idea? > > > but I might be misunderstanding what's going on. The 4.8 > > codecs module documentation doesn't really offer much enlightment, > > nor google. About the only other place I can think to look would be > > the unit test cases shipped with python. > > Get your head out of the red herring factory; unicode, "utf" (which > one?) and codecs have nothing to do with your problem. Think about > looking at your own code and at the bzip2 documentation. > > > > > Sort of hoping one of the guru-level pythonistas can point to > > illumination, or write something to help out the next chap. This > > might be one of those catalytic questions, the answer to which tackles > > five other questions you didn't really know you had. > > Thanks, > > Chris > > --- > > #!/usr/bin/python > > import subprocess > > > p = subprocess.Popen(["bzip2", "-c", "-d", "/usr/share/man/man1/mktemp. > > 1.bz2"] > > , stdout=subprocess.PIPE) > > stdout, stderr = p.communicate() > > print stdout > > > p0 = subprocess.Popen(["cat","/usr/share/man/man1/mktemp.1.bz2"], > > stdout=subprocess.PIPE) > > p1 = subprocess.Popen(["bzip2"], stdin=p0.stdout, > > stdout=subprocess.PIPE) > > stdout, stderr = p1.communicate() > > print stdout > > --- > > You left out the command-line options for bzip2. The "rat" that you > saw was the result of compressing the already-compressed man page. > Read this:http://www.bzip.org/docs.html > which is a bit obscure. The --help output from my copy of an antique > (2001, v1.02) bzip2 Windows port explains it plainly: > """ >If invoked as `bzip2', default action is to compress. > as `bunzip2', default action is to decompress. > as `bzcat', default action is to decompress to stdout. > >If no file names are given, bzip2 compresses or decompresses >from standard input to standard output. > """ > > HTH, > John Don't I feel like the biggest dork on the planet. I had started with >cat /usr/share/man/man1/paludis.1.bz2 | bunzip2 then proceeded right to a self-foot-shoot when I went to python. *sigh* Thanks for the calibration, sir. Rm C -- http://mail.python.org/mailman/listinfo/python-list
Re: codecs / subprocess interaction: utf help requested
On Jun 11, 7:17 am, smitty1e <[EMAIL PROTECTED]> wrote: > The first print statement does what you'd expect. > The second print statement has rather a lot of rat in it. > The goal here is to write a function that will return the man page for > some command (mktemp used as a short example here) as text to client > code, where the groff markup will be chopped to extract all of the > command options. Those options will eventually be used within an > emacs mode, all things going swimmingly. > I don't know what's going on with the piping in the second version. > It looks like the output of p0 gets converted to unicode at some > point, Whatever gave you that idea? > but I might be misunderstanding what's going on. The 4.8 > codecs module documentation doesn't really offer much enlightment, > nor google. About the only other place I can think to look would be > the unit test cases shipped with python. Get your head out of the red herring factory; unicode, "utf" (which one?) and codecs have nothing to do with your problem. Think about looking at your own code and at the bzip2 documentation. > Sort of hoping one of the guru-level pythonistas can point to > illumination, or write something to help out the next chap. This > might be one of those catalytic questions, the answer to which tackles > five other questions you didn't really know you had. > Thanks, > Chris > --- > #!/usr/bin/python > import subprocess > > p = subprocess.Popen(["bzip2", "-c", "-d", "/usr/share/man/man1/mktemp. > 1.bz2"] > , stdout=subprocess.PIPE) > stdout, stderr = p.communicate() > print stdout > > p0 = subprocess.Popen(["cat","/usr/share/man/man1/mktemp.1.bz2"], > stdout=subprocess.PIPE) > p1 = subprocess.Popen(["bzip2"], stdin=p0.stdout, > stdout=subprocess.PIPE) > stdout, stderr = p1.communicate() > print stdout > --- You left out the command-line options for bzip2. The "rat" that you saw was the result of compressing the already-compressed man page. Read this: http://www.bzip.org/docs.html which is a bit obscure. The --help output from my copy of an antique (2001, v1.02) bzip2 Windows port explains it plainly: """ If invoked as `bzip2', default action is to compress. as `bunzip2', default action is to decompress. as `bzcat', default action is to decompress to stdout. If no file names are given, bzip2 compresses or decompresses from standard input to standard output. """ HTH, John -- http://mail.python.org/mailman/listinfo/python-list
Re: codecs - where are those on windows?
Fredrik Lundh schrieb: > > If your installation directory is C:\Python25, then look in > > C:\Python25\lib\encodings > > that's only the glue code. the actual data sets are provided by a bunch > of built-in modules: > >>> import sys > >>> sys.builtin_module_names > ('__builtin__', '__main__', '_ast', '_bisect', '_codecs', > '_codecs_cn', '_codecs_hk', '_codecs_iso2022', '_codecs_jp', > '_codecs_kr', '_codecs_tw', ... So, it should be possible to do a custom build of python24.dll / python25.dll without some of those codecs, resulting in a smaller python24.dll ? It will be some time untill my apps must support Chinese and Japanese... Harald -- http://mail.python.org/mailman/listinfo/python-list
Re: codecs - where are those on windows?
Paul Watson wrote: >> So, my question is: on Windows. where are those CJK codecs? Are they by >> any chance included in the 1.867.776 bytes of python24.dll ? > > If your installation directory is C:\Python25, then look in > > C:\Python25\lib\encodings that's only the glue code. the actual data sets are provided by a bunch of built-in modules: >>> import sys >>> sys.builtin_module_names ('__builtin__', '__main__', '_ast', '_bisect', '_codecs', '_codecs_cn', '_codecs_hk', '_codecs_iso2022', '_codecs_jp', '_codecs_kr', '_codecs_tw', ... -- http://mail.python.org/mailman/listinfo/python-list
Re: codecs - where are those on windows?
GHUM wrote: > I stumbled apon a paragraph in python-dev about "reducing the size of > Python" for an embedded device: > > """ > In my experience, the biggest gain can be obtained by dropping the > rarely-used > CJK codecs (for Asian languages). That should sum up to almost 800K > (uncompressed), IIRC. > """ > > So, my question is: on Windows. where are those CJK codecs? Are they by > any chance included in the 1.867.776 bytes of python24.dll ? > > Best wishes, > > Harald If your installation directory is C:\Python25, then look in C:\Python25\lib\encodings -- http://mail.python.org/mailman/listinfo/python-list
Re: codecs
* TK <[EMAIL PROTECTED]> wrote: > sys.stdout = codecs.lookup('utf-8')[-1](sys.stdout) > What does this line mean? "Wrap stdout with an UTF-8 stream writer". See the codecs module documentation for details. nd -- http://mail.python.org/mailman/listinfo/python-list
Re: Codecs
Ivan Van Laningham wrote: > > It seems to me that if I want to try to read an unknown file > using an exhaustive list of possible encodings ... Supposing such a list existed: What do you mean by "unknown file"? That the encoding is unknown? Possibility 1: You are going to try to decode the file from "legacy" to Unicode -- until the first 'success' (defined how?)? But the file could be decoded by *several* codecs into Unicode without an exception being raised. Just a simple example: the encodings ['iso-8859-' + x for x in '12459'] define *all* possible 256 characters. There are various language-guessing algorithms based on e.g. frequency of ngrams ... try Google. Possibility 2: You "know" the file is in a Unicode-encoding e.g. utf-8, have successfully decoded it to Unicode, and are going to try to encode the file in a "legacy" encoding but you don't know which one is appropriate? Sorry, same "But". -- http://mail.python.org/mailman/listinfo/python-list
Re: Codecs
Ivan Van Laningham wrote: > Hi All-- > As far as I can tell, after looking only at the documentation (and not > searching peps etc.), you cannot query the codecs to give you a list of > registered codecs, or a list of possible codecs it could retrieve for > you if you knew enough to ask for them by name. > > Why not? There are several answers to that question. Which of them is true, I don't know. In order of likelyhood: 1. When the API was designed, that functionality was forgotten. It was not possible to add it later on (because of 2) 2. Registration builds on the notion of lookup functions. The lookup function gets a codec name, and either succeeds in finding the codec, or raises an exception. Now, a lookup function, in principle, might not "know" in advance what codecs it supports, or the number of encoding it supports might not be finite. So asking such a lookup function for the complete list of codecs might not be implementable. As an example of a lookup function that doesn't know what encodings it supports, look at my iconv module. This module provides all codecs that iconv_open(3) supports, yet there is no standard way to query the iconv library in advance for a list of all supported codecs. As an example for a lookup function that supports an infinite number of codecs, consider the (theoretical) encrypt/password encoding, which encrypts a string with a password, and the password is part of the codec name. Each password defines a new encoding, and there is an infinite number of them. Now, if 1) would have been considered, it might have been possible to design the API in a way that didn't support all cases that the current API supports. Alas, somebody must have misplaced the time machine. Regards, Martin -- http://mail.python.org/mailman/listinfo/python-list