Re: codecs in a chroot / without fs access

2012-01-10 Thread K Richard Pixley

On 1/9/12 16:41 , Philipp Hagemeister wrote:

I want to forbid my application to access the filesystem. The easiest
way seems to be chrooting and droping privileges. However, surprisingly,
python loads the codecs from the filesystem on-demand, which makes my
program crash:


import os
os.getuid()

0

os.chroot('/tmp')
''.decode('raw-unicode-escape')

Traceback (most recent call last):
   File "", line 1, in

(Interestingly, Python goes looking for the literal file "" in
sys.path. Wonder what happens if I touch
/usr/lib/python2.7/dist-packages/).

Is there a neat way to solve this problem, i.e. have access to all
codecs in a chroot?


The traditional solution is to copy the data you want to make available 
into the subdirectory tree that will be used as the target of the chroot.



If not, I'd love to have a function codecs.preload_all() that does what
my workaround does:

import codecs,glob,os.path
encs = [os.path.splitext(os.path.basename(f))[0]
 for f in glob.glob('/usr/lib/python*/encodings/*.py')]
for e in encs:
   try:
 codecs.lookup(e)
   except LookupError:
 pass # __init__.py or something


enumerate /usr/lib/python.*/encodings/*.py and call codecs.lookup for
every os.path.splitext(os.path.basename(filename))[0]

Dou you see any problem with this design?


Only the timing.  If you're using the shell level chroot(1) program then 
you're already chroot'd before this can execute.  If you're using 
os.chroot, then:


a) you're unix specific
b) your program must initially run as root
c) you have to drop privilege yourself rather than letting something 
like chroot(1) handle it.


As alternatives, you might consider building a root file system in a 
file and mounting it separately on a read-only basis.  You can chroot 
into that without much worry of how it will affect your regular file system.


With btrfs as root, you can create snapshots and chroot into those.  You 
can even mount them separately, read-only if you like, before chrooting. 
 The advantage of this approach is that the chroot target is built 
"automatically" in the sense that it's a direct clone of your underlying 
root file system, without allowing anything in the underlying root file 
system to be altered.  Files can be changed, but since btrfs is 
copy-on-write, only the files in the snapshot will be changed.


--rich
--
http://mail.python.org/mailman/listinfo/python-list


Re: codecs in a chroot / without fs access

2012-01-10 Thread Miki Tebeka
Another option is to copy the data to the a location under the new chroot and 
register a new lookup functions 
(http://docs.python.org/library/codecs.html#codecs.register). This way you can 
save some memory.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: codecs, csv issues

2008-08-22 Thread John Machin
On Aug 22, 11:52 pm, George Sakkis <[EMAIL PROTECTED]> wrote:
> I'm trying to use codecs.open() and I see two issues when I pass
> encoding='utf8':
>
> 1) Newlines are hardcoded to LINEFEED (ascii 10) instead of the
> platform-specific byte(s).
>
> import codecs
> f = codecs.open('tmp.txt', 'w', encoding='utf8')
> s = u'\u0391\u03b8\u03ae\u03bd\u03b1'
> print >> f, s
> print >> f, s
> f.close()

This is documented behaviour:
"""
Note
Files are always opened in binary mode, even if no binary mode was
specified. This is done to avoid data loss due to encodings using 8-
bit values. This means that no automatic conversion of '\n' is done on
reading and writing.
"""
--
http://mail.python.org/mailman/listinfo/python-list


Re: codecs, csv issues

2008-08-22 Thread Peter Otten
George Sakkis wrote:

> I'm trying to use codecs.open() and I see two issues when I pass
> encoding='utf8':
> 
> 1) Newlines are hardcoded to LINEFEED (ascii 10) instead of the
> platform-specific byte(s).
> 
> import codecs
> f = codecs.open('tmp.txt', 'w', encoding='utf8')
> s = u'\u0391\u03b8\u03ae\u03bd\u03b1'
> print >> f, s
> print >> f, s
> f.close()
> 
> This doesn't happen for the default encoding (=None).
> 
> 2) csv.writer doesn't seem to work as expected when being passed a
> codecs object; it treats it as if encoding is ascii:
> 
> import codecs, csv
> f = codecs.open('tmp.txt', 'w', encoding='utf8')
> s = u'\u0391\u03b8\u03ae\u03bd\u03b1'
> # this works fine
> print >> f, s
> # this doesn't
> csv.writer(f).writerow([s])
> f.close()
> 
> Traceback (most recent call last):
> ...
> csv.writer(f).writerow([s])
> UnicodeEncodeError: 'ascii' codec can't encode character u'\u0391' in
> position 0: ordinal not in range(128)
> 
> Is this the expected behavior or are these bugs ?

Looking into the documentation

"""
Note: This version of the csv module doesn't support Unicode input. Also,
there are currently some issues regarding ASCII NUL characters.
Accordingly, all input should be UTF-8 or printable ASCII to be safe; see
the examples in section 9.1.5. These restrictions will be removed in the
future. 
"""

and into the source code

if encoding is not None and \
   'b' not in mode:
# Force opening of the file in binary mode
mode = mode + 'b'

I'd be willing to say that both are implementation limitations.

Peter
--
http://mail.python.org/mailman/listinfo/python-list


Re: codecs / subprocess interaction: utf help requested

2007-06-10 Thread smitty1e
On Jun 10, 6:10 pm, John Machin <[EMAIL PROTECTED]> wrote:
> On Jun 11, 7:17 am, smitty1e <[EMAIL PROTECTED]> wrote:
>
> > The first print statement does what you'd expect.
> > The second print statement has rather a lot of rat in it.
> > The goal here is to write a function that will return the man page for
> > some command (mktemp used as a short example here) as text to client
> > code, where the groff markup will be chopped to extract all of the
> > command options.  Those options will eventually be used within an
> > emacs mode, all things going swimmingly.
> > I don't know what's going on with the piping in the second version.
> > It looks like the output of p0 gets converted to unicode at some
> > point,
>
> Whatever gave you that idea?
>
> > but I might be misunderstanding what's going on.  The 4.8
> > codecs  module documentation doesn't really offer much enlightment,
> > nor google.  About the only other place I can think to look would be
> > the unit test cases shipped with python.
>
> Get your head out of the red herring factory; unicode, "utf" (which
> one?) and codecs have nothing to do with your problem. Think about
> looking at your own code and at the bzip2 documentation.
>
>
>
> > Sort of hoping one of the guru-level pythonistas can point to
> > illumination, or write something to help out the next chap.  This
> > might be one of those catalytic questions, the answer to which tackles
> > five other questions you didn't really know you had.
> > Thanks,
> > Chris
> > ---
> > #!/usr/bin/python
> > import subprocess
>
> > p = subprocess.Popen(["bzip2", "-c", "-d", "/usr/share/man/man1/mktemp.
> > 1.bz2"]
> > , stdout=subprocess.PIPE)
> > stdout, stderr = p.communicate()
> > print stdout
>
> > p0 = subprocess.Popen(["cat","/usr/share/man/man1/mktemp.1.bz2"],
> > stdout=subprocess.PIPE)
> > p1 = subprocess.Popen(["bzip2"], stdin=p0.stdout,
> > stdout=subprocess.PIPE)
> > stdout, stderr = p1.communicate()
> > print stdout
> > ---
>
> You left out the command-line options for bzip2. The "rat" that you
> saw was the result of compressing the already-compressed man page.
> Read this:http://www.bzip.org/docs.html
> which is a bit obscure. The --help output from my copy of an antique
> (2001, v1.02) bzip2 Windows port explains it plainly:
> """
>If invoked as `bzip2', default action is to compress.
>   as `bunzip2',  default action is to decompress.
>   as `bzcat', default action is to decompress to stdout.
>
>If no file names are given, bzip2 compresses or decompresses
>from standard input to standard output.
> """
>
> HTH,
> John

Don't I feel like the biggest dork on the planet.
I had started with
>cat /usr/share/man/man1/paludis.1.bz2 | bunzip2
then proceeded right to a self-foot-shoot when I went to python.
*sigh*
Thanks for the calibration, sir.
Rm
C

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: codecs / subprocess interaction: utf help requested

2007-06-10 Thread John Machin
On Jun 11, 7:17 am, smitty1e <[EMAIL PROTECTED]> wrote:
> The first print statement does what you'd expect.
> The second print statement has rather a lot of rat in it.
> The goal here is to write a function that will return the man page for
> some command (mktemp used as a short example here) as text to client
> code, where the groff markup will be chopped to extract all of the
> command options.  Those options will eventually be used within an
> emacs mode, all things going swimmingly.
> I don't know what's going on with the piping in the second version.
> It looks like the output of p0 gets converted to unicode at some
> point,

Whatever gave you that idea?

> but I might be misunderstanding what's going on.  The 4.8
> codecs  module documentation doesn't really offer much enlightment,
> nor google.  About the only other place I can think to look would be
> the unit test cases shipped with python.

Get your head out of the red herring factory; unicode, "utf" (which
one?) and codecs have nothing to do with your problem. Think about
looking at your own code and at the bzip2 documentation.

> Sort of hoping one of the guru-level pythonistas can point to
> illumination, or write something to help out the next chap.  This
> might be one of those catalytic questions, the answer to which tackles
> five other questions you didn't really know you had.
> Thanks,
> Chris
> ---
> #!/usr/bin/python
> import subprocess
>
> p = subprocess.Popen(["bzip2", "-c", "-d", "/usr/share/man/man1/mktemp.
> 1.bz2"]
> , stdout=subprocess.PIPE)
> stdout, stderr = p.communicate()
> print stdout
>
> p0 = subprocess.Popen(["cat","/usr/share/man/man1/mktemp.1.bz2"],
> stdout=subprocess.PIPE)
> p1 = subprocess.Popen(["bzip2"], stdin=p0.stdout,
> stdout=subprocess.PIPE)
> stdout, stderr = p1.communicate()
> print stdout
> ---

You left out the command-line options for bzip2. The "rat" that you
saw was the result of compressing the already-compressed man page.
Read this:
http://www.bzip.org/docs.html
which is a bit obscure. The --help output from my copy of an antique
(2001, v1.02) bzip2 Windows port explains it plainly:
"""
   If invoked as `bzip2', default action is to compress.
  as `bunzip2',  default action is to decompress.
  as `bzcat', default action is to decompress to stdout.

   If no file names are given, bzip2 compresses or decompresses
   from standard input to standard output.
"""

HTH,
John

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: codecs - where are those on windows?

2006-11-04 Thread GHUM

Fredrik Lundh schrieb:

> > If your installation directory is C:\Python25, then look in
> > C:\Python25\lib\encodings
>
> that's only the glue code.  the actual data sets are provided by a bunch
> of built-in modules:
>  >>> import sys
>  >>> sys.builtin_module_names
> ('__builtin__', '__main__', '_ast', '_bisect', '_codecs',
> '_codecs_cn', '_codecs_hk', '_codecs_iso2022', '_codecs_jp',
> '_codecs_kr', '_codecs_tw', ...

So, it should be possible to do a custom build of python24.dll /
python25.dll without some of those codecs, resulting in a smaller
python24.dll ?

It will be some time untill my apps must support Chinese and
Japanese...

Harald

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: codecs - where are those on windows?

2006-10-30 Thread Fredrik Lundh
Paul Watson wrote:

>> So, my question is: on Windows. where are those CJK codecs? Are they by
>> any chance included in the 1.867.776 bytes of python24.dll ?
> 
> If your installation directory is C:\Python25, then look in
> 
> C:\Python25\lib\encodings

that's only the glue code.  the actual data sets are provided by a bunch 
of built-in modules:

 >>> import sys
 >>> sys.builtin_module_names
('__builtin__', '__main__', '_ast', '_bisect', '_codecs',
'_codecs_cn', '_codecs_hk', '_codecs_iso2022', '_codecs_jp',
'_codecs_kr', '_codecs_tw', ...



-- 
http://mail.python.org/mailman/listinfo/python-list


Re: codecs - where are those on windows?

2006-10-30 Thread Paul Watson
GHUM wrote:
> I stumbled apon a paragraph in python-dev about "reducing the size of
> Python" for an embedded device:
> 
> """
> In my experience, the biggest gain can be obtained by dropping the
> rarely-used
> CJK codecs (for Asian languages). That should sum up to almost 800K
> (uncompressed), IIRC.
> """
> 
> So, my question is: on Windows. where are those CJK codecs? Are they by
> any chance included in the 1.867.776 bytes of python24.dll ?
> 
> Best wishes,
> 
> Harald

If your installation directory is C:\Python25, then look in

C:\Python25\lib\encodings
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: codecs

2005-11-15 Thread André Malo
* TK <[EMAIL PROTECTED]> wrote:

> sys.stdout = codecs.lookup('utf-8')[-1](sys.stdout)
> What does this line mean?

"Wrap stdout with an UTF-8 stream writer".
See the codecs module documentation for details.

nd
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Codecs

2005-07-10 Thread John Machin
Ivan Van Laningham wrote:
> 
> It seems to me that if I want to try to read an unknown file
> using an exhaustive list of possible encodings ...


Supposing such a list existed:

What do you mean by "unknown file"? That the encoding is unknown?

Possibility 1:
You are going to try to decode the file from "legacy" to Unicode -- 
until the first 'success' (defined how?)? But the file could be decoded 
by *several* codecs into Unicode without an exception being raised. Just 
a simple example: the encodings ['iso-8859-' + x for x in '12459'] 
define *all* possible 256 characters.

There are various language-guessing algorithms based on e.g. frequency 
of ngrams ... try Google.

Possibility 2:
You "know" the file is in a Unicode-encoding e.g. utf-8, have 
successfully decoded it to Unicode, and are going to try to encode the 
file in a "legacy" encoding but you don't know which one is appropriate?
Sorry, same "But".



-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Codecs

2005-07-10 Thread Martin v. Löwis
Ivan Van Laningham wrote:
> Hi All--
> As far as I can tell, after looking only at the documentation (and not
> searching peps etc.), you cannot query the codecs to give you a list of
> registered codecs, or a list of possible codecs it could retrieve for
> you if you knew enough to ask for them by name.
> 
> Why not? 

There are several answers to that question. Which of them is true,
I don't know. In order of likelyhood:
1. When the API was designed, that functionality was forgotten.
   It was not possible to add it later on (because of 2)
2. Registration builds on the notion of lookup functions. The
   lookup function gets a codec name, and either succeeds in
   finding the codec, or raises an exception.
   Now, a lookup function, in principle, might not "know" in
   advance what codecs it supports, or the number of encoding
   it supports might not be finite. So asking such a lookup
   function for the complete list of codecs might not be
   implementable.

   As an example of a lookup function that doesn't know what
   encodings it supports, look at my iconv module. This module
   provides all codecs that iconv_open(3) supports, yet there
   is no standard way to query the iconv library in advance
   for a list of all supported codecs.

   As an example for a lookup function that supports an infinite
   number of codecs, consider the (theoretical) encrypt/password
   encoding, which encrypts a string with a password, and the
   password is part of the codec name. Each password defines
   a new encoding, and there is an infinite number of them.

Now, if 1) would have been considered, it might have been possible
to design the API in a way that didn't support all cases that
the current API supports. Alas, somebody must have misplaced
the time machine.

Regards,
Martin
-- 
http://mail.python.org/mailman/listinfo/python-list