regular expressions and the LOCALE flag

2010-08-03 Thread Baz Walter
the python docs say that re.LOCALE makes certain character classes 
dependent on the current locale.


here's what i currently see on my system:

 import re, locale
 locale.getdefaultlocale()
('en_GB', 'UTF8')
 locale.getlocale()
(None, None)
 re.findall(r'\w', u'a b c \xe5 \xe6 \xe7', re.L)
[u'a', u'b', u'c']
 locale.setlocale(locale.LC_ALL, 'en_GB.ISO 8859-1')
'en_GB.ISO 8859-1'
 re.findall(r'\w', u'\xe5 \xe6 \xe7 a b c', re.L)
[u'\xe5', u'\xe6', u'\xe7', u'a', u'b', u'c']
 locale.setlocale(locale.LC_ALL, 'en_GB.UTF-8')
'en_GB.UTF-8'
 re.findall(r'\w', u'a b c \xe5 \xe6 \xe7', re.L)
[u'a', u'b', u'c']

it seems wrong to me that re.LOCALE fails to give the right result 
when the local encoding is utf8 - i think it should give the same result 
as re.UNICODE.


is this a bug, or does the documentation just need to be made clearer?
--
http://mail.python.org/mailman/listinfo/python-list


Re: regular expressions and the LOCALE flag

2010-08-03 Thread Baz Walter

On 03/08/10 19:40, MRAB wrote:

Baz Walter wrote:

the python docs say that re.LOCALE makes certain character classes
dependent on the current locale.


re.LOCALE just passes the character to the underlying C library. It
really only works on bytestrings which have 1 byte per character.


the re docs don't specify 8-bit encodings: they just refer to the 
'current locale'.



And, BTW, none of your examples pass a UTF-8 bytestring to re.findall:
all those string literals starting with the 'u' prefix are Unicode
strings!


not sure what you mean by this: if the string was encoded as utf8, '\w' 
still wouldn't match any of the non-ascii characters.



Locale encodings are more trouble than they're worth. Unicode is better.
:-)


yes, i'm really just trying to decide whether i should offer 'locale' as 
an option in my program. given the unintuitive way re.LOCALE works, i'm 
not sure that i should.


are you saying that it only really makes sense for *bytestrings* to be 
used with re.LOCALE?


if so, the re docs certainly don't make that clear.
--
http://mail.python.org/mailman/listinfo/python-list


Re: regular expressions and the LOCALE flag

2010-08-03 Thread Baz Walter

On 03/08/10 21:24, MRAB wrote:

And, BTW, none of your examples pass a UTF-8 bytestring to re.findall:
all those string literals starting with the 'u' prefix are Unicode
strings!


not sure what you mean by this: if the string was encoded as utf8,
'\w' still wouldn't match any of the non-ascii characters.


Strings with the 'u' prefix are Unicode strings, not bytestrings. They
don't have an encoding.


well, they do if they are given one, as i suggested!

to be explicit, if the local encoding is 'utf8', none of the following 
will get a hit:


(1) re.findall(r'\w', '\xe5 \xe6 \xe7', re.L)
(2) re.findall(r'\w', u'\xe5 \xe6 \xe7'.encode('utf8'), re.L)
(3) re.findall(r'\w', u'\xe5 \xe6 \xe7', re.L)

so i still don't know what you meant about passing a 'UTF-8 bytestring' 
in your first comment :)


only (3) could feasibly get a hit - and then only if the re module was 
smart enough to fall back to re.UNICODE for utf8 (and any other 
encodings of unicode it might know about).



2. LOCALE: bytestring with characters in the current locale (but only 1
byte per character). Characters are categorised according to the
underlying C library; for example, 'a' is a letter if isalpha('a')
returns true.


this is actually what my question was about. i suspected something like 
this might be the case, but i can't actually see it stated anywhere in 
the docs. maybe it's just me, but 'current locale' doesn't naturally 
imply 'only 8-bit encodings'. i would have thought it implied 'whatever 
encoding is discovered on the local system' - and these days, that's 
very commonly utf8.


is there actually a use case for it working the way it currently does? 
it seems just broken to have it depending so heavily on implementation 
details.



3. UNICODE (default in Python 3): Unicode string.


i've just read the python3 re docs, and they do now make an explicit 
distinction between matching bytes (with the new re.ASCII flag) and 
matching textual characters (i.e. unicode, the default). the re.LOCALE 
flag is still there, and there are now warnings about it's unreliability 
- but it still doesn't state that it can only work properly if the local 
encoding is 8-bit.

--
http://mail.python.org/mailman/listinfo/python-list


Re: strange interaction between open and cwd

2010-05-05 Thread Baz Walter

On 05/05/10 07:24, Nobody wrote:

On Wed, 05 May 2010 02:41:09 +0100, Baz Walter wrote:


i think the algorithm also can't guarantee the intended result when
crossing filesystem boundaries. IIUC, a stat() call on the root directory
of a mounted filesystem will give the same inode number as its parent.


Nope; it will have the same dev/inode pair as if it wasn't mounted, i.e.
the device will refer to the mounted device, not the device it's mounted
on, and the inode will be the mounted filesystem's root inode (typically
#2 for Linux ext2/ext3 filesystems).

And stat()ing the appropriate entry in the parent directory will return
the same information, i.e. the root inode of the mounted device, not the
subdirectory of the parent device (as you would see if the filesystem was
unmounted).


yes, that's actually what i meant (but probably put badly as usual).


IOW, if stat(foo) reports a different device to stat(.), foo
is a mount point, while if stat(..) reports a different device to
stat(.), the current directory is the root of a mounted filesystem.


so
if several filesystems are mounted in the same parent directory, there is
no way to tell which of them is the right one.


The only case which would cause a problem here is if you mount the same
device on two different subdirectories of a common directory. But in that
case, it doesn't really matter which answer you get, as they're both
equivalent in any sense that matters.


nope! just to be clear:

here's what i get on my system, where '/dev/sda1' and '/dev/sda6' are 
mounted at '/boot' and '/home' respectively:


 os.stat('/').st_ino
2L
 os.stat('/usr').st_ino
212993L
 os.stat('/boot').st_ino
2L
 os.stat('/home').st_ino
2L


if the algorithm is climbing up from '/home/baz/tmp/xxx', what does it 
do when it searches os.listdir('../../../..')? how can it tell whether 
'boot' or 'home' is the correct next parent if it only checks the inode 
number? i think the algorithm would at least need to take account of 
changes in the current device id. not sure whether that would be enough 
to cover all cases, though.

--
http://mail.python.org/mailman/listinfo/python-list


Re: strange interaction between open and cwd

2010-05-04 Thread Baz Walter

On 04/05/10 02:12, Ben Finney wrote:

Baz Walterbaz...@ftml.net  writes:


On 03/05/10 18:41, Grant Edwards wrote:

Firstly, a file may have any number of paths (including 0).


yes, of course. i forgot about hard links


Rather, you forgot that *every* entry that references a file is a hard
link.


i'm not a frequent poster on this list, but i'm aware of it's reputation 
for pointless pedantry ;-)


but what the heck, when in rome...

note that i said hard links (plural) - i think a more generous reader 
would assume i was referring to additional hard links.


--
http://mail.python.org/mailman/listinfo/python-list


Re: strange interaction between open and cwd

2010-05-04 Thread Baz Walter

On 04/05/10 03:19, Grant Edwards wrote:

On 2010-05-03, Baz Walterbaz...@ftml.net  wrote:

On 03/05/10 19:12, Grant Edwards wrote:

Even though the user provided a legal and openable path?


that sounds like an operational definition to me: what's the
difference between legal and openable?


Legal as in meets the syntactic requirements for a path (not sure if
there really are any requirements other than it being a
null-terminated string).  Openable meaning that it denotes a path file
that exists and for which the caller has read permissions on the file
and execute premissions on the directories within the path.


openable is not the same as accessible. a file can still openable, even 
though a user may not have permission to access it.


a better definition of legal path might be whether any useful 
information can be gained from a stat() call on it.

--
http://mail.python.org/mailman/listinfo/python-list


Re: strange interaction between open and cwd

2010-05-04 Thread Baz Walter

On 04/05/10 09:23, Gregory Ewing wrote:

Grant Edwards wrote:


In your example, it's simply not possible to determine the file's
absolute path within the filesystem given the relative path you
provided.


Actually, I think it *is* theoretically possible to find an
absolute path for the file in this case.

I suspect that what realpath() is doing for a relative path is
something like:

1. Use getcwd() to find an absolute path for the current
directory.
2. Chop off a trailing pathname component for each ..
on the front of the original path.
3. Tack the filename on the end of what's left.

Step 1 fails because the current directory no longer has
an absolute pathname -- specifically, it has no name in
what used to be its parent directory.

What realpath() is failing to realise is that it doesn't
actually need to know the full path of the current directory,
only of its parent directory, which is still reachable via
.. (if it weren't, the file wouldn't be reachable either,
and we wouldn't be having this discussion).

A smarter version of realpath() wouldn't try to find the
path of the *current* directory, but would follow the
.. links until it got to a directory that it did need to
know an absolute path for, and start with that.

Unfortunately, there is no C stdlib routine that does the
equivalent of getcwd() for an arbitrary directory, so
this would require realpath() to duplicate much of
getcwd()'s functionality, which is probably why it's
done the way it is.


actually, this part of the problem can be achieved using pure python. 
given the basename of a file, all you have to do is use os.stat and 
os.listdir to recursively climb up the tree and build a dirpath for it. 
start by doing os.stat(basename) to make sure you have a legal file in 
the current directory; then use os.stat('..') to get the parent 
directory inode, and stat each of the items in os.listdir('../..') to 
find a name matching that inode etc. (note that the possibility of 
hard-linked directories doesn't really spoil this - for relative paths, 
we don't care exactly which absolute path is found).


this will work so long as the file is in a part of the filesystem that 
can be traversed from the current directory to the root. what i'm not 
sure about is whether it's possible to cross filesystem boundaries using 
this kind of technique.


--
http://mail.python.org/mailman/listinfo/python-list


Re: strange interaction between open and cwd

2010-05-04 Thread Baz Walter

On 04/05/10 09:08, Gregory Ewing wrote:

Grant Edwards wrote:


except that Python objects can form a generalized graph, and Unix
filesystems are constrained to be a tree.


Actually I believe that root is allowed to create arbitrary
hard links to directories in Unix, so it's possible to turn
the file system in to a general graph. It's highly
unrecommended, though, because it confuses the heck out of
programs that recursively traverse directories (which is
why only root is allowed to do it).


i think there are versions of mac osx that use hard-linked directories 
in their backup systems.

--
http://mail.python.org/mailman/listinfo/python-list


Re: [OT] strange interaction between open and cwd

2010-05-04 Thread Baz Walter

On 04/05/10 03:25, Grant Edwards wrote:

On 2010-05-04, Charlesc.sand...@deletethis.bom.gov.au  wrote:


I don't see how it's inelegant at all.  Perhaps it's
counter-intuitive if you don't understand how a Unix filesystem
works, but the underlying filesystem model is very simple, regular,
and elegant.


but probably makes some bit of the OS's job slightly easier and is
usually good enough in practice. Pragmatism is a bitch sometimes. :-)




I agree that the Unix file system is quite elegant, but can be
counter-intuitive for people who are used to the one file, one name
paradigm.


I guess I've been using Unix for too long (almost 30 years).  I don't
think I was consciously aware of a one file, one name paradigm.  Is
that a characteristic of Dos, Windows or Mac filesystems?


[...]



In the OP's case, references to the directory have been removed from
the file system, but his process still has the current working
directory reference to it, so it has not actually been deleted. When
he opens ../abc.txt, the OS searches the current directory for ..
and finds the inode for /home/baz/tmp, then searches that directory
(/home/baz/tmp) for abc.txt and finds it.


Exactly.  I probably should have taken the time to explain that as
well as you did.  One forgets that there are a log of new Unix users
who've never been taught how the filesystem works.


actually, what i failed to grok is that whatever '..' refers to is all 
that is needed to find the file 'abc.txt' *even if the current directory 
has been deleted*. an absolute path just isn't needed. this has really 
got nothing to do with how unix filesystems work per se or one file, 
one name, but more to do with simple reference counting. when i 
mentioned in my original post how windows handles attempts to delete the 
cwd differently, i should have put two and two together and figured this 
all out for myself. but i don't think it's immediately obvious what the 
consequences of (re)moving the cwd are, even if you've been using unix 
filesystems for a while. in know for a fact that i have used several 
linux programs in the past that didn't handle this possiblity 
gracefully, so it's not just new users that can be caught out by this.

--
http://mail.python.org/mailman/listinfo/python-list


Re: strange interaction between open and cwd

2010-05-04 Thread Baz Walter

On 05/05/10 00:44, Nobody wrote:

On Tue, 04 May 2010 14:36:06 +0100, Baz Walter wrote:


this will work so long as the file is in a part of the filesystem that can
be traversed from the current directory to the root. what i'm not sure
about is whether it's possible to cross filesystem boundaries using this
kind of technique.


At least on Linux, the kernel fixes the links at mount points, i.e.
within the root directory of a mounted filesystem, .. refers to
the directory containing the mount point on the parent filesystem, while
the mount point refers to the root directory of the mounted filesystem.

This also appears to work correctly for bind mounts (mounting an arbitrary
directory to another directory, which results in a directory hierarchy
appearing at multiple locations within the filesystem), i.e. .. refers
to the appropriate directory for each instance.

OTOH, the algorithm can fail if a directory is moved (whether by rename()
or remounting) between the stat(..) and the listdir().


i think the algorithm also can't guarantee the intended result when 
crossing filesystem boundaries. IIUC, a stat() call on the root 
directory of a mounted filesystem will give the same inode number as its 
parent. so if several filesystems are mounted in the same parent 
directory, there is no way to tell which of them is the right one.

--
http://mail.python.org/mailman/listinfo/python-list


strange interaction between open and cwd

2010-05-03 Thread Baz Walter

Python 2.6.4 (r264:75706, Mar  7 2010, 02:18:40)
[GCC 4.4.1] on linux2
Type help, copyright, credits or license for more information.
 import os
 os.mkdir('/home/baz/tmp/xxx')
 f = open('/home/baz/tmp/abc.txt', 'w')
 f.write('abc')
 f.close()
 os.chdir('/home/baz/tmp/xxx')
 os.getcwd()
'/home/baz/tmp/xxx'
 os.rmdir(os.getcwd())
 os.getcwd()
Traceback (most recent call last):
  File stdin, line 1, in module
OSError: [Errno 2] No such file or directory
 open('../abc.txt').read()
'abc'


can anybody explain how python is able to read the file at the end of 
this session? i'm guessing it's a platform specific thing as i'm pretty 
sure the above sequence of commands wouldn't work on windows (i.e. 
attempting to remove the cwd would produce an error). but how can python 
determine the parent directory of a directory that no longer exists?


this actually caused a bug for me. i was trying to ensure that my 
program always resolved any file-names given on the command line by 
using os.path.realpath(). i had assumed that if realpath failed, then 
open would also fail - but not so!

--
http://mail.python.org/mailman/listinfo/python-list


Re: strange interaction between open and cwd

2010-05-03 Thread Baz Walter

On 03/05/10 14:18, Chris Rebert wrote:

Whether or not /home/baz/tmp/xxx/ exists, we know from the very
structure and properties of directory paths that its parent directory
is, *by definition*, /home/baz/tmp/ (just chop off everything after
the second-to-last slash). I would assume this is what happens
internally.
How exactly this interacts with, say, moving the directory to a new
location rather than deleting it, I don't know; again, it would quite
likely be platform-specific.


but how does '..' get resolved in the relative path '../abc.txt'? i'm 
assuming python must initially use getcwd() internally to do this, and 
then if that fails it falls back on something else. but what is that 
something else? is it something that is reproducible in pure python?

--
http://mail.python.org/mailman/listinfo/python-list


Re: strange interaction between open and cwd

2010-05-03 Thread Baz Walter

On 03/05/10 14:46, Peter Otten wrote:

Baz Walter wrote:


attempting to remove the cwd would produce an error). but how can python
determine the parent directory of a directory that no longer exists?


My tentative explanation would be that the directory, namely the inode,
still exists -- only the entry for it in its parent directory is gone.

So one level up from here is still a valid operation, but there is no
longer a path in the file system associated with here.


so here must always be available somehow, even if getcwd() fails 
(something like the environment variable $PWD). shame that 
os.getenv('PWD') isn't reliable, as it would solve my issue :(






--
http://mail.python.org/mailman/listinfo/python-list


Re: strange interaction between open and cwd

2010-05-03 Thread Baz Walter

On 03/05/10 14:56, Chris Rebert wrote:

but how does '..' get resolved in the relative path '../abc.txt'? i'm
assuming python must initially use getcwd() internally to do this, and then
if that fails it falls back on something else. but what is that something
else? is it something that is reproducible in pure python?


I would think that the OS system call, not Python itself, does the
relative-absolute conversion.


so there is a discrepancy between some of the std library path functions 
(like realpath, getcwd, abspath) and the built-in open function. there 
are files which can be opened for which it is impossible to resolve 
their full paths (on some platforms).

--
http://mail.python.org/mailman/listinfo/python-list


Re: strange interaction between open and cwd

2010-05-03 Thread Baz Walter

On 03/05/10 15:56, Grant Edwards wrote:

On 2010-05-03, Baz Walterbaz...@ftml.net  wrote:

On 03/05/10 14:46, Peter Otten wrote:

Baz Walter wrote:


attempting to remove the cwd would produce an error). but how can python
determine the parent directory of a directory that no longer exists?


My tentative explanation would be that the directory, namely the inode,
still exists -- only the entry for it in its parent directory is gone.

So one level up from here is still a valid operation, but there is no
longer a path in the file system associated with here.


so here must always be available somehow,


Yes.


even if getcwd() fails


If the current working directory doesn't _have_ a path within a
filesystem, what do you expect it to do?


well, i expect it to fail, like i said :)


(something like the environment variable $PWD). shame that
os.getenv('PWD') isn't reliable, as it would solve my issue :(


I don't understand what you mean by that.


i'm trying to understand how the path of the cwd can be known if there 
is no entry for it in the filesytem - but this is starting to get a 
little OT, so i won't pursue it here any longer.




--
http://mail.python.org/mailman/listinfo/python-list


Re: strange interaction between open and cwd

2010-05-03 Thread Baz Walter

On 03/05/10 15:55, Grant Edwards wrote:

On 2010-05-03, Baz Walterbaz...@ftml.net  wrote:

On 03/05/10 14:56, Chris Rebert wrote:

but how does '..' get resolved in the relative path '../abc.txt'? i'm
assuming python must initially use getcwd() internally to do this, and then
if that fails it falls back on something else. but what is that something
else? is it something that is reproducible in pure python?


I would think that the OS system call, not Python itself, does the
relative-absolute conversion.


so there is a discrepancy between some of the std library path functions
(like realpath, getcwd, abspath) and the built-in open function.


Not really.  There is a discrepancy between your perception and
expectations and the way the Unix filesystem works.


it's a fact that realpath/abspath/normpath etc can fail for paths that 
don't when used with os.stat or the builtin open function. i think it's 
reasonable to expect that a path that can be used to successfully open a 
file wont then produce No such file or directory errors when used with 
an os.path function like realpath. it shouldn't be necessary to have 
detailed knowledge of the underlying filesytem to be able to use os.path 
- it's supposed to be generic.



there are files which can be opened for which it is impossible to
resolve their full paths (on some platforms).


Sort of.  The file in question _has_ a full path, you just can't tell
what it is based on the path you used to open it.


yes, that's exactly what i was trying to demonstrate in my OP. i can use 
python to open a file; but under certain circumstances, there seems to 
be no guarantee that i can then use python to locate that file in the 
filesystem.

--
http://mail.python.org/mailman/listinfo/python-list


Re: strange interaction between open and cwd

2010-05-03 Thread Baz Walter

On 03/05/10 15:24, Grant Edwards wrote:

On 2010-05-03, Baz Walterbaz...@ftml.net  wrote:

On 03/05/10 14:18, Chris Rebert wrote:

Whether or not /home/baz/tmp/xxx/ exists, we know from the very
structure and properties of directory paths that its parent directory
is, *by definition*, /home/baz/tmp/ (just chop off everything after
the second-to-last slash). I would assume this is what happens
internally.
How exactly this interacts with, say, moving the directory to a new
location rather than deleting it, I don't know; again, it would quite
likely be platform-specific.


but how does '..' get resolved in the relative path '../abc.txt'?


The current directory has an entry named '..' that points to the
parent directory.


i'm assuming python must initially use getcwd() internally to do
this,


Nope.  Python just passes the string '../abc.txt' to libc's open()
function, and that in turn passes it on to the Unix/Linux open()
syscall, when follows the link in the current working directory named
'..'.


and then if that fails it falls back on something else. but what is
that something else? is it something that is reproducible in pure
python?


None of this has anything at all to do with Python.


i think what i'm asking for is a python function that, given, say, a 
valid file descriptor, can return the file's full path. would such a 
thing even be possible?

--
http://mail.python.org/mailman/listinfo/python-list


Re: strange interaction between open and cwd

2010-05-03 Thread Baz Walter

On 03/05/10 18:12, Grant Edwards wrote:

On 2010-05-03, Baz Walterbaz...@ftml.net  wrote:

Sort of.  The file in question _has_ a full path, you just can't tell
what it is based on the path you used to open it.


yes, that's exactly what i was trying to demonstrate in my OP. i can
use python to open a file; but under certain circumstances, there
seems to be no guarantee that i can then use python to locate that
file in the filesystem.


Exactly.

In your example, it's simply not possible to determine the file's
absolute path within the filesystem given the relative path you
provided.

You requested something that wasn't possible.  It failed.  What do you
think should have happened?


path = '../abc.txt'

os.path.realpath(path) - OSError: [Errno 2] No such file or directory

therefore:

open(path) - IOError: [Errno 2] No such file or directory

i think that if the first of these seemingly impossible requests 
fails, it is reasonable to expect that the second one also fails. but 
the second one (sometimes) doesn't.


i think they should always either both succeed, or both fail.
--
http://mail.python.org/mailman/listinfo/python-list


Re: strange interaction between open and cwd

2010-05-03 Thread Baz Walter

On 03/05/10 18:41, Grant Edwards wrote:

On 2010-05-03, Baz Walterbaz...@ftml.net  wrote:


i think what i'm asking for is a python function that, given, say, a
valid file descriptor, can return the file's full path.


Firstly, a file may have any number of paths (including 0).


yes, of course. i forgot about hard links - that pretty much kills that 
idea. oh well.

--
http://mail.python.org/mailman/listinfo/python-list


Re: strange interaction between open and cwd

2010-05-03 Thread Baz Walter

On 03/05/10 19:05, Chris Rebert wrote:

On Mon, May 3, 2010 at 10:45 AM, Baz Walterbaz...@ftml.net  wrote:

On 03/05/10 18:12, Grant Edwards wrote:

On 2010-05-03, Baz Walterbaz...@ftml.netwrote:

Sort of.  The file in question _has_ a full path, you just can't tell
what it is based on the path you used to open it.


yes, that's exactly what i was trying to demonstrate in my OP. i can
use python to open a file; but under certain circumstances, there
seems to be no guarantee that i can then use python to locate that
file in the filesystem.


Exactly.

In your example, it's simply not possible to determine the file's
absolute path within the filesystem given the relative path you
provided.

You requested something that wasn't possible.  It failed.  What do you
think should have happened?


path = '../abc.txt'

os.path.realpath(path) -  OSError: [Errno 2] No such file or directory

therefore:

open(path) -  IOError: [Errno 2] No such file or directory

i think that if the first of these seemingly impossible requests fails, it
is reasonable to expect that the second one also fails. but the second one
(sometimes) doesn't.

i think they should always either both succeed, or both fail.


Well, that's Unix and Worse-is-Better[1] for ya. Inelegant
theoretically, but probably makes some bit of the OS's job slightly
easier and is usually good enough in practice. Pragmatism is a bitch
sometimes. :-)


yeah, i probably should have added in an ideal world or something :)
--
http://mail.python.org/mailman/listinfo/python-list


Re: strange interaction between open and cwd

2010-05-03 Thread Baz Walter

On 03/05/10 19:12, Grant Edwards wrote:

On 2010-05-03, Baz Walterbaz...@ftml.net  wrote:


You requested something that wasn't possible.  It failed.  What do
you think should have happened?


path = '../abc.txt'

os.path.realpath(path) -  OSError: [Errno 2] No such file or directory

therefore:

open(path) -  IOError: [Errno 2] No such file or directory

i think that if the first of these seemingly impossible requests
fails, it is reasonable to expect that the second one also fails. but
the second one (sometimes) doesn't.


Because the second one isn't impossible in the case you posted.



i think they should always either both succeed, or both fail.


That's not how Unix filesystems work.

Are you saying that Python should add code to it's open() builtin
which calls realpath() and then refuses to open files for which
realpath() fails?


my original question was really about *how* python does the seemingly 
impossible. i had hoped there might be a way to augment realpath so 
that it would always work for any path which is openable. but apparently 
that is not possible for unix filesystems.


 Even though the user provided a legal and openable path?

that sounds like an operational definition to me: what's the difference 
between legal and openable?





--
http://mail.python.org/mailman/listinfo/python-list


universal newlines and utf-16

2010-04-11 Thread Baz Walter
i am using python 2.6 on a linux box and i have some utf-16 encoded 
files with crlf line-endings which i would like to open with universal 
newlines.


so far, i have been unable to get this to work correctly.

for example:

 open('test.txt', 'w').write(u'a\r\nb\r\n'.encode('utf-16'))
 repr(open('test.txt', 'rbU').read().decode('utf-16'))
u'a\\n\\nb\\n\\n'
 import codecs
 repr(codecs.open('test.txt', 'rbU', 'utf-16').read())
u'a\\n\\nb\\n\\n'

of course, the output i want is:

u'a\\nb\\n'

i suppose it's not too surprising that the built-in open converts the 
line endings before decoding, but it surprised me that codecs.open does 
this as well.


is there a way to get universal newlines to work properly with utf-16 files?

(nb: i'm not interested in other methods of converting line endings - 
just whether universal newlines can be made to work correctly).

--
http://mail.python.org/mailman/listinfo/python-list


Re: universal newlines and utf-16

2010-04-11 Thread Baz Walter

On 11/04/10 15:37, Stefan Behnel wrote:

The codecs module does not support universal newline parsing (see the
docs). You need to use the new io module instead.


thanks.

i'd completely overlooked the io module - i thought it was only in 
python 2.7/3.x.


--
http://mail.python.org/mailman/listinfo/python-list


pyc files not automatically compiled on import

2009-07-26 Thread Baz Walter

hello

i thought that python automatically compiled pyc files after a module is 
successfully imported. what could prevent this happening?



Python 2.6.1 (r261:67515, Apr 12 2009, 03:51:25)
[GCC 4.3.2] on linux2
Type help, copyright, credits or license for more information.
 import os
 os.mkdir('/home/baz/tmp/foo')
 os.chdir('/home/baz/tmp/foo')
 f = open('foo.py', 'w')
 f.write('print hello world\n')
 f.close()
 os.listdir('.')
['foo.py']
 import foo
hello world
 os.listdir('.') # why no pyc file?
['foo.py']
 import py_compile
 py_compile.compile('foo.py')
 os.listdir('.')
['foo.py', 'foo.pyc']

--
http://mail.python.org/mailman/listinfo/python-list


Re: pyc files not automatically compiled on import

2009-07-26 Thread Baz Walter

Peter Otten wrote:
You did not set the PYTHONDONTWRITEBYTECODE environment variable in a former 
life, or did you?


thanks peter

no i didn't, but i've just discovered a script in /etc/profile.d that 
did. now i'll have to try to find out how that script got in there :-|


--
http://mail.python.org/mailman/listinfo/python-list


Testing the new version of Psyco

2009-04-08 Thread Baz Walter

hello

i recently tried out this new version of psyco...

http://www.voidspace.org.uk/python/weblog/arch_d7_2009_03_14.shtml#e1063

...because of the new support for generators. the above link says To 
use and test generators, create preferences.py, following the 
instructions in setup.py - except there's nothing obvious in setup.py 
that refers to generators.


anyway, i created a preferences.py file (with PSYCO_DEBUG = 1) and ran 
psycobench against python-2.5/psyco-1.6 and python-2.6/psyco-2.0 with 
the following results (output has been cropped slightly):


# start output

[benchmark]$ python2.5 psycobench.py -m time_generators time_anyall
Running new timings with original psyco

send call loop 1000 plain: 3.30  psyco: 3.29  ratio: 1.00
send and loop 1000  plain: 3.28  psyco: 3.27  ratio: 1.00
send just many timesplain: 1.28  psyco: 0.56  ratio: 2.29 *
iterate just many times plain: 0.67  psyco: 0.57  ratio: 1.19 *
call next just many times   plain: 0.85  psyco: 0.63  ratio: 1.35 *
all_bool_genexp plain: 2.03  psyco: 2.31  ratio: 0.88
all_bool_listcomp   plain: 2.94  psyco: 1.01  ratio: 2.91
all_genexp  plain: 1.71  psyco: 1.97  ratio: 0.87
all_listcompplain: 2.70  psyco: 0.73  ratio: 3.71
all_loopplain: 1.09  psyco: 0.08  ratio: 13.15
any_bool_genexp plain: 2.03  psyco: 2.45  ratio: 0.83
any_bool_listcomp   plain: 2.89  psyco: 0.97  ratio: 2.99
any_genexp  plain: 1.74  psyco: 1.84  ratio: 0.95
any_listcompplain: 2.65  psyco: 0.72  ratio: 3.69
any_loopplain: 1.08  psyco: 0.08  ratio: 13.10

[benchmark]$ python2.6 psycobench.py -m time_generators time_anyall
Running new timings with python2.6/site-packages/psyco/_psyco.so

send call loop 1000 plain: 2.85  psyco: 0.04  ratio: 67.90
send and loop 1000  plain: 2.85  psyco: 0.04  ratio: 65.00
send just many timesplain: 1.17  psyco: 0.87  ratio: 1.35
iterate just many times plain: 0.64  psyco: 0.86  ratio: 0.74
call next just many times   plain: 0.77  psyco: 0.87  ratio: 0.89
all_bool_genexp plain: 1.87  psyco: 1.98  ratio: 0.95
all_bool_listcomp   plain: 2.54  psyco: 0.66  ratio: 3.84
all_genexp  plain: 1.69  psyco: 1.77  ratio: 0.95
all_listcompplain: 2.38  psyco: 0.52  ratio: 4.55
all_loopplain: 1.07  psyco: 0.08  ratio: 14.24
any_bool_genexp plain: 1.87  psyco: 1.99  ratio: 0.94
any_bool_listcomp   plain: 2.57  psyco: 0.68  ratio: 3.76
any_genexp  plain: 1.74  psyco: 1.82  ratio: 0.95
any_listcompplain: 2.47  psyco: 0.53  ratio: 4.66
any_loopplain: 1.07  psyco: 0.07  ratio: 14.65

# end output

with the obvious exception of the the first two tests for psyco v2, the 
results (for generators) seem a little underwhelming. in fact, psyco v1 
does significantly better on three of the tests (marked with an 
asterisk) and for the others there's not much difference.


this makes me wonder whether i enabled generators properly in psyco v2 
when i compiled it. has anybody else tried out this new version of psyco 
and got better overall results for generators? also, those first two 
results for v2 look really odd - what could explain the huge difference 
relative to the other generator tests for v2?


(p.s. i re-ran the tests and got very similar results)

--
Baz Walter

--
http://mail.python.org/mailman/listinfo/python-list


Does Python cache the startup module?

2008-01-07 Thread Baz Walter
Hello

I remember reading somewhere (probably this list) that python may cache the 
module that starts a program (e.g. 'main.py'). I'm asking because I have found 
that this can sometimes cause problems when making small edits to the module. 
For instance, in my current module I changed the name of the main gui widget. 
When I ran the program, the program started to leak memory like a sieve. I then 
changed the name back again, and the problem went away. This looks very much 
like some sort of weird caching behaviour to me.

I've tried deleting the .pyc file and even re-booting, but I can't make the 
problem go away!

Can anyone confirm that this caching happens? And if so, is it documented 
anywhere?

TIA


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Does Python cache the startup module?

2008-01-07 Thread Baz Walter
Guilherme Polo ggpolo at gmail.com writes:
 Uhm.. this didn't make much sense. If you say the module is cached,
 then supposing you did a minor edit, and then supposing because it is
 cached your application wouldn't detect the change, then I don't see
 the connection with memory leak.
 
 Bring some concrete proof.

Thanks for your reply.

It's hard to supply an example for this, since it is local to the machine I am 
using. The startup module would look something like this:

#!/usr/local/bin/python

if __name__ == '__main__':

import sys
from qt import QApplication, QWidget

application = QApplication(sys.argv)
mainwindow = QWidget()
application.setMainWidget(mainwindow)
mainwindow.show()
sys.exit(application.exec_loop())

If I change the name 'mainwindow' to 'mainwidget', the widget it refers to does 
not get destroyed; when I change it back again, it does get destroyed. 
Otherwise, the program runs completely normally.


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Does Python cache the startup module?

2008-01-07 Thread Baz Walter
Fredrik Lundh fredrik at pythonware.com writes:

 
 Baz Walter wrote:
 
  It's hard to supply an example for this, since it is local to the machine I 
am 
  using. The startup module would look something like this:
 
 would look, or does look?  if it doesn't look like this, what else does 
 it contain?

What I did was create a minimal version of the module which still exhibits the 
same behaviour (for me, that is). Basically, QWidget in the example below 
replaces my MainWindow class.
 
  #!/usr/local/bin/python
  
  if __name__ == '__main__':
  
  import sys
  from qt import QApplication, QWidget
  
  application = QApplication(sys.argv)
  mainwindow = QWidget()
  application.setMainWidget(mainwindow)
  mainwindow.show()
  sys.exit(application.exec_loop())
  
  If I change the name 'mainwindow' to 'mainwidget', the widget it refers to 
does 
  not get destroyed; when I change it back again, it does get destroyed. 
  Otherwise, the program runs completely normally.
 
 I don't see any code in there that destroys the widget, and I also don't 
 see any code in there that creates more than one instance of the main 
 widget.

Qt will try to destroy all widgets that are linked together in the object 
hierarchy.
 
 what do you do to run the code, and how to you measure leakage?

I run it with:

python app.py -widgetcount

The widgetcount switch is a Qt facility which counts how many widgets are 
created and how many destroyed.

Before changing the name 'mainwindow' to 'mainwidget' it reports:

Widgets left: 0Max widgets: 2
Widgets left: 0Max widgets: 149 (full program)

Afterwards it reports:

Widgets left: 1Max widgets: 2
Widgets left: 146Max widgets: 149 (full program)

 is the name mainwidget used for some other purpose in your application?

No



-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Does Python cache the startup module?

2008-01-07 Thread Baz Walter
Fredrik Lundh fredrik at pythonware.com writes:
 So what you're concerned about is the lack of cleanup during interpreter 
 shutdown, not a true leak (which would result in Max widgets growing 
 towards infinity).

Yes, sorry about my sloppy terminology.

 The problem here is that you're relying on Python's GC to remove things 
 in a given order during shutdown, something that may or may not happen
 depending on lots of things that you cannot control:
 
  http://www.python.org/doc/essays/cleanup/
 
 (Note the repeated use of In an order determined by the dictionary 
 hashing of the names in that article.)
 
 The only way to get *predictable* shutdown behaviour in situations like 
 this is to clean things up yourself.  But in this case, you might as 
 well leave the cleanup to the OS.

I think you've found the real cause. I changed the name from 'mainwidget' 
to 'aaa' and the problem went away. I will have to investigate whether there is 
anything I can do from the Qt side to make sure things get cleaned up more 
predictably.

Many Thanks




-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Does Python cache the startup module?

2008-01-07 Thread Baz Walter
John Machin sjmachin at lexicon.net writes:
 If you execute that stuff inside a function (see below) instead of in
 global scope, do you get the same effect?

Thanks for your reply.

No, running it inside a function means I can't rely on python to garbage 
collect, so there will be widgets left over whether I've changed names or not.

See Frederik Lundh's last message to see what's really happening.


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: How to enable bash mode at the interative mode?

2005-11-28 Thread Baz Walter
Anthony Liu antonyliu2002 at yahoo.com writes: 
 Look what I have: 
  
 $ python 
 Python 2.4.2 (#1, Nov 20 2005, 13:03:38)  
 [GCC 3.3.1 (Mandrake Linux 9.2 3.3.1-2mdk)] on linux2 
  
 Yes, I realize that I don't have readline module 
 available. 
  
 The same Mandrake system has Python 2.3 as well, and 
 it has the readline module. 
  
 I don't know how to install the readline module.  I 
 tried what was suggested from the newsgroup, but got 
 an error at make: 
  
 make: *** [Modules/readline.o] Error 1 
  
 Thanks 
 
You need to have the development stuff for readline installed before you 
compile Python 2.4.2 from source. For Mandrake 9.2, there should be an rpm 
package named something like libreadline*-devel available. Install that, 
recompile python, and you should be okay. 
 
HTH 
-- 
Baz 

-- 
http://mail.python.org/mailman/listinfo/python-list