Re: Inverse confusion about floating point precision

2005-05-09 Thread Tim Peters
[Dan] 
>Dan> The floating-point representation of 95.895 is exactly
>Dan> 6748010722917089 * 2**-46.

[Skip Montanaro]
> I seem to recall seeing some way to extract/calculate fp representation from
> Python but can't find it now.  I didn't see anything obvious in the
> distribution.

For Dan's example,

>>> import math
>>> math.frexp(95.895)
(0.7491796874997, 7)
>>> int(math.ldexp(_[0], 53))
6748010722917089L
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: processing a Very Large file

2005-05-17 Thread Tim Peters
[DJTB]
> I'm trying to manually parse a dataset stored in a file. The data should be
> converted into Python objects.
>
> Here is an example of a single line of a (small) dataset:
> 
> 3 13 17 19 -626177023 -1688330994 -834622062 -409108332 297174549 955187488 
> 589884464 -1547848504 857311165 585616830 -749910209 194940864 -1102778558 
> -1282985276 -1220931512 792256075 -340699912 1496177106 1760327384 
> -1068195107 95705193 1286147818 -416474772 745439854 1932457456 -1266423822 
> -1150051085 1359928308 129778935 1235905400 532121853
> 
> The first integer specifies the length of a tuple object. In this case, the
> tuple has three element: (13, 17, 19)
> The other values (-626177023 to 532121853) are elements of a Set.
>
> I use the following code to process a file:
> 
> from time import time
> from sets import Set
> from string import split

Note that you don't use string.split later.

> file = 'pathtable_ht.dat'
> result = []
> start_time = time ()
> f=open(file,'r')
> for line in f:
>splitres = line.split()

Since they're all integers, may as well:

splitres = map(int, line.split())

here and skip repeated int() calls later.

>tuple_size = int(splitres[0])+1
>path_tuple = tuple(splitres[1:tuple_size])
>conflicts = Set(map(int,splitres[tuple_size:-1]))

Do you really mean to throw away the last value on the line?  That is,
why is the slice here [tuple_size:-1] rather than [tuple_size:]?

># do something with 'path_tuple' and 'conflicts'
># ... do some processing ...
>result.append(( path_tuple, conflicts))
>
> f.close()
> print time() - start_time
> 
> The elements (integer objects) in these Sets are being shared between the
> sets, in fact, there are as many distinct element as there are lines in the
> file (eg 1000 lines -> 1000 distinct set elements). AFAIK, the elements are
> stored only once and each Set contains a pointer to the actual object

Only "small" integers are stored uniquely; e.g., these aren't:

>>> 100 * 100 is 100 * 100
False
>>> int("12345") is int("12345")
False

You could manually do something akin to Python's "string interning" to
store ints uniquely, like:

int_table = {}
def uniqueint(i):
return int_table.setdefault(i, i)

Then, e.g.,

>>> uniqueint(100 * 100) is uniqueint(100 * 100) 
True
>>> uniqueint(int("12345")) is uniqueint(int("12345"))
True

Doing Set(map(uniqueint, etc)) would then feed truly shared int
(and/or long) objects to the Set constructor.

> This works fine with relatively small datasets, but it doesn't work at all
> with large datasets (4500 lines, 45000 chars per line).

Well, chars/line doesn't mean anything to us.  Knowing # of set
elements/line might help.  Say there are 4500 per line.  Then you've
got about 20 million integers.  That will consume at least several 100
MB if you don't work to share duplicates.  But if you do so work, it
should cut the memory burden by a factor of thousands.

> After a few seconds of loading, all main memory is consumed by the Python
> process and the computer starts swapping. After a few more seconds, CPU
> usage drops from 99% to 1% and all swap memory is consumed:
>
> Mem:386540k total,   380848k used, 4692k free,  796k buffers
> Swap:   562232k total,   562232k used,0k free,27416k cached
>
> At this point, my computer becomes unusable.
>
> I'd like to know if I should buy some more memory (a few GB?) or if it is
> possible to make my code more memory efficient.

See above for the latter.  If you have a 32-bit processor, you won't
be able to _address_ more than a few GB anyway.  Still, 384MB of RAM
is on the light side these days .
-- 
http://mail.python.org/mailman/listinfo/python-list


RE: [ZODB-Dev] ZODB memory problems (was: processing a Very Largefile)

2005-05-22 Thread Tim Peters
[Jeremy Hylton]
> ...
> The ObjectInterning instance is another source of problem, because it's
> a dictionary that has an entry for every object you touch.

Some vital context was missing in this post.  Originally, on c.l.py, DJTB
wasn't using ZODB at all.  In effect, he had about 5000 lists each
containing about 5000 "not small" integers, so Python created about 5000**2
= 25 million integer objects to hold them all, consuming 100s of megabytes
of RAM.  However, due to the semantics of the application, there were only
about 5000 _distinct_ integers.  What became the `ObjectInterning` class
here started as a suggestion to keep a dict of the distinct integers,
effectively "intern"ing them.  That cut the memory use by a factor of
thousands.

This has all gotten generalized and micro-optimized to the point that I
can't follow the code anymore.  Regardless, the same basic trick won't work
with ZODB (or via any other way of storing the data to disk and reading it
up again later):  if we write the same "not small" integer object out
100 times, then read them all back in, Python will again create 100
distinct integer objects to hold them.  Object identity doesn't survive for
"second class" persistent objects, and interning needs to be applied again
_every_ time one is created.

[DJTB]
> ... The only thing I can't change is that ExtendedTuple inherits
> from tuple

Let me suggest that you may be jumping in at the deep ends of too many pools
at once here.

> class ExtendedTuple(tuple):
>
>def __init__(self, els):
>tuple.__init__(self,els)

That line doesn't accomplish anything:  tuples are immutable, and by the
time __init__ is called the tuple contents are already set forever.  You
should probably be overriding tuple.__new__ instead.

> ...
>def __hash__(self):
>return hash(tuple(self))

This method isn't needed.  If you leave it out, the base class
tuple.__hash__ will get called directly, and will compute the same result.

-- 
http://mail.python.org/mailman/listinfo/python-list


RE: [ZODB-Dev] ZODB memory problems (was: processing a Very Largefile)

2005-05-24 Thread Tim Peters
[Jeremy Hylton]
> ...
> It looks like your application has a single persistent instance -- the
> root ExtendedTupleTable -- so there's no way for ZODB to manage the
> memory.  That object and everything reachable from it must be in memory
> at all times.

Indeed, I tried running this program under ZODB 3.4b1 on Windows, and
about 4% of the way done it dies during one of the subtransaction
commits, with a StackError:  the object is so sprawling that a
megabyte C stack is blown by recursion while trying to serialize it.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Python Challenge 10?

2005-05-26 Thread Tim Peters
[Greg Ewing]
> Can someone give me a hint for No. 10? My MindBlaster
> card must be acting up -- I can't seem to tune into
> the author's brain waves on this one.

There are hints on the site; for level 10,

http://www.pythonchallenge.com/forums/viewtopic.php?t=20

> I came up with what I thought was a perfectly good
> solution, but apparently it's wrong. :-(

The On-Line Encyclopedia of Integer Sequences should be better known
-- it's an amazing resource:

http://www.research.att.com/~njas/sequences/

It knows about this sequence, so don't use it unless you want the
answer given to you.  If it doesn't know about your sequence,
"perfectly good" is debatable .
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Strange KeyError using cPickle

2005-06-01 Thread Tim Peters
[Rune Strand]
> I'm experiencing strange errors both with pickle and cPickle in the
> below code:
>
>
> import cPickle as pickle
> #import pickle
> from string import ascii_uppercase
> from string import ascii_lowercase
> 
> def createData():
>d1 = list("Something's rotten")
>d2 = tuple('in the state of Denmark')
> 
>d3 = [('a', 'b'), ('c', 'd')]
>#d3 = [('s a', 's b'), ('s c', 's d')]
>#d3 = [('sa', 'sb'), ('sc', 'sd')]
>#d3 = [['s a', 's b'], ['s c', 's d']]
> 
>d4 = dict(zip(ascii_uppercase,ascii_lowercase))
>return [d1, d2, d3, d4]
> 
> def doPickle(data, pickleFile = 'pickleTest.p', proto = 2):
>f = XWwz(pickleFile, 'w')

What is "XWwz"?  Assuming it's a bizarre typo for "open", change the
'w' there to 'wb'.  Pickles are binary data, and files holding pickles
must be opened in binary mode, especially since:

> ...
> (on WinXP, CPython 2.4.1)
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Strange KeyError using cPickle

2005-06-01 Thread Tim Peters
[Tim Peters]
>> What is "XWwz"?  Assuming it's a bizarre typo for "open", change the
>> 'w' there to 'wb'.  Pickles are binary data, and files holding pickles
>> must be opened in binary mode, especially since:
>>

>>> ...
>>> (on WinXP, CPython 2.4.1)

[Rune Strand]
> Thanks Tim. The bizarre 'typo' appears to be caused by ad-blocking
> software confusing python code with javascript (i think).
>
> I had the feeling this was a red facer.
>
> Setting the protocol to 0 (text) also  make it work.

Not really.  Repeat:  pickles are binary data, and files holding
pickles must be opened in binary mode.  The horribly named "text mode"
pickles (and this is one reason they're called "protocol 0" now
instead) are binary data too.  Pickles created with protocol 0 may
also fail if transported across platforms, if written to a file opened
in text mode.  Pickle files should always be opened in binary mode,
regardless of pickle protocol, and regardless of platform.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: dictionaries and threads

2005-06-01 Thread Tim Peters
[Gary Robinson]
> I know the Global Interpreter Lock ensures that  only one python thread
> has access to the interpreter at a time, which prevents a lot of
> situations where one thread might step on another's toes.

Not really.  The CPython implementation's C code relies on the GIL in
many ways to provide mutual exclusion, but there's no such effect at
the Python level: operations from multiple Python threads can get
interleaved in any order consistent with sequential execution within
each thread.  The guarantee that the builtin datatypes provide "not to
go insane" in the absence of application synchronization is really a
separate matter, _aided_ in CPython by the GIL, but not guaranteed by
the GIL alone.  There's still an enormous amount of delicate fiddling
in Python's C code to keep internals sane in the absence of
application locking.

> But I'd like to ask about a specific situation just to be sure  I
> understand things relative to some code I'm writing.
>
> I've got a dictionary which is accessed by several threads at the same
> time (that is, to the extent that the GIL allows). The thing is,
> however, no two threads will ever be accessing the same dictionary
> items at the same time. In fact the thread's ID from thread.get_ident()
> is the key to the dictionary; a thread only modifies items
> corresponding to its own thread ID. A thread will be adding an item
> with its ID when it's created, and deleting it before it exits, and
> modifying the item's value in the meantime.
>
> As far as I can tell, if the Python bytecodes that cause dictionary
> modifications are atomic, then there should be no problem. But I don't
> know that they  are because I haven't looked at the bytecodes.

It's unclear what kind of problem you're concerned about.  The kind of
dict you described is widely used in Python apps; for example, it's at
the heart of the implementation of threading.currentThread() in the
standard library.  That dicts promise not to go insane in the absence
of locking isn't a GIL issue, even if multiple threads do modify d[k]
simultaneously (in the absence of app locking, all such threads will
eventually change d[k], but in an undefined order; if no two threads
can muck with the same k, there's no problem at all).

> Any feedback on this would be appreciated. For various reasons, we're
> still using Python 2.3 for the time being.

Best practice in 2.3 is to subclass threading.Thread for all threads
you use.  Then instead of mucking with a dict, you can just
set/retrieve an attribute on the current Thread object.  You can get
the currently active Thread at any time via calling
threading.currentThread().
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Absolultely confused...

2005-10-06 Thread Tim Peters
[Jeremy Moles]
> ...
> I may be missing something critical here, but I don't exactly grok what
> you're saying; how is it even possible to have two instances of
> PyType_vector3d? It is (like all the examples show and all the extension
> modules I've done in the past) a static structure declared and assigned
> to all at once, only once.

The most common cause is inconsistent import statements, with one
place using package-relative import but another place explicitly
specifying the package.  Here's a simple pure Python example, with
this directory structure:

playground/
package_dir/
__init__.py
source.py
module.py

__init__.py is empty; it just serves to make `package_dir` a package.

source.py defines a single type, named Type:

class Type(object):
pass

module.py imports source.Type in two different ways, then prints various stuff:

from source import Type as one_way # package-relative import
from package_dir.source import Type as another_way

print one_way is another_way
print one_way.__name__, another_way.__name__
print repr(one_way), repr(another_way)

import sys
for k, v in sys.modules.items():
 if "source" in k:
 print k, v

Now _from_ playground, run module.py:

python playground/module.py

This is the output, with annotations; I ran this on Windows, so expect
to see backslashes :

False

That is, one_way is not the same object as another_way:  there are two
distinct instances of the Type object.

Type Type

Although they're distinct, they have the same __name__, "Type".

 

Their reprs differ, though, showing the path via which they were imported.

source 
package_dir.source 

That's two lines of output from crawling over sys.modules:  there are
two distinct instances of the entire `source` module.  That's the real
cause of the multiple `Type` instances.

package-relative import is rarely a good idea.  Are you doing that anywhere?
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Merging sorted lists/iterators/generators into one stream of values...

2005-10-08 Thread Tim Peters
[Alex Martelli]
>>> try it (and read the Timbot's article included in Python's sources, and the
>>> sources themselves)...

[Kay Schluehr]
>> Just a reading advise. The translated PyPy source
>> pypy/objectspace/listsort.py might be more accessible than the
>> corresponding C code.

[cfbolz]
> indeed. it is at
>
> http://codespeak.net/svn/pypy/dist/pypy/objspace/std/listsort.py

While the Python version is certainly easier to read, I believe Alex
had in mind the detailed English _explanation_ of the algorithm:

http://cvs.sf.net/viewcvs.py/python/python/dist/src/Objects/listsort.txt

It's a complex algorithm, dripping with subtleties that aren't
apparent from staring at an implementation.

Note that if a list has N elements, sorting it requires at least N-1
comparisons, because that's the minimum number of compares needed
simply to determine whether or not it's already sorted.  A heap-based
priority queue never requires more than O(log(N)) compares to push or
pop an element.  If N is small, it shouldn't make much difference.  As
N gets larger, the advantage of a heap grows without bound.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Hidden string formatting bug

2005-10-13 Thread Tim Peters
[Echo]
> I have been trying to figure out the problem with this string formatting:

[monstrous statement snipped]

> when it executes, I get this error: "inv argument required".

That shoud be "int", not "inv".

> I have checked and rechecked both the string and the tuple. I cant figure
> out what the problem is.  After playing around with it, i found out if change
> the last line to: "%s,%i) %(... I get a different error. The error is "not all
> arguments converted during string formatting".
>
> So I am baffled and confused as of why this wont work. Is anyone able to
> shed some light on my hidden bug?

Yup, it's precedence.  Here's the same thing:

sql = "%s" + \
  "%i" % ("abc", 2)

Run that and you get "int argument required".  Change the last %i to
%s and you get "not all arguments converted during string formatting".
 Both error messages are accurate.

That's because % binds tighter than "+".  As a 1-liner, and inserting
redundant parentheses to make the precedence obvious, my simplified
example is

sql = "%s" + ("%i" % ("abc", 2))

What you intended _requires_ inserting more parentheses to force the
intended order (if you're going to stick to this confusing coding
style):

sql = ("%s" + "%i") % ("abc", 2)

Better is to not use "+" on string literals, and not use backslash
continuation either; try this instead:

sql_format = ('string1'
'string2'
'string3'
...
'last string')

Build the format in a separate statement to reduce the chance of
errors (for example, your original mistake would have been impossible
to make this way).  Use parens instead of backslashes.  Don't use "+"
to catenate string literals:  the Python compiler automatically
catenates adjacent string literals for you, and at compile-time (so
it's even more efficient than using '"+").
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Queue question

2005-10-17 Thread Tim Peters
[Alex Martelli]
> ...
> not_empty and not_full are not methods but rather instances of the
> threading.Condition class, which gets waited on and notified
> appropriately.  I'm not entirely sure exactly WHAT one is supposed to do
> with the Condition instances in question (I'm sure there is some design
> intent there, because their names indicate they're public); presumably,
> like for the Lock instance named 'mutex', they can be used in subclasses
> that do particularly fiendish things... but I keep planning not to cover
> them in the 2nd edition of the Nutshell (though there I _will_ cover the
> idea of subclassing Queue to implement queueing disciplines other than
> FIFO without needing to worry about synchronization, which I had skipped
> in the 1st edition).

Last time it was rewritten, I put as much thought into the names of
Queue instance variables as Guido put into them originally:  none. 
They have always "looked like" public names, but I'm at a loss to
think of anythng sane a Queue client or extender could do with them. 
Of course that's why the docs don't mention them.  I suppose an
extender could make good use of `mutex` if they wanted to add an
entirely new method, say:

def get_and_put_nowait(self, new_item) :
"""Pop existing item, and push new item, atomically [blah blah blah]."""

I don't know of anyone who has done so, though.  I kept the name
`mutex` intact during the last rewrite, but didn't hesitate to get rid
of the former `esema` and `fsema` attributes.  Since no complaints
resulted, it's a pretty safe bet nobody was mucking with esema or
fsema before.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: wierd threading behavior

2005-10-18 Thread Tim Peters
[Qun Cao]
>> import thread
>> def main():
>> thread.start_new(test.())
>>
>> def test():
>> print 'hello'
>>
>> main()
>> "
>> this program doesn't print out 'hello' as it is supposed to do.
>> while if I change main()

[Neil Hodgson]
>The program has exited before the thread has managed to run. It is
> undefined behaviour whether secondary threads survive main thread
> termination but it looks like they don't on your system.

In fact, they don't on most systems.

>Use the threading module and call join to wait for all threads to
> complete before exiting the program.

That's a different story:  threads from the `thread` module have
entirely OS-specific behavior when Python shuts down.  Python knows a
lot more about threads from the newer `threading` module, & uses an
atexit() hook to ensure that the Python interpreter does _not_ go away
while a threading.Thread is still running(*).  IOW, Python does the
join() for you for threading.Thread threads -- there's no need to do
it yourself.

(*) Unless you explicitly mark it as a daemon thread.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Would there be support for a more general cmp/__cmp__

2005-10-20 Thread Tim Peters
[Toby Dickenson]
> ...
> ZODB's BTrees work in a similar way but use the regular python comparison
> function, and the lack of a guarantee of a total ordering can be a liability.
> Described here in 2002, but I think same is true today:
> http://mail.zope.org/pipermail/zodb-dev/2002-February/002304.html

There's a long discussion of this in the ZODB Programming Guide,
subsection "5.3.1 Total Ordering and Persistence" on this page:

http://www.zope.org/Wikis/ZODB/FrontPage/guide/node6.html

That talks about more than the referenced ZODB thread covered.

Persistence adds more twists, such as that the total ordering among
keys must remain the same across time (including across Python
releases; for example, how None compares to objects of other types has
changed across releases).
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: migrate from ZODB 3.3.1 --- to where, and how?

2005-10-25 Thread Tim Peters
[Harald Armin Massa]
> I am using ZODB "standalone" in version 3.3.1 within some application.
>
> Now I learn that the 3.3.x branch of ZODB is "retired". No problem so
> far, everything is running fine.
>
> BUT... "retired" gives me the hint that nothing GREAT will be done to
> this branch anymore :)

More than that, there will never be another release of any kind in the
3.3 line.  ZODB work (I'm ZODB's primary maintainer, BTW) is driven by
Zope's needs.  When the last Zope that needs a particular ZODB is no
longer supported, the ZODB used by that Zope is also no longer
supported.  The ZODB 3.1 and 3.3 lines are dead now, corresponding
(respectively) to the Zope 2.6 line and an early experimental release
of Zope3.

> Now I am questioning myself: to which branch should I migrate? Is 3.5 a
> sure bet, or are uneven subversions a bad sign, making retirement
> likely?

Whether digits in a release number are even or odd has no meaning in
ZODB (or Zope) releases.

You should be able to move to the 3.4 or 3.5 lines without problems. 
ZODB 3.4 corresponds to the Zope 2.8 line, and ZODB 3.5 to the Zope
3.1 line.  By the end of this year, ZODB 3.6 will be released
(corresponding to Zopes 2.9 and 3.2, which will also be released by
the end of the year).  ZODB 3.7 (along with another batch of Zopes)
will be released mid-year 2006, etc.

> As much as my diggings showed me, the "special sign" of 3.3 was
> import ZODB
> from persistent import Persistent
> from persistent.list import PersistentList
> from persistent.mapping import PersistentMapping
>
> that PersistentList and PersistenMapping reside within
> persistent., while in the 3.2 branch they reside somewhere
> else in the namespace.

ZODBs at and before the 3.2 line differ in many ways from ZODBs at and
after the 3.3 line.  So far, the 3.3, 3.4, 3.5 and (not yet released)
3.6 lines are pretty much intechangeable (although, of course, later
releases add features that may not work under earlier releases, and
because the 3.3 line is dead now it doesn't even get critical bugfixes
anymore).

> I learned it the hard way that 3.3 filestores not get converted
> magically or easy to 3.2 :)

As above, many things changed starting with 3.3.

> So, my questions:
>  - where should I migrate to?

I'd jump to 3.5.1 now if I were you, and to 3.6.0 when it's released.

>  - how is migration done best (especially taking care of "old"
> filestores

Moving from 3.3.1 to 3.5.1 should "just work".

Note that ZODB has its own mailing list:

http://mail.zope.org/mailman/listinfo/zodb-dev

Like most Zope lists, you have to subscribe to it in order to post to
it, but anyone can read the archives.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: more than 100 capturing groups in a regex

2005-10-27 Thread Tim Peters
[DH]
>> It's a conflict between python's syntax for regex back references
>> and octal number literals.  Probably wasn't noticed until way too
>> ate, and now it will never change.

[EMAIL PROTECTED]
> I suspect it comes from Perl, since Python's regular expression engine tries
> pretty hard to be compatible with Perl's, at least for the basics.

"No" to all the above .  The limitation to 99 in backreference
notation was thoroughly discussed on the Python String-SIG at the
time, and it was deliberately not bug-compatible with the Perl of that
time.

In the Perl of that time (no idea what's true now), e.g., \123 in a
regexp was an octal escape if it appeared before or within the 123rd
capturing group, but was a backreference to the 123rd capturing group
if it appeared after the 123rd capturing group.  So, yes, two
different instances of "\123" in a single regexp could have different
meanings (meaning chr(83) in one place, and a backreference to group
123 in another, and there's no way to tell the difference without
counting the number of preceding capturing groups).

That's so horridly un-Pythonic that we drew the line there.  Nobody
had a sane use case for more than 99 backreferences, so "who cares?"
won.

Note that this isn't a reason for limiting the number of capturing
groups.  It only accounts for why we didn't care that you couldn't
write a _backreference_ to a capturing group higher than number 99
using "\nnn" notation.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Is bytecode machine (in)dependent?

2005-10-28 Thread Tim Peters
[Robert McLay]
> I'm trying to understand bytecodes generated on different machines.
> I understand that the bytecodes can change between version.  But since
> I'm told that .pyc files are version dependent but not machine
> dependent, I'm wondering why the bytecodes are machine dependent.

They aren't -- at least not particularly .

> my friend and I created this simple example to explore the problem.
> The example code is:
>
>   #!/usr/bin/env python
>   #
>   import sys, os
>
>   def main():
> x = 1.234
> print x
>
>   if ( __name__ == '__main__'):
> main()
>
>
> Using sib.py from Vendorid 1.0 generates different bytecodes under linux
> and sgi.  At its core sib.py is using:
>
># Compile the code using prefix and marshall it.
>compiled_code = compile(source_code, prefix, 'exec')
>marshalled_code = marshal.dumps(compiled_code)
>
> to get the bytecodes.

Why do you believe that `prefix` had the same value in both runs?  The
output suggests it did not, but can't guess more than that from here
since I don't know where `prefix` came from.

> So why are the byte codes different?  Is it that the intel-linux is
> little endian and the SGI is big endian

No; marshal format has fixed endianness.

> and the numerical constant (1.234) is stored different depending on the
> endian-ness?

Python defers to the platform C library for string->float conversions,
and it's *possible* that different platforms could convert "1.234" to
a C double in slightly different ways.  There's no evidence of that
here, though.

> This was generated under intel-linux using python 2.4.2
>
>
>
>
> 
> /*==*/
> /* Frozen main script for test
>   */
> /* Generated from test.py 
>   */
> /* This is generated code; Do not modify it!  
>   */
> 
> /*--*/
> unsigned char M___main__[] =
> {
> 99,0,0,0,0,0,0,0,0,2,0,0,0,64,0,0,
> 0,115,55,0,0,0,100,0,0,107,0,0,90,0,0,100,
> 0,0,107,1,0,90,1,0,100,1,0,132,0,0,90,2,
> 0,101,3,0,100,2,0,106,2,0,111,11,0,1,101,2,
> 0,131,0,0,1,110,1,0,1,100,0,0,83,40,3,0,
> 0,0,78,99,0,0,0,0,1,0,0,0,1,0,0,0,
> 67,0,0,0,115,15,0,0,0,100,1,0,125,0,0,124,
> 0,0,71,72,100,0,0,83,40,2,0,0,0,78,102,5,
> 49,46,50,51,52,40,1,0,0,0,116,1,0,0,0,120,
> 40,1,0,0,0,82,0,0,0,0,40,0,0,0,0,40,
> 0,0,0,0,116,23,0,0,0,47,104,111,109,101,47,118, /* This 
> line */
> 112,97,114,114,47,115,105,98,47,116,101,115,116,46,112,121,
> 116,4,0,0,0,109,97,105,110,5,0,0,0,115,4,0,
> 0,0,0,1,6,1,116,8,0,0,0,95,95,109,97,105,
> 110,95,95,40,4,0,0,0,116,3,0,0,0,115,121,115,
> 116,2,0,0,0,111,115,82,2,0,0,0,116,8,0,0,
> 0,95,95,110,97,109,101,95,95,40,3,0,0,0,82,4,
> 0,0,0,82,2,0,0,0,82,5,0,0,0,40,0,0,
> 0,0,40,0,0,0,0,82,1,0,0,0,116,1,0,0,
> 0,63,3,0,0,0,115,6,0,0,0,18,2,9,4,13,
> 1,
> };  x
>
>  And under SGI (python 2.4.2) it created :
>
>
> /*==*/
>/* Frozen main script for test 
>  */
>/* Generated from test.py  
>  */
>/* This is generated code; Do not modify it!   
>  */
>
> /*--*/
>unsigned char M___main__[] =
>{
>99,0,0,0,0,0,0,0,0,2,0,0,0,64,0,0,
>0,115,55,0,0,0,100,0,0,107,0,0,90,0,0,100,
>0,0,107,1,0,90,1,0,100,1,0,132,0,0,90,2,
>0,101,3,0,100,2,0,106,2,0,111,11,0,1,101,2,
>0,131,0,0,1,110,1,0,1,100,0,0,83,40,3,0,
>0,0,78,99,0,0,0,0,1,0,0,0,1,0,0,0,
>67,0,0,0,115,15,0,0,0,100,1,0,125,0,0,124,
>0,0,71,72,100,0,0,83,40,2,0,0,0,78,102,5,
>49,46,50,51,52,40,1,0,0,0,116,1,0,0,0,120,
>40,1,0,0,0,82,0,0,0,0,40,0,0,0,0,40,
>0,0,0,0,116,23,0,0,0,47,87,111,114,107,47,118,  /* This 
> line */
>112,97,114,114,47,115,105,98,47,116,101,115,116,46,112,121,
>116,4,0,0,0,109,97,105,110,5,0,0,0,115,4,0,
>0,0,0,1,6,1,116,8,0,0,0,95,95,109,97,105,
>110,95,95,40,4,0,0,0,116,3,0,0,0,115,121,115,
>116,2,0,0,0,111,115,82,2,0,0,0,116,8,0,0,
>0,95,95,110,97,109,101,95,95,40,3,0,0,0,82,4,
>0,0,0,82,2,0,0,0,82,5,0,0,0,40,0,0,
>0,0,40,0,0,0,0,82,1,0,0,0,116,1,0,0,
>0,63,3,0,0,0,115,6,0,0,0,18,2,9,4,13,
>1,
>};
>
>
> The difference between the two is very slight:
>
> 18c18
> < 0,0,0,0,116,23,0,0,0,47,104,111,109,101,47,118,
> ---
> > 0,0,0,0,116,23,0,0,0,47,87,111,114,107,47,118,

Stare at thi

Re: Recursive generators and backtracking search

2005-10-31 Thread Tim Peters
[Talin]
> I've been using generators to implement backtracking search for a while
> now. Unfortunately, my code is large and complex enough (doing
> unification on math expressions) that its hard to post a simple
> example. So I decided to look for a simpler problem that could be used
> to demonstrate the technique that I am talking about.
>
> I noticed that PEP 255 (Simple Generators) refers to an implementation
> of the "8 Queens" problem in the lib/test directory. Looking at the
> code, I see that while it does use generators, it doesn't use them
> recursively.

In context, the N-Queens and MxN Knight's Tour solvers in
test_generators.py are exercising the conjoin() generators in that
file.  That's a different approach to backtracking search, with some
nice features too:  (1) it uses heap space instead of stack space;
and, (2) it's easy to run entirely different code at different levels
of the search.  #2 isn't well-illustrated by the N-Queens solver
because the problem is so symmetric, although it is used to give the
code for each row its own local table of the board lines used by the
squares in that row.  That in turn is a major efficiency win.  The
Knight's Tour solver makes more obvious use of #2, by, e.g., running
different code for "the first" square than for "the second" square
than for "the last" square than for "all the other" squares.  That
doesn't require any runtime test-and-branch'ing in the search code,
it's set up once at the start in the list of M*N generators passed to
conjoin() (each square gets its own generator, which can be customized
in arbitrary ways, in advance, for that square).

> As an alternative, I'd like to present the following implementation. If
> you compare this one with the one in lib/test/test_generator.py you
> will agree (I hope) that by using recursive generators to implement
> backtracking, the resulting code is a little more straightforward and
> intuitive:

Since "straightfoward and intuitive" weren't the goals of the
test_generators.py implementations, that's not too surprising ;-)

> ...
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: python gc performance in large apps

2005-11-06 Thread Tim Peters
[Robby Dermody]
> ...
> -len(gc.get_objects()) will show linearly increasing counts over time
> with the director (that match the curve of the director's memory usage
> exactly), but with the harvester, the object count doesn't increase over
> time (although memory usage does). This might mean that we are dealing
> with two separate problems on each component

You meant "at least two" <0.5 wink>.

> uncollected object count growth on the director, and something else on
> the harvester. ...OR the harvester may have an object count growth
> problem as well, it might just be in a C module in a place not visible to
> gc.get_objects() ?

Note that gc.get_objects() only knows about container objects that
elect to participate in cyclic gc.  In particular, it doesn't know
anything about "scalar" types, like strings or floats (or any other
type that can't be involved in a cycle).  For example, this little
program grows about a megabyte per second on my box, but
len(gc.get_objects()) never changes:

"""
import gc
from random import choice

letters = "abcdefghijklmnop"

def build(n):
return "".join(choice(letters) for dummy in range(n))

d = {}
i = 0
while 1:
d[build(10)] = build(5)
i += 1
if i % 1000 == 0:
print i, len(gc.get_objects())
"""

To learn about non-gc-container-object growth, use a debug build and
sys.getobjects() instead.  This is described in SpecialBuilds.txt. 
sys.getobjects() tries to keep track of _all_ objects (but exists only
in a debug build).
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: regexp non-greedy matching bug?

2005-12-03 Thread Tim Peters
[John Hazen]
> I want to match one or two instances of a pattern in a string.
>
> According to the docs for the 're' module
> ( http://python.org/doc/current/lib/re-syntax.html ) the '?' qualifier
> is greedy by default, and adding a '?' after a qualifier makes it
> non-greedy.

>> The "*", "+", and "?" qualifiers are all greedy...
>> Adding "?" after the qualifier makes it perform the match in
>> non-greedy or minimal fashion...

> In the following example, though my re is intended to allow for 1 or 2
> instinces of 'foo', there are 2 in the string I'm matching.  So, I would
> expect group(1) and group(3) to both be populated.  (When I remove the
> conditional match on the 2nd foo, the grouping is as I expect.)
>
> $ python2.4
> Python 2.4.1 (#2, Mar 31 2005, 00:05:10)
> [GCC 3.3 20030304 (Apple Computer, Inc. build 1666)] on darwin
> Type "help", "copyright", "credits" or "license" for more information.
> >>> import re
> >>> foofoo = re.compile(r'^(foo)(.*?)(foo)?(.*?)$')
> >>> foofoo.match(s).group(0)
> 'foobarbazfoobar'
> >>> foofoo.match(s).group(1)
> 'foo'
> >>> foofoo.match(s).group(2)
> ''
> >>> foofoo.match(s).group(3)
> >>> foofoo.match(s).group(4)
> 'barbazfoobar'

Your problem isn't that

(foo)?

is not greedy (it is greedy), it's that your first

(.*?)

is not greedy.  Remember that regexps also work left to right.  When
you coded your first

(.*?)

you're asking (because of the '?') the regexp engine to chew up the
fewest possible number of characters at that point such that the
_rest_ of the regexp _can_ match.  By chewing up no characters at all,
the rest of the regexp can in fact match, so that's what the engine
did -- your second

   (foo)?

is optional, telling the engine you don't require that `foo` to match.
 The engine took you at your word about that ;-)

> >>> foofoo = re.compile(r'^(foo)(.*?)(foo)(.*?)$')

In this case your second `foo` is not optional.  The behavior wrt the first

(.*?)

really doesn't change:  the regexp engine again chews up the fewest
number of characters at that point such that the rest of the regexp
can match.  But because your second `foo` isn't optional in this case,
the engine can't get away with matching 0 characters in this case.  It
still matches the fewest number it can match there, though (consistent
with the rest of the pattern matching too).

> >>> foofoo.match(s).group(0)
> 'foobarbazfoobar'
> >>> foofoo.match(s).group(1)
> 'foo'
> >>> foofoo.match(s).group(2)
> 'barbaz'
> >>> foofoo.match(s).group(3)
> 'foo'
> >>> foofoo.match(s).group(4)
> 'bar'
> >>>
>
> So, is this a bug, or just a problem with my understanding?

The behavior is what I expected ;-)

> If it's my brain that's broken, what's the proper way to do this with regexps?

Sorry, I'm unclear on (exactly) what it is you're trying to
accomplish.  Maybe what you're looking for is

^P(.*P)?.*$

?

> And, if the above is expected behavior, should I submit a doc bug?  It's
> clear that the "?" qualifier (applied to the second foo group) is _not_
> greedy in this situation.

See above:  that's not only not clear, it's not true.  Consider a
related but much simpler example:

>>> m = re.match(r'a(b)?(b)?c', 'abc')
>>> m.groups()
('b', None)

Both instances of

(b)?

are "greedy" there, and that the second one didn't match "b" does not
mean that the second one is not greedy.  It _couldn't_ match without
violating that the _first_ is greedy, and the first "wins" because
regexps work left to right.  It may be harder to see that the same
principle is at work in your example, but it is:  your second (foo)?
couldn't match without violating that your first (.*?) asks for a
minimal match.  My

^P(.*P)?.*$

above asks the engine to match two instances of P if possible, but to
settle for one if that's all it can find.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: hash()

2005-12-05 Thread Tim Peters
[John Marshall]
> For strings of > 1 character, what are the chances
> that hash(st) and hash(st[::-1]) would return the
> same value?

First, if `st` is a string, `st[::-1]` is a list.  Do you really mean
to compare string hashes with list hashes here?  I'm going to assume
not.

Second, what are your assumptions about (a) the universe of strings;
and, (b) the hash function?

Assuming a finite universe of strings (also finite ;-)), and a hash
function that returns each of its H possible results "at random"
(meaning that there's no algorithmic way to predict any bit of the
hash output short of running the hash function), then the probability
that two distinct strings have the same hash is 1/H.  It doesn't
matter to this outcome whether one input is the reversal of the other.

> My goal is to uniquely identify multicharacter strings,

Unclear what that means.  Obviously, if your string universe contains
more than H strings, it's impossible for any hash function with H
possible values to return a different hash value each input.

> all of which begin with "/" and never end with "/".
> Therefore, st != st[::-1].

As at the start, I think you mean to say st != "".join(st[::-1]).  I
don't know why you might think that matters, though.  Is it simply
because this condition eliminates palindromes from your input
universe?

Anyway, to be concrete, using CPython's hash function on a 32-bit box,
H = 2**32-1.  Call a string `s` bad iff:

s[0] == "/" and s[-1] != "/" and hash(s) == hash("".join(reversed(s)))

Then there are no bad strings of length 1, 2, 3, or 4.  There are 4
bad strings of length 5:

'/\xde&\xf6C'
'/\xca\x0e\xfaC'
'/\xc4\x06\xfcC'
'/\xad\xd6\x01\xd6'

I didn't think about this -- I just wrote a little program to try all
~= 4 billion such strings.  So if your universe is the set of all
5-character 8-bit strings that start with '/' but don't end with '/',
and you pick inputs uniformly at random from that universe, the chance
of a hash match between a string and its reversal is

4 / (256**3 * 255)

or a little less than 1 in a billion.  For a truly random hash
function with a 32-bit output, the chance would be a little less than
1 in 4 billion.

It would be mildly surprising if those odds got worse as string length
increased.  The md5 and sha hashes have much larger H, and were
designed for (among other things) good collision resistance.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: hash()

2005-12-05 Thread Tim Peters
[John Marshall]
>>> For strings of > 1 character, what are the chances
>>> that hash(st) and hash(st[::-1]) would return the
>>> same value?

[Tim Peters]
>> First, if `st` is a string, `st[::-1]` is a list.  Do you really mean
>> to compare string hashes with list hashes here?  I'm going to assume
>> not.

[Jeff Epler]
> It is?
>
> >>> st = "french frogs"
> >>> st[::-1]
> 'sgorf hcnerf'
>
> (Python 2.3)

Indeed that's right.  Python 2.4+ also.  My apologies!  Good thing it
doesn't matter to the rest of the exposition ;-)
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: UNIX timestamp from a datetime class

2005-12-06 Thread Tim Peters
[John Reese]
> >>> import time, calendar, datetime
> >>> n= 1133893540.874922
> >>> datetime.datetime.fromtimestamp(n)
> datetime.datetime(2005, 12, 6, 10, 25, 40, 874922)
> >>> lt= _
> >>> datetime.datetime.utcfromtimestamp(n)
> datetime.datetime(2005, 12, 6, 18, 25, 40, 874922)
> >>> gmt= _
>
> So it's easy to create datetime objects from so-called UNIX timestamps
> (i.e. seconds since Jan 1, 1970 UTC).  Is there any way to get a UNIX
> timestamp back from a datetime object besides the following
> circumlocutions?
>
> >>> float(lt.strftime('%s'))
> 1133893540.0
> >>> calendar.timegm(gmt.timetuple())
> 1133893540

Do

time.mktime(some_datetime_object.timetuple())

Note that datetime spans a much larger range than most "so-called UNIX
timestamp" implementations, so this conversion isn't actually possible
for most datetime values; e.g.,

>>> time.mktime(datetime(4000, 12, 12).timetuple())
Traceback (most recent call last):
  File "", line 1, in ?
OverflowError: mktime argument out of range

on a Windows box.  Note too that Python timestamps extend most UNIXy
ones by being floats with fractional seconds; time,mktime() doesn't
know anything about fractional seconds, and neither does
datetime.timetuple(); e.g.,

>>> time.mktime(datetime.utcfromtimestamp(1133893540.874922).timetuple())
1133911540.0

loses the fractional part.  You can add that back in if you like:

you can add that back in if you like:

time.mktime(some_datetime_object.timetuple()) + \
some.datetime_object.microsecond / 1e6
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: binascii.crc32 results not matching

2005-12-09 Thread Tim Peters
[Larry Bates]
> I'm trying to get the results of binascii.crc32
> to match the results of another utility that produces
> 32 bit unsigned CRCs.  binascii.crc32 returns
> results in the range of -2**31-1 and 2**21-1. Has
> anyone ever worked out any "bit twiddling" code to
> get a proper unsigned 32 bit result from binascii.crc32?

Just "&" the result with 0x (a string of 32 1-bits).

> Output snip from test on three files:
>
> binascii.crc32=-1412119273, oldcrc32= 2221277246
> binascii.crc32=-647246320, oldcrc32=73793598
> binascii.crc32=-1391482316, oldcrc32=79075810

Doesn't look like these are using the same CRC algorithms (there are
many distinct ways to compute a 32-bit CRC).  For example,

>>> print -1412119273 & 0x
2882848023

and that's not equal to 2221277246.  Or you're on Windows, and forgot
to open the files in binary mode.  Or something ;-)
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: newby question: Splitting a string - separator

2005-12-09 Thread Tim Peters
[James Stroud]
>> The one I like best goes like this:
>>
>> py> data = "Guido van Rossum  Tim Peters Thomas Liesner"
>> py> names = [n for n in data.split() if n]
>> py> names
>> ['Guido', 'van', 'Rossum', 'Tim', 'Peters', 'Thomas', 'Liesner']
>>
>> I think it is theoretically faster (and more pythonic) than using regexes.

[Kent Johnson]
> Unfortunately it gives the wrong result.

Still, it gets extra points for being such a pleasing example ;-)
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: binascii.crc32 results not matching

2005-12-10 Thread Tim Peters
[Raymond L. Buvel]
> Check out the unit test in the following.
>
> http://sourceforge.net/projects/crcmod/

Cool!

> I went to a lot of trouble to get the results to match the results of
> binascii.crc32.  As you will see, there are a couple of extra operations
> even after you get the polynomial and bit ordering correct.

Nevertheless, the purpose of binascii.crc32 is to compute exactly the
same result as most zip programs give.  All the details (including
what look to you like "extra operations" ;-)) were specified by RFC
1952 (the GZIP file format specification).  As a result,
binascii.crc32 matches, e.g., the CRCs reported by WinZip on Windows,
and gives the same results as zlib.crc32 (as implemented by the zlib
developers).
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Comparing dictionaries, is this valid Python?

2005-12-13 Thread Tim Peters
[François Pinard]
...
> Would someone know where I could find a confirmation that comparing
> dictionaries with `==' has the meaning one would expect (even this is
> debatable!), that is, same set of keys, and for each key, same values?

Yes, look here :  it has the meaning you expect, provided that
by "same" you mean "compare equal" (and not, e.g, "is").

See the Language Reference Manual, section 5.9 "Comparisons", for
more, and footnote 5.5 there for a bit of history.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Why and how "there is only one way to do something"?

2005-12-15 Thread Tim Peters
[Steve Holden, to bonono]
> ...
> I believe I have also suggested that the phrases of the Zen aren't to be
> taken too literally.

Heretic.

> You seem to distinguish between "obvious" meaning "obvious to Steve
> but not necessarily to me" and "really obvious" meaning "obvious to both
> Steve and me". So where does the subjectivity creep in?

For those who have ears to hear, it's sufficient to note that:

There should be one-- and preferably only one --obvious way to do it.

is followed by:

Although that way may not be obvious at first unless you're Dutch.

> And are you going to spend the rest of your life arguing trivial semantics?

A more interesting question is how many will spend the rest of their
lives responding ;-)

perfect-dutchness-is-a-journey-not-a-destination-ly y'rs - tim
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Next floating point number

2005-12-17 Thread Tim Peters
[Steven D'Aprano]
> I'm looking for some way to get the next floating point number after any
> particular float.
...
> According to the IEEE standard, there should be a routine like next(x,y)
> which returns the next float starting from x in the direction of y.
>
> Unless I have missed something, Python doesn't appear to give an interface
> to the C library's next float function. I assume that most C floating
> point libraries will have such a function.

While the C99 standard defines such a function (several, actually),
the C89 standard does not, so Python can't rely on one being
available.  In general, Python's `math` module exposes only standard
C89 libm functions, plus a few extras it can reliably and portably
build itself on top of those.  It does not expose platform-specific
libm functions.  You can argue with that policy, but not successfully
unless you take over maintenance of mathmodule.c <0.5 wink>.

> So I came up with a pure Python implementation, and hoped that somebody
> who had experience with numerical programming in Python would comment.

If you're happy with what you wrote, who needs comments ;-)  Here's a
careful, "kinda portable" implementation in C:

http://www.netlib.org/toms/722

If you ignore all the end cases (NaNs, infinities, signaling underflow
and overflow, ...), the heart of it is just adding/subtracting 1
to/from the 64-bit double representation, viewing it as an 8-byte
integer.  That works fine for the IEEE-754 float representations (but
does not work for all float representations).

I've used this simple routine based on that observation, which ignores
all endcases, and only works if both input and result are >= 0:

"""
from struct import pack, unpack

def next(x, direction=+1):
bits = unpack(">Q", pack(">d", x))[0]
return unpack(">d", pack(">Q", bits + direction))[0]
"""

For example,

>>> next(0) # smallest denorm > 0
4.9406564584124654e-324
>>> next(_, -1) # should really signal underflow
0.0
>>> next(1)
1.0002
>>> next(1, -1)
0.99989
>>> next(1e100)
1.0002e+100
>>> next(1e100, -1)
9.9982e+099
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: doctest fails to NORMALIZE_WHITESPACE ?

2005-12-17 Thread Tim Peters
[David MacKay]
> Hello, I'm a python-list newbie. I've got a question about doctest; perhaps
> a bug report.

As things will turn out, it's just a question.  That's common for newbies :-)

> I really like doctest, but sometimes doctest gives a failure when the output
> looks absolutely fine to me -- indeed, even after I have gone to considerable
> effort to make my documentation match the output perfectly.
>
> http://www.aims.ac.za/~mackay/python/compression/huffman/Huffman3.py
>
> The above file is an example.
>
> It's self-contained, so you can plop it into emacs and hit C-cC-c to run the
> doctests. One of them fails.
> The piece of source code concerned is here:
>
> >>> c = []; \
> c.append(node(0.5,1,'a')); \
> c.append(node(0.25,2,'b')); \
> c.append(node(0.125,3,'c')); \
> c.append(node(0.125,4,'d')); \
> iterate(c) ; reportcode(c)   # doctest: 
> +NORMALIZE_WHITESPACE, +ELLIPSIS

I'd probably write that more like so:

>>> c = [node(0.5,1,'a'), node(0.25,2,'b'), node(0.125,3,'c'),
node(0.125,4,'d')]
>>> iterate(c)
>>> reportcode(c)  # doctest [etc]

When Python doesn't "look clean", it's not Python -- and backslash
continuation & semicolons often look like dirt to the experienced
Python's eye.

> #Symbol Count   Codeword
> a (0.5) 1
> b (0.25)01
> c (0.12)000
> d (0.12)001
> """
>
> And the output is:
>
> Failed example:
> c = []; c.append(node(0.5,1,'a')); 
> c.append(node(0.25,2,'b')); c.append(node(0.125,3,'c')); 
> c.append(node(0.125,4,'d')); iterate(c) ; reportcode(c)   
> # doctest: +NORMALIZE_WHITESPACE, +ELLIPSIS
> Expected:
> #Symbol Count   Codeword
> a (0.5) 1
> b (0.25)01
> c (0.12)000
> d (0.12)001
> Got:
> <__main__.internalnode instance at 0xb7aee76c>

Well, there you go, right?  There was certainly no

<__main__.internalnode instance at 0xb7aee76c>

line in the expected output.  I don't know whether you _want_ to see
that line or not, but the doctest said you don't.  That's why the test
fails.  If you do want to see it, add, e.g.,

<__main__.internalnode instance at 0x...>

as the first line of the expected output.

> #Symbol Count   Codeword
> a   (0.5)   1
> b   (0.25)  01
> c   (0.12)  000
> d   (0.12)  001

You have another problem here, alas:  whether a "%2.2g" format rounds
0.125 to "0.12" or "0.13" varies across platforms.  For example, if
you had run this on Windows, you would have seen:

 c   (0.13)  000
 d   (0.14)  001

for the last two lines.

> I have tried numerous tweaks, and am at a loss. I am wondering whether there
> is some big in doctest involving the "#" character in the output.
> Or maybe I made some silly mistake.

I think we're ready to vote on that now ;-)

> Any advice appreciated!
>
> Many thanks again to the authors of doctest, it gives a great feeling
> to write code in the way that doctest encourages. :-)

You're welcome!
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: doctest fails to NORMALIZE_WHITESPACE ?

2005-12-17 Thread Tim Peters
[David MacKay, having fun with doctest]
 ...
> I've got a follow-up question motivated by my ugly backslash continuations.

[Tim]
>> When Python doesn't "look clean", it's not Python -- and backslash
>> continuation & semicolons often look like dirt to the experienced
>> Python's eye.

> The reason I was making those ugly single-line monsters was I had somehow got
> the impression that each fresh line starting ">>>" was tested
> _completely_separately_ by doctest; so, to preserve state I thought I had to 
> write
>
> >>> a=[2, 5, 1];  a.sort();  print a
> [1, 2, 5]
>
> rather than
>
> >>> a=[2, 5, 1]
> >>> a.sort()
> >>> print a
> [1, 2, 5]
>
> But I see now I was wrong.

Yup, that would be pretty unusable ;-)

> Question:
> Does doctest always preserve state throughout the entire sequence of tests
> enclosed by `"""`, as if the tests were entered in a single interactive 
> session?

Pretty much.  doctest can take inputs from several places, like (as
you're doing) docstrings, but also from files, or from the values in a
__test__ dictionary.  I'll note parenthetically that doctests are
heavily used in Zope3 and ZODB development, and "tutorial doctests" in
files have proved to be pleasant & very effective.  For example,



That entire file is "a doctest", and contains as much expository prose
as code.  It's written in ReST format, and tools in Zope3 can present
such doctest files as nicely formatted documentation too.  Unlike most
docs, though, each time we run ZODB's or Zope3's test suite, we
automatically verify that the examples in _this_ documentation are
100% accurate.  Standard practice now is to write a tutorial doctest
for a new feature first, before writing any implementation code; this
folds in some aspects of test-driven development, but  with some care
leaves you with _readable_ testing code and a tutorial intro to the
feature too.  That doctest makes it a little easier to write prose
than to write code is a bias that's amazingly effective in getting
people to write down what they _think_ they're doing ;-)

Anyway, running any piece of Python code requires a global namespace,
and doctest uses a single (but mutable!) global namespace for each
_batch_ of tests it runs.  In the case of a doctest file, a single
global namespace is used across the entire file.  In your case,
letting doctest extract tests from module docstrings, one global
namespace is created per docstring.  In that case, the globals the
tests in a docstring use are initially a shallow copy of the module's
__dict__.  That way the tests can "see" all the top-level functions
and classes and imports (etc) defined in the module, but can't mutate
the module's __dict__ directly.  Assignments within a doctest alter
bindings in the same namespace object, so these bindings are also
visible to subsequent code in the same docstring.  That's a
long-winded way of expanding on section 5.2.3.3 ("What's the Execution
Context?") in the docs.

> Or is there a way to instruct doctest to forget its state, and start
>  the next `>>>` with a clean slate?

Goodness no -- and nobody would want that.  You would lose _all_
globals then, including losing the ability to refer to functions and
classes (etc) defined in the module.

If, for some reason, you want to destroy some particular binding
within a doctest, then you do that the same way you destroy a global
binding outside of doctest, with `del`; e.g.,

"""
>>> x = range(100)
>>> len(x)
100
>>> del x  # free the memory for the giant list
>>> x
Traceback (most recent call last):
   ...
NameError: name 'x' is not defined
"""

works fine as a doctest.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Next floating point number

2005-12-18 Thread Tim Peters
[Bengt Richter]
> I wonder if frexp is always available,

Yes, `frexp` is a standard C89 libm function.  Python's `math` doesn't
contain any platform-specific functions.

...

> The math module could also expose an efficient multiply by a power
> of two using FSCALE if the pentium FPU is there.

`ldexp` is also a standard C89 libm function, so Python also exposes
that.  Whether it uses FSCALE is up to the platform C libm.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Next floating point number

2005-12-18 Thread Tim Peters
[Steven D'Aprano]
> ...
> Will Python always use 64-bit floats?

A CPython "float" is whatever the platform C compiler means by
"double".  The C standard doesn't define the size of a double, so
neither does Python define the size of a float.

That said, I don't know of any Python platform to date where a C
double was not 64 bits.  It's even possible that all _current_ Python
platforms use exactly the same format for C double.(the IEEE-754
double format), modulo endianness.

There's a subtlety related to that in my pack/unpack code, BTW:  using
a ">d" format forces `struct` to use a "standard" big-endian double
encoding, which is 64 bits, regardless of how the platform C stores
doubles and regardless of how many bytes a native C double occupies. 
So, oddly enough, the pack/unpack code would work even if the platform
C double used some, e.g., non-IEEE 4-byte VAX float encoding.  The
only real docs about this "standard" encoding are in Python's
pickletools module:

"""
float8 = ArgumentDescriptor(
 name='float8',
 n=8,
 reader=read_float8,
 doc="""An 8-byte binary representation of a float, big-endian.

 The format is unique to Python, and shared with the struct
 module (format string '>d') "in theory" (the struct and cPickle
 implementations don't share the code -- they should).  It's
 strongly related to the IEEE-754 double format, and, in normal
 cases, is in fact identical to the big-endian 754 double format.
 On other boxes the dynamic range is limited to that of a 754
 double, and "add a half and chop" rounding is used to reduce
 the precision to 53 bits.  However, even on a 754 box,
 infinities, NaNs, and minus zero may not be handled correctly
 (may not survive roundtrip pickling intact).
 """)
"""
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Guido at Google

2005-12-22 Thread Tim Peters
[EMAIL PROTECTED]
> ...
>  What about the copyright in CPython ? Can I someone take the codebase
> and make modifications then call it Sneak ?

Of course they _could_ do that, and even without making modifications
beyond the name change.  If you want to know whether it's legal,
that's a different question.  Take a copy of the Python license to
your lawyer and buy an opinion worth hearing ;-)
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Guido at Google

2005-12-22 Thread Tim Peters
[Greg Stein]
>>> Guido would acknowledge a query, but never announce it. That's not his
>>> style.

He's been very low-key about it, but did make an informal announcement
on the PSF-Members mailing list.

>>> This should have a positive impact on Python. His job description has a
>>> *very* significant portion of his time dedicated specifically to working on
>>> Python. (much more than his previous "one day a week" jobs have given
>>> him)

It's got to be better than getting one patch per year from him, trying
to fix threading on the ever-popular Open Source combination of HP-UX
on an Itanium chip .

[Jay Parlar]
>> Do you actually mean "working on Python", or did you mean "working WITH
>> Python"?

[Robert Kern]
> I'm pretty sure he means "working on Python."

While I'm not a professional Greg-channeller, in this case I can:  he
meant what he said.

> No one hires Guido and expects him not to work *with* Python most of the time.

Ask Guido how fond he is of Java these days ;-)
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: How to get started in GUI Programming?

2005-12-23 Thread Tim Peters
[D H]
> ...
> Doesn't the python community already have enough assholes as it is?

The Python Software Foundation may well wish to fund a study on that. 
Write a proposal!  My wild-ass guess is that, same as most other Open
Source communities, we average about one asshole per member.  I'd love
to proven wrong, though.

at-my-age-you-need-all-the-evacuation-routes-you-can-get-ly y'rs  - tim
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: sorting with expensive compares?

2005-12-23 Thread Tim Peters
[Steven D'Aprano]
> ...
> As others have pointed out, Python's sort never compares the same objects
> more than once.

Others have pointed it out, and it's getting repeated now, but it's
not true.  Since I wrote Python's sorting implementation, it's
conceivable that I'm not just making this up ;-)

Here's the shortest counter-example:  [10, 30, 20].

The first thing sort() does is compute the length of the longest
natural run (whether increasing or decreasing) starting at index 0. 
It compares 10 and 30, sees that the list starts with an increasing
run, then compares 30 and 20 to see whether this run can be extended
to length 3.  It can't.  Since the list has only 3 elements, it
decides to use a binary insertion sort to move the 3rd element into
place.  That requires 2 comparisons, the first of which _again_
compares 30 and 20.

This doesn't bother me, and it would cost more to avoid this than it
could possibly save in general.  It's true that the workhorse binary
insertion and merge sorts never compare the same elements more than
once, but there is some potential duplication between those and
comparisons done to identify natural runs.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: How to signal "not implemented yet"?

2005-12-25 Thread Tim Peters
[Roy Smith]
> Is there some standard way to signal "not implemented yet" in
> unfinished code?

raise NotImplementedError

That's a builtin exception.

...
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: python coding contest

2005-12-25 Thread Tim Peters
Over at

http://spoj.sphere.pl/problems/SIZECON/

the task is to come up with the shortest program that solves a
different problem.  There's a twist in this one:

Score equals to size of source code of your program except symbols with
ASCII code <= 32.

So blanks, newlines and tabs aren't counted at all.  However, no
"control characters" of any kind are counted, and I found a convoluted
way to transform any Perl program so that only 7 "readable" characters
remain.  That's currently the shortest solution known (given the
stated metric -- my Perl source is actually over 400 bytes!  all but 7
of them have ord < 33, though).

There's probably still room for improvement in the "shortest" Python
program known for this task.  No deadlines, no prizes, and you don't
get to see anyone else's code, but as a form of programming
masturbation it's great ;-)

http://spoj.sphere.pl/problems/KAMIL/

is another program-size task, but this one counts all bytes.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: build curiosities of svn head (on WinXP)

2005-12-26 Thread Tim Peters
[David Murmann]
...
>> second, the build order in "pcbuild.sln" for elementtree seems to be
>> wrong, nant tried to build elementtree before pythoncore (which failed).
>> i fixed this by building elementtree separately.

[Steve Holden]
> Yes, the elementtree module is a new arrival for 3.5, so the build may
> not yet be perfectly specified. This is useful information.

I just checked in a fix for that.  Thanks!

...

>> ... and i could reproduce the expected failure (ATM) of the regression
>> test suite:
>>
>>   http://mail.python.org/pipermail/python-dev/2005-December/059033.html

Note that all tests pass on Windows as of Sunday (in release and debug
builds, with and without -O).
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: python coding contest

2005-12-28 Thread Tim Peters
[Bengt Richter]
> ...
> [23:28] C:\pywk\clp\seven\pycontest_01>wc -lc  seven_seg.py
>   2136  seven_seg.py
>
> 2 lines, 136 chars including unix-style lineseps (is that cheating on 
> windows?)

Na.  Most native Windows apps (including native Windows Python) don't
care whether \n or \r\n in used as the line terminator in text files. 
This is because the platform C library changes \r\n to \n on input of
a text-mode file, and leaves \n alone:  the difference is literally
invisible to most apps.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: using NamedTemporaryFile on windows

2005-12-29 Thread Tim Peters
[Peter Hansen]
>>> What I don't understand is why you _can't_ reopen the NamedTemporaryFile
>>> under Windows when you can reopen the file created by mkstemp (and the
>>> files created by TemporaryFile are created by mkstemp in the first place).

[Lee Harr]
>> Are you saying you tried it and you actually can do what it says
>> you can't do?

[Peter Hansen]
> I don't think so.  I think I was saying that I can do exactly what it
> says I can, and can't do what it says I can't do, but that I don't
> understand why there is a difference between the two approaches given
> what else it says... (I hope that's clearer than it looks to me. ;-)

Because NamedTemporaryFile on Windows passes the Microsoft-specific
O_TEMPORARY flag, and mkstemp doesn't.  One consequence is that you
have to delete a temp file obtained from mkstemp yourself, but a
NamedTemporaryFile goes away by magic when the last handle to it is
closed.  Microsoft's I/O libraries do that cleanup, not Python.

Another consequence is that a file opened with O_TEMPORARY can't be
opened again (at least not via C stdio), not even by the process that
opened it to begin with.  AFAICT Microsoft never documented this, but
that's how it works.  Because you can't delete an open file on
Windows, temp files on Windows always have names visible in the
filesystem, and that's a potential "security risk".  _Presumably_ the
inability to open an O_TEMPORARY file again was a partially-baked
approach to eliminating that risk.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: csrss.exe & Numeric

2005-12-30 Thread Tim Peters
[jelle]
> I have a function that uses the Numeric module. When I launch the
> function csrss.exe consumes 60 / 70 % cpu power rather than having
> python / Numeric run at full speed. Has anyone encountered this problem
> before? It seriously messes up my Numeric performance.
>
> I'm running 2.4.2 on xp.

Need more clues than that.  What does your function do?  What
facilities does your function make use of?

csrss.exe is the "user-mode side" of Windows, and consumes a lot of
CPU if, for example, you're starting/stopping many threads, or display
a lot of output to "a DOS box".  You should also be aware that several
kinds of malware install a program named "csrss.exe" or "CSRSS.EXE"
(search the web for more on that, and run a virus scan).
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Thread priorities?

2005-06-14 Thread Tim Peters
[Gary Robinson]
> In the application we're writing (http://www.goombah.com) it would be
> helpful for us to give one thread a higher priority than the others. We
> tried the recipe here:
> http://groups-beta.google.com/group/comp.lang.python/msg/6f0e118227a5f5de
> and it didn't seem to work for us.

If you need to fiddle thread priorities, then you'll need to do such
platform-specific things, and fight with the OS documentation to
figure out when and why they don't seem to work.

> We don't need many priority levels. We just need one thread to
> *temporarily* have a higher priority than others.
>
> One thing that occurred to me: There wouldn't by any chance be some way
> a thread could grab the GIL and not let it go until it is ready to do
> so explicitly?

No, unless that thread is running in a C extension and never calls
back into Python until it's willing to yield.

> That would have the potential to solve our problem.
>
> Or maybe there's another way to temporarily let one thread have
> priority over all the others?

No way in Python, although specific platforms may claim to support
relevant gimmicks in their native threads (and Python threads
generally are native platform threads).

A portable approach needs to rely on thread features Python supports
on all platforms.  For example, maybe this crude approach is good
enough:

"""
import threading

class Influencer(object):
def __init__(self):
self.mut = threading.Lock()
self.proceed = threading.Event()
self.proceed.set()
self.active = None

def grab(self):
self.mut.acquire()
self.active = threading.currentThread()
self.proceed.clear()
self.mut.release()

def release(self):
self.mut.acquire()
self.active = None
self.proceed.set()
self.mut.release()

def maybe_wait(self):
if (self.proceed.isSet() or
  self.active is threading.currentThread()):
pass
else:
self.proceed.wait()
"""

The idea is that all threads see an (shared) instance of Influencer,
and call its .maybe_wait() method from time to time.  Normally that
doesn't do anything.  If a thread T gets it into its head that it's
more important than other threads, it calls .grab(), which returns at
once.  Other threads then block at their next call to .maybe_wait(),
until (if ever) some thread calls .release().  Season to taste; e.g.,
maybe other threads are willing to pause no more than 0.1 second; etc.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: multi threading and win9x

2005-06-20 Thread Tim Peters
[Timothy Smith]
> i want to run my sql statements on a seperate thread to prevent my app
> from stop responding to input (atm is says "not responding" under
> windows until the sql is finished)
> but i'm hesitant because i have to still support win9x and i'm not sure
> how well this will play.

All versions of Windows >= Win95 use threads heavily, have very solid
thread support, and the Win32 API was thread-aware from the start 
Thread _scheduling_ is pretty bizarre <= WinME, but process scheduling
is too.  Don't try to use hundreds of threads <= WinME and you should
be fine.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: pickle broken: can't handle NaN or Infinity under win32

2005-06-22 Thread Tim Peters
[with the start of US summer comes the start of 754 ranting season]

[Grant Edwards]
 Negative 0 isn't a NaN, it's just negative 0.

[Scott David Daniels]
>>> Right, but it is hard to construct in standard C.

[Paul Rubin]
>> Huh?  It's just a hex constant.

[Scott David Daniels]
> Well, -0.0 doesn't work,

C89 doesn't define the result of that, but "most" C compilers these
days will create a negative 0.

> and (double)0x8000 doesn't work,

In part because that's an integer , and in part because it's
only 32 bits.  It requires representation casting tricks (not
conversion casting tricks like the above), knowledge of the platform
endianness, and knowledge of the platform integer sizes.  Assuming the
platform uses 754 bit layout to begin with, of course.

> and I think you have to use quirks of a compiler to create
> it.

You at least need platform knowledge.  It's really not hard, if you
can assume enough about the platform.

>  And I don't know how to test for it either, x < 0.0 is
> not necessarily true for negative 0.

If it's a 754-conforming C compiler, that's necessarily false (+0 and
-0 compare equal in 754).  Picking the bits apart is again the closest
thing to a portable test.  Across platforms with a 754-conforming
libm, the most portable way is via using atan2(!):

>>> pz = 0.0
>>> mz = -pz
>>> from math import atan2
>>> atan2(pz, pz)
0.0
>>> atan2(mz, mz)
-3.1415926535897931

It's tempting to divide into 1, then check the sign of the infinity,
but Python stops you from doing that:

>>> 1/pz
Traceback (most recent call last):
  File "", line 1, in ?
ZeroDivisionError: float division

That can't be done at the C level either, because _some_ people run
Python with their 754 HW floating-point zero-division, overflow, and
invalid operation traps enabled, and then anything like division by 0
causes the interpreter to die.  The CPython implementation is
constrained that way.

Note that Python already has Py_IS_NAN and Py_IS_INFINITY macros in
pyport.h, and the Windows build maps them to appropriate
Microsoft-specific library functions.  I think it's stuck waiting on
others to care enough to supply them for other platforms.  If a
platform build doesn't #define them, a reasonable but cheap attempt is
made to supply "portable" code sequences for them, but, as the
pyport.h comments note, they're guaranteed to do wrong things in some
cases, and may not work at all on some platforms.  For example, the
default

#define Py_IS_NAN(X) ((X) != (X))

is guaranteed never to return true under MSVC 6.0.

> I am not trying to say there is no way to do this.  I am
> trying to say it takes thought and effort on every detail,
> in the definition, implementations, and unit tests.

It's par for the course -- everyone thinks "this must be easy" at
first, and everyone who persists eventually gives up.  Kudos to
Michael Hudson for persisting long enough to make major improvements
here in pickle, struct and marshal for Python 2.5!
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: pickle broken: can't handle NaN or Infinity under win32

2005-06-22 Thread Tim Peters
[Tim Peters]
...
>> Across platforms with a 754-conforming libm, the most portable way [to
>> distinguish +0.0 from -0.0 in standard C] is via using atan2(!):
>>
>> >>> pz = 0.0
>> >>> mz = -pz
>> >>> from math import atan2
>> >>> atan2(pz, pz)
>> 0.0
>> >>> atan2(mz, mz)
>> -3.1415926535897931

[Ivan Van Laningham]
> Never fails.  Tim, you gave me the best laugh of the day.

Well, I try, Ivan.  But lest the point be missed , 754 doesn't
_want_ +0 and -0 to act differently in "almost any" way.  The only
good rationale I've seen for why it makes the distinction at all is in
Kahan's paper "Branch Cuts for Complex
Elementary Functions, or Much Ado About Nothing's Sign Bit".  There
are examples in that where, when working with complex numbers, you can
easily stumble into getting real-world dead-wrong results if there's
only one flavor of 0.  And, of course, atan2 exists primarily to help
convert complex numbers from rectangular to polar form.

Odd bit o' trivia:  following "the rules" for signed zeroes in 754
makes exponeniation c**n ambiguous, where c is a complex number with
c.real == c.imag == 0.0 (but the zeroes may be signed), and n is a
positive integer.  The signs on the zeroes coming out can depend on
the exact order in which multiplications are performed, because the
underlying multiplication isn't associative despite that it's exact. 
I stumbled into this in the 80's when KSR's Fortran compiler failed a
federal conformance test, precisely because the test did atan2 on the
components of an all-zero complex raised to an integer power, and I
had written one of the few 754-conforming libms at the time.  They
wanted 0, while my atan2 dutifully returned -pi.  I haven't had much
personal love for 754 esoterica since then ...
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: pickle broken: can't handle NaN or Infinity under win32

2005-06-23 Thread Tim Peters
[Tim Peters']
>> Well, I try, Ivan.  But lest the point be missed , 754 doesn't
>> _want_ +0 and -0 to act differently in "almost any" way.  The only
>> good rationale I've seen for why it makes the distinction at all is in
>> Kahan's paper "Branch Cuts for Complex
>> Elementary Functions, or Much Ado About Nothing's Sign Bit".  There
>> are examples in that where, when working with complex numbers, you can
>> easily stumble into getting real-world dead-wrong results if there's
>> only one flavor of 0.  And, of course, atan2 exists primarily to help
>> convert complex numbers from rectangular to polar form.

[Steven D'Aprano]
> It isn't necessary to look at complex numbers to see the difference
> between positive and negative zero. Just look at a graph of y=1/x. In
> particular, look at the behaviour of the graph around x=0. Now tell me
> that the sign of zero doesn't make a difference.

OK, I looked, and it made no difference to me.  Really.  If I had an
infinitely tall monitor, maybe I could see a difference, but I don't
-- the sign of 0 on the nose makes no difference to the behavior of
1/x for any x other than 0.  On my finite monitor, I see it looks like
the line x=0 is an asymptote, and the graph approaches minus infinity
on that line from the left and positive infinity from the right; the
value of 1/0 doesn't matter to that.

> Signed zeroes also preserve 1/(1/x) == x for all x,

No, signed zeros "preverse" that identity for exactly the set {+Inf,
-Inf}, and that's all.  That's worth something, but 1/(1/x) == x isn't
generally true in 754 anyway.  Most obviously, when x is subnormal,
1/x overflows to an infinity (the 754 exponent range isn't symmetric
around 0 -- subnormals make it "heavy" on the negative side), and then
1/(1/x) is a zero, not x.  1/(1/x) == x doesn't hold for a great many
normal x either (pick a pile at random and check -- you'll find
counterexamples quickly).

> admittedly at the cost of y==x iff 1/y == 1/x (which fails for y=-0 and x=+0).
>
> Technically, -0 and +0 are not the same (for some definition of 
> "technically"); but
> practicality beats purity and it is more useful to have -0==+0 than the 
> alternative.

Can just repeat that the only good rationale I've seen is in Kahan's
paper (previously referenced).

>> Odd bit o' trivia:  following "the rules" for signed zeroes in 754
>> makes exponeniation c**n ambiguous, where c is a complex number with
>> c.real == c.imag == 0.0 (but the zeroes may be signed), and n is a
>> positive integer.  The signs on the zeroes coming out can depend on
>> the exact order in which multiplications are performed, because the
>> underlying multiplication isn't associative despite that it's exact.

> That's an implementation failure. Mathematically, the sign of 0**n should
> depend only on whether n is odd or even. If c**n is ambiguous, then that's
> a bug in the implementation, not the standard.

As I said, these are complex zeroes, not real zeroes.  The 754
standard doesn't say anything about complex numbers.  In rectangular
form, a complex zero contains two real zeroes.  There are 4
possiblities for a complex zero if the components are 754
floats/doubles:

+0+0i
+0-0i
-0+0i
-0-0i

Implement Cartesian complex multiplication in the obvious way:

(a+bi)(c+di) = (ac-bd) + (ad+bc)i

Now use that to raise the four complex zeroes above to various integer
powers, trying different ways of grouping the multiplications.  For
example, x**4 can be computed as

  ((xx)x)x

or

  (xx)(xx)

or

  x((xx)x)

etc.  You'll discover that, in some cases, for fixed x and n, the
signs of the zeroes in the result depend how the multiplications were
grouped.  The 754 standard says nothing about any of this, _except_
for the results of multiplying and adding 754 zeroes.  Multiplication
of signed zeroes in 754 is associative.  The problem is that the
extension to Cartesian complex multiplication isn't associative under
these rules in some all-zero cases, mostly because the sum of two
signed zeroes is (under 3 of the rounding modes) +0 unless both
addends are -0.  Try examples and you'll discover this for yourself.

I was part of NCEG (the Numerical C Extension Group) at the time I
stumbled into this, and they didn't have any trouble following it
.  It was a surprise to everyone at the time that Cartesian
multiplication of complex zeroes lost associativity when applying 754
rules in the obvious way, and no resolution was reached at that time.

>> I stumbled into this in the 80's when KSR's Fortran compiler failed a
>> federal conformance test, precisely because the test did atan2 on the
>>

Re: Avoiding deadlocks in concurrent programming

2005-06-23 Thread Tim Peters
[Terry Hancock]
> ...
> I realize you've probably already made a decision on this, but this sounds
> like a classic argument for using an *object DBMS*, such as ZODB: It
> certainly does support transactions, and "abstracting the data into tables"
> is a non-issue as ZODB stores Python objects more or less directly (you
> only have to worry about ensuring that objects are of "persistent"  types
> -- meaning either immutable, or providing persistence support explicitly).

ZODB can store/retrieve anything that can be pickled, regardless of
whether it derives from Persistent.  There are various space and time
efficiencies that can be gained by deriving from Peristent, and ZODB
automatically notices when a Persistent object mutates, but that's
about it.  Andrew Kuchling's intro to ZODB is still a good read
(Andrew doesn't work on it anymore, but I take sporadic stabs at
updating it):

http://www.zope.org/Wikis/ZODB/FrontPage/guide/index.html
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Tracing down segfault

2005-06-24 Thread Tim Peters
[Tony Meyer]
> I have (unfortunately) a Python program that I can consistently (in a
> reproducible way) segfault.  However, I've got somewhat used to Python's
> very nice habit of protecting me from segfaults and raising exceptions
> instead, and am having trouble tracking down the problem.
>
> The problem that occurs looks something like this:
> 
> Program received signal SIGSEGV, Segmentation fault.
> 0x00a502aa in ?? ()
> (gdb) bt
> #0  0x00a502aa in ?? ()
> Cannot access memory at address 0x0
>
> Which looks something like accessing a NULL pointer to me.

Worse, if you can't get a stack trace out of gdb, it suggests that bad
C code has corrupted the C stack beyond intelligibility.  The original
SIGSEGV was _probably_ due to a NULL pointer dereference too (although
it could be due to any string of nonsense bits getting used as an
address).

The _best_ thing to do next is to rebuild Python, and as many other
packages as possible, in debug mode.  For ZODB/ZEO, you do that like
so:

 python setup.py build_ext -i --debug

It's especially useful to rebuild Python that way.  Many asserts are
enabled then, and all of Python's memory allocations go thru a special
debug allocator then with gimmicks to try and catch out-of-bounds
stores, double frees, and use of free()'d memory.

> The problem is finding the code that is causing this, so I can work around
> it (or fix it).  Unfortunately, the script uses ZEO, ZODB,
> threading.Threads, and wx (my code is pure Python, though),

You didn't mention which version of any of these you're using, or the
OS in use.  Playing historical odds, and assuming relatively recent
versions of all, wx is the best guess.

> and I'm having trouble creating a simple version that isolates the problem
> (I'm pretty sure it started happening when I switched from thread to
> threading, but I'm not sure why that would be causing a problem;

It's unlikely to be the true cause.  Apart from some new-in-2.4
thread-local storage gimmicks, all of the threading module is written
in Python too.  NULL pointers are a (depressingly common) C problem.

> I am join()ing all threads before this happens).

So only a single thread is running at the time the segfault occurs? 
Is Python also in the process of tearing itself down (i.e., is the
program trying to exit?).

One historical source of nasties is trying to get more than one thread
to play nicely with GUIs.

> Does anyone have any advice for tracking this down?

Nope, can't think of a thing -- upgrade to Windows .
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Favorite non-python language trick?

2005-06-27 Thread Tim Peters
[Terry Hancock]
> Probably the most pointless Python wart, I would think. The =/==
> distinction makes sense in C, but since Python doesn't allow assignments
> in expressions, I don't think there is any situation in which the distinction
> is needed.  Python could easily figure out whether you meant assignment
> or equality from the context, just like the programmer does.

That's what Python originally did, before release 0.9.6 (search
Misc/HISTORY for eqfix.py).  Even this is ambigous then:

a = b

Especially at an interactive prompt, it's wholly ambiguous then
whether you want to change a's binding, or want to know whether a and
b compare equal.

Just yesterday, I wrote this in a script:

lastinline = ci == ncs - 1

This:

lastinline = ci = ncs - 1

means something very different (or means something identical,
depending on exactly how it is Python "could easily figure out" what I
intended ).

Of course strange rules could have resolved this, like, say, "=" means
assignment, unless that would give a syntax error, and then "=" means
equality.  Then

lastinline = ci = ncs - 1

would have been chained assignment, and something like

lastinline = (ci = ncs - 1)

would have been needed to get the intent of the current

lastinline = ci == ncs - 1
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: %g and fpformat.sci()

2005-06-30 Thread Tim Peters
[Sivakumar Bhaskarapanditha]
> How can I control the number of digits after the decimal point using the %g
> format specifier.

You cannot.  See a C reference for details; in general, %g is required
to truncate trailing zeroes, and in %.g the  is the maximum
number of significant digits displayed (the total number both before
and after the decimal point).

If you need to preserve trailing zeroes, then you need to write code
to do that yourself, or use the %e format code (or maybe even %f). 
For example,

>>> a = 1.234e-5
>>> print "%.6g" % a
1.234e-005
>>> print "%.60g" % a
1.234e-005
>>> print "%.6e" % a
1.234000e-005
>>> print "%.6f" % a
0.12
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Determining actual elapsed (wall-clock) time

2005-07-02 Thread Tim Peters
[Peter Hansen]
> Hmmm... not only that, but at least under XP the return value of
> time.time() _is_ UTC.  At least, it's entirely unaffected by the
> daylight savings time change, or (apparently) by changes in time zone.

On all platforms, time.time() returns the number of seconds "since the
epoch".  All POSIX systems agree on when "the epoch" began, but that
doesn't really matter to your use case.  Number of seconds since the
epoch is insensitive to daylight time, time zone, leap seconds, etc. 
Users can nevertheless make it appear to jump (into the future or the
past) by changing their system clock.  If you need an absolute measure
of time immune to user whims, you need to connect to special hardware,
or to an external time source.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: math.nroot [was Re: A brief question.]

2005-07-03 Thread Tim Peters
[Fredrik Johansson]
>>> I'd rather like to see a well implemented math.nthroot. 64**(1/3.0)
>>> gives 3.9996, and this error could be avoided.

[Steven D'Aprano]
>> >>> math.exp(math.log(64)/3.0)
>> 4.0
>>
>> Success!!!

[Tom Anderson]
> Eeenteresting. I have no idea why this works. Given that math.log is
> always going to be approximate for numbers which aren't rational powers of
> e (which, since e is transcendental, is all rational numbers, and
> therefore all python floats, isn't it?), i'd expect to get the same
> roundoff errors here as with exponentiation. Is it just that the errors
> are sufficiently smaller that it looks exact?

Writing exp(log(x)*y) rather than x**y is in _general_ a terrible
idea, but in the example it happens to avoid the most important
rounding error entirely:  1./3. is less than one-third, so 64**(1./3.)
is less than 64 to the one-third.  Dividing by 3 instead of
multiplying by 1./3. is where the advantage comes from here:

>>> 1./3.  # less than a third
0.1
>>> 64**(1./3.)  # also too small
3.9996
>>> exp(log(64)/3)  # happens to be on the nose
4.0

If we feed the same roundoff error into the exp+log method in
computing 1./3., we get a worse result than pow got:

>>> exp(log(64) * (1./3.))  # worse than pow's
3.9991

None of this generalizes usefully -- these are example-driven
curiousities.  For example, let's try 2000 exact cubes, and count how
often "the right" answer is delivered:

c1 = c2 = 0
for i in range(1, 2001):
p = i**3
r1 = p ** (1./3.)
r2 = exp(log(p)/3)
c1 += r1 == i
c2 += r2 == i
print c1, c2

On my box that prints

3 284

so "a wrong answer" is overwhelmingly more common either way.  Fredrik
is right that if you want a library routine that can guarantee to
compute exact n'th roots whenever possible, it needs to be written for
that purpose.

...

> YES! This is something that winds me up no end; as far as i can tell,
> there is no clean programmatic way to make an inf or a NaN;

All Python behavior in the presence of infinities, NaNs, and signed
zeroes is a platform-dependent accident, mostly inherited from that
all C89 behavior in the presence of infinities, NaNs, and signed
zeroes is a platform-dependent crapshoot.

> in code i write which cares about such things, i have to start:
>
> inf = 1e300 ** 1e300
> nan = inf - inf

That would be much more portable (== would do what you intended by
accident on many more platforms) if you used multiplication instead of
exponentiation in the first line.

...

> And then god forbid i should actually want to test if a number is NaN,
> since, bizarrely, (x == nan) is true for every x; instead, i have to
> write:
> 
> def isnan(x):
>return (x == 0.0) and (x == 1.0)

The result of that is a platform-dependent accident too.  Python 2.4
(but not eariler than that) works hard to deliver _exactly_ the same
accident as the platform C compiler delivers, and at least NaN
comparisons work "as intended" (by IEEE 754) in 2.4 under gcc and MSVC
7.1 (because those C implementations treat NaN comparisons as intended
by IEEE 754; note that MSVC 6.0 did not):

>>> inf = 1e300 * 1e300
>>> nan == nan
>>> nan = inf - inf
>>> nan == 1.0
False
>>> nan < 1.0
False
>>> nan > 1.0
False
>>> nan == nan
False
>>> nan < nan
False
>>> nan > nan
False
>>> nan != nan
True

So at the Python level you can do "x != x" to see whether x is a NaN
in 2.4+(assuming that works in the C with which Python was compiled;
it does under gcc and MSVC 7.1).

> The IEEE spec actually says that (x == nan) should be *false* for every x,
> including nan. I'm not sure if this is more or less stupid than what
> python does!

Python did nothing "on purpose" here before Python 2.4.

> And while i'm ranting, how come these expressions aren't the same:
>
> 1e300 * 1e300
> 1e300 ** 2

Because all Python behavior in the presence of infinities, NaNs and
signed zeroes is a platform-dependent accident.

> And finally, does Guido know something about arithmetic that i don't,

Probably yes, but that's not really what you meant to ask .

> or is this expression:
>
> -1.0 ** 0.5
>
> Evaluated wrongly?

Read the manual for the precedence rules.  -x**y groups as -(x**y). 
-1.0 is the correct answer.  If you intended (-x)**y, then you need to
insert parentheses to force that order.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: math.nroot [was Re: A brief question.]

2005-07-03 Thread Tim Peters
[Steven D'Aprano]
...
> But this works:
> 
> py> inf = float("inf")
> py> inf
> inf

Another platform-dependent accident.  That does not work, for example,
on Windows.  In fact, the Microsoft C float<->string routines don't
support any way "to spell infinity" that works in the string->float
direction.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: math.nroot [was Re: A brief question.]

2005-07-03 Thread Tim Peters
...

[Tom Anderson]
> So, is there a way of generating and testing for infinities and NaNs
> that's portable across platforms and versions of python?

Not that I know of, and certainly no simple way.

> If not, could we perhaps have some constants in the math module for them?

See PEP 754 for this.

...

>> Read the manual for the precedence rules.  -x**y groups as -(x**y). -1.0
>> is the correct answer.  If you intended (-x)**y, then you need to insert
>> parentheses to force that order.

> So i see. Any idea why that precedence order was chosen? It goes against
> conventional mathematical notation, as well as established practice in
> other languages.

Eh?  For example, Fortran and Macsyma also give exponentiation higher
precedence than unary minus.  From my POV, Python's choice here was
thoroughly conventional.

> Also, would it be a good idea for (-1.0) ** 0.5 to evaluate to 1.0j? It
> seems a shame to have complex numbers in the language and then miss this
> opportunity to use them!

It's generally true in Python that complex numbers are output only if
complex numbers are input or you explicitly use a function from the
cmath module.  For example,

>>> import math, cmath
>>> math.sqrt(-1)
Traceback (most recent call last):
  File "", line 1, in ?
ValueError: math domain error
>>> cmath.sqrt(-1)
1j

The presumption is that a complex result is more likely the result of
program error than intent for most applications.  The relative handful
of programmers who expect complex results can get them easily, though.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: math.nroot [was Re: A brief question.]

2005-07-05 Thread Tim Peters
[Tim Peters]
>> All Python behavior in the presence of infinities, NaNs, and signed
>> zeroes is a platform-dependent accident, mostly inherited from that
>> all C89 behavior in the presence of infinities, NaNs, and signed
>> zeroes is a platform-dependent crapshoot.
 
[Michael Hudson]
> As you may have noticed by now, I'd kind of like to stop you saying
> this :) -- at least on platforms where doubles are good old-fashioned
> 754 8-byte values.

Nope, I hadn't noticed!  I'll stop saying it when it stops being true,
though .  Note that since there's not even an alpha out for 2.5
yet, none of the good stuff you did in CVS counts for users yet.

> But first, I'm going to whinge a bit, and lay out some stuff that Tim
> at least already knows (and maybe get some stuff wrong, we'll see).
>
> Floating point standards lay out a number of "conditions": Overflow
> (number too large in magnitude to represent), Underflow (non-zero
> number to small in magnitude to represent), Subnormal (non-zero number
> to small in magnitude to represent in a normalized way), ...

The 754 standard has five of them:  underflow, overflow, invalid
operation, inexact, and "divide by 0" (which should be understood more
generally as a singularity; e.g., divide-by-0 is also appropriate for
log(0)).

> For each condition, it should (at some level) is possible to trap each
> condition, or continue in some standard-mandated way (e.g. return 0
> for Underflow).

754 requires that, yes.

> While ignoring the issue of allowing the user to control this, I do
> wish sometimes that Python would make up it's mind about what it does
> for each condition.

Guido and I agreed long ago that Python "should", by default, raise an
exception on overflow, invalid operation, and divide by 0, and "should
not", by default, raise an exception on underflow or inexact.  Such
defaults favor non-expert use.  Experts may or may not be happy with
them, so Python "should" also allow changing the set.

> There are a bunch of conditions which we shouldn't and don't trap by
> default -- Underflow for example.  For the conditions that probably should
> result in an exception, there are inconsistencies galore:

> >>> inf = 1e300 * 1e300 # <- Overflow, no exception
> >>> nan = inf/inf # <- InvalidOperation, no exception

Meaning you're running on a 754 platform whose C runtime arranged to
disable the overflow and invalid operation traps.  You're seeing
native HW fp behavior then.

> >>> pow(1e100, 100) <- Overflow, exception
> Traceback (most recent call last):
>  File "", line 1, in ?
> OverflowError: (34, 'Numerical result out of range')
> >>> math.sqrt(-1) # <- InvalidOperation, exception
> Traceback (most recent call last):
>  File "", line 1, in ?
> ValueError: math domain error

Unlike the first two examples, these call libm functions.  Then it's a
x-platform crapshoot whether and when the libm functions set errno to
ERANGE or EDOM, and somewhat of a mystery whether it's better to
reproduce what the native libm considers to be "an error", or try to
give the same results across platforms.  Python makes a weak attempt
at the latter.

> At least we're fairly consistent on DivisionByZero...

When it's a division by 0, yes.  It's cheap and easy to test for that.
 However, many expert uses strongly favor getting back an infinity
then instead, so it's not good that Python doesn't support a choice
about x/0.

> If we're going to trap Overflow consistently, we really need a way of
> getting the special values reliably -- which is what pep 754 is about,
> and its implementation may actually work more reliably in 2.5 since my
> recent work...

I don't know what you have in mind.  For example, checking the result
of x*y to see whether it's an infinity is not a reliable way to detect
overflow, and it fails in more than one way (e.g., one of the inputs
may have been an infinity (in which case OverflowError is
inappropriate), and overflow doesn't always result in an infinity
either (depends on the rounding mode in effect)).

> On the issue of platforms that start up processes with traps enabled,
> I think the correct solution is to find the incantation to turn them
> off again and use that in Py_Initialize(), though that might upset
> embedders.

Hard to know.  Python currently has a hack to disable traps on
FreeBSD, in python.c's main().
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: math.nroot [was Re: A brief question.]

2005-07-07 Thread Tim Peters
[Tim Peters]
>>>> All Python behavior in the presence of infinities, NaNs, and signed
>>>> zeroes is a platform-dependent accident, mostly inherited from that
>>>> all C89 behavior in the presence of infinities, NaNs, and signed
>>>> zeroes is a platform-dependent crapshoot.

[Michael Hudson]
>>> As you may have noticed by now, I'd kind of like to stop you saying
>>> this :) -- at least on platforms where doubles are good old-fashioned
>>> 754 8-byte values.

[Tim]
>> Nope, I hadn't noticed!  I'll stop saying it when it stops being true,
>> though .  Note that since there's not even an alpha out for 2.5
>> yet, none of the good stuff you did in CVS counts for users yet.

[Michael] 
> Well, obviously.  OTOH, there's nothing I CAN do that will be useful
> for users until 2.5 actually comes out.

Sure.  I was explaining why I keep saying what you say you don't want
me to say:  until 2.5 actually comes out, what purpose would it serve
to stop warning people that 754 special-value behavior is a x-platform
crapshoot?  Much of it (albeit less so) will remain a crapshoot after
2.5 comes out too.

>>> But first, I'm going to whinge a bit, and lay out some stuff that Tim
>>> at least already knows (and maybe get some stuff wrong, we'll see).
>>>
>>> Floating point standards lay out a number of "conditions": Overflow
>>> (number too large in magnitude to represent), Underflow (non-zero
>>> number to small in magnitude to represent), Subnormal (non-zero number
>>> to small in magnitude to represent in a normalized way), ...

>> The 754 standard has five of them:  underflow, overflow, invalid
>> operation, inexact, and "divide by 0" (which should be understood more
>> generally as a singularity; e.g., divide-by-0 is also appropriate for
>> log(0)).

> OK, the decimal standard has more, which confused me for a bit
> (presumably it has more because it doesn't normalize after each
> operation).

The "conditions" in IBM's decimal standard map, many-to-one, on to a
smaller collection of "signals" in that standard.  It has 8 signals: 
the 5 I named above from 754, plus "clamped", "rounded", and
"subnormal".  Distinctions are excruciatingly subtle; e.g., "rounded"
and "inexact" would be the same thing in 754, but, as you suggest, in
the decimal standard a result can be exact yet also rounded (if it
"rounds away" one or more trailing zeroes), due to the unnormalized
model.

>>> For each condition, it should (at some level) is possible to trap each
>>> condition, or continue in some standard-mandated way (e.g. return 0
>>> for Underflow).

>> 754 requires that, yes.

>>> While ignoring the issue of allowing the user to control this, I do
>>> wish sometimes that Python would make up it's mind about what it does
>>> for each condition.

>> Guido and I agreed long ago that Python "should", by default, raise an
>> exception on overflow, invalid operation, and divide by 0, and "should
>> not", by default, raise an exception on underflow or inexact.

And, I'll add, "should not" on rounded, clamped and subnormal too.

> OK.

OK .

>> Such defaults favor non-expert use.  Experts may or may not be happy
>> with them, so Python "should" also allow changing the set.
 
> Later :)

That's a problem, though.  754 subsets are barely an improvement over
what Python does today:  the designers knew darned well that each
default is going to make some non-trivial group of users horridly
unhappy.  That's why such extensive machinery for detecting signals,
and for trapping or not trapping on signals, is mandated.  That's a
very important part of these standards.

> (In the mean time can we just kill fpectl, please?)

Has it been marked as deprecated yet (entered into the PEP for
deprecated modules, raises deprecation warnings, etc)?  I don't know. 
IMO it should become deprecated, but I don't have time to push that.

>>> There are a bunch of conditions which we shouldn't and don't trap by
>>> default -- Underflow for example.  For the conditions that probably should
>>> result in an exception, there are inconsistencies galore:
>>>
>>> >>> inf = 1e300 * 1e300 # <- Overflow, no exception
>>> >>> nan = inf/inf # <- InvalidOperation, no exception

>> Meaning you're running on a 754 platform whose C runtime arranged to
>> disable the overflow and invalid operation traps.
 
> Isn't that the standard-mandated start up environment?

Re: PPC floating equality vs. byte compilation

2005-07-09 Thread Tim Peters
[Donn Cave]
> I ran into a phenomenon that seemed odd to me, while testing a
> build of Python 2.4.1 on BeOS 5.04, on PowerPC 603e.
>
> test_builtin.py, for example, fails a couple of tests with errors
> claiming that apparently identical floating point values aren't equal.
> But it only does that when imported, and only when the .pyc file
> already exists.  Not if I execute it directly (python test_builtin.py),
> or if I delete the .pyc file before importing it and running test_main().

It would be most helpful to open a bug report, with the output from
failing tests.  Can't guess much from the above.  In general, this can
happen if the platform C string<->float routines are so poor that

eval(repr(x)) != x

for some float x, because .pyc files store repr(x) for floats in
2.4.1.  The 754 standard requires that eval(repr(x)) == x exactly for
all finite float x, and most platform C string<->float routines these
days meet that requirement.

> For now, I'm going to just write this off as a flaky build.  I would
> be surprised if 5 people in the world care, and I'm certainly not one
> of them.  I just thought someone might find it interesting.

There are more than 5 numeric programmers even in the Python world
, but I'm not sure there are more than 5 such using BeOS 5.04 on
PowerPC 603e.

> The stalwart few who still use BeOS are mostly using Intel x86 hardware,
> as far as I know, but the first releases were for PowerPC, at first
> on their own hardware and then for PPC Macs until Apple got nervous
> and shut them out of the hardware internals.  They use a Metrowerks
> PPC compiler that of course hasn't seen much development in the last
> 6 years, probably a lot longer.

The ultimate cause is most likely in the platform C library's
string<->float routines (sprintf, strtod, that kind of thing).
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: math.nroot [was Re: A brief question.]

2005-07-11 Thread Tim Peters
[Tim Peters]
>>>>>> All Python behavior in the presence of infinities, NaNs, and signed
>>>>>> zeroes is a platform-dependent accident, mostly inherited from that
>>>>>> all C89 behavior in the presence of infinities, NaNs, and signed
>>>>>> zeroes is a platform-dependent crapshoot.

[Michael Hudson]
>>>>> As you may have noticed by now, I'd kind of like to stop you saying
>>>>> this :) -- at least on platforms where doubles are good old-fashioned
>>>>> 754 8-byte values.

[Tim]
>>>> Nope, I hadn't noticed!  I'll stop saying it when it stops being true,
>>>> though .  Note that since there's not even an alpha out for 2.5
>>>> yet, none of the good stuff you did in CVS counts for users yet.

[Michael]
>>> Well, obviously.  OTOH, there's nothing I CAN do that will be useful
>>> for users until 2.5 actually comes out.

[Tim]
>> Sure.  I was explaining why I keep saying what you say you don't want
>> me to say:  until 2.5 actually comes out, what purpose would it serve
>> to stop warning people that 754 special-value behavior is a x-platform
>> crapshoot?  Much of it (albeit less so) will remain a crapshoot after
>> 2.5 comes out too.

[Michael] 
> Well, OK, I phrased my first post badly.  Let me try again:
>
> I want to make this situation better, as you may have noticed.

Yup, I did notice that!  I've even pointed it out.  In a positive light .

>>>>> But first, I'm going to whinge a bit, and lay out some stuff that Tim
>>>>> at least already knows (and maybe get some stuff wrong, we'll see).
>>>>>
>>>>> Floating point standards lay out a number of "conditions": Overflow
>>>>> (number too large in magnitude to represent), Underflow (non-zero
>>>>> number to small in magnitude to represent), Subnormal (non-zero
>>>>> number to small in magnitude to represent in a normalized way), ...

>>>> The 754 standard has five of them:  underflow, overflow, invalid
>>>> operation, inexact, and "divide by 0" (which should be understood more
>>>> generally as a singularity; e.g., divide-by-0 is also appropriate for
>>>> log(0)).

>>> OK, the decimal standard has more, which confused me for a bit
>>> (presumably it has more because it doesn't normalize after each
>>> operation).

>> The "conditions" in IBM's decimal standard map, many-to-one, on to a
>> smaller collection of "signals" in that standard.  It has 8 signals:
>> the 5 I named above from 754, plus "clamped", "rounded", and
>> "subnormal".  Distinctions are excruciatingly subtle; e.g., "rounded"
>> and "inexact" would be the same thing in 754, but, as you suggest, in
>> the decimal standard a result can be exact yet also rounded (if it
>> "rounds away" one or more trailing zeroes), due to the unnormalized
>> model.
 
> Right, yes, that last one confused me for a while.
> 
> Why doesn't 754 have subnormal?

Who cares <0.1 wink>.

>  Actually, I think I'm confused about when Underflow is signalled -- is it
> when a denormalized result is about to be returned or when a genuine
> zero is about to be returned?

Underflow in 754 is involved -- indeed, the definition is different
depending on whether the underflow trap is or is not enabled(!).  On
top of that, it's not entirely defined -- some parts are left to the
implementer's discrection.  See

 http://www2.hursley.ibm.com/decimal/854M8208.pdf

for what, from a brief glance, looks very much like the underflow text
in the final 754 standard.

Note that subnormals and underflow are much less a real concern with
754 doubles than with 754 floats, because the latter have such a small
dynamic range.

>>>>> For each condition, it should (at some level) is possible to trap each
>>>>> condition, or continue in some standard-mandated way (e.g. return 0
>>>>> for Underflow).

>>>> 754 requires that, yes.

>>>>> While ignoring the issue of allowing the user to control this, I do
>>>>> wish sometimes that Python would make up it's mind about what it
>>>>> does for each condition.

>>>> Guido and I agreed long ago that Python "should", by default, raise an
>>>> exception on overflow, invalid operation, and divide by 0, and "should
>>>> not", by default, raise an exception on underflow or inexact.

>> And, I'll add, "should no

Re: Tricky Dictionary Question from newbie

2005-07-12 Thread Tim Peters
[Peter Hansen]
...
> I suppose I shouldn't blame setdefault() itself for being poorly named,

No, you should blame Guido for that .

> but it's confusing to me each time I see it in the above, because the
> name doesn't emphasize that the value is being returned, and yet that
> fact is arguably more important than the fact that a default is set!
>
> I can't think of a better name, though, although I might find "foo" less
> confusing in the above context. :-)

I wanted to call it getorset() -- so much so that even now I sometimes
still type that instead!  The "get" part reminds me that it's fetching
a value, same as dict.get(key, default) -- "or set"'ing it too if
there's not already a value to "get".  If you have a fancy enough
editor, you can teach it to replace setdefault by getorset whenever
you type the former ;-)
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: math.nroot [was Re: A brief question.]

2005-07-12 Thread Tim Peters
[Michael Hudson]
> I doubt anyone else is reading this by now, so I've trimmed quotes
> fairly ruthlessly :)

Damn -- there goes my best hope at learning how large a message gmail
can handle before blowing up .  OK, I'll cut even more.

[Michael]
>>> Can't we use the stuff defined in Appendix F and header  of
>>> C99 to help here?  I know this stuff is somewhat optional, but it's
>>> available AFAICT on the platforms I actually use (doesn't mean it
>>> works, of course).

[Tim]
>> It's entirely optional part of C99.

> Hmm, is  optional?  I'm not finding those words.  I know
> Appendix F is.

fenv.h is required, but the standard is carefully worded so that
fenv.h may not be of any actual use.  For example, a conforming
implementation can define FE_ALL_EXCEPT as 0 (meaning it doesn't
define _any_ of the (optional!) signal-name macros:  FE_DIVBYZERO,
etc).  That in turn makes feclearexcept() (& so on) pretty much
useless -- you couldn't specify any flags.

If the implementation chooses to implement the optional Appendix F,
then there are stronger requirements on what fenv.h must define.

>> Python doesn't require C99.

> Sure.  But it would be possible to, say, detect C99 floating point
> facilities at ./configure time and use them if available.

Yes.

>> The most important example of a compiler that doesn't support any of
>> that stuff is Microsoft's, although they have their own MS-specific
>> ways to spell most of it.

> OK, *that's* a serious issue.
> 
> If you had to guess, do you think it likely that MS would ship fenv.h
> in the next interation of VC++?

Sadly not.  If they wanted to do that, they had plenty of time to do
so before VC 7.1 was released (C99 ain't exactly new anymore).  As it
says on

http://en.wikipedia.org/wiki/C_programming_language

MS and Borland (among others) appear to have no interest in C99.

In part I expect this is because C doesn't pay their bills nearly so
much as C++ does, and C99 isn't a standard from the C++ world.

>>> In what way does C99's fenv.h fail?  Is it just insufficiently
>>> available, or is there some conceptual lack?

>> Just that it's not universally supported.  Look at fpectlmodule.c for
>> a sample of the wildly different ways it _is_ spelled across some
>> platforms.

> C'mon, fpectlmodule.c is _old_.  Maybe I'm stupidly optimistic, but
> perhaps in the last near-decade things have got a little better here.

Ah, but as I've said before, virtually all C compilers on 754 boxes
support _some_ way to get at this stuff.  This includes gcc before C99
and fenv.h -- if the platforms represented in fpectlmodule.c were
happy to use gcc, they all could have used the older gcc spellings
(which are in fpectlmodule.c, BTW, under the __GLIBC__ #ifdef).  But
they didn't, so they're using "minority" compilers.  I used to write
compilers for a living, but I don't think this is an inside secret
anymore :  there are a lot fewer C compiler writers than there
used to be, and a lot fewer companies spending a lot less money on
developing C compilers than there used to be.

As with other parts of C99, I'd be in favor of following its lead, and
defining Py_ versions of the relevant macros and functions.  People on
non-C99 platforms who care enough can ugly-up pyport.h with whatever
their platform needs to implement the same functionality, and C99
platforms could make them simple lexical substitutions.  For example,
that's the path we took for Python's C99-workalike Py_uintptr_t and
Py_intptr_t types (although those are much easier to "fake" across
non-C99 platforms).

>> A maze of #ifdefs could work too, provided we defined a
>> PyWhatever_XYZ API to hide platform spelling details.

> Hopefully it wouldn't be that bad a maze; frankly GCC & MSVC++ covers
> more than all the cases I care about.

I'd be happy to settle for just those two at the start,  As with
threading too, Python has suffered from trying to support dozens of
unreasonable platforms, confined to the tiny subset of abilities
common to all of them.  If, e.g., HP-UX wants a good Python thread or
fp story, let HP contribute some work for a change.  I think we have
enough volunteers to work out good gcc and MSVC stories -- although I
expect libm to be an everlasting headache (+ - * are done in HW and
most boxes have fully-conforming 754 semantics for them now; but there
are no pressures like that working toward uniform libm behaviors;
division is still sometimes done in software, but the divide-by-0 is
check is already done by Python and is dead easy to do).
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: math.nroot [was Re: A brief question.]

2005-07-13 Thread Tim Peters
[Steven D'Aprano]
> (All previous quoting ruthlessly snipped.)

And ruthlessly appreciated ;-)

> A question for Tim Peters, as I guess he'll have the most experience in
> this sort of thing.
>
> With all the cross-platform hassles due to the various C compilers not
> implementing the IEEE standard completely or correctly, I wonder how much
> work would be involved for some kind soul to implement their own maths
> library to do the lot, allowing Python to bypass the C libraries
> altogether.
>
> Are you falling over laughing Tim, or thinking what a great idea?

Neither, really.  Doing basic + - * / in SW is way too slow to sell to
programmers who take floats seriously in their work.  For libm, K-C Ng
at Sun was writing fdlibm at the same time Peter Tang & I were writing
a "spirit of 754" libm for Kendall Square Research (early 90's).  KSR
is long gone, and the code was proprietary anyway; fdlibm lives on,
with an MIT-like ("do whatever you want") license, although it doesn't
appear to have enjoyed maintenance work for years now:

http://www.netlib.org/fdlibm/

fdlibm is excellent (albeit largely inscrutable to non-specialists) work.

I believe that, at some point, glibc replaced its math functions with
fdlibm's, and went on to improve them.  Taking advantage of those
improvements may (or may not) raise licensing issues Python can't live
with.

There are at least two other potential issues with using it:

1. Speed again.  The libm I wrote for KSR was as accurate and relentlessly
   754-conforming as fdlibm, but approximately 10x faster.  There's
   an enormous amount of optimization you can do if you can exploit every
   quirk of the HW you're working on -- and the code I wrote was entirely
   unportable, non-standard C, which couldn't possibly run on any HW other
   than KSR's custom FPU.  C compiler vendors at least used to spend a lot
   of money similarly crafting libraries that exploited quirks of the HW they
   were targeting, and lots of platforms still have relatively fast libms as a
   result.  fdlibm aims to run on "almost any" 32-bit 754 box, and pays for
   that in comparative runtime sloth.  .  Since fdlibm was written at Sun over
   a decade ago, you can guess that it wasn't primarily aiming at the Pentium
   architecture.

2. Compatibility with the platform libm.  Some users will be unhappy unless
   the stuff they get from Python is quirk-for-quirk and bug-for-bug identical
   to the stuff they get from other languages on their platform.  There's really
   no way to do that unless Python uses the same libm.  For example,
   many people have no real idea what they're doing with libm functions, and
   value reproducibility over anything else -- "different outcomes means
   one of them must be in error" is the deepest analysis they can, or maybe
   just have time, to make.  Alas, for many uses of libm, that's a defensible
   (albeit appalling <0.6 wink>) attitude (e.g., someone slings sin() and cos()
   to plot a circle in a GUI -- when snapping pixels to the closest grid point,
   the tiniest possible rounding difference can make a pixel "jump" to a
   neighboring pixel, and then "it's a bug" if Python doesn't reproduce the
   same pixel plotting accidents as, e.g., the platform C or JavaScript).

> What sort of work is needed? Is it, say, as big a job as maintaining
> Python? Bigger? One weekend spent working solidly?

Half a full-time year if done from scratch by a numeric programming
expert.  Maybe a few weeks if building on fdlibm, which probably needs
patches to deal with modern compilers (e.g.,

http://www.netlib.org/fdlibm/readme

has no date on it, but lists as "NOT FIXED YET":

3. Compiler failure on non-standard code
Statements like
*(1+(int*)&t1) = 0;
are not standard C and cause some optimizing compilers (e.g.
GCC) to generate bad code under optimization.These cases
are to be addressed in the next release.
).
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Why does python break IEEE 754 for 1.0/0.0 and 0.0/0.0?

2005-07-14 Thread Tim Peters
[Grant Edwards]
> I've read over and over that Python leaves floating point
> issues up to the underlying platform.
>
> This seems to be largely true, but not always.  My underlying
> platform (IA32 Linux) correctly handles 1.0/0.0 and 0.0/0.0
> according to the IEEE 754 standard, but Python goes out of its
> way to do the wrong thing.

Python does go out of its way to raise ZeroDivisionError when dividing by 0.

> 1/0 is defined by the standard as +Inf and 0/0 is NaN.
>
> That's what my platform does for programs written in C.

IOW, that's what your platform C does (the behavior of these cases is
left undefined by the C89 standard, so it's not the case that you can
write a _portable_ C89 program relying on these outcomes).  What does
your platform C return for the integer expression 42/0?  Is any other
outcome "wrong"?

> Python apparently checks for division by zero and throws and exception
> rather than returning the correct value calculated by the
> underlying platform.
>
> Is there any way to get Python to return the correct results
> for those operations rather than raising an exception?

No, except when using the decimal module.  The latter provides all the
facilities in IBM's proposed standard for decimal floating-point,
which intends to be a superset of IEEE 854:

http://www2.hursley.ibm.com/decimal/

It's relatively easy to do this in the decimal module because it
emulates, in software, all the gimmicks that most modern FPUs provide
in hardware.

Note that support for 754 was rare on Python platforms at the time
Python was designed, and nobody mentioned 754 support as even a vague
desire in those days.  In the absence of user interest, and in the
absence of HW support for NaNs or infinities on most Python platforms,
the decision to raise an exception was quite sensible at the time. 
Python could not have implemented 754 semantics without doing
emulating fp arithmetic in SW on most platforms (as the decimal module
does today), and for much the same reasons you can't give a non-silly
answer to my earlier "what does your platform C return for the integer
expression 42/0?" question today .

> There's no way to "resume" from the exception and return a
> value from an exception handler, right?

Correct.

Note that there's a huge, current, informed discussion of these issues
already in the math.nroot thread.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Why does python break IEEE 754 for 1.0/0.0 and 0.0/0.0?

2005-07-14 Thread Tim Peters
[Tim Peters]
...
>> What does your platform C return for the integer expression
>> 42/0?  Is any other outcome "wrong"?

[Grant Edwards] 
> I guess I though it was obvious from my reference to IEEE 754
> that I was referring to floating point operations.

Yes, that was obvious.  Since I thought my point would be equally
obvious, I won't spell it out <0.7 wink>.

...

>> Note that support for 754 was rare on Python platforms at the
>> time Python was designed, and nobody mentioned 754 support as
>> even a vague desire in those days.

> I often foget how old Python is.  Still, I've been using IEEE
> floating point in C programs (and depending on the proper
> production and handling of infinities and NaNs) for more than
> 20 years now.  I had thought that Python might have caught up.

It has not.  Please see the other thread I mentioned.

>> In the absence of user interest, and in the absence of HW
>> support for NaNs or infinities on most Python platforms,

> Really?

Yes, but looks like you didn't finish reading the sentence.  Here's
the rest, with emphasis added:

>> the decision to raise an exception was quite sensible AT THE TIME.

You may have forgotten how much richer the "plausible HW" landscape
was at the time too.  I was deeply involved in implementing Kendall
Square Research's HW and SW 754 story at the time, and it was all
quite novel, with little prior art to draw on to help resolve the
myriad language issues 754 didn't address (e.g., what should Fortran's
3-branch Arithmetic IF statement do if fed a NaN?  there were hundreds
of headaches like that, and no cooperation among compiler vendors
since the language standards ignored 754).  The C standards didn't
mention 754 until C99, and then left all support optional (up to the
compiler implementer whether to do it).  That didn't help much for a
bigger reason:  major C vendors (like Microsoft and Borland) are still
ignoring C99.  "Subset" HW implementations of 754 were also common,
like some that didn't support denorms at all, others that didn't
implement the non-default rounding modes, some that ignored signed
zeroes, and several that implemented 754 endcases by generating kernel
traps to deal with infinities and NaNs, making them so much slower
than normal cases that users avoided them like death.

If I had to bet at the time, I would have put my money on 754 dying
out due to near-universal lack of language support, and incompatible
HW implementations.  Most programming languages still have no sane 754
story, but the remarkable dominance of the Pentium architecture
changed everything on the HW side.

>  I would have guessed that most Python platforms are
> '586 or better IA32 machines running either Windows or Linux.

Today, yes, although there are still Python users on many other OSes
and architectures.  Most of the latter support 754 too now.

> They all have HW support for NaNs and Infinities.

Yes, Intel-based boxes certainly do (and have for a long time), and so
do most others now.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Why does python break IEEE 754 for 1.0/0.0 and 0.0/0.0?

2005-07-14 Thread Tim Peters
[Grant Edwards]
>> 1/0 is defined by the standard as +Inf and 0/0 is NaN.
 
[Martin v. Löwis]
> I wonder why Tim hasn't protested here:

Partly because this thread (unlike the other current thread on the
topic) isn't moving toward making progress, and I have little time for
this.

But mostly because Python's fp design was in no way informed by 754,
so logic-chopping on the 754 standard wrt what Python actually does is
plainly perverse <0.5 wink>.

> I thought this was *not* the case. I thought IEEE 754 defined +Inf and NaN
> as only a possible outcome of these operations with other possible
> outcomes being exceptions... In that case, Python would comply to IEEE
> 754 in this respect (although in a different way than the C implementation on
> the same system).

Ya, and Unicode defines 16-bit little-endian characters . 
Seriously, the 754 standard is quite involved, and there's just no
visible point I can see to trotting out its elaborate details here. 
If Python claimed to support 754, then details would be important. 
Short course wrt this specific point:  there's no reasonable way in
which Python's float arithmetic can be said to comply to IEEE 754 in
this case, neither in letter nor spirit.  The decimal module does,
though (mutatis mutandis wrt base).
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: math.nroot [was Re: A brief question.]

2005-07-14 Thread Tim Peters
[Michael Hudson]
>>>>> In what way does C99's fenv.h fail?  Is it just insufficiently
>>>>> available, or is there some conceptual lack?

[Tim Peters]
>>>> Just that it's not universally supported.  Look at fpectlmodule.c for
>>>> a sample of the wildly different ways it _is_ spelled across some
>>>> platforms.

[Michael]
>>> C'mon, fpectlmodule.c is _old_.  Maybe I'm stupidly optimistic, but
>>> perhaps in the last near-decade things have got a little better here.

[Tim]
>> Ah, but as I've said before, virtually all C compilers on 754 boxes
>> support _some_ way to get at this stuff.  This includes gcc before C99
>> and fenv.h -- if the platforms represented in fpectlmodule.c were
>> happy to use gcc, they all could have used the older gcc spellings
>> (which are in fpectlmodule.c, BTW, under the __GLIBC__ #ifdef).

[Michael] 
> Um, well, no, not really.  The stuff under __GLIBC___ unsurprisingly
> applies to platforms using the GNU project's implementation of the C
> library, and GCC is used on many more platforms than just that
> (e.g. OS X, FreeBSD).

Good point taken:  parings of C compilers and C runtime libraries are
somewhat fluid.

So if all the platforms represented in fpectlmodule.c were happy to
use glibc, they all could have used the older glibc spellings. 
Apparently the people who cared enough on those platforms to
contribute code to fpectlmodule.c did not want to use glibc, though. 
In the end, I still don't know why there would be a reason to hope
that an endless variety of other libms would standardize on the C99
spellings.  For backward compatibility, they have to continue
supporting their old spellings too, and then what's in it for them to
supply aliases?  Say I'm SGI, struggling as often as not just to stay
in business.  I'm unlikely to spend what little cash I have to make it
easier for customers to jump ship .

> ...
>  Even given that, the glibc section looks mighty Intel specific to me (I don't
> see why 0x1372 should have any x-architecture meaning).

Why not?  I don't know whether glibc ever did this, but Microsoft's
spelling of this stuff used to, on Alphas (when MS compilers still
supported Alphas), pick apart the bits and rearrange them into the
bits needed for the Alpha's FPU control registers.  Saying that bit
0x10 (whatever) is "the overflow flag" (whatever) is as much a
x-platform API as saying that the expansion of the macro FE_OVERFLOW
is "the overflow flag".  Fancy pants symbolic names are favored by
"computer science" types these days, but real numeric programmers have
always been delighted to wallow in raw bits .

...

> One thing GCC doesn't yet support, it turns out, is the "#pragma STDC
> FENV_ACCESS ON" gumpf, which means the optimiser is all too willing to
> reorder
> 
>feclearexcept(FE_ALL_EXCEPT);
>r = x * y;
>fe = fetestexcept(FE_ALL_EXCEPT);
>
> into
> 
>feclearexcept(FE_ALL_EXCEPT);
>fe = fetestexcept(FE_ALL_EXCEPT);
>r = x * y;
> 
> Argh!  Declaring r 'volatile' made it work.
 
Oh, sigh.  One of the lovely ironies in all this is that CPython
_could_ make for an excellent 754 environment, precisely because it
does such WYSIWYG code generation.  Optimizing-compiler writers hate
hidden side effects, and every fp operation in 754 is swimming in them
-- but Python couldn't care much less.

Anyway, you're rediscovering the primary reason you have to pass a
double lvalue to the PyFPE_END_PROTECT protect macro. 
PyFPE_END_PROTECT(v) expands to an expression including the
subexpression

PyFPE_dummy(&(v))

where PyFPE_dummy() is an extern that ignores its double* argument. 
The point is that this dance prevents C optimizers from moving the
code that computes v below the code generated for
PyFPE_END_PROTECT(v).  Since v is usually used soon after in the
routine, it also discourages the optimizer from moving code up above
the PyFPE_END_PROTECT(v) (unless the C does cross-file analysis, it
has to assume that PyFPE_dummy(&(v)) may change the value of v). 
These tricks may be useful here too -- fighting C compilers to the
death is part of this game, alas.

PyFPE_END_PROTECT() incorporates an even stranger trick, and I wonder
how gcc deals with it.  The Pentium architecture made an agonizing
(for users who care) choice:  if you have a particular FP trap enabled
(let's say overflow), and you do an fp operation that overflows, the
trap doesn't actually fire until the _next_ fp operation (of any kind)
occurs.  You can honest-to-God have, e.g., an overflowing fp add on an
Intel box, and not learn about it until a billion cycles after it
happened (if you don't do more FP operations over the next billion
cycles).


Re: time.time() under load between two machines

2005-07-22 Thread Tim Peters
[EMAIL PROTECTED]
> I am seeing negative latencies of up to 1 second.  I am  using ntp to
> synchronize both machines at an interval of 2 seconds, so the clocks
> should be very much in sync (and are from what I have observed).  I
> agree that it is probably OS, perhaps I should hop over to a Microsoft
> newsgroup and pose the question, although I'm sure they will find a way
> to blame it on Python.

That won't be easy .  This is how Python computes time.time() on
Windows (it's C code, of course):

struct timeb t;
ftime(&t);
return (double)t.time + (double)t.millitm * (double)0.001;

`ftime()` there is from Microsoft's C library:



IOW, Python basically returns exactly what MS's ftime() returns, after
converting it to a double-precision float.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Ten Essential Development Practices

2005-07-29 Thread Tim Peters
[Steve Holden]
>> If I canpoint out the obvious, the output from "import this" *is*
>> headed "The Zen of Python", so clearly it isn;t intended to be
>> universal in its applicability.

[Michael Hudson]
> It's also mistitled there, given that it was originally posted as '19
> Pythonic Theses' and nailed to, erm, something.

'Twas actually posted as "20 Pythonic Theses", although most times I
count them I find19.  Nevertheless, that there are in fact 20 was
channeled directly from Guido's perfectly Pythonic mind, so 20 there
must be.  I suspect he withheld one -- although, as some argue, it's
possible he thinks in base 9.5, that just doesn't seem Pythonic to me.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Ten Essential Development Practices

2005-07-29 Thread Tim Peters
[Dan Sommers]
> Ok, not universal.  But as usual, Zen is not easily nailed to a tree.
> 
> Was Tim writing about developing Python itself, or about developing
> other programs with Python?

Tim was channeling Guido, and that's as far as our certain knowledge
can go.  It _seems_ reasonable to believe that since Guido's mind is,
by definition, perfectly Pythonic, any truth channeled from it
necessarily applies to all things Pythonic.

nevertheless-we-interpret-the-gods-at-our-peril-ly y'rs  - tim
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Occasional OSError: [Errno 13] Permission denied on Windows

2006-01-05 Thread Tim Peters
[Alec Wysoker]
> Using Python 2.3.5 on Windows XP, I occasionally get OSError: [Errno
> 13] Permission denied when calling os.remove().  This can occur with a
> file that is not used by any other process on the machine,

How do you know that?

> and is created by the python.exe invocation that is trying to delete it.  It
> can happen with various pieces of my system - almost anywhere I try to
> delete a file.
>
> I have assumed that the problem is that I was holding on to a handle to
> the file that I was trying to remove.  I have scoured my code and close
> any handle to the file that I can find.  The intermittent nature of
> this problem leads me to believe that I'm not explicitly holding onto a
> file object somewhere.
>
> My next theory was that there was some object holding onto a file
> handle for which there wasn't an extant reference, but which hadn't
> been garbage-collected.  So, I tried removing files like this:
>
> try:
> os.remove(strPath)
> except OSError:
> # Wild guess that garbage collection might clear errno 13
> gc.collect()
> os.remove(strPath)
>
> This does indeed reduce the frequency of the problem, but it doesn't
> make it go away completely.

Replace gc.collect() there with a short sleep (say, time.sleep(0.2)),
and see whether that does just as well at reducing the frequency of
the problem.  My bet is that it will.

Best guess is that some process you haven't thought about yet _is_
opening the file.  For example, what you're seeing is common if
Copernic Desktop Search is installed and its "Index new and modified
files on the fly" option is enabled.  Other searching/indexing apps,
"file deletion recovery" services, and even some virus scanners can
have similar effects.  That's why I asked at the start how you _know_
no other process is opening the file.  The symtoms you describe are
consistent with some background-level utility app/service briefly
opening files for its own purposes.

In that case, anything that burns some time and tries again will work
better.  Replacing gc.collect() with time.sleep() is an easy way to
test that hypothesis; because gc.collect() does an all-generations
collection, it can consume measurable time.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Does Python allow access to some of the implementation details?

2006-01-06 Thread Tim Peters
[Claudio Grondi]
> Let's consider a test source code given at the very end of this posting.
>
> The question is if Python allows somehow access to the bytes of the
> representation of a long integer or integer in computers memory?

CPython does not expose its internal representation of longs at the
Python level.

> Or does Python hide such implementation details that deep, that there is
> no way to get down to them?

As above.

> The test code below shows, that extracting bits from an integer value n
> is faster when using n&0x01 than when using n%2 and I suppose it is
> because %2 tries to handle the entire integer,

It not only tries, it succeeds ;-)

>where &0x01 processes only the last two bytes of it

If x and y are positive longs, the time required to compute x&y in all
recent CPythons is essentially proportional to the number of bits in
min(x, y).

> ...
> If it were possible to 'tell' the %2 operation to operate only on one short of
> the integer number representation there will be probably no difference in
> speed. Is there a way to do this efficiently in Python like it is possible in
> C when using pointers and recasting?

No.

> As I am on Python 2.4.2 and Microsoft Windows, I am interested in
> details related to this Python version (to limit the scope of the
> question).

Doesn't really matter:  same answers for all recent versions of
CPython on all platforms.  If you go back far enough, in older
versions of CPython the time to compute x&y was proportional to the
number of bits in max(x, y) (instead of min(x, y)).
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Newline at EOF Removal

2006-01-09 Thread Tim Peters
[Bengt Richter]
> ...
> [1] BTW, I didn't see the 't' mode in 
> http://docs.python.org/lib/built-in-funcs.html
> description of open/file, but I have a nagging doubt about saying it's not 
> valid.
> Where did you see it?

't' is a Windows-specific extension to standard C's file modes. 
Python passes mode strings as-is on to the platform C library, so if
your platform C likes 't', you're free to use it.

You might think there's no point to passing 't', since text mode is
the default.  If so, you'd almost be right ;-).  The rub is that
Windows also supports another non-standard gimmick, to make binary
mode the global default instead (although very few know about this,
and I've never seen it used in real life).  If you've done that, then
't' is necessary to get text mode.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: testing units in a specific order?

2006-01-09 Thread Tim Peters
[Antoon Pardon]
> I  have  used  unit  tests now for a number of project. One thing
> that I dislike is it that the order in which the tests  are  done
> bears no relationship to the order they appear in the source.
>
> This  makes  using  unit tests somewhat cumbersome. Is there some
> way to force the tests being done in a particular order?

They're run in alphabetical order, sorting on the test methods' names.
 For that reason some people name test methods like 'test_001',
'test_002', ..., although unit tests really "shouldn't" case which
order they get run in.  Sometimes this is abused in a different way,
by naming a setup kind of method starting with AAA and its
corresponding teardown kind of method with zzz.

You could presumably change the sort order by subclassing TestLoader
and overriding its class-level .sortTestMethodsUsing attribute (which
is `cmp` in TestLoader).  Sounds painful and useless to me, though ;-)
 The source-code order isn't available in any case (unittest doesn't
analyze source code).
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Do you have real-world use cases for map's None fill-in feature?

2006-01-10 Thread Tim Peters
[Raymond Hettinger]
> ...
> I scanned the docs for Haskell, SML, and Perl and found that the norm
> for map() and zip() is to truncate to the shortest input or raise an
> exception for unequal input lengths.
> ...
> Also, I'm curious as to whether someone has seen a zip fill-in feature
> employed to good effect in some other programming language, perhaps
> LISP or somesuch?

FYI, Common Lisp's `pairlis` function requires that its first two
arguments be lists of the same length.  It's a strain to compare to
Python's zip() though, as the _intended_ use of `pairlis` is to add
new pairs to a Lisp association list.  For that reason, `pairlis`
accepts an optional third argument; if present, this should be an
association list, and pairs from zipping the first two arguments are
prepended to it.  Also for this reason, the _order_ in which pairs are
taken from the first two arguments isn't defined(!).

http://www.lispworks.com/documentation/HyperSpec/Body/f_pairli.htm#pairlis

For its intended special-purpose use, it wouldn't make sense to allow
arguments of different lengths.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: testing units in a specific order?

2006-01-10 Thread Tim Peters
[Antoon Pardon]
> Well maybe unit tests shouldn't care (Thats what I think you meant),

Yup!

> I care. Some methods are vital for the functionality of other methods.
> So it the test for the first method fails it is very likely a number of
> other methods will fail too. However I'm not interrested in the results
> of those other tests in that case. Having to weed through all the test
> results in order to check first if the vital methods are working before
> checking other methods is cumbersome.
>
> Having the vital methods tested first and ignore the rest of the results
> if they fail is much easier.

So put the tests for the different kinds of methods into different
test classes, and run the corresponding test suites in the order you
want them to run.  This is easy.  Code like:

test_classes = [FileStorageConnectionTests,
FileStorageReconnectionTests,
FileStorageInvqTests,
FileStorageTimeoutTests,
MappingStorageConnectionTests,
MappingStorageTimeoutTests]

def test_suite():
suite = unittest.TestSuite()
for klass in test_classes:
suite.addTest(unittest.makeSuite(klass))
return suite

is common in large projects.  unittest runs tests added to a suite in
the order you add them (although  _within_ a test class, the test
methods are run in alphabetical order of method name -- when you want
ordering, that's the wrong level to try to force it; forcing order is
natural & easy at higher levels).
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: atexit + threads = bug?

2006-01-12 Thread Tim Peters
[David Rushby]
> Consider the following program (underscores are used to force
> indentation):
> 
> import atexit, threading, time
>
> def atExitFunc():
> print 'atExitFunc called.'
>
> atexit.register(atExitFunc)
>
> class T(threading.Thread):
> def run(self):
> assert not self.isDaemon()
> print 'T before sleep.'
> time.sleep(1.0)
> print 'T after sleep.'
>
> T().start()
> print 'Main thread finished.'
> 
>
> I would expect the program to print 'atExitFunc called.' after 'T after
> sleep.',

Why?  I expect very little ;-)

> but instead, it prints (on Windows XP with Python 2.3.5 or
> 2.4.2):
> 
> T before sleep.
> Main thread finished.
> atExitFunc called.
> T after sleep.
> 

That's not what I saw just now on WinXP Pro SP2.  With 2.3.5 and 2.4.2
I saw this order instead:

Main thread finished
atExitFunc called.
T before sleep.
T after sleep.

The relative order of "Main thread finished." and "T before sleep" is
purely due to timing accidents; it's even possible for "T after
sleep." to appear before "Main thread finished.", although it's not
possible for "T after sleep." to appear before "T before sleep.".  In
fact, there are only two orderings you can count on here:

T before sleep < T after sleep
Main thread finished < atExitFunc called

If you need more than that, you need to add synchronization code.

> atExitFunc is called when the main thread terminates, rather than when
> the process exits.

Is there a difference between "main thread terminates" and "the
process exits" on Windows?  Not in C.  It so happens that Python's
threading module _also_ registers an atexit callback, which does a
join() on all the threads you created and didn't mark as daemon
threads.  Because threading.py's atexit callback was registered first,
it gets called last when Python is shutting down, and it doesn't
return until it joins all the non-daemon threads still sitting around.
 Your atexit callback runs first because it was registered last.  That
in turn makes it _likely_ that you'll see (as we both saw) "at
exitFunc called." before seeing "T after sleep.", but doesn't
guarantee that.

Don't by fooled by _printing_ "Main thread finished", BTW:  that's
just a sequence of characters ;-).  The main thread still does a lot
of work after that point, to tear down the interpreter in a sane
order.  Part of that work is threading.py waiting for your threads to
finish.

> The atexit documentation contains several warnings,
> but nothing about this.  Is this a bug?

It doesn't look like a bug to me, and I doubt Python wants to make
stronger promises than it does now about the exact order of assorted
exit gimmicks.

You can reliably get "atExitFunc called." printed last by delaying
your import of the threading module until after you register your
atExitFunc callback.  If you register that first, it's called last,
and threading.py's wait-for-threads-to-end callback gets called first
then.  That callback won't return before your worker thread finishes.

There's no promise that will continue to work forever, though.  This
is fuzzy stuff vaguely covered by the atexit doc's "In particular,
other core Python modules are free to use atexit without the
programmer's knowledge."  threading.py happens to be such a module
today, but maybe it won't be tomorrow.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: atexit + threads = bug?

2006-01-12 Thread Tim Peters
[David Rushby]
> ...
> I understand your explanation and can live with the consequences, but
> the atexit docs sure don't prepare the reader for this.

In fact, they don't mention threading.py at all.

> They say, "Functions thus registered are automatically executed upon
> normal interpreter termination."  It seems like sophistry to argue that
> "normal interpreter termination" has occurred when there are still
> threads other than the main thread running.

Well, since atexit callbacks are written in Python, it's absurd on the
face of it to imagine that they run after the interpreter has torn
itself down.  Clearly Python is still running at that point, or they
wouldn't get run at all.

It's also strained to imagine that threads have nothing to do with
shutdown, since the threading docs say "the entire Python program
exits when only daemon threads are left".  It's not magic that
prevents Python from exiting when non-daemon threads are still
running.  You happened to use the same non-magical hack that
threading.py uses to fulfill that promise, and you're seeing
consequences of their interaction.  In Python as well as in C, atexit
only works well when it's got exactly zero or one users <0.1 wink>.

You're welcome to suggest text you'd like better, but microscopic
examination of details most people will never care about makes for bad
docs in a different way.  To get a full picture of how CPython's
shutdown works, you need to explain all of Py_Finalize() in English,
and you need to get agreement on which details are accidents and which
are guaranteed.

Now it's probably a fact that you couldn't care less about 99.9% of
those finalization details:  you only care about the one that just bit
you.  How are you going to beef up the docs in such a way that you
would have _found_ the bit you cared about, among the vast bulk of new
detail you don't care about?

You aren't, so you could settle for suggesting new words that just
cover the bit you care about.  Give it a try!

> Suppose that today I promise to donate my body to science "upon my
> death", and tomorrow, I'm diagnosed with a gradual but inexorable
> illness that will kill me within ten years.  I wouldn't expect to be
> strapped down and dissected immediately after hearing the diagnosis, on
> the basis that the mere prophecy of my death is tantamount to the death
> itself.

Next time, quit while you're ahead ;-)
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Why is there no post-pre increment operator in python

2006-01-12 Thread Tim Peters
[EMAIL PROTECTED]
> Anyone has any idea on why is there no post/pre increment operators in
> python ?

Maybe because Python doesn't aim at being a cryptic portable assembly
language?  That's my guess ;-)

> Although the statement:
> ++j
> works but does nothing

That depends on the type of j, and how it implements the __pos__()
method.  The builtin numeric types (integers, floats, complex)
implement __pos__ to return the base-class part of `self`.  That's not
the same as doing nothing.  There is no "++" operator in Python, BTW
-- that's two applications of the unary-plus operator.

>>> class MyFloat(float):
... pass
>>> x = MyFloat(3.5)
>>> x
3.5
>>> type(x)

>>> type(+x)  # "downcasts" to base `float` type

>>> type(x.__pos__())   # same thing, but wordier


If you want, you can implement __pos__ in your class so that

   +a_riteshtijoriwala_object

posts messages to comp.lang.c asking why C is so inflexible ;-).
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Converting a string to an array?

2006-01-13 Thread Tim Peters
[Bryan Olson]
> ...
> For sorting, we had the procedure 'sort', then added the pure
> function 'sorted'. We had a 'reverse' procedure, and wisely
> added the 'reversed' function.
>
> Hmmm... what we could we possible do about 'shuffle'?

'permuted' is the obvious answer, but that would leave us open to more
charges of hifalutin elitism, so the user-friendly and slightly risque
'jiggled' it is.

sorry-it-can't-be-'shuffled'-we-ran-out-of-'f's-ly y'rs  - tim
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Just want to walk a single directory

2006-01-14 Thread Tim Peters
[EMAIL PROTECTED]
> I have a super-simple need to just walk the files in a single directory.
>
> I thought this would do it, but "permanentFilelist" ends up containing
> all folders in all subdirectories.

All folders everywhere, or all file (not directory) names in the top
two levels?  It looks like the latter to me.

> Could someone spot the problem? I've scoured some threads using XNews reg
> expressions involving os.walk, but couldn't extrapolate the answer for my
> need.
>
> ===
>
> thebasedir = "E:\\temp"
>
> permanentFilelist= []
>
> for thepath,thedirnames,thefilenames in os.walk(thebasedir):
>
> if thepath != thebasedir:

You wanted == instead of != there.  Think about it ;-)

> thedirnames[:] = []
>
> for names in thefilenames:
> permanentFilelist.append(names)

A simpler way (assuming I understand what you're after) is:

thebasedir = "C:\\tmpold"
for dummy, dummy, permanentFilelist in os.walk(thebasedir):
break

or the possibly more cryptic equivalent:

thebasedir = "C:\\tmpold"
permanentFilelist = os.walk(thebasedir).next()[-1]

or the wordier but transparent:

thebasedir = "C:\\tmpold"
permanentFilelist = [fn for fn in os.listdir(thebasedir)
 if os.path.isfile(os.path.join(thebasedir, fn))]
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: More than you ever wanted to know about objects [was: Is everything a refrence or isn't it]

2006-01-15 Thread Tim Peters
[Alex Martelli]
...
>> In mathematics, 1 is not "the same" as 1.0 -- there exists a natural
>> morphism of integers into reals that _maps_ 1 to 1.0, but they're still
>> NOT "the same" thing.  And similarly for the real-vs-complex case.

[Xavier Morel]
> I disagree here, 1 and 1.0 are the same mathematical object e.g. 1 (and
> the same as "1+0i"), the difference due to notation only makes sense in
> computer science where integers, real and complex ensembles are disjoin.
> In mathematics, Z is included in IR which is included in C (note: this
> is not mathspeak, but I have no idea how to say it in english), and this
> notation -- at best -- merely determines the ensemble you're currently
> considering.
>
> There is no "natural morphism" of integers into reals because there is
> no mathematical difference between integers and reals, the real ensemble
> is merely a superset of the integers one.
>
> Or so it was last time i got a math course.

This all depends on which math course you last took ;-)  You have more
a physicist's view here.  The simplest case is real versus complex,
where even a physicist  can accept that a complex number,
formally, is an ordered pair of real numbers.  From that view, it's
almost obviously not possible that a complex number could be "the same
object" as a real number.  For example, 1+0i is formally the ordered
pair <1.0, 0.0>, but the real 1.0 is just the real 1.0.  If you'll
grant that a real number is never itself an ordered pair of real
numbers, then the intersection between the complex and real numbers is
necessarily empty.

At lower levels of the "numeric tower" you have in mind, the formal
difference is more extreme, not less.  The natural numbers
("non-negative integers") are often defined in terms of von Neumann
ordinals, so that natural number N "is" the set of all natural numbers
less than N (0 "is" the empty set, 1 "is" the set containing the empty
set, 2 "is" the set containing the empty set and the set containing
the empty set, ...), while defining reals as either Dedekind cuts or
Cauchy sequences requires elaborate formal machinery.

Does it matter?  To foundational mathematicians, certainly.  Luckily,
in a computer all numerics suck, so who cares ;-).
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: OT: excellent book on information theory

2006-01-16 Thread Tim Peters
[Paul Rubin]
...
>> David J.C. MacKay
>> Information Theory, Inference, and Learning Algorithms
>>
>> Full text online:
>> http://www.inference.phy.cam.ac.uk/mackay/itila/
...
>> The printed version is somewhat expensive, but according to the
>> following analysis it's a better bargain than "Harry Potter and the
>> Philosopher's Stone":
>>
>> http://www.inference.phy.cam.ac.uk/mackay/itila/Potter.html

[Grant Edwards]
> That made me smile on a Monday morning (not an insignificant
> accomplishment).  I noticed in the one footnote that the H.P.
> book had been "translated into American".  I've always wondered
> about that.  I noticed several spots in the H.P. books where
> the dialog seemed "wrong": the kids were using American rather
> than British English.  I thought it rather jarring.

You should enjoy:

   http://www.hp-lexicon.org/about/books/differences.html

and especially the links near the bottom to try-to-be-exhaustive
listings of all differences between the Bloomsbury (UK) and Scholastic
(US) editions.  More "Britishisms" are surviving in the Scholastic
editions as the series goes on, but as the list for Half-Blood Prince
shows the editors still make an amazing number of seemingly pointless
changes:

   http://www.hp-lexicon.org/about/books/hbp/differences-hbp.html

like:

   UK:Harry smiled vaguely back
   US:Harry smiled back vaguely

Non-English translations have real challenges, and because this series
is more popular than the Python Reference Manual these days, there's a
lot of fascinating info to be found.  For example, I think the
Japanese translator deserves a Major Award for their heroic attempt to
translate Ron's "Uranus" pun:

   http://www.cjvlang.com/Hpotter/wordplay/uranus.html



-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Decimal ROUND_HALF_EVEN Default

2006-01-16 Thread Tim Peters
[3c273]
> I'm just curious as to why the default rounding in the decimal module is
> ROUND_HALF_EVEN instead of ROUND_HALF_UP.

Because it's the best (numerically "fairest") rounding method for most
people most of the time.

> All of the decimal arithmetic I do is rounded half up and I can't think of
> why one might use round half even.

Because you want better numeric results, or because your application
requires it.  "Half-even" is also called "banker's rounding" in the
United States, because it's required in many (but not all) banking
applications.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Decimal ROUND_HALF_EVEN Default

2006-01-17 Thread Tim Peters
[LordLaraby]
> If 'bankers rounding' is HALF_ROUND_EVEN, what is HALF_ROUND_UP?

Not banker's rounding ;-).  Same answer if you had said ROUND_HALF_UP
instead (which  I assume you intended) -- most of these don't have
cute names.

> I confess to never having heard the terms.

ROUND_HALF_UP etc are symbolic constants in Python's `decimal` module;
see the docs.

> I usually do: Y = int(X + 0.5) scaled to proper # of decimal places.
> Which type of rounding is this? If either.

If you meant what you said, it's not "rounding" at all, because it's
insane for negative inputs.  For example, int(-2 + 0.5) = int(-1.5) =
-1, and no _rounding_ method changes an exact integer (like -2) to a
_different_ exact integer (like -1).

If you were assuming X >= 0.0, then int(X+0.5) coincides with
ROUND_HALF_UP on that domain.  For X < 0.0, ROUND_HALF_UP works like
int(X-0.5).
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: OT: excellent book on information theory

2006-01-18 Thread Tim Peters
[Paul Rubin]
>> I wouldn't have figured out that a "car park" was a parking lot.  I
>> might have thought it was a park where you go to look at scenery from
>> inside your car.  Sort of a cross between a normal park and a drive-in
>> movie.

[Grant Edwards[
> ;)
>
> That's a joke, right?

Probably not, if Paul's American.  For example, here in the states we
have Python Parks, where you go to look at scenery from inside your
python.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Decimal vs float

2006-01-19 Thread Tim Peters
[Kay Schluehr]
>> This is interesting. If we define
>>
>> def f():
>>print str(1.1)
>>
>> and disassemble the function, we get:
>>
> dis.dis(f)
>  2   0 LOAD_GLOBAL  0 (str)
>  3 LOAD_CONST   1 (1.1001)  # huh?

[Fredrik Lundh]
> huh huh?
>
> >>> str(1.1)
> '1.1'
> >>> repr(1.1)
> '1.1001'
> >>> "%.10g" % 1.1
> '1.1'

A more interesting one is:

"%.12g" % a_float

because that's closest to what str(a_float) produces in Python. 
repr(a_float) is closest to:

"%.17g" % a_float

> >>> "%.20g" % 1.1
> '1.1001'
> >>> "%.30g" % 1.1
> '1.1001'
> >>> "%.10f" % 1.1
> '1.10'
> >>> "%.20f" % 1.1
> '1.1001'
> >>> "%.30f" % 1.1
> '1.100100'

The results of most of those (the ones asking for more than 17
significant digits) vary a lot across platforms.  The IEEE-754
standard doesn't wholly define output conversions, and explicitly
allows that a conforming implementation may produce any digits
whatsoever at and after the 18th signficant digit. when converting a
754 double to string.  In practice, all implementations I know of that
exploit that produce zeroes at and after the 18th digit -- but they
could produce 1s instead, or 9s, or digits from pi, or repetitions of
the gross national product of Finland in 1967.  You're using one of
those there, probably Windows.  glibc does conversions "as if to
infinite precision" instead, so here on a Linux box:

>>> "%.20g" % 1.1
'1.1000888'
>>> "%.30g" % 1.1
'1.10008881784197001'
>>> "%.50g" % 1.1
'1.1000888178419700125232338905334472656'
>>> "%.100g" % 1.1
'1.100088817841970012523233890533447265625'

The last one is in fact the exact decimal representation of the 754
double closest to the decimal 1.1.

> more here: http://docs.python.org/tut/node16.html

Still, there's always more ;-)
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Strange python behavior with modules on an emt64 box

2006-01-20 Thread Tim Peters
[Joshua Luben]
> I thought I would post this here first before seeking more experienced ears
> for this particular strangness.
>
> I have Python 2.4.2 installed from source on a dual processor dell server.
> These are x86_64 processors (verified by /bin/arch) (aka emt64 extensions).
>
> uname -a gives
> Linux eps-linuxserv3 2.6.5-7.244-smp #1 SMP Mon Dec 12 18:32:25 UTC 2005 
> x86_64 x86_64 x86_64 GNU/Linux
>
> The flavor of Linux is 64 bit SUSE SLES 9 with the latest updates.
>
>
> Now for the strangeness. I'm using DCOracle2 (can't use anything else, as
> this is the corporate standard) also compiled from source. When calling
> executemany() when any parameter is of type int, I get a OverflowError. I
> turned on debug traces in DCOracle2; this indicated that PyArg_ParseTuple()
> was returning sizeof(int) = 4 bytes.

Sounds right to me.  I don't know of any platform other than old Cray
Research boxes where sizeof(int) > 4.

> DCOracle2 is compiled such that sizeof(int) = 8 bytes.

Sounds wrong to me.

> Python itself gives,
>
> python -c "import sys; print sys.maxint"
> 9223372036854775807
>
> Therefore, indicating that the size of int is 8 bytes.

No, it does not.  A Python `int` is a C `long`, and sizeof(long) = 8
on most 64-bit boxes (Win64 is an exception).  The size of a platform
C long can be deduced from the value of Python's sys.maxint, but
nothing about the size of a platform C int.

> So I'll go out on a limb here and assume that this is a python problem...but
> I don't know where to take it...

I'd start with this part, which sounds crazy:

DCOracle2 is compiled such that sizeof(int) = 8 bytes.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Is there a maximum length of a regular expression in python?

2006-01-20 Thread Tim Peters
[Bryan Olson]
>> Does no one care about an internal error in the regular expression
>> engine?

[Steve Holden]
> Not one that requires parsing a 100 kilobyte re that should be replaced
> by something more sensible, no.

I care:  this is a case of not detecting information loss due to
unchecked downcasting in C, and it was pure luck that it resulted in
an internal re error rather than, say, a wrong result.  God only knows
what other pathologies the re engine could tricked into exhibiting
this way.  Python 2.5 will raise an exception instead, during regexp
compilation (I just checked in code for this on the trunk; with some
luck, someone will backport that to 2.4 too).
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Can a simple a==b 'hang' in and endless loop?

2006-01-20 Thread Tim Peters
[Claudio Grondi]
>> Python 2.4.2 (#67, Sep 28 2005, 12:41:11) [MSC v.1310 32 bit
(Intel)] on win32 - IDLE 1.1.2
>>  >>> a=[]
>>  >>> a.append(a)
>>  >>> b=[]
>>  >>> b.append(b)
>>  >>> a==b
>>
>> Traceback (most recent call last):
>>File "", line 1, in -toplevel-
>>  a==b
>> RuntimeError: maximum recursion depth exceeded in cmp

|[Steven D'Aprano]
> Works for me:

Under a different version of Python, though.

> Python 2.3.3 (#1, May  7 2004, 10:31:40)
> [GCC 3.3.3 20040412 (Red Hat Linux 3.3.3-7)] on linux2
> Type "help", "copyright", "credits" or "license" for more information.
> >>> a = []
> >>> a.append(a)
> >>> b = []
> >>> b.append(b)
> >>> a == b
> True
>
>
> Maybe IDLE is playing silly buggers, or perhaps Python 2.4.2 has a bug.

It's neither.  From the NEWS file for Python 2.4a1:

"""
- Python no longer tries to be smart about recursive comparisons.
  When comparing containers with cyclic references to themselves it
  will now just hit the recursion limit.  See SF patch 825639.
"""
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Decimal vs float

2006-01-21 Thread Tim Peters
[Kay Schluehr]
> I concur and I wonder why CAS like e.g. Maple that represent floating
> point numbers using two integers [1] are neither awkward to use nor
> inefficient.

My guess is that it's because you never timed the difference in Maple
-- or, perhaps, that you did, but misinterpreted the results.  You
don't give any data, so it's hard to guess which.

BTW, why do you think Maple's developers added the UseHardwareFloats option?

> According to the Python numeric experts one has to pay a
> high tradeoff between speed and accuracy.  But as it seems it just
> compares two Python implementations ( float / decimal ) and does not
> compare those to approaches in other scientific computing systems.

It's easy to find papers comparing the speed of HW and SW floating
point in Maple.  Have you done that, Kay?  For example, read:

"Symbolic and Numeric Scientific Computation in Maple"
K.O. Geddes, H.Q. Le
http://www.scg.uwaterloo.ca/~kogeddes/papers/ICAAA02.ps

Keith Geddes is a key figure in Maple's history and development, and
can hardly be accused of being a Python apologist ;-)  Note that
Example 1.5 there shows a _factor_ of 47 speed gain from using HW
instead of SW floats in Maple, when solving a reasonably large system
of linear equations.  So I'll ask again ;-):  why do you think Maple's
developers added the UseHardwareFloats option?

While that paper mentions the facility only briefly, Geddes and Zheng
give detailed analyses of the tradeoffs in Maple here:

"Exploiting Fast Hardware Floating Point in High Precision Computation"
http://www.scg.uwaterloo.ca/~kogeddes/papers/TR200241.ps

If you're uncomfortable reading technical papers, one bottom line is
that they show that the time required by Maple to do a floating-point
multiplication in software "is at least 1000 times larger" than doing
the same with UseHardwareFloats set to true (and Digits:=15 in both
cases).

> By the way one can also learn from Maple how accuracy can be adjusted
> practically. I never heard users complaining about that.

It's easy to change the number of digits of precision in Python's
decimal module.

> ...
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Possible memory leak?

2006-01-25 Thread Tim Peters
[Fredrik Lundh]
>> ...
>> for the OP's problem, a PIL-based solution would probably be ~100
>> times faster than the array solution, but that's another story.

[Tuvas]
> What do you mean by a PIL based solution? The reason I need to get the
> data into the string list is so I can pump it into PIL to give me my
> image... If PIL has a way to make it easier, I do not know it, but
> would like to know it.

If your data is in an array.array, you can pass that directly to PIL's
(1.1.4) Image.frombuffer() constructor
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: writing large files quickly

2006-01-28 Thread Tim Peters
[Jens Theisen]
> ...
> Actually I'm not sure what this optimisation should give you anyway. The
> only circumstance under which files with only zeroes are meaningful is
> testing, and that's exactly when you don't want that optimisation.

In most cases, a guarantee that reading "uninitialized" file data will
return zeroes is a security promise, not an optimization.  C doesn't
require this behavior, but POSIX does.

On FAT/FAT32, if you create a file, seek to a "large" offset, write a
byte, then read the uninitialized data from offset 0 up to the byte
just written, you get back whatever happened to be sitting on disk at
the locations now reserved for the file.  That can include passwords,
other peoples' email, etc -- anything whatsoever that may have been
written to disk at some time in the disk's history.  Security weenies
get upset at stuff like that ;-)
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: os.walk() dirs and files

2006-02-08 Thread Tim Peters
[rtilley]
> When working with file and dir info recursively on Windows XP. I'm going
> about it like this:
>
> for root, dirs, files in os.walk(path):
>  for f in files:
>  ADD F to dictionary
>  for d in dirs:
>  ADD D to dictionary
>
> Is it possible to do something such as this:
>
> for root, dirs, files in os.walk(path):
>  for f,d in files, dirs:
>  ADD F|D to dictionary
>
> Just trying to save some lines of code and thought it wise to ask the
> gurus before trying it :)

As has been pointed out,

for name in dirs + files:

is simple and effective.  In a context where you don't want to endure
the memory burden of materializing a concatentated list, you can do

for name in itertools.chain(dirs, files):

intead.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: how do you pronounce 'tuple'?

2006-02-14 Thread Tim Peters
[EMAIL PROTECTED]
> ...
> I work with Guido now and I'm conflicted.  I'm still conditioned to say
> tuhple.  Whenever he says toople, I just get a smile on my face.  I
> think most of the PythonLabs guys pronounce it toople.

"tuhple" is a girly-man affectation.  That's why Guido and I both say
the manly "toople".  Jeremy's still a baby, so he says "tuhple", and
for the same reasons other adolescent males pierce their nipples. 
Barry sucks up to whoever he's talking with at the moment.  Fred is a
doc guy, so nobody remembers what he says ;-)

the-acid-test-is-whether-you-say-"xor"-with-one-syllable-or-three-ly y'rs  - tim
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: processing limitation in Python

2006-02-14 Thread Tim Peters
[EMAIL PROTECTED]
> If I un-comment any line in this program below the line where I
> commented " all OK up to this point " This program locks up my
> computer.
>
> Windows task manager will show "Not Responding" for Python in the
> Applications tab and in the Performance tabe the CPU usage will be
> locked at %100.
>
> I've experienced the same problem on 2 different computers, one running
> 2000 pro. the other running XP home eddition.  both computers run
> Python 2.4.2
>
> I'm just wondering if any one else has noticed any problems with
> working with large numbers in Python ind if there is anything that can
> work around this issue.
>
> Thankd for reading
> David
>
> def factor(n):
> d = 2
> factors = [ ]
> while n > 1:
> if n % d == 0:
> factors.append(d)
> n = n/d
> else:
> d = d + 1
> print factors

You primary problem is that this is a horridly inefficient way to
factor, taking time propotional to n's largest prime divisor (which
may be n).

> factor (12)
> factor (123)
> factor (1234)
> factor (12345)
> factor (123456)
> factor (1234567)
> factor (12345678)
> factor (123456789)
> factor (1234567898)
> factor (12345678987)
> factor (123456789876)
> factor (1234567898765)   # all OK up to this point
> #factor (12345678987654)# locks up computer if I run this line

It doesn't lock up for me, using Python 2.3.5 or 2.4.2 on Windows (XP
Pro SP2, but the specific flavor of Windows shouldn't matter).  I ran
it from a DOS box, and while it was plugging away on 12345678987654,
hitting Ctrl+C stopped it.

If you let it continue running, and you & your computer were immortal
(something worth shooting for :-)), it would eventually print the
factorization.  Since

12345678987654 = 2 * 3 * 2057613164609

the loop would have to go around over 2 trillion times to find the
final 2057613164609 prime factor.  A simple enormous improvement is to
get out of the loop when d*d > n.  Then n must be prime or 1.  That
would slash the worst-care runtime from being proportional to n to
being proportional to sqrt(n).

> ...
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Queue.Queue()

2006-02-17 Thread Tim Peters
[john peter]
>  what happens behind the scenes when i create a Queue.Queue() without
> specifying a maxsize?  does a block of space gets allocated initially then
> dynamically "expanded" as needed?

Yes.

> if so, what is the default size of the initial space?

It's initially empty.

> is it always better to specify a big enough maxsize

No.

> initially for efficiency purposes,

The intent has nothing to do with efficiency:  bounded and unbounded
queues are _semantic_ variations.  They can behave differently. 
Whether you want a bounded or unbounded queue depends on the needs of
your app.  A bounded queue is typically used, e.g., to mediate between
producers and consumers with different processing rates.

> or does it matter much?

Shouldn't matter at all.  Even if you specify a max size, the
underlying container still starts its life empty, and grows and
shrinks as needed.  The semantic difference is in whether a .put()
attempt may block.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: issues with doctest and threads

2005-08-08 Thread Tim Peters
[Michele Simionato]
> I am getting a strange error with this script:
>
> $ cat doctest-threads.py
> """
> >>> import time, threading
> >>> def example():
> ... thread.out = []
> ... while thread.running:
> ... time.sleep(.01)
> ... thread.out.append(".")
> >>> thread = threading.Thread(None, example)
> >>> thread.running = True; thread.start()
> >>> time.sleep(.1)
> >>> thread.running = False
> >>> print thread.out
> ['.', '.', '.', '.', '.', '.', '.', '.', '.']
> """
> 
> if __name__ == "__main__":
>import doctest; doctest.testmod()
> 
> $ python doctest-threads.py
> Exception in thread Thread-1:
> Traceback (most recent call last):
>  File "/usr/lib/python2.4/threading.py", line 442, in __bootstrap
>self.run()
>  File "/usr/lib/python2.4/threading.py", line 422, in run
>self.__target(*self.__args, **self.__kwargs)
>  File "", line 5, in example
> NameError: global name 'thread' is not defined

It looks like pure thread-race accident to me.  The main program does
nothing to guarantee that the thread is finished before it prints
`thread.out`, neither anything to guarantee that Python doesn't exit
while the thread is still running.  Stuff, e.g., a time.sleep(5) after
"thread.running = False", and it's much more likely to work the way
you intended (but still not guaranteed).

A guarantee requires explicit synchronization; adding

>>> thread.join()

after "thread.running = False" should be sufficient.  That ensures two things:

1. The `example` thread is done before thread.out gets printed.
2. The *main* thread doesn't exit (and Python doesn't start tearing itself
   down) while the `example` thread is still running.

The exact output depends on OS scheduling accidents, but I expect
you'll see 10 dots most often.

BTW, trying to coordinate threads with sleep() calls is usually a Bad
Idea; you can't generally expect more from an OS than that it will
treat sleep's argument as a lower bound on the elapsed time the
sleeper actually yields the CPU.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: issues with doctest and threads

2005-08-09 Thread Tim Peters
[Michele Simionato]
> Thank you for your replies Jeff & Tim. The snippet I submitted is
> unfortunate, since I was writing an example (for a Python course I am
> going to give in September) to show that you cannot reliably assume
> that you will get exactly 9 dots, because of the limitations of 'sleep'.
> Mindlessly, I have cut & pasted that snippet, but my real question
> was not "how many dots I get", it was:  "why the error message talks
> about 'thread' not being in the globals?"  It's true that I can avoid it with
> a thread.join() (which I had forgot), but still I really cannot understand  
> the
> reason for such message.

Because the program is buggy:  synchronizing threads isn't a "do it if
you feel like it" thing, it's essential to correct threaded behavior. 
If you're going to show students bad thread practice, they're going to
get mysteries a lot deeper and more damaging than this one <0.5 wink>.

Add some more prints:

"""
>>> import time, threading
>>> def example():
... thread.out = []
... while thread.running:
... time.sleep(.01)
... print [11]
... thread.out.append(".")
... print [12]
... print [13]
>>> thread = threading.Thread(None, example)
>>> thread.running = True; thread.start()
>>> time.sleep(.1)
>>> thread.running = False
>>> print thread.out
['.', '.', '.', '.', '.', '.', '.', '.', '.']
"""

if __name__ == "__main__":
   import doctest
   doctest.testmod()
   print [2]

Here's a typical run on my box:

File "blah.py", line 13, in __main__
Failed example:
time.sleep(.1)
Expected nothing
Got:
[11]
[12]
[11]
[12]
[11]
[12]
[11]
[12]
[11]
[12]
[11]
[12]
[11]
[12]
[11]
[12]
[11]
[12]
**
1 items had failures:
   1 of   7 in __main__
***Test Failed*** 1 failures.
[2]
[11]
Exception in thread Thread-1:
Traceback (most recent call last):
  File "C:\Code\python\lib\threading.py", line 444, in __bootstrap
self.run()
  File "C:\Code\python\lib\threading.py", line 424, in run
self.__target(*self.__args, **self.__kwargs)
  File "", line 6, in example
NameError: global name 'thread' is not defined

Note that [2] is printed _while_ the spawned thread is still running
([13] is never printed):  the call to doctest.testmod() is completely
finished, but you're still letting a thread spawned _by_ doctest run. 
The environment doctest set up for that thread is gone too.  Although
it doesn't actually matter in this specific example, because the main
thread (not just doctest) is also entirely done, the Python
interpreter starts tearing itself down. Why that last doesn't matter
in this example would take some time to explain; I don't think it's
needed here, because the test case got into mortal trouble for an
earlier reason.

> Why it is so misleading?

Simply because bad thread programming has allowed a thread to run
beyond the time resources it relies on have vanished.  It may sound
harsh, but this is tough love :  it's pilot error.

> Can something be done about it?

Properly synchronize the thread, to enforce what the code requires but
cannot hope to obtain by blind luck.  All it takes is the
thread.join() I suggested.  I don't understand why you're fighting
that, because it's basic proper thread practice -- it's not like I
suggested an obscure expert-level hack here.  If a student doesn't
know to join() a thread before they rely on that thread being done,
their thread career will be an endless nightmare.

All that said, this specific failure would _happen_ to go away too, if
in doctest's DocTestRunner.run(), the final 'test.globs.clear()" were
removed.  If you feel it's necessary to let threads spawned by a
doctest run beyond the time doctest completes, you can arrange to
invoke DocTestRunner.run() with clear_globs=False.  That's not an
intended use case, but it will work.  The intended use case is
explained in run's docstring:

The examples are run in the namespace `test.globs`.  If
`clear_globs` is true (the default), then this namespace will
be cleared after the test runs, to help with garbage
collection.  If you would like to examine the namespace after
the test completes, then use `clear_globs=False`.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: String functions deprication

2005-08-16 Thread Tim Peters
[steve morin]
> http://www.python.org/doc/2.4.1/lib/node110.html
> 
> These methods are being deprecated.  What are they being replaced
> with?  Does anyone know?

As it says at the top of that page,

The following list of functions are also defined as methods of string and
Unicode objects; see ``String Methods'' (section 2.3.6) for more
information on those. You should consider these functions as deprecated,
although they will not be removed until Python 3.0.

The methods of string and Unicode objects are not deprecated.  It's
just the redundant _functions_ in the string module that are
deprecated.

Historically, the functions in the string module existed long before
strings and Unicode objects had any methods.  Now that string and
Unicode objects do have methods, the functions in the string module
are no longer needed.
-- 
http://mail.python.org/mailman/listinfo/python-list


<    1   2   3   4   >