from:"Nick Maclaren"

[Python-Dev] Numerical robustness, IEEE etc.

2006-06-18 Thread Nick Maclaren

As I have posted to comp.lang.python, I am not happy with Python's
numerical robustness - because it basically propagates the 'features'
of IEEE 754 and (worse) C99.  Yes, it's better, but I would like to
make it a LOT better.  I already have a more robust version of 2.4.2,
but there are some problems, technical and political.  I should
appreciate advice.

1) Should I start off by developing a testing version, to give people
a chance to scream at me, or write a PEP?  Because I am no Python
development expert, the former would help to educate me into its
conventions, technical and political.

2) Because some people are dearly attached to the current behaviour,
warts and all, and there is a genuine quandary of whether the 'right'
behaviour is trap-and-diagnose, propagate-NaN or whatever-IEEE-754R-
finally-specifies (let's ignore C99 and Java as beyond redemption),
there might well need to be options.  These can obviously be done by
a command-line option, an environment variable or a float method.
There are reasons to disfavour the last, but all are possible.  Which
is the most Pythonesque approach?

3) I am rather puzzled by the source control mechanism.  Are commit
privileges needed to start a project like this in the main tree?
Note that I am thinking of starting a test subtree only.

4) Is there a Python hacking document?  Specifically, if I want to
add a new method to a built-in type, is there any guide on where to
start?

5) I am NOT offering to write a full floating-point emulator, though
it would be easy enough and could provide repeatable, robust results.
"Easy" does not mean "quick" :-(  Maybe when I retire.  Incidentally,
experience from times of yore is that emulated floating-point would
be fast enough that few, if any, Python users would notice.


Regards,
Nick Maclaren,
University of Cambridge Computing Service,
New Museums Site, Pembroke Street, Cambridge CB2 3QH, England.
Email:  [EMAIL PROTECTED]
Tel.:  +44 1223 334761Fax:  +44 1223 334679
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Pre-PEP: Allow Empty Subscript List Without Parentheses

2006-06-18 Thread Nick Maclaren

Talin <[EMAIL PROTECTED]> wrote:
> 
> Ok, so in order to clear up the confusion here, I am going to take a 
> moment to try and explain Noam's proposal in clearer language.
> 
> Now, as to the specifics of Noam's problem: Apparently what he is trying 
> to do is what many other people have done, which is to use Python as a 
> base for some other high-level language, building on top of Python 
> syntax and using the various operator overloads to define the semantics 
> of the language.

No, that's too restrictive.  Back in the 1970s, Genstat (a statistical
language) and perhaps others introduced the concept of an array type
with an indefinite number of dimensions.  This is a requirement for
implementing such things as continengy tables, analysis of variance
etc., and was and is traditionally handled by some ghastly code.  It
always was easy to handle in LISP and, as far as this goes, Python is
a descendent of LISP rather than of Algol, CPL or Fortran.

Now, I thought of how conventional "3rd GL" languages (Algol 68,
Fortran, C etc.) could be extended to support those - it is very
simple, and is precisely what Noam is proposing.  An index becomes
a single-dimensional vector of integers, and all is hunky-dory.
When you look at it, you realise that you DO want to allow zero-length
index vectors, to avoid having to write separate code for the scalar
case.

So it is not just a matter of mapping another language, but that of
meeting a specific requirement, that is largely language-independent.

Regards,
Nick Maclaren,
University of Cambridge Computing Service,
New Museums Site, Pembroke Street, Cambridge CB2 3QH, England.
Email:  [EMAIL PROTECTED]
Tel.:  +44 1223 334761Fax:  +44 1223 334679
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Numerical robustness, IEEE etc.

2006-06-21 Thread Nick Maclaren

Brett Cannon's and Neal Norwitz's replies appreciated and noted, but
responses sent by mail.

Nick Coghlan <[EMAIL PROTECTED]> wrote:
>
> Python 2.4's decimal module is, in essence, a floating point emulator based 
> on 
> the General Decimal Arithmetic specification.

Grrk.  Format and all?  Because, in software, encoding, decoding and
dealing with the special cases accounts for the vast majority of the
time.  Using a format and specification designed for implementation
in software is a LOT faster (often 5-20 times).

> If you want floating point mathematics that doesn't have insane platform 
> dependent behaviour, the decimal module is the recommended approach. By the 
> time Python 2.6 rolls around, we will hopefully have an optimized version 
> implemented in C (that's being worked on already).

Yes.  There is no point in building a wheel if someone else is doing it.
Please pass my name on to the people doing the optimisation, as I have
a lot of experience in this area and may be able to help.  But it is a
fairly straightforward (if tricky) task.

> That said, I'm not clear on exactly what changes you'd like to make to the 
> binary floating point type, so I don't know if I think they're a good idea or 
> not :)

Now, here it is worth posting a reponse :-)

The current behaviour follows C99 (sic) with some extra checking (e.g.
division by zero raises an exception).  However, this means that a LOT
of errors will give nonsense answers without comment, and there are a
lot of ways to 'lose' NaN values quietly - e.g. int(NaN).  That is NOT
good software engineering.  So:

Mode A:  follow IEEE 754R slavishly, if and when it ever gets into print.
There is no point in following C99, as it is too ill-defined, even if it
were felt desirable.  This should not be the default, because of the
flaws I mention above (see Kahan on Java).

Mode B:  all numerically ambiguous or invalid operations should raise
an exception - including pow(0,0), int(NaN) etc. etc.  There is a moot
point over whether overflow is such a case in an arithmetic that has
infinities, but let's skip over that one for now.

Mode C:  all numerically ambiguous or invalid operations should return
a NaN (or infinity, if appropriate).  Anything that would lose the error
indication would raise an exception.  The selection between modes B and
C could be done by a method on the class - with mode B being selected
if any argument had it set, and mode C otherwise.

Now, both modes B and C are traditional approaches to numerical safety,
and have the property that error indications can't be lost "by accident",
though they make no guarantees that the answers make sense.  I am
agnostic about which is better, though mode B is a LOT better from the
debugging point of view, as you discover an error closer to where it
was made.

Heaven help us, there could be a mode D, which would be mode C but
with trace buffers.  They are another sadly neglected software
engineering technique, but let's not add every bell and whistle on
the shelf :-)

"tjreedy" <[EMAIL PROTECTED]> wrote:
> 
> > experience from times of yore is that emulated floating-point would
> > be fast enough that few, if any, Python users would notice.
> 
> Perhaps you should enquire on the Python numerical and scientific computing 
> lists to see how many feel differently.  I don't see how someone crunching 
> numbers hours per day could not notice a slowdown.

Oh, certainly, almost EVERYONE will "feel" differently!  But that is
not the point.  Those few of us remaining (and there are damn few) who
know how a fast emulated floating-point performs know that the common
belief that it is very slow is wrong.  I have both used and implemented
it :-)

The point is, as I mention above, you MUST use a software-friendly
format AND specification if you want performance.  IEEE 754 and IBM's
decimal pantechnichon are both extremely software-hostile.

Regards,
Nick Maclaren,
University of Cambridge Computing Service,
New Museums Site, Pembroke Street, Cambridge CB2 3QH, England.
Email:  [EMAIL PROTECTED]
Tel.:  +44 1223 334761Fax:  +44 1223 334679
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Numerical robustness, IEEE etc.

2006-06-21 Thread Nick Maclaren

Michael Hudson <[EMAIL PROTECTED]> wrote:
> 
> > As I have posted to comp.lang.python, I am not happy with Python's
> > numerical robustness - because it basically propagates the 'features'
> > of IEEE 754 and (worse) C99. 
> 
> That's not really now I would describe the situation today.

It is certainly the case in 2.4.2, however you would describe it.

> > 2) Because some people are dearly attached to the current behaviour,
> > warts and all, and there is a genuine quandary of whether the 'right'
> > behaviour is trap-and-diagnose, propagate-NaN or whatever-IEEE-754R-
> > finally-specifies (let's ignore C99 and Java as beyond redemption),
> 
> Why?  Maybe it's clear to you, but it's not totally clear to me, and
> it any case the discussion would be better informed for not being too
> dismissive.

Why which?  There are several things that you might be puzzled over.
And where can I start?  Part of the problem is that I have spent a LOT
of time in these areas in the past decades, and have been involved
with many of the relevant standards, and I don't know what to assume.

> > there might well need to be options.  These can obviously be done by
> > a command-line option, an environment variable or a float method.
> > There are reasons to disfavour the last, but all are possible.  Which
> > is the most Pythonesque approach?
> 
> I have heard Tim say that there are people who would dearly like to be
> able to choose.  Environment variables and command line switches are
> not Pythonic.

All right, but what is?  Firstly, for something that needs to be
program-global?  Secondly, for things that don't need to be brings
up my point of adding methods to a built-in class.

> I'm interested in making Python's floating point story better, and
> have worked on a few things for Python 2.5 -- such as
> pickling/marshalling of special values -- but I'm not really a
> numerical programmer and don't like to guess what they need.

Ah.  I must get a snapshot, then.  That was one of the lesser things
on my list.  I have spent a lot of the past few decades in the numerical
programming arena, from many aspects.

Regards,
Nick Maclaren,
University of Cambridge Computing Service,
New Museums Site, Pembroke Street, Cambridge CB2 3QH, England.
Email:  [EMAIL PROTECTED]
Tel.:  +44 1223 334761Fax:  +44 1223 334679
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Numerical robustness, IEEE etc.

2006-06-21 Thread Nick Maclaren

gt; have worked on a few things for Python 2.5 -- such as
> >> pickling/marshalling of special values -- but I'm not really a
> >> numerical programmer and don't like to guess what they need.
> >
> > Ah.  I must get a snapshot, then.  That was one of the lesser things
> > on my list.
> 
> It was fairly straightforward, and still caused portability problems...

Now, why did I predict that?  Did you, by any chance, include
System/390 and VAX support in your code :-)


Regards,
Nick Maclaren,
University of Cambridge Computing Service,
New Museums Site, Pembroke Street, Cambridge CB2 3QH, England.
Email:  [EMAIL PROTECTED]
Tel.:  +44 1223 334761Fax:  +44 1223 334679
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Numerical robustness, IEEE etc.

2006-06-22 Thread Nick Maclaren

Very interesting.  I need to investigate in more depth.

> The work-in-progress can be seen in Python's SVN sandbox:
>
> http://svn.python.org/view/sandbox/trunk/decimal-c/

beelzebub$svn checkout http://svn.python.org/view/sandbox/trunk/decimal-c/
svn: PROPFIND request failed on '/view/sandbox/trunk/decimal-c'
svn: PROPFIND of '/view/sandbox/trunk/decimal-c': Could not read chunk size: 
connection was closed by server. (http://svn.python.org)


Regards,
Nick Maclaren,
University of Cambridge Computing Service,
New Museums Site, Pembroke Street, Cambridge CB2 3QH, England.
Email:  [EMAIL PROTECTED]
Tel.:  +44 1223 334761Fax:  +44 1223 334679
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Numerical robustness, IEEE etc.

2006-06-22 Thread Nick Maclaren

 check interrupt have to do with anything?

Because a machine check is one of the classes of interrupt that you
POSITIVELY want the other cores stopped until you have worked out
whether it impacts just the interrupted core or the CPU as a whole.
Inter alia, the PowerPC architecture takes one when a core has just
gone AWOL - and there is NO WAY that the dead core can handle the
interrupt indicating its own demise!

> > Oh, that's the calm, moderate description.  The reality is worse.
> 
> Yes, but fortunately irrelevant...

Unfortunately, it isn't.  I wish that it were :-(

> Now, a more general reply: what are you actually trying to acheive
> with these posts?  I presume it's more than just make wild claims
> about how much more you know about numerical programming than anyone
> else...

Sigh.  What I am trying to get is floating-point support of the form
that, when a programmer makes a numerical error (see above), he gets
EITHER an exception value returned OR an exception raised.  I do, of
course, need to exclude the cases when the code is testing states
explicitly, twiddling bits and so on.


Regards,
Nick Maclaren,
University of Cambridge Computing Service,
New Museums Site, Pembroke Street, Cambridge CB2 3QH, England.
Email:  [EMAIL PROTECTED]
Tel.:  +44 1223 334761Fax:  +44 1223 334679
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Numerical robustness, IEEE etc.

2006-06-24 Thread Nick Maclaren

"Neal Norwitz" <[EMAIL PROTECTED]> wrote:
>
> Seriously, there seems to be a fair amount of miscommunication in this
> thread.  ...

Actually, this isn't really a reply to you, but you have described
the issue pretty well.

> The best design doc that I know of is code. :-)
>
> It would be much easier to communicate using code snippets.
> I'd suggest pointing out places in the Python code that are lacking
> and how you would correct them.  That will make it easier for everyone
> to understand each other.

Yes.  That is easy.  What, however, I have part of (already) and was
proposing to do BEFORE going into details was to generate a testing
version that shows how I think that it should be done.  Then people
could experiment with both the existing code and mine, to see the
differences.

But, in order to do that, I needed to find out the best way of going
about it 

It wouldn't help with the red herrings, such as the reasons why it
is no longer possible to rely on hardware interrupts as a mechanism.
But they are only very indirectly relevant.

The REASON that I wanted to do that was precisely because I knew that
very few people would be deeply into arithmetic models, the details
of C89 and C99 (ESPECIALLY as the standard is incomplete :-( ), and
so having a sandbox before starting the debate would be a GREAT help.
It's much easier to believe things when you can try them yourself 

"Facundo Batista" <[EMAIL PROTECTED]> wrote:
> 
> Well, so I'm completely lost... because, if all you want is to be able
> to chose a returned value or an exception raised, you actually can
> control that in Decimal.

Yes, but I have so far failed to get hold of a copy of the Decimal code!
I will have another go at subverting Subversion.  I should VERY much
like to be get hold of those documents AND build a testing version of
the code - then I can go away, experiment, and come back with some more
directed comments (not mere generalities).

Aahz <[EMAIL PROTECTED]> wrote:
> 
> You can't expect us to do your legwork for you, and you can't expect
> that Tim Peters is the only person on the dev team who understands what
> you're getting at.

Well, see above for the former - I did post my intents in my first
message.  And, as for the latter, I have tried asking what I can
assume that people know - it is offensive and time-consuming and hence
counter-productive to start off assuming that your audience does not
have a basic background.

To repeat, it is precisely to address THAT issue that I wanted to build
a sandbox BEFORE going into details.  If people don't know the theory
in depth and but are interested, they could experiment with the sandbox
and see what happens in practice.

> Incidentally, your posts will go directly to python-dev without
> moderation if you subscribe to the list, which is a Good Idea if you want
> to participate in discussion.

Er, you don't receive a mailing list at all if you don't subscribe!

If that is the intent, I will see if I can find how to subscribe in
the unmoderated fashion.  I didn't spot two methods on the Web pages
when I subscribed.

Regards,
Nick Maclaren,
University of Cambridge Computing Service,
New Museums Site, Pembroke Street, Cambridge CB2 3QH, England.
Email:  [EMAIL PROTECTED]
Tel.:  +44 1223 334761Fax:  +44 1223 334679
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Numerical robustness, IEEE etc.

2006-06-24 Thread Nick Maclaren

To the moderator:  this is getting ridiculous.

[EMAIL PROTECTED] wrote:
> 
> > >Unfortunately, that doesn't help, because it is not where the issues
> > >are.  What I don't know is how much you know about numerical models,
> > >IEEE 754 in particular, and C99.  You weren't active on the SC22WG14
> > >reflector, but there were some lurkers.
>
> Hand wave, hand wave, hand wave.  Many of us here aren't stupid and have
> more than passing experience with numerical issues, even if we haven't been
> "active on SC22WG14".  Let's stop with the high-level pissing contest and
> lay out a clear technical description of exactly what has your knickers in a
> twist, how it hurts Python, and how we can all work together to make the
> pain go away.

SC22WG14 is the ISO committee that handles C standardisation.  One
of the reasons that the UK voted "no" was because the C99 standard
was seriously incomprehensible in many areas to anyone who had not
been active on the reflector.  If you think that I can summarise a
blazing row that went on for over 5 years and produced over a million
lines of technical argument alone in a "clear technical description",
you have an exaggerated idea of my abilities.

I have a good many documents that I could post, but they would not
help.  Some of them could be said to be "clear technical descriptions"
but most of them were written for other audiences, and assume those
audiences' backgrounds.  I recommend starting by reading the comments
in floatobject.c and mathmodule.c and then looking up the sections of
the C89 and C99 standards that are referenced by them.

> A good place to start: You mentioned earlier that there where some
> nonsensical things in floatobject.c.  Can you list some of the most serious
> of these?

Well, try the following for a start:

Python 2.4.2 (#1, May  2 2006, 08:28:01)
[GCC 4.1.0 (SUSE Linux)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> a = "NaN"
>>> b = float(a)
>>> c = int(b)
>>> d = (b == b)
>>> print a, b, c, d
NaN nan 0 False

Python 2.3.3 (#1, Feb 18 2004, 11:58:04) 
[GCC 2.8.1] on sunos5
Type "help", "copyright", "credits" or "license" for more information.
>>> a = "NaN"
>>> b = float(a)
>>> c = int(b)
>>> d = (b == b)
>>> print a, b, c, d
NaN NaN 0 True

That demonstrates that the error state is lost by converting to int,
and that NaN testing isn't reliable.

Regards,
Nick Maclaren,
University of Cambridge Computing Service,
New Museums Site, Pembroke Street, Cambridge CB2 3QH, England.
Email:  [EMAIL PROTECTED]
Tel.:  +44 1223 334761Fax:  +44 1223 334679
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Numerical robustness, IEEE etc.

2006-06-24 Thread Nick Maclaren

Michael Hudson <[EMAIL PROTECTED]> wrote:
>
> But, a floating point exception isn't a machine check interrupt, it's
> a program interrupt...

For reasons that I could go into, but are irrelevant, almost all
modern CPU architectures have one ONE interrupt mechanism, and use
it for both of those.  It is the job of the interrupt handler (i.e.
FLIH, first-level interrupt handler, usually in Assembler) to
classify those, get into the appropriate state and call the interrupt
handling code.

Now, this is a Bad Idea, but separating floating-point exceptions
from machine checks at the hardware level died with mainframes, as
far as I know.  The problem with the current approach is that it
makes it very hard for the operating system to allow the application
to handle the former.  And the problem with most modern operating
systems is that don't even do what they could do at all well, because
THAT died with the mainframes, too :-(

The impact of all this mess on things like Python is that exception
handling is a nightmare area, especially when you bring in threading
(i.e. hardware threading with multiple cores, or genuinely parallel
threading on a single core).  Yes, I have brought a system down by
causing too many floating-point exceptions in all threads of a
highly parallel program on a large SMP 

> See, that wasn't so hard!  We'd have saved a lot of heat and light if
> you'd said that at the start (and if you think you'd made it clear
> already: you hadn't).

I thought I had.  I accept your statement that I hadn't.  Sorry.

Regards,
Nick Maclaren,
University of Cambridge Computing Service,
New Museums Site, Pembroke Street, Cambridge CB2 3QH, England.
Email:  [EMAIL PROTECTED]
Tel.:  +44 1223 334761Fax:  +44 1223 334679
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Numerical robustness, IEEE etc.

2006-06-24 Thread Nick Maclaren

"Tim Peters" <[EMAIL PROTECTED]> wrote:
>
> I suspect Nick spends way too much time reading standards ;-)

God help me, YES!  And in trying to get them improved.  Both of which
are very bad for my blood pressure :-(

My real interest is in portable, robust programming - I DON'T abuse
the terms to mean bitwise identical, but that is by the way - and I
delved in here trying to write a jiffy little bit of just such code
as part of a course example.  BANG!!!  It failed in both respects on
the first two systems I tried on, and it wasn't my code that was wrong.

The killer is that standards are the nearest to a roadmap for portability,
especially portability and robustness.  If you have non-conforming code,
and it goes bananas, the compiler vendor will refuse to do anything, no
matter how clearly it is a bug in the compiler or library.  What is
worse is that there is an incentive for the leading vendors (see below)
to implement down to the standard, even when it is easier to do better.
And this is happening in this area.

> What he said is:
> 
> If you look at floatobject.c, you will find it solid with constructions
> that make limited sense in C99 but next to no sense in C89.
> 
> And, in fact, C89 truly defines almost nothing about floating-point
> semantics or pragmatics.  Nevertheless, if a thing "works" under gcc
> and under MS C, then "it works" for something like 99.9% of Python's
> users, and competitive pressures are huge for other compiler vendors
> to play along with those two.

Yup, though you mean gcc on an x86/AMD64/EM64T system, and 99.9% is a
rhetorical exaggeration - but one of the failures WAS on one of those! 

> I don't know what specifically Nick had in mind, and join the chorus
> asking for specifics.

That is why I wanted to:

   a) Read the decimal stuff and play around with the module
and:
   b) Write a sandbox and sort out my obvious errors
and:
   c) Write a PEP describing the issue and proposals

BEFORE going into details.  The devil is in the details, and I wanted
to leave him sleeping until I had lined up my howitzers 

> I _expect_ he's got a keen eye for genuine
> coding bugs here, but also expect I'd consider many technically
> dubious bits of FP code to be fine under the "de facto standard"
> dodge.

Actually, I tried to explain that I don't have many objections to the
coding of the relevant files - whoever wrote them and I have a LOT of
common attitudes :-) And I have been strongly into de facto standards
for over 30 years, so am happy with them.  Yes, I have found a couple of
bugs, but not ones worth fixing (e.g. there is a use of x != x where
PyISNAN should be used, and a redundant test for an already excluded
case, but what the hell?)  My main objection is that they invoke C
behaviour in many places, and that is (a) mostly unspecified in C, (b)
numerically insane in C99 and (c) broken in practice.

> So, sure, everything we do is undefined, but, no, we don't really care
> :-)  If a non-trivial 100%-guaranteed-by-the-standard-to-work C
> program exists, I don't think I've seen it.

I can prove that none exists, though I would have to trawl over
SC22WG14 messages to prove it.  I spent a LONG time trying to get
"undefined" defined and used consistently (let alone sanely) in C, and
failed dismally.

> BTW, Nick, are you aware of Python's fpectl module?  That's
> user-contributed code that attempts to catch overflow, div-by-0, and
> invalid operation on 754 boxes and transform them  into raising a
> Python-level FloatingPointError exception.  Changes were made all over
> the place to try to support this at the time.  Every time you see a
> PyFPE_START_PROTECT or PyFPE_END_PROTECT macro in Python's C code,
> that's the system it's trying to play nicely with.  "Normally", those
> macros have empty expansions.

Aware of, yes.  Have looked at, no.  I have already beaten my head
against that area and already knew the issues.  I have even implemented
run-time systems that got it right, and that is NOT pretty.

> fpectl is no longer built by default, because repeated attempts failed
> to locate any but "ya, I played with it once, I think" users, and the
> masses of platform-specific #ifdef'ery in fpectlmodule.c were
> suffering fatal bit-rot.  No users + no maintainers means I expect
> it's likely that module will go away in the foreseeable future.  You'd
> probably hate its _approach_ to this anyway ;-)

Oh, yes, I know that problem.  You would be AMAZED at how many 'working'
programs blow up when I turn it on on systems that I manage - not
excluding Python itself (integer overflow) :-)  And, no, I don't hate
that approach, because it is one of the plausible ones; not good, but
what can

Re: [Python-Dev] Numerical robustness, IEEE etc.

2006-06-24 Thread Nick Maclaren

"Tim Peters" <[EMAIL PROTECTED]> wrote:
> 
> > SC22WG14?  is that some marketing academy?  not a very good one, obviously.
> 
> That's because it's European ;-)

Er, please don't post ironic satire of that nature - many people will
believe it!

ISO is NOT European.  It is the Internatational Standards Organisation,
of which ANSI is a member.  And, for reasons and with consequences that
are only peripherally relevant, SC22WG14 has always been dominated by
ANSI.  In fact, C89 was standardised by ANSI (sic), acting as an agent
for ISO.  C99 was standardised by ISO directly, but for various reasons
only some of which I know, was even more ANSI-dominated than C89.

Note that I am NOT saying "big bad ANSI", as a large proportion of that
was and is the ghastly support provided by many countries to their
national standards bodies.  The UK not excepted.

Regards,
Nick Maclaren,
University of Cambridge Computing Service,
New Museums Site, Pembroke Street, Cambridge CB2 3QH, England.
Email:  [EMAIL PROTECTED]
Tel.:  +44 1223 334761Fax:  +44 1223 334679
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Numerical robustness, IEEE etc.

2006-06-24 Thread Nick Maclaren

"Terry Reedy" <[EMAIL PROTECTED]> wrote:
>
> Of interest among their C-EPs is one for adding the equivalent of our 
> decimal module
> http://www.open-std.org/jtc1/sc22/wg14/www/projects#24732 

IBM is mounting a major campaign to get its general decimal arithmetic
standardised as THE standard form of arithmetic.  There is a similar
(more advanced) move in C++, and they are working on Fortran.  I assume
that Cobol is already on board, and there may be others.

There is nothing underhand about this - IBM is quite open about it,
I believe that they are making all critical technologies freely
design has been thought out and is at least half-sane - which makes
it among the best 1-2% of IT technologies :-(

Personally, I think that it is overkill, because it is a MASSIVELY
complex solution, and will make sense only where at least two of
implementation cost, performance, power usage and CPU/memory size are
not constraints.  E.g. mainframes, heavyweight commercial codes etc.
but definitely NOT massive parallelism, very low power computing,
micro-minaturisation and so on.  IEEE 754 was bad (which is why it is
so often implemented only in part), but this is MUCH worse.  God alone
knows whether IBM will manage to move the whole of IT design - they
have done it before, and have failed before (often after having got
further than this).

Now, whether that makes it a good match for Python is something that
is clearly fruitful grounds for debate :-)

Regards,
Nick Maclaren,
University of Cambridge Computing Service,
New Museums Site, Pembroke Street, Cambridge CB2 3QH, England.
Email:  [EMAIL PROTECTED]
Tel.:  +44 1223 334761Fax:  +44 1223 334679
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Numerical robustness, IEEE etc.

2006-06-24 Thread Nick Maclaren

"Jim Jewett" <[EMAIL PROTECTED]> wrote:
> 
> > The conventional interpretation was that any operation that
> > was not mathematically continuous in an open region including its
> > argument values (in the relevant domain) was an error, and that all
> > such errors should be flagged.  That is what I am talking about.
> 
> Not a bad goal, but not worth sweating over, since it isn't
> sufficient.  It still allows functions whose continuity does not
> extend to the next possible floating point approximation, or functions
> whose value, while continuous, changes "too much" in that region.

Oh, yes, quite.  But I wasn't describing something that needed effort;
I was merely describing the criterion that was traditionally used (and
still is, see below).  There is also the Principle of Least Surprise:
the behaviour of a language should be at least explicable to mere
mortals (a.k.a. ordinary programmers) - one that says "whatever the
designing committee thought good at the time" is a software engineering
disaster.

> For some uses, it is more important to be consistent with established
> practice than to be as correct as possible.  If the issues are still
> problems, and can't be solved in languages like java, then ... the
> people who want "correct" behavior will be a tiny minority, and it
> makes sense to have them use a 3rd-party extension.

I don't think that you understand the situation.

I was and am describing established practice, as used by the numeric
programmers who care about getting reliable answers - most of those
still use Fortran, for good and sufficient reasons.  There are two
other established practices:

Floating-point is figment of your imagination - don't support it.

Yeah.  Right.  Whatever.  It's only approximate, so who gives a
damn what it does?

Mine is the approach taken by the Fortran, C and C++ standards
and many Fortran implementations, but the established practice in
highly optimised Fortran and most C is the last.  Now, Java (to
some extent) and C99 introduced something that attempts to eliminate
errors by defining what they do (more-or-less arbitrarily); much as
if Python said that, if a list or dictionary entry wasn't found, it
would create one and return None.  But that is definitely NOT
established practice, despite the fact that its proponents claim it
is.  Even IEEE 754 (as specified) has never reached established
practice at the language level.

The very first C99 Annex F implementation that I came across appeared
in 2005 (Sun One Studio 9 under Solaris 10 - BOTH are needed); I have
heard rumours that HP-UX may have one, but neither AIX nor Linux does
(even now).  I have heard rumours that the latest Intel compiler may be
C99 Annex F, but don't use it myself, and I haven't heard anything
reliable either way for Microsoft.  What is more, many of the tender
documents for systems bought for numeric programming in 2005 said
explicitly that they wanted C89, not C99 - none asked for C99 Annex F
that I saw.  No, C99 Annex F is NOT established practice and, God
willing, never will be.

> > For example, consider conversion between float
> > and long - which class should control the semantics?
> 
> The current python approach with binary fp is to inherit from C
> (consistency with established practice).  The current python approach
> for Decimal (or custom classes) is to refuse to guess in the first
> place; people need to make an explicit conversion.  How is this a
> problem?

See above re C extablished practice.

The above is not my point.  I am talking about the generic problem
where class A says that overflow should raise an exception, class B
says that it should return infinity and class C says nothing.  What
should C = A*B do on overflow?

> [ Threading and interrupts ]

No, that is a functionality issue, but the details are too horrible
to go into here.  Python can do next to nothing about them, except
to distrust them - just as it already does.

Regards,
Nick Maclaren,
University of Cambridge Computing Service,
New Museums Site, Pembroke Street, Cambridge CB2 3QH, England.
Email:  [EMAIL PROTECTED]
Tel.:  +44 1223 334761Fax:  +44 1223 334679
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Numerical robustness, IEEE etc.

2006-06-25 Thread Nick Maclaren

[EMAIL PROTECTED] wrote:
> 
> I'm not asking you to describe SC22WG14 or post detailed technical summaries
> of the long and painful road.  I'd like you to post things directly relevant
> to Python with footnotes to necessary references.  It is then incumbent on
> those that wish to respond to your post to familiarize themselves with the
> relevant background material.  However, it is really darn hard to do that
> when we don't know what you're trying to fix in Python.  The examples you
> show below are a good start in that direction.

Er, no.  Given your response, it has merely started off a hare.  The
issues you raise are merely ones of DETAIL, and I was and am trying
to tackle the PRINCIPLE (a.k.a. design).

I originally stated my objective, and asked for information so that I
could investigate in depth and produce (in some order) a sandbox and
a PEP.  That is still my plan.

This example was NOT of problems with the existing implementation,
but was to show how even the most basic numeric code that attempts to
handle errors cannot avoid tripping over the issues.  I shall respond
to your points, but shall try to refrain from following up.

> 1) The string representation of NaN is not standardized across platforms

Try what I actually used:

x = 1.0e300
x = (x*x)/(x*x)

I converted that to float('NaN') to avoid confusing people.  There
are actually many issues around the representation of NaNs, including
whether signalling NaNs should be separated from quiet NaNs and whether
they should be allowed to have values.  See IEEE 754, IEEE 754R and
C99 for more details (but not clarification).

> 2) on a sane platform, int(float('NaN')) should raise an ValueError
> exception for the int() portion.

Well, I agree with you, but Java and many of the C99 people don't.

> 3) float('NaN') == float('NaN') should be false, assuming NaN is not a
> signaling NaN, by default

Why?  Why should it not raise ValueError?  See table 4 in IEEE 754.
I could go into this one in much more depth, but let's not, at least
not now.

> So the open question is how to both define the semantics of Python floating
> point operations and to implement them in a way that verifiably works on the
> vast majority of platforms without turning the code into a maze of
> platform-specific defines, kludges, or maintenance problems waiting to
> happen.

Well, in a sense, but the second is really a non-question - i.e. it
answers itself almost trivially once the first is settled.  ALL of your
above points fall into that category.  The first question to answer is
what the fundamental model should be, and I need to investigate in
more depth before commenting on that - which should tell you roughly
what I know and what I don't about the decimal model.

The best way to get a really ghastly specification is to decide on
the details before agreeing on the intent.  Committees being what they
are, that is a recipe for something that nobody else will ever get
their heads around.

Regards,
Nick Maclaren,
University of Cambridge Computing Service,
New Museums Site, Pembroke Street, Cambridge CB2 3QH, England.
Email:  [EMAIL PROTECTED]
Tel.:  +44 1223 334761Fax:  +44 1223 334679
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] Python memory model (low level)

2006-06-30 Thread Nick Maclaren

I Have been thinking about software floating point, and there are
some aspects of Python and decimal that puzzle me.  Basically, they
are things that are wanted for this sort of thing and seem to be
done in very contorted ways, so I may have missed something.

Firstly, can Python C code assume no COMPACTING garbage collector,
or should it allow for things shifting under its feet?

Secondly, is there any documentation on the constraints and necessary
ritual when allocating chunks of raw data and/or types of variable
size?  Decimal avoids the latter.

Thirdly, I can't find an efficient way for object-mangling code to
access class data and/or have some raw data attached to a class (as
distinct from an instance).

Fourthly, can I assume that no instance of a class will remain active
AFTER the class disappears?  This would mean that it could use a
pointer to class-level raw data.

I can explain why all of those are the 'right' way to approach the
problem, at an abstract level, but it is quite possible that Python
does not support the abstract model of class implementation that I
am thinking of.


Regards,
Nick Maclaren,
University of Cambridge Computing Service,
New Museums Site, Pembroke Street, Cambridge CB2 3QH, England.
Email:  [EMAIL PROTECTED]
Tel.:  +44 1223 334761Fax:  +44 1223 334679
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python memory model (low level)

2006-06-30 Thread Nick Maclaren

Aahz <[EMAIL PROTECTED]> wrote:
> 
> Without answering your specific questions, keep in mind that Python and
> Python-C code are very different things.  The current Decimal
> implementation was designed to be *readable* and efficient *Python* code.
> For a look at what the Python-C implementation of Decimal might look
> closer to, take a look at the Python long implementation.

Er, perhaps I should have said explicitly that I was looking at the
Decimal-in-C code and not the Python.  Most of my questions don't
make any sense at the Python level.

But you have a good point.  The long code will be both simpler and
have had a LOT more work done on it - but it will address only the
object of variable size issue, as it doesn't need class-level data
in the same way as Decimal and I do.

Regards,
Nick Maclaren,
University of Cambridge Computing Service,
New Museums Site, Pembroke Street, Cambridge CB2 3QH, England.
Email:  [EMAIL PROTECTED]
Tel.:  +44 1223 334761Fax:  +44 1223 334679
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python memory model (low level)

2006-06-30 Thread Nick Maclaren

"Tim Peters" <[EMAIL PROTECTED]> wrote:

[ Many useful answers ]

Thanks very much!  That helps.  Here are a few points where we are at
cross-purposes.

I am talking about the C level.  What I am thinking of is the standard
method of implementing the complicated housekeeping of a class (e.g.
inheritance) in Python, and the basic operations in C (for efficiency).
The model that I would like to stick to is that the Python layer never
knows about the actual object implementation, and the C never knows
about the housekeeping.

The housekeeping would include the class derivation, which would (inter
alia) fix the size of a number.  The C code would need to allocate
some space to store various constants and workspace, shared between
all instances of the derived class.  This would be accessible from the
object it returns.

Each instance would be of a length specified by its derivation (i.e.
like Decimal), but would be constant for all members of the class
(i.e. not like long).  So it would be most similar to tuple in that
respect.

Operations like addition would copy the pointer to the class data
from the arguments, and ones like creation would need to be passed
the appropriate class and whatever input data they need.

I believe that, using the above approach, it would be possible to
achieve good efficiency with very little C - certainly, it has worked
in other languages.


Regards,
Nick Maclaren,
University of Cambridge Computing Service,
New Museums Site, Pembroke Street, Cambridge CB2 3QH, England.
Email:  [EMAIL PROTECTED]
Tel.:  +44 1223 334761Fax:  +44 1223 334679
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] Another 2.5 bug candidate?

2006-07-02 Thread Nick Maclaren

"A.M. Kuchling" <[EMAIL PROTECTED]> wrote:
> 
> http://www.python.org/sf/1488934 argues that Python's use of fwrite()
> has incorrect error checking; this most affects file.write(), but
> there are other uses of fwrite() in the core.  It seems fwrite() can
> return N bytes written even if an error occurred, and the code needs
> to also check ferror(f->fp).
> 
> At the last sprint I tried to assemble a small test case to exhibit
> the problem but failed.  The reporter's test case uses SSH, and I did
> verify that Python does loop infinitely if executed under SSH, but a
> test case would need to work without SSH.
> 
> Should this be fixed in 2.5?  I'm nervous about such a change to error
> handling without a test case to add; maybe it'll cause problems on one
> of our platforms.

So would assembling a test case.  NOTHING will cause ferror to return
True that isn't classed as undefined behaviour, and therefore may fail
on some platforms.

Regards,
Nick Maclaren,
University of Cambridge Computing Service,
New Museums Site, Pembroke Street, Cambridge CB2 3QH, England.
Email:  [EMAIL PROTECTED]
Tel.:  +44 1223 334761Fax:  +44 1223 334679
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Handling of sys.args (Re: User's complaints)

2006-07-13 Thread Nick Maclaren

On systems that are not Unix-derived (which, nowadays, are rare),
there is commonly no such thing as a program name in the first place.
It is possible to get into that state on some Unices - i.e. ones which
have a form of exec that takes a file descriptor, inode number or
whatever.

This is another argument for separating off argv[0] and allowing the
program name to be None if there IS no program name.


Regards,
Nick Maclaren,
University of Cambridge Computing Service,
New Museums Site, Pembroke Street, Cambridge CB2 3QH, England.
Email:  [EMAIL PROTECTED]
Tel.:  +44 1223 334761Fax:  +44 1223 334679
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Handling of sys.args (Re: User's complaints)

2006-07-14 Thread Nick Maclaren

Greg Ewing <[EMAIL PROTECTED]> wrote:
> 
> > On systems that are not Unix-derived (which, nowadays, are rare),
> > there is commonly no such thing as a program name in the first place.
> > It is possible to get into that state on some Unices - i.e. ones which
> > have a form of exec that takes a file descriptor, inode number or
> > whatever.
> 
> I don't think that applies to the Python args[] though,
> since its args[0] isn't the path of the OS-level
> executable, it's the path of the main Python script.

Oh, yes, it does!  The file descriptor or inode number could refer to
the script just as well as it could to the interpreter binary.

> But you could still end up without one, if the main
> script comes from somewhere other than a file.

I didn't want to include that, to avoid confusing people who haven't
used systems with such features.  Several systems have had the ability
to exec to a memory segment, for example.  But, yes.

Regards,
Nick Maclaren,
University of Cambridge Computing Service,
New Museums Site, Pembroke Street, Cambridge CB2 3QH, England.
Email:  [EMAIL PROTECTED]
Tel.:  +44 1223 334761Fax:  +44 1223 334679
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Handling of sys.args (Re: User's complaints)

2006-07-15 Thread Nick Maclaren

"Guido van Rossum" <[EMAIL PROTECTED]> wrote:
>
> OK, then I propose that we wait to see which things you end up having
> to provide to sandboxed code, rather than trying to analyze it to
> death in abstracto.

However, the ORIGINAL proposal in this thread (to split off argv[0]
and/or make that and the arguments read-only) is entirely different.
That is purely a matter of convenience, cleanliness of specification
or whatever you call it.  I can't imagine any good reason to separate
argv[0] from argv[1:] by a sandbox (either way).

Regards,
Nick Maclaren,
University of Cambridge Computing Service,
New Museums Site, Pembroke Street, Cambridge CB2 3QH, England.
Email:  [EMAIL PROTECTED]
Tel.:  +44 1223 334761Fax:  +44 1223 334679
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Strategy for converting the decimal module to C

2006-07-19 Thread Nick Maclaren

Georg Brandl <[EMAIL PROTECTED]> wrote:
>
> > Even then, we need to drop the concept of having the flags as counters
> > rather than booleans.
>
> Yes. Given that even Tim couldn't imagine a use case for counting the
> exceptions, I think it's sensible.

Well, I can.  There is a traditional, important use - tuning.

When such arithmetic is implemented in hardware, it is normal for
exceptional cases to be handled by interrupt, and that is VERY
expensive - often 100-1,000 times the cost of a single operation,
occasionally 10,000 times.  It then becomes important to know how
many of the things you got, to know whether it is worth putting
code in to avoid them or even using a different algorithm.

Now, it is perfectly correct to say that this does not apply to
emulated arithmetic and that there is no justification for such
ghastly implementations.  But, regrettably, almost all exception
handling on modern systems IS ghastly - at least by the standards
of the 1960s.

Whether you regard the use of Python for tuning code that is to be
run using hardware, where the arithmetic will be performance-
critical as important, is a matter of taste.  I don't :-)

Regards,
Nick Maclaren,
University of Cambridge Computing Service,
New Museums Site, Pembroke Street, Cambridge CB2 3QH, England.
Email:  [EMAIL PROTECTED]
Tel.:  +44 1223 334761Fax:  +44 1223 334679
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Strategy for converting the decimal module to C

2006-07-20 Thread Nick Maclaren

Greg Ewing <[EMAIL PROTECTED]> wrote:
>
> But couldn't you just put in an interrupt handler that
> counts the interrupts, for the purpose of measurement?

No, but the reasons are very arcane.

The general reason is that taking an interrupt handler and returning
is not transparent, and is often not possible on modern systems.  If
that problem is at the hardware level (as on the Alpha and 3086/7),
you are stuffed.  But, more often, it is due the the fact that the
architecture means that such handling can only be done at maximally
privileged level.

Now, interrupting into that level has to be transparent, in order to
support TLB misses, clock interrupts, device interrupts, machine-check
interrupts and so on.  But the kernels rarely support transparent
callbacks from that state into user code (though they used to); it is
actually VERY hard to do, and even the mainframes had problems.  This
very commonly means that such counting breaks other facilities, unless
it is done IN the privileged code.

Of course, a GOOD hardware architecture wouldn't leave the process
state when it gets a floating-point interrupt, but would just invoke
an asynchronous routine call.  That used to be done.

As I said, none of this is directly relevant to emulated implementations,
such as the current Python ones, but it IS to the design of an arithmetic
specification.It could become relevant if Python wants to start to use
a hardware implementation, because your proposal would mean that it would
have to try to ensure that such callbacks are transparent.

As one of the few people still working who has extensive experience
with doing that, I can assure you that it is an order of magnitude
fouler than you can imagine.  A decimal order of magnitude :-(

But note that I wasn't saying that such things should be put into the
API, merely that there is a very good reason to do so for hardware
implementations and ones used to tune code for such implementations.
Personally, I wouldn't bother.

Regards,
Nick Maclaren,
University of Cambridge Computing Service,
New Museums Site, Pembroke Street, Cambridge CB2 3QH, England.
Email:  [EMAIL PROTECTED]
Tel.:  +44 1223 334761Fax:  +44 1223 334679
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] new security doc using object-capabilities

2006-07-20 Thread Nick Maclaren

"Giovanni Bajo" <[EMAIL PROTECTED]> wrote:
> 
> This recipe for safe_eval:
> http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/496746
> which is otherwise very cute, does not handle this case as well: it tries to
> catch and interrupt long-running operations through a secondary thread, but
> fails on a single long operation because the GIL is not released and the
> alarm thread does not get its chance to run.

Grin :-)

You have put your finger on the Great Myth of such virtualisations,
which applies to the system-level ones and even to the hardware-level
ones.  In practice, there is always some request that a sandbox can
make to the hypervisor that can lock out or otherwise affect other
sandboxes.

The key is, of course, to admit that and to specify what is and is
not properly virtualised, so that the consequences can at least be
analysed.

Regards,
Nick Maclaren,
University of Cambridge Computing Service,
New Museums Site, Pembroke Street, Cambridge CB2 3QH, England.
Email:  [EMAIL PROTECTED]
Tel.:  +44 1223 334761Fax:  +44 1223 334679
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Strategy for converting the decimal module to C

2006-07-21 Thread Nick Maclaren

Greg Ewing <[EMAIL PROTECTED]> wrote:
> 
> > Now, interrupting into that level has to be transparent, in order to
> > support TLB misses, clock interrupts, device interrupts, machine-check
> > interrupts and so on.
> 
> I thought we were just talking about counting the number
> of floating point exceptions that a particular piece of
> code generates. Surely that's deterministic, and isn't

Er, no.  Rather fundamentally, on two grounds.  Please bear with me, as
this IS relevant to Python.  See the summary at the end if you like :-)

The first is that such things are NOT deterministic, not even on simple
CPUs - take a look at the Alpha architecture for an example, and then
follow it up with the IA64 one if you have the stomach for it.  But
that wasn't my main point.

It is that modern CPUs have a SINGLE interrupt mechanism (a mistake in
itself, but they do), so a CPU may be interrupted when it is running
a device driver, other kernel thread or within a system call as much as
when running an application.  In fact, to some extent, interrupt handlers
can themselves be interrupted (let's skip the details).

Now, in order to allow the application to run its handler, the state
has to be saved, sanitised and converted back to application context;
and conversely on return.  That is hairy, and is why it is not possible
to handle interrupts generated within system calls on many systems.
But that is not directly Python's problem.

What is, is that the code gets interrupted at an unpredictable place,
and the registers and other state may not be consistent as the language
run-time system and Python are concerned.  It is critical (a) that a sane
state is restored before calling the handler and (b) that calling the
handler neither relies on nor disturbs any of the "in flight" actions
in the interrupted code.

To cut a long story short, it is impractical for a language run-time
system to call user-defined handlers with any degree of reliability
unless the compiled code and run-time interoperate carefully - I have
been there and done that many times, but few people still working have.
On architectures with out-of-order execution (and interrupts), you
have to assume that an interrupt may occur anywhere, even when the
code does not use the relevant facility.  Floating-point overflow
in the middle of a list insertion?  That's to be expected.

It becomes considerably easier if the (run-time system) interrupt
handler merely needs to flag or count interrupts, as it can use a
minimal handler which is defensive and non-intrusive.  Even that is
a pretty fair nightmare, as many systems temporarily corrupt critical
registers when they think that it is safe.  And few think of interrupts
when deciding that 

So, in summary, please DON'T produce a design that relies on trapping
floating-point exceptions and passing control to a Python function.
This is several times harder than implementing fpectl.

Regards,
Nick Maclaren,
University of Cambridge Computing Service,
New Museums Site, Pembroke Street, Cambridge CB2 3QH, England.
Email:  [EMAIL PROTECTED]
Tel.:  +44 1223 334761Fax:  +44 1223 334679
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Strategy for converting the decimal module to C

2006-07-24 Thread Nick Maclaren

James Y Knight <[EMAIL PROTECTED]> wrote:
> 
> > To cut a long story short, it is impractical for a language run-time
> > system to call user-defined handlers with any degree of reliability
> > unless the compiled code and run-time interoperate carefully - I have
> > been there and done that many times, but few people still working  
> > have.
> > On architectures with out-of-order execution (and interrupts), you
> > have to assume that an interrupt may occur anywhere, even when the
> > code does not use the relevant facility.  Floating-point overflow
> > in the middle of a list insertion?  That's to be expected.
> 
> While this _is_ a real problem, is it _not_ a general problem as you  
> are describing it. Processors are perfectly capable of generating  
> precise interrupts, and the inability to do so has nothing to do with  
> the out-of-order execution, etc. Almost all interrupts are precise.  

I am sorry, but this is almost totally wrong, though I agree that you
will get that impression upon reading the architecture books unless
you are very deeply into that area.

Let's skip the hardware issues, as they aren't what I am talking about
(though see [*]).  I am referring to the interaction between the
compiled code, deep library functions and run-time interrupt handler.

It is almost universal for some deep library functions and common for
compiled code to leave data structures inconsistent in a short window
that "cannot possibly fail" - indeed, most system interfaces do this
around system calls.  If an interrupt occurs then, the run-time system
will receive control with those data structures in a state where they
must not be accessed.  And it is fairly common for such data structures
to include ones critical to the functioning of the run-time system.

Now, it IS possible to write run-time systems that are safe against
this, and still allow asynchronous interrupts, but I am one of three
people in the world that I know have done it in the past two decades.
There may be as many as six, but I doubt more, and I know of no such
implementation on any Unix or Microsoft system.  It is even possible
to do this for compiled code, but that is where the coordination between
the compiler and run-time system comes in.

> The only interesting one which is not, on x86 processors, is the x87  
> floating point exception, ...

Er, no.  Try a machine-check in a TLB miss handler.  But it is all
pretty irrelevant, as the problem arises with asychronous exceptions
(e.g. timer interrupts, signals from other processes), anyway.

> Also, looking forward, the "simd" floating point instructions (ie mmx/ 
> sse/sse2/sse3) _do_ ...

The critical problems with the x87 floating-point exception were
resolved in the 80386.

[*]  Whether or not it is a fundamental problem, it is very much
a general problem at present, and it will become more so as more CPUs
implement micro-threading.  For why it is tied up with out-of-order
execution etc., consider a system with 100 operations flying, of which
10 are memory accesses, and then consider what happens when you have
combinations of floating-point exceptions, TLB misses, machine-checks
(e.g. ECC problems on memory) and device/timer interrupts.  Once you
add user-defined handlers into that mix, you either start exposing
that mess to the program or have to implement them by stopping the
CPU, unwinding the pipeline, and rerunning in very, very serial mode
until the handler is called.  Look at IA64 

Regards,
Nick Maclaren,
University of Cambridge Computing Service,
New Museums Site, Pembroke Street, Cambridge CB2 3QH, England.
Email:  [EMAIL PROTECTED]
Tel.:  +44 1223 334761Fax:  +44 1223 334679
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Strategy for converting the decimal module to C

2006-07-25 Thread Nick Maclaren

Greg Ewing <[EMAIL PROTECTED]> wrote:
> 
> But we weren't talking about asynchronous exceptions,
> we were talking about floating point exceptions. Unless
> your TLB miss handler uses floating point arithmethic,
> there's no way it can get interrupted by one. (And if
> it does use floating point arithmetic in a way that
> can cause an exception, you'd better write it to deal
> with that!)

I am really not getting my message across, am I?

Yes, that is true - as far as it goes.  The trouble is that designing
systems based on assuming that IS true as far as it goes means that they
don't work when it goes further.  And it does.  Here are a FEW of the
many examples of where the simplistic model is likely to fail in an
x86 context:

The compiled code has made a data structure temporarily inconsistent
because the operation is safe (say, list insertion), and then gets an
asynchronous interrupt (e.g. SIGINT).  The SIGINT handler does some
operation (e.g. I/O) that implicitly uses floating-point, which then
interrupts.

The x86 architecture is extended to include out-of-order floating-point
as it had in the past, many systems have today, and is very likely to
happen in the future.  It is one of the standard ways to get better
performance, after all, and is on the increase.

The x86 architecture is extended to support micro-threading.  I have
not been told by Intel or AMD that either have such plans, but I have
very good reason to believe that both have such projects.  IBM and Sun
certainly do, though I don't know if IBM's is/are relevant.

Regards,
Nick Maclaren,
University of Cambridge Computing Service,
New Museums Site, Pembroke Street, Cambridge CB2 3QH, England.
Email:  [EMAIL PROTECTED]
Tel.:  +44 1223 334761Fax:  +44 1223 334679
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Rounding float to int directly ...

2006-08-01 Thread Nick Maclaren

"M.-A. Lemburg" <[EMAIL PROTECTED]> wrote:
>
> You often have a need for controlled rounding when doing
> financial calculations or in situations where you want to
> compare two floats with a given accuracy, e.g. to work
> around rounding problems ;-)

The latter is a crude hack, and was traditionally used to save cycles
when floating-point division was very slow.  There are better ways,
and have been for decades.

> Float formatting is an entirely different issue.

Not really.  You need controlled rounding to a fixed precision in
the other base.  But I agree that controlled rounding in binary
does not help with controlled rounding in decimal.

Regards,
Nick Maclaren,
University of Cambridge Computing Service,
New Museums Site, Pembroke Street, Cambridge CB2 3QH, England.
Email:  [EMAIL PROTECTED]
Tel.:  +44 1223 334761Fax:  +44 1223 334679
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Rounding float to int directly ...

2006-08-01 Thread Nick Maclaren

Aahz <[EMAIL PROTECTED]> wrote:
> On Tue, Aug 01, 2006, M.-A. Lemburg wrote:
> >
> > You often have a need for controlled rounding when doing financial
> > calculations or in situations where you want to compare two floats
> > with a given accuracy, e.g. to work around rounding problems ;-)
> >
> > The usual approach is to use full float accuracy throughout the
> > calculation and then apply rounding a certain key places.
> 
> That's what Decimal() is for.

Well, maybe.  There are other approaches, too, and Decimal has its
problems with that.  In particular, when people need precisely
defined decimal rounding, they ALWAYS need fixed-point and not
floating-point.

> (Note that I don't care all that much about round(), but I do think we
> want to canonicalize Decimal() for all purposes in standard Python where
> people care about accuracy.  Additional float features can go into
> NumPy.)

Really?  That sounds like dogma, not science.

Decimal doesn't even help people who care about accuracy.  At most
(and with the reservation mentioned above), it means that you can
can map external decimal formats to internal ones without loss of
precision.  Not a big deal, as there ARE no requirements for doing
that for floating-point, and there are plenty of other solutions for
fixed-point.

People who care about the accuracy of calculations prefer binary,
as it is a more accurate model.  That isn't a big deal, either.

Regards,
Nick Maclaren,
University of Cambridge Computing Service,
New Museums Site, Pembroke Street, Cambridge CB2 3QH, England.
Email:  [EMAIL PROTECTED]
Tel.:  +44 1223 334761Fax:  +44 1223 334679
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Rounding float to int directly ...

2006-08-01 Thread Nick Maclaren

Raymond Hettinger <[EMAIL PROTECTED]> wrote:
>
> Hogwash.  The only issues with decimal are ease-of-use and speed.

I suggest that you get hold of a good 1960s or 1970s book on computer
arithmetic, and read up about "wobbling precision".  While it is not
a big deal, it was regarded as such, and is important enough to cause
significant numerical problems to the unwary - which means 99.99% of
modern programmers :-(

And, as I am sure that Aahz could point out, there are significant
reliability issues concerned with frequent base changes where any loss
of precision is unacceptable.  Yes, it can always be done, but only a
few people are likely to do it correctly in all cases.

Regards,
Nick Maclaren,
University of Cambridge Computing Service,
New Museums Site, Pembroke Street, Cambridge CB2 3QH, England.
Email:  [EMAIL PROTECTED]
Tel.:  +44 1223 334761Fax:  +44 1223 334679
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Rounding float to int directly ...

2006-08-02 Thread Nick Maclaren

Greg Ewing <[EMAIL PROTECTED]> wrote:
>
> You should NOT be using binary floats for money
> in the first place.

Or floating-point at all, actually.  But binary floating-point is
definitely unsuited for such a use.

> Pseudo-rounding to decimal places is not
> the right way to do that. The right way is
> to compare the difference to a tolerance.

Right.  Where the tolerance should be a combination of relative
and absolute accuracy.  1.0e-300 should usually be 'similar' to 0.0.

Simon Burton <[EMAIL PROTECTED]> wrote:
>
> It's not even clear to me that int(round(x)) is always the
> nearest integer to x.

There is a sense in which this is either true or overflow occurs.

> Is it always true that float(some_int)>=some_int ?  (for positive values).
>
> (ie. I am wondering if truncating the float representation
> of an int always gives back the original int).

No.

Consider 'standard' Python representations on a 64-bit system.  There
are only 53 bits in the mantissa, but an integer can have up to 63.
Very large integers need to be rounded, and can be rounded up or down.

Please note that I am not arguing against an int_rounded() function.
There is as much reason to want one as an int_truncated() one, but
there is no very good reason to to want more than one of the two.
int_expanded() [i.e. ceiling] is much less useful.

For people interested in historical trivia, the dominance of the
truncating form of integer conversion over the rounding form seems to
be yet another side-effect of the Fortran / IBM 370 dominance over
the Algol / other hardware, despite the fact that most modern
languages are rooted in CPL rather than Fortran.  I am unaware of
any technical grounds to prefer one over the other (i.e. the reasons
for wanting each are equally balanced).

It all comes down to the simple question "Do we regard a single
primitive for int(round()) as important enough to provide?"

I abstain :-)

Regards,
Nick Maclaren,
University of Cambridge Computing Service,
New Museums Site, Pembroke Street, Cambridge CB2 3QH, England.
Email:  [EMAIL PROTECTED]
Tel.:  +44 1223 334761Fax:  +44 1223 334679
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Rounding float to int directly ...

2006-08-02 Thread Nick Maclaren

Michael Chermside <[EMAIL PROTECTED]> wrote:
>
> > Decimal doesn't even help people who care about accuracy.
>
> Not true! The float class is incapable of maintaining 700 digits of
> precision, but Decimal handles it just fine. (Why you would WANT
> more than around 53 bits of precision is a different question, but
> Decimal handles it.)

Oh, yes, the CURRENT decimal class is potentially more accurate than
the CURRENT floating class, but that has nothing to do with the
intrinsic differences in the base.

> > People who care about the accuracy of calculations prefer binary,
> > as it is a more accurate model.
> 
> Huh? That doesn't even make sense! A model is not inherently accurate
> or inaccurate, it is only an accurate or inaccurate representation
> of some "real" system. Neither binary nor decimal is a better
> representation of either rational or real numbers, the first
> candidates for "real" system I thought of. Financial accounting rules
> tend to be based on paper-and-pencil calculations for which
> decimal is usually a far better representation.
> 
> If you had said that binary floats squeeze out more digits of
> precision per bit of storage than decimal floats, or that binary
> floats are faster because they are supported by specialized hardware,
> then I'd go along, but they're not a "better model".

No, that isn't true.  The "wobbling precision" effect may be overstated,
but is real, and gets worse the larger the base is.  To the best of my
knowledge, that is almost the only way in which binary is more accurate
than decimal, in absolute terms, and it is a marginal difference.  Note
that I said "prefer", not "require" :-)

For example, calculating the relative difference between two close
numbers is sensitive to whether you are using the numbers in their
normal or inverse forms (by a factor on N in base N), and this is a
common cause of incorrect answers.  A factor of 2 is better than one
of 10.

Regards,
Nick Maclaren,
University of Cambridge Computing Service,
New Museums Site, Pembroke Street, Cambridge CB2 3QH, England.
Email:  [EMAIL PROTECTED]
Tel.:  +44 1223 334761Fax:  +44 1223 334679
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Rounding float to int directly ...

2006-08-03 Thread Nick Maclaren

James Y Knight <[EMAIL PROTECTED]> wrote:
>
> I'd be happy to see floats lose their __int__ method entirely,  
> replaced by an explicit truncate function.

Come back Algol - all is forgiven :-)  Yes, indeed.  I have favoured
that view for 35 years - anything that can lose information quietly
should be explicit.

[EMAIL PROTECTED] (Christian Tanzer) wrote:
> Greg Ewing <[EMAIL PROTECTED]> wrote:
> 
> > What's the feeling about this? If, e.g. int()
> > were changed in Py3k to round instead of truncate,
> > would it cause anyone substantial pain?
> 
> Gratuitous breakage!
> 
> I shudder at the thought of checking hundreds of int-calls to see if
> they'd still be correct under such a change.

My experience of doing that when compilers sometimes did one and sometimes
the other is that such breakages are rarer than the conversions to integer
that are broken with both rules!  And both are rarer than the code that
works with either rule.

However, a 5% breakage rate is still enough to be of concern.

Regards,
Nick Maclaren,
University of Cambridge Computing Service,
New Museums Site, Pembroke Street, Cambridge CB2 3QH, England.
Email:  [EMAIL PROTECTED]
Tel.:  +44 1223 334761Fax:  +44 1223 334679
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Rounding float to int directly ...

2006-08-03 Thread Nick Maclaren

Ka-Ping Yee <[EMAIL PROTECTED]> wrote:
>
> That's my experience as well.  In my opinion, the purpose of round()
> is most commonly described as "to make an integer".  So it should
> yield an integer.

Grrk.  No, that logic is flawed.

There are algorithms where the operation of rounding (or truncation)
is needed, but where the value may be larger than can be held in an
integer, and that is not an error.  If the only rounding or truncation
primitive converts to an integer, those algorithms are unimplementable.
You need at least one primitive that converts a float to an integer,
held as a float.

Which is independent of whether THIS particular facility should yield
an integer or float!

Regards,
Nick Maclaren,
University of Cambridge Computing Service,
New Museums Site, Pembroke Street, Cambridge CB2 3QH, England.
Email:  [EMAIL PROTECTED]
Tel.:  +44 1223 334761Fax:  +44 1223 334679
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Rounding float to int directly ...

2006-08-03 Thread Nick Maclaren

Ronald Oussoren <[EMAIL PROTECTED]> wrote:
> 
> > There are algorithms where the operation of rounding (or truncation)
> > is needed, but where the value may be larger than can be held in an
> > integer, and that is not an error.
> 
> Is that really true for python? Python integers are unbounded in  
> magnitute, they are not the same as int or long in C, therefore any  
> float except exceptional values like NaN can be converted to an  
> integer value. The converse is not true, python integers can contain  
> values that are larger than any float (aka C's double).

It depends a great deal on what you mean by a Python integer!  Yes,
I was assuming the (old) Python model, where it is a C long, but so
were many (most?) of the other postings.

If you are assuming the (future?) model, where there is a single
integer type of unlimited size, then that is true.  There is still
an efficiency point, in that such algorithms positively don't want
a float value like 1.0e300 (or, worse, 1.0e4000) expanded to its
full decimal representation as an intermediate step.

Whatever.  There should still be at least one operation that rounds
or truncates a float value, returning a float of the same type, on
either functionality or efficiency grounds.  I and most programmers
of such algorithms don't give a damn which it does, provided that it
is clearly documented, at least half-sane and doesn't change with
versions of Python.

Regards,
Nick Maclaren,
University of Cambridge Computing Service,
New Museums Site, Pembroke Street, Cambridge CB2 3QH, England.
Email:  [EMAIL PROTECTED]
Tel.:  +44 1223 334761Fax:  +44 1223 334679
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Dicts are broken ...

2006-08-04 Thread Nick Maclaren

Michael Hudson <[EMAIL PROTECTED]> wrote:
> 
> I'd say it's more to do with __eq__.  It's a strange __eq__ method
> that raises an Exception, IMHO.

Not entirely.  Any type that supports invalid values (e.g. IEEE 754)
and is safe against losing the invalid state by accident needs to
raise an exception on A == B.  IEEE 754 is not safe.

> Please do realize that the motivation for this change was hours and
> hours of tortous debugging caused by a buggy __eq__ method making keys
> "impossibly" seem to not be in dictionaries.

Quite.  Been there - been caught by that.  It is a catastrophic (but
very common) misdesign to conflate failure and the answer "no".
There is a fundamental flaw of that nature in card-based banking,
that I pointed out was insoluble to the Grid people, and then got
caught by just a month later!

Regards,
Nick Maclaren,
University of Cambridge Computing Service,
New Museums Site, Pembroke Street, Cambridge CB2 3QH, England.
Email:  [EMAIL PROTECTED]
Tel.:  +44 1223 334761Fax:  +44 1223 334679
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] gcc 4.2 exposes signed integer overflows

2006-08-30 Thread Nick Maclaren

"Tim Peters" <[EMAIL PROTECTED]> wrote:
>
> This is a wrong time in the release process to take on chance on
> discovering a flaky LONG_MIN on some box, so I want to keep the code
> as much as possible like what's already there (which worked fine for >
> 10 years on all known boxes) for now.

No, it didn't.  I reported a bug a couple of years back.

A blanket rule not to use symbols is clearly wrong, but there are
good reasons not to want to rely on LONG_MIN (or INT_MIN for that
matter).  Because of some incredibly complex issues (which I only
know some of), it hasn't been consistently -(1+LONG_MAX) on twos'
complement machines.  There are good reasons for making it -LONG_MAX,
but they aren't the ones that actually cause it to be so.

There are, however, very good reasons for using BOTH tests.  I.e.
if I have a C system which defines LONG_MIN to be -LONG_MAX because
it uses -(1+LONG_MAX) for an integer NaN indicator in some contexts,
you really DON'T want to create such a value.  I don't know if there
are any such C systems, but there have been other languages that did.

I hope that Guido wasn't saying that Python should EVER rely on
signed integer overflow wrapping in twos' complement.  Despite the
common usage, Java and all that, it is perhaps the worst systematic
architectural change to have happened in 30 years, and accounts for
a good 30% of all hard bugs in many classes of program.  Simple
buffer overflow is fairly easy to avoid by good programming style;
integer overflow causing trashing of unpredictable data isn't.

Any decent programming language (like Python!) regards integer
overflow as an error, and the moves to make C copy Java semantics
are yet another step away from software engineering in the direction
of who-gives-a-damn hacking.

Regards,
Nick Maclaren,
University of Cambridge Computing Service,
New Museums Site, Pembroke Street, Cambridge CB2 3QH, England.
Email:  [EMAIL PROTECTED]
Tel.:  +44 1223 334761Fax:  +44 1223 334679
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Signals, threads, blocking C functions

2006-09-02 Thread Nick Maclaren

"Gustavo Carneiro" <[EMAIL PROTECTED]> wrote:
> 
> We have to resort to timeouts in pygtk in order to catch unix signals
> in threaded mode.

A common defect of modern designs - TCP/IP is particularly objectionable
in this respect, but that battle was lost and won over two decades ago :-(

> The reason is this.  We call gtk_main() (mainloop function) which
> blocks forever.  Suppose there are threads in the program; then any
> thread can receive a signal (e.g. SIGINT).  Python catches the signal,
> but doesn't do anything; it simply sets a flag in a global structure
> and calls Py_AddPendingCall(), and I guess it expects someone to call
> Py_MakePendingCalls().  However, the main thread is blocked calling a
> C function and has no way of being notified it needs to give control
> back to python to handle the signal.  Hence, we use a 100ms timeout
> for polling.  Unfortunately, timeouts needlessly consume CPU time and
> drain laptop batteries.

Yup.

> According to [1], all python needs to do to avoid this problem is
> block all signals in all but the main thread; then we can guarantee
> signal handlers are always called from the main thread, and pygtk
> doesn't need a timeout.

1) That page is password protected, so I can't see what it says, and
am disinclined to register myself to yet another such site.

2) No way, Jose, anyway.  The POSIX signal handling model was broken
beyond redemption, even before threading was added, and the combination
is evil almost beyond belief.  That procedure is good practice, yes,
but that is NOT all that you have to do - it may be all that you CAN
do, but that is not the same.

Come back MVS (or even VMS) - all is forgiven!  That is only partly
a joke.

> Another alternative would be to add a new API like
> Py_AddPendingCallNotification, which would let python notify
> extensions that new pending calls exist and need to be processed.

Nope.  Sorry, but you can't solve a broken design by adding interfaces.

>   But I would really prefer the first alternative, as it could be
> fixed within python 2.5; no need to wait for 2.6.

It clearly should be done, assuming that Python's model is that it
doesn't want to get involved with subthread signalling (and I really,
but REALLY, recommend not doing so).  The best that can be done is to
say that all signal handling is the business of the main thread and
that, when the system bypasses that, all bets are off.

>   Please, let's make Python ready for the enterprise! [2]

Given that no Unix variant or Microsoft system is, isn't that rather
an unreasonable demand?

I am probably one of the last half-dozen people still employed in a
technical capacity who has implemented run-time systems that supported
user-level signal handling with threads/asynchronicity and allowing
for signals received while in system calls.  It would be possible to
modify/extend POSIX or Microsoft designs to support this, but currently
they don't make it possible.  There is NOTHING that Python can do but
to minimise the chaos.

Regards,
Nick Maclaren,
University of Cambridge Computing Service,
New Museums Site, Pembroke Street, Cambridge CB2 3QH, England.
Email:  [EMAIL PROTECTED]
Tel.:  +44 1223 334761Fax:  +44 1223 334679
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Signals, threads, blocking C functions

2006-09-02 Thread Nick Maclaren

"Gustavo Carneiro" <[EMAIL PROTECTED]> wrote:
> 
>   Oh, sorry, here's the comment:
> 
> (coment by Arjan van de Ven):
> | afaik the kernel only sends signals to threads that don't have them blocked.
> | If python doesn't want anyone but the main thread to get signals, it
> should just
> | block signals on all but the main thread and then by nature, all
> signals will go
> | to the main thread

Well, THAT'S wrong, I am afraid!  Things ain't that simple :-(

Yes, POSIX implies that things work that way, but there are so many
get-out clauses and problems with trying to implement that specification
that such behaviour can't be relied on.

>   Well, Python has a broken design too; it postpones tasks and expects
> to magically regain control in order to finish the job.  That often
> doesn't happen!

Very true.  And that is another problem with POSIX :-(

>   Python is halfway there; it assumes signals are to be handled in the
> main thread.  However, it _catches_ them in any thread, sets a flag,
> and just waits for the next opportunity when it runs again in the main
> thread.  It is precisely this "split handling" of signals that is
> failing now. 

I agree that is not how to do it, but that code should not be removed.
Despite best attempts, there may well be circumstances under which
signals are received in a subthread, despite all attempts of the
program to ensure that the main thread gets them.

>   Anyway, attached a patch that should fix the problem in posix
> threads systems, in case anyone wants to review.

Not "fix" - "improve" :-)

I haven't looked at it, but I agree that what you have said is the
way to proceed.  The best solution is to enable the main thread for
all relevant signals, disable all subthreads, but to not rely on
any of that working in all cases.

It won't help with the problem where merely receiving a signal causes
chaos, or where blocking them does so, but there is nothing that Python
can do about that, in general.

Regards,
Nick Maclaren,
University of Cambridge Computing Service,
New Museums Site, Pembroke Street, Cambridge CB2 3QH, England.
Email:  [EMAIL PROTECTED]
Tel.:  +44 1223 334761Fax:  +44 1223 334679
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Signals, threads, blocking C functions

2006-09-04 Thread Nick Maclaren

"Gustavo Carneiro" <[EMAIL PROTECTED]> wrote:
>
> That's a very good point; I wasn't aware that child processes
> inherited the signals mask from their parent processes.

That's one of the few places where POSIX does describe what happens.
Well, usually.  You really don't want to know what happens when you
call something revolting, like csh or a setuid program.  This
particular mess is why I had to write my own nohup - the new POSIX
interfaces broke the existing one, and it remains broken today on
almost all systems.

>   I am now thinking of something along these lines:
> typedef void (*PyPendingCallNotify)(void *user_data);
> PyAPI_FUNC(void) Py_AddPendingCallNotify(PyPendingCallNotify callback,
> void *user_data);
> PyAPI_FUNC(void) Py_RemovePendingCallNotify(PyPendingCallNotify
> callback, void *user_data);

Why would that help?  The problems are semantic, not syntactic.

Anthony Baxter isn't exaggerating the problem, despite what you may
think from his posting.

Regards,
Nick Maclaren,
University of Cambridge Computing Service,
New Museums Site, Pembroke Street, Cambridge CB2 3QH, England.
Email:  [EMAIL PROTECTED]
Tel.:  +44 1223 334761Fax:  +44 1223 334679
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Signals, threads, blocking C functions

2006-09-04 Thread Nick Maclaren

Chris McDonough <[EMAIL PROTECTED]> wrote:
>
> Would adding an API for sigprocmask help here?

No.  sigprocmask is a large part of the problem.

Regards,
Nick Maclaren,
University of Cambridge Computing Service,
New Museums Site, Pembroke Street, Cambridge CB2 3QH, England.
Email:  [EMAIL PROTECTED]
Tel.:  +44 1223 334761Fax:  +44 1223 334679
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Signals, threads, blocking C functions

2006-09-04 Thread Nick Maclaren

"Gustavo Carneiro" <[EMAIL PROTECTED]> wrote:
>
>   You guys are tough customers to please.  I am just trying to solve a
> problem here, not create a new one; you have to believe me.

Oh, I believe you.

Look at it this way.  You are trying to resolve the problem that your
farm is littered with cluster bombs, and your cows keep blowing their
legs off.  Your solution is effectively saying "well, let's travel
around and pick them all up then".

>   We want to get rid of timeouts.  Now my idea: add a Python API to say:
>  "dear Python, please call me when you start having pending calls,
> even if from a signal handler context, ok?"

Yes, I know.  I have been there and done that, both academically and
(observing, as a consultant) to the vendor.  And that was on a system
that was a damn sight better engineered than any of the main ones that
Python runs on today.

I have attempted to do much EASIER tasks under both Unix and (earlier)
versions of Microsoft Windows, and failed dismally because the system
wasn't up to it.

> From that point on, signals will get handled by Python, python calls
> PyGTK, PyGTK calls a special API to safely wake up the main loop even
> from a thread or signal handler, then main loop checks for signal by
> calling PyErr_CheckSignals(), it is handled by Python, and the process
> lives happily ever after, or die trying.

The first thing that will happen to that beautiful theory when it goes
out into Unix County or Microsoft City is that a gang of ugly facts
will find it and beat it into a pulp.

>  I sincerely hope my explanation was satisfactory this time.

Oh, it was last time.  It isn't that that is the problem.

> Are signal handlers guaranteed to not be interrupted by another
> signal, at least?  What about threads?

No and no.  In theory, what POSIX says about blocking threads should
be reliable; in my experience, it almost is, except under precisely the
circumstances that you most want it to work.

Look, I am agreeing that your basic design is right.  What I am saying
is that (a) you cannot make delivery reliable and abolish timeouts
and (b) that it is such a revoltingly system-dependent mess that I
would much rather Python didn't fiddle with it.

Do you know how signalling is misimplemented at the hardware level?
And that it is possible for a handler to be called with any of its
critical pointers (INCLUDING the global code and data pointers) in
undefined states?  Do you know how to program round that sort of
thing?

I can answer "yes" to all three - for my sins, which must be many and
grievous, for that to be the case :-(

Regards,
Nick Maclaren,
University of Cambridge Computing Service,
New Museums Site, Pembroke Street, Cambridge CB2 3QH, England.
Email:  [EMAIL PROTECTED]
Tel.:  +44 1223 334761Fax:  +44 1223 334679
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Signals, threads, blocking C functions

2006-09-04 Thread Nick Maclaren

Jean-Paul Calderone <[EMAIL PROTECTED]> wrote:
> On Mon, 04 Sep 2006 17:24:56 +0100,
> David Hopwood <[EMAIL PROTECTED]
> der.co.uk> wrote:
> >Jean-Paul Calderone wrote:
> >> PyGTK would presumably implement its pending call callback by writing a
> >> byte to a pipe which it is also passing to poll().
> >
> >But doing that in a signal handler context invokes undefined behaviour
> >according to POSIX.
> 
> write(2) is explicitly listed as async-signal safe in IEEE Std 1003.1, 2004.
> Was this changed in a later edition?  Otherwise, I don't understand what you
> mean by this.

Try looking at the C90 or C99 standard, for a start :-(

NOTHING may safely be done in a real signal handler, except possibly
setting a value of type static volatile sig_atomic_t.  And even that
can be problematic.  And note that POSIX defers to C on what the C
languages defines.  So, even if the function is async-signal-safe,
the code that calls it can't be!

POSIX's lists are complete fantasy, anyway.  Look at the one that
defines thread-safety, and then try to get your mind around what
exit being thread-safe actually implies (especially with regard to
atexit functions).

Regards,
Nick Maclaren,
University of Cambridge Computing Service,
New Museums Site, Pembroke Street, Cambridge CB2 3QH, England.
Email:  [EMAIL PROTECTED]
Tel.:  +44 1223 334761Fax:  +44 1223 334679
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Signals, threads, blocking C functions

2006-09-04 Thread Nick Maclaren

Jean-Paul Calderone <[EMAIL PROTECTED]> wrote:
> 
> Thanks for expounding.  Given that it is basically impossible to do
> anything useful in a signal handler according to the relevant standards
> (does Python's current signal handler even avoid relying on undefined
> behavior?), how would you suggest addressing this issue?

Much as you are doing, and I described, but the first step would be
to find out what 'most' Python people need for signal handling in
threaded programs.  This is because there is an unavoidable conflict
between portability/reliability and functionality.

I would definitely block all signals in threads, except for those that
are likely to be generated ON the thread (SIGFPE etc.)  It is a very
good idea not to touch the handling of several of those, because doing
so can cause chaos.

I would have at least two 'standard' handlers, one of which would simply
set a flag and return, and the other of which would abort.  Now, NEITHER
is a very useful specification, but providing ANY information is risky,
which is why it is critical to know what people need.

I would not TRUST the blocking of signals, so would set up handlers even
when I blocked them, and would do the minimum fiddling in the main
thread compatible with decent functionality.

I would provide a call to test if the signal flag was set, and another
to test and clear it.  This would be callable ONLY from the main thread,
and that would be checked.

It is possible to do better, but that starts needing serious research.

> It seems to me that it is actually possible to do useful things in a
> signal handler, so long as one accepts that doing so is relying on
> platform specific behavior.

Unfortunately, that is wrong.  That was true under MVS and VMS, but
in Unix and Microsoft systems, the problem is that the behaviour is
both platform and circumstance-dependent.  What you can do reliably
depends mostly on what is going on at the time.

For example, on many Unix and Microsoft platforms, signals received
while you are in the middle of certain functions or system calls, or
certain particular signals (often SIGFPE), call the C handler with a
bad set of global pointers or similar.  I believe that this is one of
reasons (perhaps the main one) that some such failures so often cause
debuggers to be unable to find the stack pointer.

I have tracked a few of those down, and have occasionally identified
the cause (and even got it fixed!), but it is a murderous task, and
I know of few other people who have ever succeeded.

> How hard would it be to implement this for the platforms Python supports,
> rather than for a hypothetical standards-exact platform?

I have seen this effect on OSF/1, IRIX, Solaris, Linux and versions
of Microsoft Windows.  I have never used a modern BSD, haven't used
HP-UX since release 9, and haven't used Microsoft systems seriously
in years (though I did hang my new laptop in its GUI fairly easily).

As I say, this isn't so much a platform issue as a circumstance one.

Regards,
Nick Maclaren,
University of Cambridge Computing Service,
New Museums Site, Pembroke Street, Cambridge CB2 3QH, England.
Email:  [EMAIL PROTECTED]
Tel.:  +44 1223 334761Fax:  +44 1223 334679
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] Cross-platform math functions?

2006-09-05 Thread Nick Maclaren

Andreas Raab <[EMAIL PROTECTED]> wrote:
> 
> I'm curious if there is any interest in the Python community to achieve 
> better cross-platform math behavior. A quick test[1] shows a 
> non-surprising difference between the platform implementations. 
> Question: Is there any interest in changing the behavior to produce 
> identical results across platforms (for example by utilizing fdlibm 
> [2])? Since I have need for a set of cross-platform math functions I'll 
> probably start with a math-compatible fdlibm module (unless somebody has 
> done that already ;-)
> 
> [1] Using Python 2.4:
>  >>> import math
>  >>> math.cos(1.0e32)
> 
> WinXP:-0.39929634612021897
> LinuxX86: -0.49093671143542561

Well, I hope not, but I am afraid that there is :-(

The word "better" is emotive and inaccurate.  Such calculations are
numerically meaningless, and merely encourage the confusion between
consistency and correctness.  There is a strong sense in which giving
random results between -1 and 1 would be better.

Now, I am not saying that you don't have a requirement for consistency
but I am saying that confusing it with correctness (as has been fostered
by IEEE 754, Java etc.) is harmful.  One of the great advantages of the
wide variety of arithmetics available in the 1970s is that numerical
testing was easier and more reliable - if you got wildly different
results on two platforms, you got a strong pointer to numerical problems.

That viewpoint is regarded as heresy nowadays, but used not to be!

Regards,
Nick Maclaren,
University of Cambridge Computing Service,
New Museums Site, Pembroke Street, Cambridge CB2 3QH, England.
Email:  [EMAIL PROTECTED]
Tel.:  +44 1223 334761Fax:  +44 1223 334679
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Signals, threads, blocking C functions

2006-09-05 Thread Nick Maclaren

"Adam Olsen" <[EMAIL PROTECTED]> wrote:
> On 9/4/06, Gustavo Carneiro <[EMAIL PROTECTED]> wrote:
> 
> >   Now, we've had this API for a long time already (at least 2.5
> > years).  I'm pretty sure it works well enough on most *nix systems.
> > Event if it works 99% of the times, it's way better than *failing*
> > *100%* of the times, which is what happens now with Python.
> 
> Failing 99% of the time is as bad as failing 100% of the time, if your
> goal is to eliminate the short timeout on poll().  1% is quite a lot,
> and it would probably have an annoying tendency to trigger repeatedly
> when the user does certain things (not reproducible by you of course).

That can make it a lot WORSE that repeated failure.  At least with hard
failures, you have some hope of tracking them down in a reasonable time.
The problem with exception handling code that goes off very rarely,
under non-reproducible circumstances, is that it is almost untestable
and that bugs in it are positive nightmares.  I have been inflicted
with quite a large number in my time, and have a fairly good success
rate, but the number of people who know the tricks is decreasing.

Consider the (real) case where an unpredictable process on a large
server (64 CPUs) was failing about twice a week (detectably), with
no indication of how many failures were giving wrong answers.  We
replaced dozens of DIMMs, took days of down time and got nowhere;
it then went hard (i.e. one failure a day).  After a week's total
down time, with me spending 100% of my time on it and the vendor
allocating an expert at high priority, we cracked it.  We were very
lucky to find it so fast.

I could give you other examples that were/are there years and decades
later, because the pain threshhold never got high enough to dedicate
the time (and the VERY few people with experience).  I know of at
least one such problem in generic TCP/IP (i.e. on Linux, IRIX,
AIX and possibly Solaris) that has been there for decades and causes
occasional failure in most networked applications/protocols.

Regards,
Nick Maclaren,
University of Cambridge Computing Service,
New Museums Site, Pembroke Street, Cambridge CB2 3QH, England.
Email:  [EMAIL PROTECTED]
Tel.:  +44 1223 334761Fax:  +44 1223 334679
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Signals, threads, blocking C functions

2006-09-05 Thread Nick Maclaren

"Gustavo Carneiro" <[EMAIL PROTECTED]> wrote:
>
> Anyway, I was speaking hypothetically.  I'm pretty sure writing to a
> pipe is async signal safe.  It is the oldest trick in the book,
> everyone uses it.  I don't have to see a written signed contract to
> know that it works.

Ah.  Well, I can assure you that it's not the oldest trick in the book,
and not everyone uses it.

> This is all the evidence that I need.  And again I reiterate that
> whether or not async safety can be achieved in practice for all
> platforms is not Python's problem.

I wish you the joy of trying to report a case where it doesn't work
to a large vendor and get them to accept that it is a bug.

> Although I believe writing to a
> pipe is 100% reliable for most platforms.  Even if it is not, any
> mission critical application relying on signals for correct behaviour
> should be rewritten to use unix sockets instead; end of argument.

Er, no.  There are lots of circumstances where that isn't feasible,
such as wanting to close down an application cleanly when the scheduler
sends it a SIGXCPU.

Regards,
Nick Maclaren,
University of Cambridge Computing Service,
New Museums Site, Pembroke Street, Cambridge CB2 3QH, England.
Email:  [EMAIL PROTECTED]
Tel.:  +44 1223 334761Fax:  +44 1223 334679
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Signals, threads, blocking C functions

2006-09-05 Thread Nick Maclaren

Johan Dahlin <[EMAIL PROTECTED]> wrote:
>
> Are you saying that we should let less commonly used platforms dictate
> features and functionality for the popular ones?
> I mean, who uses HP/UX, SCO and [insert your favorite flavor] as a modern
> desktop system where this particular bug makes a difference?

You haven't been following the thread.  As I posted, this problem
occurs to a greater or lesser degree on all platforms.  This will be
my last posting on the topic, but I shall try to explain.

The first problem is in the hardware and operating system.  A signal
interrupts the thread, and passes control to a handler with a very
partial environment and (usually) information on the environment
when it was interrupted.  If it interrupted the thread in the middle
of a system call or other library routine that uses non-Python
conventions, the registers and other state may be weird.  There ARE
solutions to this, but they are unbelievably foul, and even Linux
on x86 gas had trouble with this.  And, on return, everything has to
be reversed entirely transparently!

It is VERY common for there to be bugs in the C run-time system and
not rare for there to be ones in the kernel (that area of Linux has
been rewritten MANY times, for this reason).  In many cases, the
run-time system simply doesn't pretend to handle interrupts in
arbitrary code (which is where the C undefined behaviour is used by
vendors).

The second problem is that what you can do depends both on what you
were doing and how your 'primitive' is implemented.  For example, if
you call something that takes out even a very short term lock or uses
a spin loop to emulate an atomic operation, you had better not use it
if you interrupted code that was doing the same.  Your thread may
hang, crash or otherwise go bananas.  Can you guarantee that even
write is free of such things?  No, and certainly not if you are using
a debugger, a profiling library or even tracing system calls.  I have
often used programs that crashed as soon as I did one of those :-(

Related to this is that it is EXTREMELY hard to write synchronisation
primitives (mutexes etc.) that are interrupt-safe - MUCH harder than
to write thread-safe ones - and few people are even aware of the
issues.  There was a thread on some Linux kernel mailing list about
this, and even the kernel developers were having headaches thinking
about the issues.

Even if write is atomic, there are gotchas.  What if the interrupted
code is doing something to that file at the time?  Are you SURE that
an unexpected operation on it (in the same thread) won't cause the
library function of program to get confused?  And can you be sure
that the write will terminate fast enough to not cause time-critical
code to fail?  And have you studied the exact semantics of blocking
on pipes?  They are truly horrible.

So this is NOT a matter of platform X is safe and platform Y isn't.
Even Linux x86 isn't entirely safe - or wasn't, the last time I heard.

Regards,
Nick Maclaren,
University of Cambridge Computing Service,
New Museums Site, Pembroke Street, Cambridge CB2 3QH, England.
Email:  [EMAIL PROTECTED]
Tel.:  +44 1223 334761Fax:  +44 1223 334679
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Signals, threads, blocking C functions

2006-09-09 Thread Nick Maclaren

I was hoping to have stopped, but here are a few comments.

I agree with Jan Kanis.  That is the way to tackle this one.

"Adam Olsen" <[EMAIL PROTECTED]> wrote:
> 
> I don't think we should let this die, at least not yet.  Nick seems to
> be arguing that ANY signal handler is prone to random crashes or
> corruption (due to bugs).  However, we already have a signal handler,
> so we should already be exposed to the random crashes/corruption.

No.  I am afraid that is a common myth and often catastrophic mistake.
In this sort of area, NEVER assume that even apparently unrelated changes
won't cause 'working' code to misbehave.  Yes, Python is already exposed,
but it would be easy to turn a very rare failure into a more common one.

What I was actually arguing for was defensive programming.

> If we're going to rely on signal handling being correct then I think
> we should also rely on write() being correct.  Note that I'm not
> suggesting an API that allows arbitrary signal handlers, but rather
> one that calls write() on an array of prepared file descriptors
> (ignoring errors).

For your interpretation of 'correct'.  The cause of this chaos is that
the C and POSIX standards are inconsistent, even internally, and they
are wildly incompatible.  So, even if things 'work' today, don't bet on
the next release of your favourite system behaving the same way.

It wouldn't matter if there was a de facto standard (i.e. a consensus),
but there isn't.

> Ensuring modifications to that array are atomic would be tricky, but I
> think it would be doable if we use a read-copy-update approach (with
> two alternating signal handler functions).  Not sure how to ensure
> there's no currently running signal handlers in another thread though.
>  Maybe have to rip the atomic read/write stuff out of the Linux
> sources to ensure it's *always* defined behavior.

Yes.  But even that wouldn't solve the problem, as that code is very
gcc-specific.

> Looking into the existing signalmodule.c, I see no attempts to ensure
> atomic access to the Handlers data structure.  Is the current code
> broken, at least on non-x86 platforms?

Well, at a quick glance at the actual handler (the riskiest bit):

1) It doesn't check the signal range - bad practice, as systems
do sometimes generate wayward numbers.

2) Handlers[sig_num].tripped = 1; is formally undefined, but
actually pretty safe.  If that breaks, nothing much will work.  It
would be better to make the int sig_atomic_t, as you say.

3) is_tripped++; and Py_AddPendingCall(checksignals_witharg, NULL);
will work only because the handler ignores all signals in subthreads
(which is definitely NOT right, as the comments say).

Despite the implication, the code of Py_AddPendingCall is NOT safe
against simultaneous registration.  It is just plain broken, I am
afraid.  The note starting "Darn" should be a LOT stronger :-)

[ For example, think of two threads calling the function at exactly
the same time, in almost perfect step.  Oops. ]

I can't honestly promise to put any time into this in the forseeable
future, but will try (sometime).  If anyone wants to tackle this,
please ask me for comments/help/etc.

Regards,
Nick Maclaren,
University of Cambridge Computing Service,
New Museums Site, Pembroke Street, Cambridge CB2 3QH, England.
Email:  [EMAIL PROTECTED]
Tel.:  +44 1223 334761Fax:  +44 1223 334679
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Caching float(0.0)

2006-09-29 Thread Nick Maclaren

"Jason Orendorff" <[EMAIL PROTECTED]> wrote:
>
> Anyway, this kind of static analysis is probably more entertaining
> than relevant.  ...

Well, yes.  One can tell that by the piffling little counts being
bandied about!  More seriously, yes, it is Well Known that 0.0 is
the Most Common Floating-Point Number is most numerical codes; a
lot of older (and perhaps modern) sparse matrix algorithms use that
to save space.

In the software floating-point that I have started to draft some
example code but have had to shelve (no, I haven't forgotten) the
values I predefine are Invalid, Missing, True Zero and Approximate
Zero.  The infinities and infinitesimals (a.k.a. signed zeroes)
could also be included, but are less common and more complicated.
And so could common integers and fractions.

It is generally NOT worth doing a cache lookup for genuinely
numerical code, as the common cases that are not the above rarely
account for enough of the numbers to be worth it.  I did a fair
amount of investigation looking for compressibility at one time,
and that conclusion jumped out at me.

The exact best choice depends entirely on what you are doing.

Regards,
Nick Maclaren,
University of Cambridge Computing Service,
New Museums Site, Pembroke Street, Cambridge CB2 3QH, England.
Email:  [EMAIL PROTECTED]
Tel.:  +44 1223 334761Fax:  +44 1223 334679
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Caching float(0.0)

2006-10-03 Thread Nick Maclaren

"Terry Reedy" <[EMAIL PROTECTED]> wrote:
>
> For true floating point measurements (of temperature, for instance), 
> 'integral' measurements (which are an artifact of the scale used (degrees F 
> versus C versus K)) should generally be no more common than other realized 
> measurements.

Not quite, but close enough.  A lot of algorithms use a conversion to
integer, or some of the values are actually counts (e.g. in statistics),
which makes them a bit more likely.  Not enough to get excited about,
in general.

> Thirty years ago, a major stat package written in Fortran (BMDP) required 
> that all data be stored as (Fortran 4-byte) floats for analysis.  So a 
> column of yes/no or male/female data would be stored as 0.0/1.0 or perhaps 
> 1.0/2.0.  That skewed the distribution of floats.  But Python and, I hope, 
> Python apps, are more modern than that.

And SPSS and Genstat and others - now even Excel 

> Float caching strikes me a a good subject for cookbook recipies, but not, 
> without real data and a willingness to slightly screw some users, for the 
> default core code.

Yes.  It is trivial (if tedious) to add analysis code - the problem
is finding suitable representative applications.  That was always
my difficulty when I was analysing this sort of thing - and still
is when I need to do it!

> Nick Craig-Wood <[EMAIL PROTECTED]> wrote:
> 
> For my application caching 0.0 is by far the most important. 0.0 has
> ~200,000 references - the next highest reference count is only about ~200.

Yes.  All the experience I have ever seen over the past 4 decades
confirms that is the normal case, with the exception of floating-point
representations that have a missing value indicator.

Even in IEEE 754, infinities and NaN are rare unless the application
is up the spout.  There are claims that a lot of important ones have
a lot of NaNs and use them as missing values but, despite repeated
requests, none of the people claiming that have ever provided an
example.  There are some pretty solid grounds for believing that
those claims are not based in fact, but are polemic.

Regards,
Nick Maclaren,
University of Cambridge Computing Service,
New Museums Site, Pembroke Street, Cambridge CB2 3QH, England.
Email:  [EMAIL PROTECTED]
Tel.:  +44 1223 334761Fax:  +44 1223 334679
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Caching float(0.0)

2006-10-03 Thread Nick Maclaren

=?iso-8859-1?Q?Kristj=E1n_V=2E_J=F3nsson?= <[EMAIL PROTECTED]> wrote:
>
> The total count of floating point numbers allocated at this point is 985794.
> Without the reuse, they would be 1317145, so this is a saving of 25%, and
> of 5Mb.

And, if you optimised just 0.0, you would get 60% of that saving at
a small fraction of the cost and considerably greater generality.
It isn't clear whether the effort justifies doing more.

Regards,
Nick Maclaren,
University of Cambridge Computing Service,
New Museums Site, Pembroke Street, Cambridge CB2 3QH, England.
Email:  [EMAIL PROTECTED]
Tel.:  +44 1223 334761Fax:  +44 1223 334679
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Caching float(0.0)

2006-10-03 Thread Nick Maclaren

[EMAIL PROTECTED] wrote:
>
> Doesn't that presume that optimizing just 0.0 could be done easily?  Suppose
> 0.0 is generated all over the place in EVE?

Yes, and it isn't, respectively!  The changes in floatobject.c would
be trivial (if tedious), and my recollection of my scan is that
floating values are not generated elsewhere.

It would be equally easy to add a general caching algorithm, but
that would be a LOT slower than a simple floating-point comparison.
The problem (in Python) isn't hooking the checks into place,
though it could be if Python were implemented differently.

Regards,
Nick Maclaren,
University of Cambridge Computing Service,
New Museums Site, Pembroke Street, Cambridge CB2 3QH, England.
Email:  [EMAIL PROTECTED]
Tel.:  +44 1223 334761Fax:  +44 1223 334679
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Caching float(0.0)

2006-10-03 Thread Nick Maclaren

=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?= <[EMAIL PROTECTED]> wrote:
> 
> >> The total count of floating point numbers allocated at this point is 
> >> 985794.
> >> Without the reuse, they would be 1317145, so this is a saving of 25%, and
> >> of 5Mb.
> > 
> > And, if you optimised just 0.0, you would get 60% of that saving at
> > a small fraction of the cost and considerably greater generality.
> 
> As Michael Hudson observed, this is difficult to implement, though:
> You can't distinguish between -0.0 and +0.0 easily, yet you should.

That was the point of a previous posting of mine in this thread :-(

You shouldn't, despite what IEEE 754 says, at least if you are
allowing for either portability or numeric validation.

There are a huge number of good reasons why IEEE 754 signed zeroes
fit extremely badly into any normal programming language and are
seriously incompatible with numeric validation, but Python adds more.
Is there any other type where there are two values that are required
to be different, but where both the hash is required to be zero and
both are required to evaluate to False in truth value context?

Regards,
Nick Maclaren,
University of Cambridge Computing Service,
New Museums Site, Pembroke Street, Cambridge CB2 3QH, England.
Email:  [EMAIL PROTECTED]
Tel.:  +44 1223 334761Fax:  +44 1223 334679
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Caching float(0.0)

2006-10-03 Thread Nick Maclaren

=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?= <[EMAIL PROTECTED]> wrote:
>
> Ah, you are proposing a semantic change, then: -0.0 will become
> unrepresentable, right?

Well, it is and it isn't.

Python currently supports only some of IEEE 754, and that is more by
accident than design - because that is exactly what C90 implementations
do!  There is code in floatobject.c that assumes IEEE 754, but Python
does NOT attempt to support it in toto (it is not clear if it could),
not least because it uses C90.

And, as far as I know, none of that is in the specification, because
Python is at least in theory portable to systems that use other
arithmetics and there is no current way to distinguish -0.0 from 0.0
except by comparing their representations!  And even THAT depends
entirely on whether the C library distinguishes the cases, as far
as I can see.

So distinguishing -0.0 from 0.0 isn't really in Python's current
semantics at all.  And, for reasons that we could go into, I assert
that it should not be - which is NOT the same as not supporting
branch cuts in cmath.

Regards,
Nick Maclaren,
University of Cambridge Computing Service,
New Museums Site, Pembroke Street, Cambridge CB2 3QH, England.
Email:  [EMAIL PROTECTED]
Tel.:  +44 1223 334761Fax:  +44 1223 334679
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Caching float(0.0)

2006-10-03 Thread Nick Maclaren

=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?= <[EMAIL PROTECTED]> wrote:
>
> py> x=-0.0
> py> y=0.0
> py> x,y

Nobody is denying that SOME C90 implementations distinguish them,
but it is no part of the standard - indeed, a C90 implementation is
permitted to use ANY criterion for deciding when to display -0.0 and
0.0.  C99 is ambiguous to the point of internal inconsistency, except
when __STDC_IEC_559__ is set to 1, though the intent is clear.

And my reading of Python's code is that it relies on C's handling
of such values.

Regards,
Nick Maclaren,
University of Cambridge Computing Service,
New Museums Site, Pembroke Street, Cambridge CB2 3QH, England.
Email:  [EMAIL PROTECTED]
Tel.:  +44 1223 334761Fax:  +44 1223 334679
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Caching float(0.0)

2006-10-04 Thread Nick Maclaren

Alastair Houghton <[EMAIL PROTECTED]> wrote:
> 
> AFAIK few systems have floating point traps enabled by default (in  
> fact, isn't that what IEEE 754 specifies?), because they often aren't  
> very useful.

The first two statements are true; the last isn't.  They are extremely
useful, not least because they are the only practical way to locate
numeric errors in most 3 GL programs (including C, Fortran etc.)

> And in the specific case of the Python interpreter, why  
> would you ever want them turned on?  Surely in order to get  
> consistent floating point semantics, they need to be *off* and Python  
> needs to handle any exceptional cases itself; even if they're on, by  
> your argument Python must do that to avoid being terminated.

Grrk.  Why are you assuming that turning them off means that the
result is what you expect?  That isn't always so - sometimes it
merely means that you get wrong answers but no indication of that.

> or see if it can't turn them off using the C99  APIs.

That is a REALLY bad idea.  You have no idea how broken that is,
and what the impact it would be on Python.

Regards,
Nick Maclaren,
University of Cambridge Computing Service,
New Museums Site, Pembroke Street, Cambridge CB2 3QH, England.
Email:  [EMAIL PROTECTED]
Tel.:  +44 1223 334761Fax:  +44 1223 334679
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Caching float(0.0)

2006-10-04 Thread Nick Maclaren

James Y Knight <[EMAIL PROTECTED]> wrote:
> 
> This is a really poor argument. Python should be moving *towards*  
> proper '754 fp support, not away from it. On the platforms that are  
> most important, the C implementations distinguish positive and  
> negative 0. That the current python implementation may be defective  
> when the underlying C implementation is defective doesn't excuse a  
> change to intentionally break python on the common platforms.

Perhaps you might like to think why only IBM POWERx (and NOT the
Cell or most embedded POWERs) is the ONLY mainstream system to have
implemented all of IEEE 754 in hardware after 22 years?  Or why
NO programming language has provided support in those 22 years,
and only Java and C have even claimed to?

See Kahan's "How Javas Floating-Point Hurts Everyone Everywhere",
note that C99 is much WORSE, and then note that Java and C99 are
the only languages that have even attempted to include IEEE 754.

You have also misunderstood the issue.  The fact that a C implementation
doesn't support it does NOT mean that the implementation is defective;
quite the contrary.  The issue always has been that IEEE 754's basic
model is incompatible with the basic models of all programming
languages that I am familiar with (which is a lot).  And the specific
problems with C99 are in the STANDARD, not the IMPLEMENTATIONS.

> IEEE 754 is so widely implemented that IMO it would make sense to  
> make Python's floating point specify it, and simply declare floating  
> point operations on non-IEEE 754 machines as "use at own risk, may  
> not conform to python language standard". (or if someone wants to use  
> a software fp library for such machines, that's fine too).

Firstly, see the above.  Secondly, Python would need MAJOR semantic
changes to conform to IEEE 754R.  Thirdly, what would you say to
the people who want reliable error detection on floating-point of
the form that Python currently provides?

Regards,
Nick Maclaren,
University of Cambridge Computing Service,
New Museums Site, Pembroke Street, Cambridge CB2 3QH, England.
Email:  [EMAIL PROTECTED]
Tel.:  +44 1223 334761Fax:  +44 1223 334679
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Caching float(0.0)

2006-10-04 Thread Nick Maclaren

On Wed, Oct 04, 2006 at 12:42:04AM -0400, Tim Peters wrote:
>
> > If C90 doesn't distinguish -0.0 and +0.0, how can Python?
> 
> > Can you give a simple example where the difference between the two
> > is apparent to the Python programmer?
> 
> Perhaps surprsingly, many (well, comparatively many, compared to none
> ) people have noticed that the platform atan2 cares a lot:

Once upon a time, floating-point was used as an approximation to
mathematical real numbers, and anything which was mathematically
undefined in real arithmetic was regarded as an error in floating-
point.  This allowed a reasonable amount of numeric validation,
because the main remaining discrepancy was that floating-point
has only limited precision and range.

Most of the numerical experts that I know of still favour that
approach, and it is the one standardised by the ISO LIA-1, LIA-2
and LIA-3 standards for floating-point arithmetic.

atan2(0.0,0.0) should be an error.

But C99 differs.  While words do not fail me, they are inappropriate
for this mailing list :-(

Regards,
Nick Maclaren,
University of Cambridge Computing Service,
New Museums Site, Pembroke Street, Cambridge CB2 3QH, England.
Email:  [EMAIL PROTECTED]
Tel.:  +44 1223 334761Fax:  +44 1223 334679
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Cloning threading.py using proccesses

2006-10-12 Thread Nick Maclaren

"M.-A. Lemburg" <[EMAIL PROTECTED]> wrote:
>
> This is hard to believe. I've been in that business for a few
> years and so far have not found an OS/hardware/network combination
> with the mentioned features.

Surely you must have - unless there is another M.-A. Lemburg in IT!
Some of the specialist systems, especially those used for communication,
were like that, and it is very likely that many still are.  But they
aren't currently in Python's domain.  I have never used any, but have
colleagues who have.

Regards,
Nick Maclaren,
University of Cambridge Computing Service,
New Museums Site, Pembroke Street, Cambridge CB2 3QH, England.
Email:  [EMAIL PROTECTED]
Tel.:  +44 1223 334761Fax:  +44 1223 334679
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Cloning threading.py using proccesses

2006-10-12 Thread Nick Maclaren

Josiah Carlson <[EMAIL PROTECTED]> wrote:
> 
> It would be convenient, yes, but the question isn't always 'threads or
> processes?'  In my experience (not to say that it is more or better than
> anyone else's), when going multi-process, the expense on some platforms
> is significant enough to want to persist the process (this is counter to
> my previous forking statement, but its all relative). And sometimes one
> *wants* multiple threads running in a single process handling multiple
> requests.

Yes, indeed.

This is all confused by the way that POSIX (and Microsoft) threads
have become essentially just processes with shared resources.  If
one had a system with real, lightweight threads, the same might
well not be so.

Regards,
Nick Maclaren,
University of Cambridge Computing Service,
New Museums Site, Pembroke Street, Cambridge CB2 3QH, England.
Email:  [EMAIL PROTECTED]
Tel.:  +44 1223 334761Fax:  +44 1223 334679
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Signals, threads, blocking C functions

2006-10-12 Thread Nick Maclaren

Sorry.  I was on holiday, and then buried this when sorting out my
thousands of Emails on my return, partly because I had to look up the
information!

=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?= <[EMAIL PROTECTED]> wrote:
> 
> >> | afaik the kernel only sends signals to threads that don't have them 
> >> blocked.
> >> | If python doesn't want anyone but the main thread to get signals, it
> >> should just
> >> | block signals on all but the main thread and then by nature, all
> >> signals will go
> >> | to the main thread
> > 
> > Well, THAT'S wrong, I am afraid!  Things ain't that simple :-(> 
> > Yes, POSIX implies that things work that way, but there are so many
> > get-out clauses and problems with trying to implement that specification
> > that such behaviour can't be relied on.
> 
> Can you please give one example for each (one get-out clause, and
> one problem with trying to implement that).

http://www.opengroup.org/onlinepubs/009695399/toc.htm

2.4.1 Signal Generation and Delivery

It is extremely unclear what that means, but it talks about the
generation and delivery of signals to both threads and processes.
I can tell you (from speaking to system developers) that they
understand that to mean that they are allowed to send signals to
specific threads when that is appropriate.  But they are as
confused by POSIX's verbiage as I am!

> I fail to see why it isn't desirable to make all signals occur
> in the main thread, on systems where this is possible.

Oh, THAT's easy.  Consider a threaded application running on a
muti-CPU machine and consider hardware generated signals (e.g.
SIGFPE, SIGSEGV etc.)  Sending them to the master thread involves
either moving them between CPUs or moving the master thread; both
are inefficient and neither may be possible.

[ I have brought systems down with signals that did have to be
handled on a particular CPU, by flooding that with signals from
dozens of others (yes, big SMPs) and blocking out high-priority
interrupts.  The efficiency point can be serious. ]

That also applies to many of the signals that do not reach programs,
such as TLB misses, ECC failure etc.  But, in those cases, what does
Python or even POSIX need to know about them?

Regards,
Nick Maclaren,
University of Cambridge Computing Service,
New Museums Site, Pembroke Street, Cambridge CB2 3QH, England.
Email:  [EMAIL PROTECTED]
Tel.:  +44 1223 334761Fax:  +44 1223 334679
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Signals, threads, blocking C functions

2006-10-12 Thread Nick Maclaren

=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?= <[EMAIL PROTECTED]> wrote:
Michael Hudson schrieb:
> 
> >> According to [1], all python needs to do to avoid this problem is
> >> block all signals in all but the main thread;
> > 
> > Argh, no: then people who call system() from non-main threads end up
> > running subprocesses with all signals masked, which breaks other
> > things in very mysterious ways.  Been there...
> 
> Python should register a pthread_atfork handler then, which clears
> the signal mask. Would that not work?

No.  It's not the only such problem.

Personally, I think that anyone who calls system(), fork(), spawn()
or whatever from threads is cuckoo.  It is precisely the sort of
thing that is asking for trouble, because there are so many ways
of doing it 'right' that you can't be sure exactly what mental
model the system developers will have.

Regards,
Nick Maclaren,
University of Cambridge Computing Service,
New Museums Site, Pembroke Street, Cambridge CB2 3QH, England.
Email:  [EMAIL PROTECTED]
Tel.:  +44 1223 334761Fax:  +44 1223 334679
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Floor division

2007-01-22 Thread Nick Maclaren

"Guido van Rossum" <[EMAIL PROTECTED]> wrote:
> 
> That really sucks, especially since the whole point of making int
> division return a float was to make the integers embedded in the
> floats... I think the best solution would be to remove the definition
> of % (and then also for divmod()) for floats altogether, deferring to
> math.fmod() instead.

Please, NO!!!

At least not without changing the specification of the math module.
The problem with it is that it is specified to be a mapping of the
underlying C library, complete with its  error handling.
fmod isn't bad, as  goes, BUT:

God alone knows what happens with fmod(x,0.0), let alone fmod(x,-0.0).
C99 says that it is implementation-defined whether a domain error
occurs or the function returns zero, but domain errors are defined
behaviour only in C90 (and not in C99!)  It is properly defined only
if Annex F is in effect (with all the consequences that implies).

Note that I am not saying that syntactic support is needed, because
Fortran gets along perfectly well with this as a function.  All I
am saying is that we want a defined function with decent error
handling!  Returning a NaN is fine on systems with proper NaN support,
which is why C99 Annex F fmod is OK.

> For ints and floats, real could just return self, and imag could
> return a 0 of the same type as self. I guess the conjugate() function
> could also just return self (although I see that conjugate() for a
> complex with a zero imaginary part returns something whose imaginary
> part is -0; is that intentional? I'd rather not have to do that when
> the input is an int or float, what do you think?)

I don't see the problem in doing that - WHEN implicit conversion
to a smaller domain, losing information, raises an exception.  The
errors caused by needing a 'cast' (including Fortran INT, DBLE and
(ugh) COMPLEX, here) causing not just conversion but information
loss have caused major trouble for as long as I have been around.

Regards,
Nick Maclaren,
University of Cambridge Computing Service,
New Museums Site, Pembroke Street, Cambridge CB2 3QH, England.
Email:  [EMAIL PROTECTED]
Tel.:  +44 1223 334761Fax:  +44 1223 334679
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Floor division

2007-01-23 Thread Nick Maclaren

"Tim Peters" <[EMAIL PROTECTED]> wrote:
>
> > I guess the conjugate() function could also just return self (although I see
> > that conjugate() for a complex with a zero imaginary part returns
> > something whose imaginary part is -0; is that intentional?
> 
> That's wrong, if true:  it should return something with the opposite
> sign on the imaginary part, whether or not that equals 0 (+0. and -0.
> both "equal 0").

Grrk.  Why?  Seriously.  IEEE 754 signed zeroes are deceptive enough
for float, but are a gibbering nightmare for complex; Kahan may be
able to handle them, but mere mortals can't.  Inter alia, the only
sane forms of infinity for complex numbers are a SINGLE one (the
compactified model) and to may infinity into NaN (which I prefer,
as it leads to less nonsense).

And, returning to 'floor' - if one is truncating towards -infinity,
should floor(-0.0) deliver -1.0, 0.0 or -0.0?

> math.fmod is 15 years old -- whether or not someone likes it has
> nothing to do with whether Python should stop trying to use the
> current integer-derived meaning of % for floats.

Eh?  No, it isn't.  Because of the indirection to the C library, it
is changing specification as we speak!  THAT is all I am getting at;
not that the answer might not be A math.fmod with defined behaviour.

> On occasion we've added additional error checking around functions
> inherited from C.  But adding code to return a NaN has never been
> done.  If you want special error checking added to the math.fmod
> wrapper, it would be easiest to "sell" by far to request that it raise
> ZeroDivisionError (as integer mod does) for a modulus of 0, or
> ValueError (Python's historic mapping of libm EDOM, and what Python's
> fmod(1, 0) already does on some platforms).  The `decimal` module
> raises InvalidOperation in this case, but that exception is specific
> to the `decimal` module for now.

I never said that it should; I said that it is reasonable behaviour
on systems that support them.  I personally much prefer an exception
in this case.  What I was trying to point out is that the current
behaviour is UNDEFINED (and may give total nonsense).  That is not
good.

> >> For ints and floats, real could just return self, and imag could
> >> return a 0 of the same type as self. I guess the conjugate() function
> >> could also just return self (although I see that conjugate() for a
> >> complex with a zero imaginary part returns something whose imaginary
> >> part is -0; is that intentional? I'd rather not have to do that when
> >> the input is an int or float, what do you think?)
> 
> > I don't see the problem in doing that - WHEN implicit conversion
> > to a smaller domain, losing information, raises an exception.
> 
> Take it as a pragmatic fact that it wouldn't.  Besides, e.g., the
> conjugate of 10**5 is exactly 10**5 mathematically.  Why raise
> an exception just because it can't be represented as a float?  The
> exact result is easily supplied with a few lines of "obviously
> correct" implementation code (incref `self` and return it).

Eh?  I don't understand.  Are you referring to float("1.0e5"),
pow(10,5), pow(10.0,5), or a conjugate (and, if so, of what?)

float(conjg(1.23)) obviously need not raise an exception, except
possibly "Sanity failure" :-)  

Regards,
Nick Maclaren,
University of Cambridge Computing Service,
New Museums Site, Pembroke Street, Cambridge CB2 3QH, England.
Email:  [EMAIL PROTECTED]
Tel.:  +44 1223 334761Fax:  +44 1223 334679
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Floor division

2007-01-23 Thread Nick Maclaren

A generic comment.  Many of my postings seem to be being misunderstood. 
I hold no brief for ANY particular floating-point religion, sect or
heresy, except insofar as it affects robustness and portability (i.e.
"software engineering").  I can work with and teach almost any model,
and have done so with some pretty weird ones.

My objections to some proposals is that they are sacrificing those
in favour of some ill-defined objectives.

"Tim Peters" <[EMAIL PROTECTED]> wrote:
> [TIm Peters]
> >> That's wrong, if true:  it should return something with the opposite
> >> sign on the imaginary part, whether or not that equals 0 (+0. and -0.
> >> both "equal 0").
> 
> |[Nick Maclaren]
> > Grrk.  Why?  Seriously.
> 
> Seriously:  because there's some reason to do so and no good reason
> not to.

Hmm.  That doesn't fully support the practice, except for IEEE 754(R)
numbers.  To require a floating-point format to have signed zeroes
is a religious matter.  But I agree that specifying something different
if the numbers are an IEEE 754(R) format makes no sense.

> > And, returning to 'floor' - if one is truncating towards -infinity,
> > should floor(-0.0) deliver -1.0, 0.0 or -0.0?
> 
> I'd leave a zero argument alone (for ceiling too), and am quite sure
> that's "the right" 754-ish behavior.

It's not clear, and there was a debate about it!  But it is what
IEEE 754R ended up specifying.

> Couldn't quite parse that, but nearly all of Python's math-module
> functions inherit most behavior from the platform libm.  This is often
> considered to be a feature:  the functions called from Python
> generally act much like they do when called from C or Fortran on the
> same platform, easing cross-language development on a single platform.

And making it impossible to write robust, portable code :-(  Note that
most platforms have several libms and the behaviour even for a single
libm can be wildly variable.  It can also ENHANCE cross-language
problems, where a user needs to use a library that expects a different
libm or libm option.

> Do note the flip side:  to the extent that different platform
> religions refuse to standardize libm endcase behavior, Python plays
> along with whatever libm gods the platform it's running on worships.
> That's of value to some too.

Actually, no, it doesn't.  Because Python doesn't support any libm
behaviour other than the one that it was compiled with, and that is
often NOT what is wanted.

> So which one would you prefer?  As explained, there are 3 plausible
> candidates.
> 
> You seem to be having some trouble taking "yes" for an answer here ;-)

Actually, there are a lot more candidates, but let that pass.  All
I am saying is that there should be SOME defined AND SANE behaviour.
While I would prefer an exception, I am not dogmatic about it.  What
I can't stand is completely undefined behaviour, as was introduced
into Python by C99.

> > What I was trying to point out is that the current behaviour is
> > UNDEFINED (and may give total nonsense).  That is not
> > good.
> 
> Eh -- I can't get excited about it.  AFAIK, in 15 years nobody has
> complained about passing a 0 modulus to math.fmod (possibly because
> most newbies use the Windows distro, and it does raise ValueError
> there).  

Some people write Python that is intended to be robust and portable;
it is those people who suffer.

> What Guido would rather do, which I agreed with, was to have
> x.conjugate() simply return x when x is float/int/long.  No change in
> value, no change in type, and the obvious implementation would even
> make ...

Fine.  I am happy with that.  What I was pointing out that forcible
changes of type aren't harmful if you remove the "gotchas" of loss
of information with coercions that aren't intended to do so.

Regards,
Nick Maclaren,
University of Cambridge Computing Service,
New Museums Site, Pembroke Street, Cambridge CB2 3QH, England.
Email:  [EMAIL PROTECTED]
Tel.:  +44 1223 334761Fax:  +44 1223 334679
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] Floor division

2007-01-23 Thread Nick Maclaren

"Jim Jewett" <[EMAIL PROTECTED]> wrote:
>
> >... I can work with and teach almost any model,
> > and have done so with some pretty weird ones.
>
> I think python's model is "Whatever your other tools use.  Ask them."
> And I think that is a reasonable choice.

Answer:  It's undefined.  Just because you have tested your code
today doesn't mean it will work tomorrow, or on a different set of
values (however similar), or that it will give the same answer
every time you do the same operation on the same input, or that the
effects will be limited to wrong answers and stray exceptions.

Still think that it is reasonable?

> > Some people write Python that is intended to be robust and portable;
> > it is those people who suffer.
> 
> If your users stick to sensible inputs, then it doesn't matter which
> model you used.

Sigh.  Let's step back a step.  Who decides when inputs are sensible?
And where is it documented?  Answers:  God alone knows, and nowhere.

One of Python's general principles is that its operations should
either do roughly what a reasonable user would expect, or it will
raise an exception.  It doesn't always get there, but it isn't bad.
What you are saying is that is undesirable.

The old Fortran and C model of saying that any user error can cause
any effect (including nasal demons) is tolerable only if there is
agreement on what IS an error, and there is some way for a user to
find that out.  In the case of C, neither is true.

> If not, there is no way to get robust and portable; it is just a
> matter of which users you annoy.

Well, actually, there is.  Though I agree that the techniques have
rather been forgotten in the past 30 years.  Python implements more
of them than most languages.

Regards,
Nick Maclaren,
University of Cambridge Computing Service,
New Museums Site, Pembroke Street, Cambridge CB2 3QH, England.
Email:  [EMAIL PROTECTED]
Tel.:  +44 1223 334761Fax:  +44 1223 334679
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] compex numbers (was Floor Division)

2007-01-23 Thread Nick Maclaren

"Jim Jewett" <[EMAIL PROTECTED]> wrote:
> Tim Peters wrote:
> 
> >  complex_new() ends with:
> 
> >   cr.real -= ci.imag;
> >   cr.imag += ci.real;
> 
> > and I have no idea what that thinks it's doing.  Surely this isn't 
> > intended?!
> :
> 
> I think it is.  python.org/sf/1642844 adds comments to make it less unclear.

Agreed.

> 
> If "real" and "imag" are themselves complex numbers, then normalizing
> the result will move the imaginary portion of the "real" vector into
> the imaginary part and vice versa.

Not really.  What it does is to make complex(a,b) exactly equivalent
to a+1j*b.  For example:

>>> a = 1+2j
>>> b = 3+4j
>>> complex(a)
(1+2j)
>>> b*1j
(-4+3j)
>>> complex(a,b)
(-3+5j)

> Note that changing this (to discard the imaginary parts) would break
> passing complex numbers to their own constructor.

Eh?  Now, I am baffled.  There are several ways of changing it, all
of which would turn one bizarre behaviour into another - or would
raise an exception.  Personally, I would do the following:

complex(a) would permit a to be complex.

complex(a,b) would raise an exception if either a or b were complex.

But chacun a son gout (accents omitted).

Regards,
Nick Maclaren,
University of Cambridge Computing Service,
New Museums Site, Pembroke Street, Cambridge CB2 3QH, England.
Email:  [EMAIL PROTECTED]
Tel.:  +44 1223 334761Fax:  +44 1223 334679
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Problem with signals in a single threaded application

2007-01-24 Thread Nick Maclaren

On Tue, Jan 23, 2007, Ulisses Furquim wrote:
>
> I've read some threads about signals in the archives and I was under
> the impression signals should work reliably on single-threaded
> applications. Am I right?  I've thought about a way to fix this, but I
> don't know what is the current plan for signals support in python, so
> what can be done?

This one looks like an oversight in Python code, and so is a bug,
but it is important to note that signals do NOT work reliably under
any Unix or Microsoft system.  Inter alia, all of the following are
likely to lead to lost signals:

Two related signals received between two 'checkpoints' (i.e. when
the signal is tested and cleared).  You may only get one of them,
and 'related' does not mean 'the same'.

A second signal received while the first is being 'handled' by the
operating system or language run-time system.

A signal sent while the operating system is doing certain things to
the application (including, sometimes, when it is swapped out or
deep in I/O.)

And there is more, some of which can cause program misbehaviour or
crashes.  You are also right that threading makes the situation a
lot worse.

Obviously, Unix and Microsoft systems depend on signals, so you
can't simply regard them as hopelessly broken, but you can't assume
that they are RELIABLE.  All code should be designed to cope with
the case of signals getting lost, if at all possible.  Defending
yourself against the other failures is an almost hopeless task,
but luckily they are extremely rare except on specialist systems.

Regards,
Nick Maclaren,
University of Cambridge Computing Service,
New Museums Site, Pembroke Street, Cambridge CB2 3QH, England.
Email:  [EMAIL PROTECTED]
Tel.:  +44 1223 334761Fax:  +44 1223 334679
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] Complex constructors [was Re: Floor division]

2007-01-24 Thread Nick Maclaren

Gareth McCaughan <[EMAIL PROTECTED]> wrote:
> 
> ...  The question is whether
> it makes sense to define complex(a,b) = a+ib for all a,b
> or whether the two-argument form is always in practice going
> to be used with real numbers[1]. If it is, which seems pretty
> plausible to me, then changing complex() to complain when
> passed two complex numbers would (1) notify users sooner
> when they have errors in their programs, (2) simplify the
> code, and (3) avoid the arguably broken behaviour Tim was
> remarking on, where complex(-0.0).real is +0 instead of -0.
> 
> [1] For the avoidance of ambiguity: "real" is not
> synonymous with "double-precision floating-point".

Precisely.  On this matter, does anyone know of an application
where making that change would harm anything?  I cannot think of
a circumstance under which the current behaviour adds any useful
function over the one that raises an exception if there are two
arguments and either is complex.

Yes, of course, SOME people will find it cool to write complex(a,b)
when they really mean a+1j*b, but 

Regards,
Nick Maclaren,
University of Cambridge Computing Service,
New Museums Site, Pembroke Street, Cambridge CB2 3QH, England.
Email:  [EMAIL PROTECTED]
Tel.:  +44 1223 334761Fax:  +44 1223 334679
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Floor division

2007-01-25 Thread Nick Maclaren

Armin Rigo <[EMAIL PROTECTED]> wrote:
>
> Thanks for the clarification.  Yes, it makes sense that __mod__,
> __divmod__ and __floordiv__ on float and decimal would eventually follow
> the same path as for complex (where they make even less sense and
> already raise a DeprecationWarning).

Yes.  Though them not doing so would also make sense.  The difference
is that they make no mathematical sense for complex, but the problems
with float are caused by floating-point (and do not occur for the
mathematical reals).

There is an argument for saying that divmod should return a long
quotient and a float remainder, which is what C99 has specified for
remquo (except that it requires only the last 3 bits of the quotient
for reasons that completely baffle me).  Linux misimplemented that
the last time I looked.

Personally, I think that it is bonkers, as it is fiendishly expensive
compared to its usefulness - especially with Decimal!  But it isn't
obviously WRONG.

Regards,
Nick Maclaren,
University of Cambridge Computing Service,
New Museums Site, Pembroke Street, Cambridge CB2 3QH, England.
Email:  [EMAIL PROTECTED]
Tel.:  +44 1223 334761Fax:  +44 1223 334679
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Floor division

2007-01-26 Thread Nick Maclaren

"Guido van Rossum" <[EMAIL PROTECTED]> wrote:
>
> "(int)float_or_double" truncates in C (even in K&R C) /provided that/
> the true result is representable as an int.  Else behavior is
> undefined (may return -1, may cause a HW fault, ...).

Actually, I have used Cs that didn't, but haven't seen any in over
10 years.  C90 is unclear about its intent, but C99 is specific that
truncation is towards zero.  This is safe, at least for now.

> So Python uses C's modf() for float->int now, which is always defined
> for finite floats, and also truncates.

Yes.  And that is clearly documented and not currently likely to
change, as far as I know.

Regards,
Nick Maclaren,
University of Cambridge Computing Service,
New Museums Site, Pembroke Street, Cambridge CB2 3QH, England.
Email:  [EMAIL PROTECTED]
Tel.:  +44 1223 334761Fax:  +44 1223 334679
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Floor division

2007-01-26 Thread Nick Maclaren

"Tim Peters" <[EMAIL PROTECTED]> wrote:
>
> It could, but who would have a (sane) use for a possibly 2000-bit quotient?

Well, the 'exact rounding' camp in IEEE 754 seem to think that there
is one :-)

As you can gather, I can't think of one.  Floating-point is an inherently
inaccurate representation for anything other than small integers.

> This is a bit peculiar to me, because there are ways to compute
> "remainder" using a number of operations proportional to the log of
> the exponent difference.  It could be that people who spend their life
> doing floating point forget how to work with integers ;-)

Aargh!  That is indeed the key!  Given that I claim to know something
about integer arithmetic, too, how can I have been so STUPID?  Yes,
you are right, and that is the only plausible way to calculate the
remainder precisely.  You don't get the quotient precisely, which is
what my (insane) specification would have provided.

I would nitpick with your example, because you don't want to reduce
modulo 3.14 but modulo pi and therefore the modular arithmetic is
rather more expensive (given Decimal).  However, it STILL doesn't
help to make remquo useful!

The reason is that pi is input only to the floating-point precision,
and so the result of remquo for very large arguments will depend
more on the inaccuracy of pi as input than on the mathematical
result.  That makes remquo totally useless for the example you quote.

Yes, I have implemented 'precise' range reduction, and there is no
substitute for using an arbitrary precision pi value :-(

> > But it isn't obviously WRONG.
>
> For floats, fmod(x, y) is exactly congruent to x modulo y -- I don't
> think it's possible to get more right than exactly right ;-)

But, as a previous example of yours pointed out, it's NOT exactly
right.  It is also supposed to be in the range [0,y) and it isn't.
-1%1e100 is mathematically wrong on two counts.
1a
Cc: "Tim Peters" <[EMAIL PROTECTED]>

Regards,
Nick Maclaren,
University of Cambridge Computing Service,
New Museums Site, Pembroke Street, Cambridge CB2 3QH, England.
Email:  [EMAIL PROTECTED]
Tel.:  +44 1223 334761Fax:  +44 1223 334679
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Floor division

2007-01-26 Thread Nick Maclaren

"Tim Peters" <[EMAIL PROTECTED]> wrote:
>
> [Tim (misattributed to Guido)]

Apologies to both!

> > C90 is unclear about its intent,
>
> But am skeptical of that.  I don't have a copy of C90 here, but before
> I wrote that I checked Kernighan & Ritchie's seminal C book, Harbison
> & Steele's generally excellent "C: A Reference Manual" (2nd ed), and a
> web version of Plauger & Brodie's "Standard C":
>
>  http://www-ccs.ucsd.edu/c/
> 
> They all agree that the Cs they describe (all of which predate C99)
> convert floating to integral types via truncation, when possible.

I do.  Kernighan & Ritchie's seminal C book describes the Unix style
of "K&R" C - one of the reasons that ANSI/ISO had to make incompatible
changes was that many important PC and embedded Cs differed.  Harbison
and Steele is generally reliable, but not always; I haven't looked at
the last, but I would regard it suspiciously.

What C90 says is:

When a value of floating type is converted to integer type, the
fractional part is discarded.

There is other wording, but none relevant to this issue.  Now, given
the history of floating-point remainder, that is seriously ambiguous.

> > but C99 is specific that truncation is towards zero.
>
> As opposed to what?  Truncation away from zero?  I read "truncation"
> as implying toward 0, although the Plauger & Brodie source is explicit
> about "the integer part of X, truncated toward zero" for the sake of
> logic choppers ;-)

Towards -infinity, of course.  That was as common as truncation towards
zero up until the 1980s.  It was near-universal on twos complement
floating-point systems, and not rare on signed magnitude ones.  During
the standardisation of C90, the BSI tried to explain to ANSI that this
needed spelling out, but were ignored.  C99 did not add the normative
text "(i.e., the value is truncated toward zero)" because there was
no ambiguity, after all!

Regards,
Nick Maclaren,
University of Cambridge Computing Service,
New Museums Site, Pembroke Street, Cambridge CB2 3QH, England.
Email:  [EMAIL PROTECTED]
Tel.:  +44 1223 334761Fax:  +44 1223 334679
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Floor division

2007-01-26 Thread Nick Maclaren

"Tim Peters" <[EMAIL PROTECTED]> wrote:
>
> OTOH, I am a fan of analyzing FP operations as if the inputs were in
> fact exactly what they claim to be, which 754 went a long way toward
> popularizing.  That largely replaced mountains of idiosyncratic
> "probabilistic arguments" (and where it seemed no two debaters ever
> agreed on the "proper" approach)  with a common approach that
> sometimes allows surprisingly sharp analysis.  Since I spent a good
> part of my early career as a professional apologist for Seymour Cray's
> "creative" floating point, I'm probably much more grateful to leave
> sloppy arithmetic behind than most.

Well, I spent some of it working with code (and writing code) that
was expected to work, unchanged, on an ICL 1900, CDC 6600/7600,
IBM 370 and others.  I have seen the harm caused by the 'exact
arithmetic' mindset and so don't like it, but I agree about your
objections to the "probabilistic arguments" (which were and are
mostly twaddle).  But that is seriously off-topic.

> [remquo]  It's really off-topic for Python-Dev, so
> I didn't/don't want to belabor it.

Agreed, except in one respect.  I stand by my opinion that the C99
specification has no known PRACTICAL use (your example is correct,
but I know of no such use in a real application), and so PLEASE
don't copy it as a model for Python divmod/remainder.

> No, /Python's/ definition of mod is inexact for that example.  fmod
> (which is not Python's definition) is always exact:  fmod(-1, 1e100) =
> -1, and -1 is trivially exactly congruent to -1 modulo anything
> (including modulo 1e100).  The result of fmod(x, y) has the same sign
> as x; Python's x.__mod__(y) has the same sign as y; and that makes all
> the difference in the world as to whether the exact result is always
> exactly representable as a float.

Oops.  You're right, of course.

Regards,
Nick Maclaren,
University of Cambridge Computing Service,
New Museums Site, Pembroke Street, Cambridge CB2 3QH, England.
Email:  [EMAIL PROTECTED]
Tel.:  +44 1223 334761Fax:  +44 1223 334679
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] Python's C interface for types

2007-01-26 Thread Nick Maclaren

I have a fair amount of my binary floating-point model written,
though even of what I have done only some is debugged (and none
has been rigorously tested).  But I have hit some things that I
can't work out, and one query reduced comp.lang.python to a
stunned silence :-)

Note that I am not intending to do all the following, at least for
now, but I have had to restructure half a dozen times to match my
implementation requirements to the C interface (as I have learnt
more about Python!) and designing to avoid that is always good.

Any pointers appreciated.

I can't find any detailed description of the methods that I need
to provide.  Specifically:

Does Python use classic division (nb_divide) and inversion (nb_invert)
or are they entirely historical?  Note that I can very easily provide
the latter.

Is there any documentation on the coercion function (nb_coerce)?  It
seems to have unusual properties.

How critical is the 'numeric' property of the nb_hash function?  I
can certainly honour it, but is it worth it?

I assume that Python will call nb_richcompare if defined and 
nb_compare if not.  Is that right?

Are the inplace methods used and, if so, what is their specification?

I assume that I can ignore all of the allocation, deallocation and
attribute handling functions, as the default for a VAR object is
fine.  That seems to work.

Except for one thing!  My base type is static, but I create some
space for every derivation (and it can ONLY be used in derived form).
The space creation is donein C but the derivation in Python.  I
assume that I need a class (not instance) destructor, but what
should it do to free the space?  Call C to Py_DECREF it?

I assume that a class structure will never go away until after all
instances have gone away (unless I use Py_DECREF), so a C pointer
from an instance to something owned by the class is OK.

Is there any documentation on how to support marshalling/pickling
and the converse from C types?

I would quite like to provide some attributes.  They are 'simple'
but need code executing to return them.  I assume that means that
they aren't simple enough, and have to be provided as methods
(like conjugate).  That's what I have done, anyway.

Is there any obvious place for a reduction method to be hooked in?
That is a method that takes a sequence, all members of which must
be convertible to a single class, and returns a member of that
class.  Note that it specifically does NOT make sense on a single
value of that class.

Sorry about the length of this!


Regards,
Nick Maclaren,
University of Cambridge Computing Service,
New Museums Site, Pembroke Street, Cambridge CB2 3QH, England.
Email:  [EMAIL PROTECTED]
Tel.:  +44 1223 334761Fax:  +44 1223 334679
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python's C interface for types

2007-01-26 Thread Nick Maclaren

Thanks very much!  That answers most things.  Yes, I had got many
of my answers from searching the source, but there is clearly some
history there, and it isn't always clear what is current.  Here are
a few responses to the areas of confusion:

> nb_invert is used for bitwise inversion (~) and PyNumber_Invert(). It's not
> historical, it's actual.

Ah!  So it's NOT 1/x!  No relevant to floating-point, then.

> I don't recall ever seeing useful documentation on coerce() and nb_coerce.
> I suggest not to use it; it's gone in Python 3.0 anyway.

Excellent!  Task completed :-)

> Which numeric property? the fact that it returns a C long? Or that, for
> natural numbers, it *seems* to return self? 

The latter.  hash(123) == hash(123.0) for example.  It is a real
pain for advanced formats.  Making it the same for things that compare
equal isn't a problem.

> [inplace ] I assume your floating-point type is
> immutable, so you won't have to implement them.

I haven't done anything special to flag it as such, but it is.

> Where do you allocate this space, and how do you allocate it? If it's space
> you malloc() and store somewhere in the type struct, yecchh. You should not
> just allocate stuff at the end of the type struct, as the type struct's
> layout is not under your control (we actually extend the type struct as
> needed, which is why newer features end up in less logical places at the end
> of the struct ;) I would suggest using attributes of the type instead, with
> the normal Python refcounting. That means the 'extra space' has to be an
> actual Python object, though.

PyMem_Malloc.  I can certainly make it an attribute, as the overhead
isn't large for a per-class object.  It is just a block of mutable
memory, opaque to the Python layer, and NOT containing any pointers!

> I don't you can make your own type marshallable. For pickle it's more or
> less the same as for Python types. The pickle docs (and maybe
> http://www.python.org/dev/peps/pep-0307/) probably cover what you want to
> know. You can also look at one of the complexer builtin types that support
> pickling, like the datetime types.

The only documentation I have found is how to do it in Python.  Is
that what you mean?  I will look at the datetime types.

> You can use PyGetSetDef to get 'easy' attributes with getters and setters.
> http://docs.python.org/api/type-structs.html#l2h-1020

I was put off by some of the warnings.  I will revisit it.

> There's nothing I can think of that is a natural match for that in standard
> Python methods. I would suggest just making it a classmethod.
> (dict.fromkeysis a good example of a classmethod in C.)

Thanks.  That is a useful reference.  Reductions are a problem in
many languages.


Regards,
Nick Maclaren,
University of Cambridge Computing Service,
New Museums Site, Pembroke Street, Cambridge CB2 3QH, England.
Email:  [EMAIL PROTECTED]
Tel.:  +44 1223 334761Fax:  +44 1223 334679
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python's C interface for types

2007-01-26 Thread Nick Maclaren

Oops.  Something else fairly major I forgot to ask.  Python long.
I can't find any clean way of converting to or from this, and
would much rather not build a knowledge of long's internals into
my code.  Going via text is, of course, possible - but is not very
efficient, even using hex/octal.


Regards,
Nick Maclaren,
University of Cambridge Computing Service,
New Museums Site, Pembroke Street, Cambridge CB2 3QH, England.
Email:  [EMAIL PROTECTED]
Tel.:  +44 1223 334761Fax:  +44 1223 334679
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python's C interface for types

2007-01-26 Thread Nick Maclaren

Giovanni Bajo <[EMAIL PROTECTED]> wrote:
>
> I personally consider *very* important that hash(5.0) == hash(5) (and
> that 5.0 == 5, of course).

It gets a bit problematic with floating-point, when you can have
different values "exactly 5.0" and "approximately 5.0".  IEEE 754
has signed zeroes.  And so it goes.

Regards,
Nick Maclaren,
University of Cambridge Computing Service,
New Museums Site, Pembroke Street, Cambridge CB2 3QH, England.
Email:  [EMAIL PROTECTED]
Tel.:  +44 1223 334761Fax:  +44 1223 334679
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python's C interface for types

2007-01-26 Thread Nick Maclaren

Josiah Carlson <[EMAIL PROTECTED]> wrote:
>
> See _PyLong_FromByteArray and _PyLong_AsByteArray .

Oops!  Thanks very much.

Regards,
Nick Maclaren,
University of Cambridge Computing Service,
New Museums Site, Pembroke Street, Cambridge CB2 3QH, England.
Email:  [EMAIL PROTECTED]
Tel.:  +44 1223 334761Fax:  +44 1223 334679
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python's C interface for types

2007-01-26 Thread Nick Maclaren

Having looked into the answers a bit more deeply, I am afraid that
I am still a bit puzzled.

1) As I understand it, PyMem_Malloc won't cause trouble, but won't
be automatically freed, either, as it doesn't return a new reference.
I don't think that immediately following it by PyCObject_FromVoidPtr
(which is what I do) helps with that.  What I need is some standard
type that allows me to allocate an anonymous block of memory; yes,
I can define such a type, but that seems excessive.  Is there one?

2) _PyLong_FromByteArray and _PyLong_AsByteArray aren't in the API
and have no comments.  Does that mean that they are unstable, in the
sense that they may change behaviour in new versions of Python?
And will they be there in 3.0?

Thanks for any help, again.


Regards,
Nick Maclaren,
University of Cambridge Computing Service,
New Museums Site, Pembroke Street, Cambridge CB2 3QH, England.
Email:  [EMAIL PROTECTED]
Tel.:  +44 1223 334761Fax:  +44 1223 334679
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Problem with signals in a single threaded application

2007-01-27 Thread Nick Maclaren

I apologise for going off-topic, but this is an explanation of why
I said that signal handling is not reliable.  The only relevance to
Python is that Python should avoid relying on signals if possible,
and try to be a little defensive if not.  Signals will USUALLY do
what is expected, but not always :-(

Anything further by Email, please.

Greg Ewing <[EMAIL PROTECTED]> wrote:
> 
> > This one looks like an oversight in Python code, and so is a bug,
> > but it is important to note that signals do NOT work reliably under
> > any Unix or Microsoft system.
> 
> That's a rather pessimistic way of putting it. In my
> experience, signals in Unix mostly do what they're
> meant to do quite reliably -- it's just a matter of
> understanding what they're meant to do.

Yes, it is pessimistic, but I am afraid that my experience is that
it is so :-(  That doesn't deny your point that they MOSTLY do
'work', but car drivers MOSTLY don't need to wear seat belts, either.
I am talking about high-RAS objectives, and ones where very rare
failure modes can become common (e.g. HPC and other specialist uses).

More commonly, there are plain bugs in the implementations which are
sanctioned by the standards (Linux is relatively disdainful of such
legalistic games).  Because they say that everything is undefined
behaviour, many vendors' support mechanisms will refuse to accept
bug reports unless you push like hell.  And, as some are DIABOLICALLY
difficult to explain, let alone demonstrate, they can remain lurking
for years or decades.

> There may be bugs in certain systems that cause
> signals to get lost under obscure circumstances, but
> that's no reason for Python to make the situation
> worse by introducing bugs of its own.

100% agreed.

> > Two related signals received between two 'checkpoints' (i.e. when
> > the signal is tested and cleared).  You may only get one of them,
> > and 'related' does not mean 'the same'.
> 
> I wasn't aware that this could happen between
> different signals. If it can, there must be some
> rationale as to why the second signal is considered
> redundant. Otherwise there's a bug in either the
> design or the implementation.

Nope.  There is often a clash between POSIX and the hardware, or
a cause where a 'superior' signal overrides an 'inferior' one.
I have seen SIGKILL flush some other signals, for example.  And, on
some systems, SIGFPE may be divided into the basic hardware exceptions.
If you catch SIGFPE as such, all of those may be cleared.  I don't
think that many (any?) current systems do that.

And it is actually specified to occur for the SISSTOP, SIGTSTP,
SIGTTIN, SIGTTOU, SIGCONT group.

> > A second signal received while the first is being 'handled' by the
> > operating system or language run-time system.
> 
> That one sounds odd to me. I would expect a signal
> received during the execution of a handler to be
> flagged and cause the handler to be called again
> after it returns. But then I'm used to the BSD
> signal model, which is relatively sane.

It's nothing to do with the BSD model, which may be saner but still
isn't 100% reliable, but occurs at a lower layer.  At the VERY lowest
level, when a genuine hardware event causes an interrupt, the FLIH
(first-level interrupt handler) runs in God mode (EVERYTHING disabled)
until it classifies what is going on.  This is a ubiquitous misdesign
of modern hardware, but that is off-topic.  Hardware 'signals' from
other CPUs/devices may well get lost if they occur in that window.

And there are other, but less extreme, causes at higher levels in the
operating system.  Unix and Microsoft do NOT have a reliable signal
delivery model, where the sender of a signal checks if the recipient
has got it and retries if not.  Some operating systems do - but I don't
think that BSD does.

> > A signal sent while the operating system is doing certain things to
> > the application (including, sometimes, when it is swapped out or
> > deep in I/O.)
> 
> That sounds like an outright bug. I can't think
> of any earthly reason why the handler shouldn't
> be called eventually, if it remains installed and
> the process lives long enough.

See above.  It gets lost at a low level.  That is why you can cause
serious time drift on an "IBM PC" (most modern ones) by hammering
the video card or generating streams of floating-point fixups.  Most
people don't notice, because xntp or equivalent fixes it up.

And there are worse problems.  I could start on cross-CPU TLB and ECC
handling on large shared memory systems.  I managed to get an Origin
in a state where it wouldn't even power down from the power-off button,
and I had to flip bre

Re: [Python-Dev] Python's C interface for types

2007-01-27 Thread Nick Maclaren

=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?= <[EMAIL PROTECTED]> wrote:
>
> [not sure what "And so it goes" means in English]

I apologise.  I try to restrain myself from using excessive idiom,
but sometimes I forget.  It means "That is how things are, and there
is and will be more of the same."

> It may be a bit problematic to implement, but I think a clean
> specification is possible. If a and b are numbers, and a==b,
> then hash(a)==hash(b). I'm not sure whether "approximately 5.0"
> equals 5 or not: if it does, it should hash the same as 5,
> if it doesn't, it may or may not hash the same (whatever is
> easier to implement).
> For 0: hash(+0.0)==hash(-0.0)==hash(0)=hash(0L)=0

Unfortunately, that assumes that equality is transitive.  With the
advanced floating-point models, it may not be.  For example, if you
want to avoid the loss of error information, exact infinity and
approximate infinity (the result of overflow) have different
semantics.  Similarly with infinitesimals.

Even at present, Python's float (Decimal probably more so) doesn't
allow you to do some things that are quite reasonable.  For example,
let us say that I am implementing a special function and want to
distinguish -0.0 and +0.0.  Why can't I use a dictionary?

>>> a = float("+0.0")
>>> b = float("-0.0")
>>> print a, b
0.0 -0.0
>>> c = {a: "+0.0", b: "-0.0"}
>>> print c[a], c[b]
-0.0 -0.0

Well, we all know why.  But it is not what some quite reasonable
programmers will expect.  And Decimal (with its cohorts and variant
precisions) has this problem quite badly - as do I.

No, I don't have an answer.  You are damned if you do, and damned
if you don't.  It is an insoluble problem, and CURRENTLY doesn't
justify two hashing mechanisms (i.e. ANY difference and EQUALITY
difference).

Regards,
Nick Maclaren,
University of Cambridge Computing Service,
New Museums Site, Pembroke Street, Cambridge CB2 3QH, England.
Email:  [EMAIL PROTECTED]
Tel.:  +44 1223 334761Fax:  +44 1223 334679
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] Python's C interface for types

2007-02-01 Thread Nick Maclaren

"Jim Jewett" <[EMAIL PROTECTED]> wrote:
> 
> >> For 0: hash(+0.0)==hash(-0.0)==hash(0)=hash(0L)=0
> 
> > Unfortunately, that assumes that equality is transitive.
> 
> No, but the (transitively closed set of equivalent objects) must have
> the same hash.  ...

Er, how do you have a transitive closure for a non-transitive operation?

I really do mean that quite a lot of floating-point bells and whistles
are non-transitive.  The only one most people will have come across is
IEEE NaNs, where 'a is b' does not imply 'a == b', but there are a
lot of others (and have been since time immemorial).  I don't THINK
that IEEE 754R decimal introduces any, though I am not prepared to
bet on it.

> > let us say that I am implementing a special function and want to
> > distinguish -0.0 and +0.0.  Why can't I use a dictionary?
> 
> Because they are equal.  They aren't identical, but they are equal.

You have missed my point, which is extended floating-points effectively
downgrade the status of the purely numeric comparisons, and therefore
introduce a reasonable requirement for using a tighter match.  Note
that I am merely commenting that this needs bearing in mind, and NOT
that anything should be changed.

> >>>> a = float("+0.0")
> >>>> b = float("-0.0")
> >>>> print a, b
> >0.0 -0.0
> 
> With the standard windows distribution, I get just
> 
> 0.0 0.0

Watch that space :-)  Expect it to change.

Regards,
Nick Maclaren,
University of Cambridge Computing Service,
New Museums Site, Pembroke Street, Cambridge CB2 3QH, England.
Email:  [EMAIL PROTECTED]
Tel.:  +44 1223 334761Fax:  +44 1223 334679
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python's C interface for types

2007-02-01 Thread Nick Maclaren

=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?= <[EMAIL PROTECTED]> wrote:
>
> > I really do mean that quite a lot of floating-point bells and whistles
> > are non-transitive.
>
> If so, they just shouldn't use the equal operator (==). == ought to
> be transitive. It should be consistent with has().

Fine.  A very valid viewpoint.  Would you like to explain that to
the IEEE 754 people?

Strictly, it is only the reflexive property that IEEE 754 and the
Decimal module lack.  Yes, A == A is False, if A is a NaN.  But
the definition of 'transitive' often requires 'reflexive'.

>>> from decimal import *
>>> x = Decimal("NaN")
>>> x == x
False

I don't know any CURRENT systems where basic floating-point doesn't
have the strict transitive relation, but I wouldn't bet that there
aren't any.  You don't need to extend floating-point to have trouble;
even the basic forms often had it.  I sincerely hope that one is dead,
but people keep reinventing old mistakes :-(

The most common form was where comparison was equivalent to subtraction,
and there were numbers such that A-B == 0, B-C == 0 but A-C != 0.  That
could occur even for integers on some systems.  I don't THINK that the
Decimal specification has reintroduced this, but am not quite sure.

> > You have missed my point, which is extended floating-points effectively
> > downgrade the status of the purely numeric comparisons, and therefore
> > introduce a reasonable requirement for using a tighter match.  Note
> > that I am merely commenting that this needs bearing in mind, and NOT
> > that anything should be changed.
> 
> If introducing extended floating-points would cause trouble to existing
> operations, I think extended floating-points should not be introduced
> to Python. If all three of you really need them, come up with method
> names to express "almost equal" or "equal only after sunset".

Fine.  Again, a very valid viewpoint.  Would you like to explain it
to the IEEE 754, Decimal and C99 people, and the Python people who
think that tracking C is a good idea?

We already have the situation where A == B == 0, but where
'C op A' != 'C op B' != 'C op 0'.  Both where op is a built-in
operator and where 'C op' is a standard library function.

This one is NOT going to go away, and is going to get more serious,
especially if extended floating-point formats like Decimal take off.
Note that it is not a fault in Decimal, but a feature of almost all
extended floating-points.  As I said, I have no answer to it.

Regards,
Nick Maclaren,
University of Cambridge Computing Service,
New Museums Site, Pembroke Street, Cambridge CB2 3QH, England.
Email:  [EMAIL PROTECTED]
Tel.:  +44 1223 334761Fax:  +44 1223 334679
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python's C interface for types

2007-02-01 Thread Nick Maclaren

=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?= <[EMAIL PROTECTED]> wrote:
> 
> >> If so, they just shouldn't use the equal operator (==). == ought to
> >> be transitive. It should be consistent with has().
> > 
> > Fine.  A very valid viewpoint.  Would you like to explain that to
> > the IEEE 754 people?
> 
> Why should I? I don't talk about IEEE 754, I talk about Python.

The problem is that Python is increasingly assuming IEEE 754 by
implication, and you were stating something as a requirement that
isn't true in IEEE 754.

> > Strictly, it is only the reflexive property that IEEE 754 and the
> > Decimal module lack.  Yes, A == A is False, if A is a NaN.  But
> > the definition of 'transitive' often requires 'reflexive'.
> 
> I deliberately stated 'transitive', not 'reflexive'. The standard
> definition of 'transitive' is "if a==b and b==c then a==c".

When I was taught mathematics, the lecturer said that a transitive
relation is a reflexive one that has that extra property.  It was
then (and may still be) a fairly common usage.  I apologise for being
confusing!

> > The most common form was where comparison was equivalent to subtraction,
> > and there were numbers such that A-B == 0, B-C == 0 but A-C != 0.  That
> > could occur even for integers on some systems.  I don't THINK that the
> > Decimal specification has reintroduced this, but am not quite sure.
> 
> I'm not talking about subtraction, either. I'm talking about == and
> hash.

Grrk.  Look again.  So am I.  But let this one pass, as I don't think
that mistake will return - and I sincerely hope not!

> > Fine.  Again, a very valid viewpoint.  Would you like to explain it
> > to the IEEE 754, Decimal and C99 people, and the Python people who
> > think that tracking C is a good idea?
> 
> I'm not explaining anything. I'm stating an opinion.

You are, however, stating an opinion that conflicts with the direction
that Python is currently taking.

> It doesn't look like you *need* to give an answer now. I thought
> you were proposing some change to Python (although I'm uncertain
> what that change could have been). If you are merely explaining
> things (to whom?), just keep going.

Thanks.  I hope the above clarifies things a bit.  My purpose in
posting is to point out that some changes are already happening,
by inclusion from other standards, that are introducing problems
to Python.  And to many other languages, incidentally, including
Fortran and C.

Regards,
Nick Maclaren,
University of Cambridge Computing Service,
New Museums Site, Pembroke Street, Cambridge CB2 3QH, England.
Email:  [EMAIL PROTECTED]
Tel.:  +44 1223 334761Fax:  +44 1223 334679
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] Python's C interface for types

2007-02-02 Thread Nick Maclaren

"Jim Jewett" <[EMAIL PROTECTED]> wrote:
> 
> > Fine.  A very valid viewpoint.  Would you like to explain that to
> > the IEEE 754 people?
> 
> When Decimal was being argued, Tim pointed out that the standard
> requires certain operations, but doesn't require specific spelling
> shortcuts.  If you managed to do (and document) it right, people would
> be grateful for methods like
> 
> a.exactly(b)
> a.close_enough(b)
> a.same_expected_value(b)
> 
> but that doesn't mean any of them should be used when testing a==b

Hmm.  That is misleading, as you state it.  IEEE 754R doesn't include
specific spellings, but IEEE 754 assuredly does.  For example, it
states that the equality operator that delivers False for NaN = NaN
is spelled .EQ. in Fortran.

There was no C standard at the time, but the "ad hoc' spellings are
clearly intended for C-like languages, and C99 is very clear that
the above equality operator is spelled '=='.

However, there is no requirement that Python uses those names.  What
IS important is (a) that the comparisons are consistent, (b) that
IEEE 754 (and IEEE 754R) define no reflexivity-preserving equality
operator and (c) that the current float type derives its comparisons
from C.

> (In Lisp, you typically can specify which equality predicate a
> hashtable should use on pairs of keys; in python, you only specify
> which it should use on objects of your class, and if the other object
> in the comparison disagrees, you're out of luck.)

Yup.

> > Strictly, it is only the reflexive property that IEEE 754 and the
> > Decimal module lack.  Yes, A == A is False, if A is a NaN.
> 
> Therefore NaNs should never be used (in python) as dictionary keys.
> Therefore, they should be unhashable.

Again, a very valid point.  Are you suggesting a change? :-)

Currently, on my Linux system, Decimal raises an exception when trying
to hash a NaN value but float doesn't.  Is that a bug?

> Also note that PyObject_RichCompareBool (from Objects/object.c)
> assumes the reflexive property, and if you try to violate it, you will
> get occasional surprises.

Oh, yes, indeed!

> > We already have the situation where A == B == 0, but where
> > 'C op A' != 'C op B' != 'C op 0'.  Both where op is a built-in
> > operator and where 'C op' is a standard library function.
> 
> That's fine; it just means that numeric equality may not be the
> strongest possible equivalence.  hash in particular just happens to be
> defined in terms of ==, however == is determined.

NO!!!  What it means is that the equality operator may not be the
strongest numeric equivalence!  A much stronger statement.

As I said, I am not grinding an axe, and have no answers.

Regards,
Nick Maclaren,
University of Cambridge Computing Service,
New Museums Site, Pembroke Street, Cambridge CB2 3QH, England.
Email:  [EMAIL PROTECTED]
Tel.:  +44 1223 334761Fax:  +44 1223 334679
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] generic async io (was: microthreading vs. async io)

2007-02-15 Thread Nick Maclaren

[EMAIL PROTECTED] wrote:
>
> I think this discussion would be facilitated by teasing the first
> bullet-point from the latter two: the first deals with async IO, while
> the latter two deal with cooperative multitasking.
> 
> It's easy to write a single package that does both, but it's much harder
> to write *two* fairly generic packages with a clean API between them,
> given the varied platform support for async IO and the varied syntax and
> structures (continuations vs. microthreads, in my terminology) for
> multitasking.  Yet I think that division is exactly what's needed.

Hmm.  Now, please, people, don't take offence, but I don't know how
to phrase this tactfully :-(

The 'threading' approach to asynchronous I/O was found to be a BAD
IDEA back in the 1970s, was abandoned in favour of separating
asynchronous I/O from threading, and God alone knows why it was
reinvented - except that most of the people with prior experience
had died or retired :-(

Let's go back to the days when asynchronous I/O was the norm, and
I/O performance critical applications drove the devices directly.
In those days, yes, that approach did make sense.  But it rapidly
ceased to do so with the advent of 'semi-intelligent' devices and
the virtualisation of I/O by the operating system.  That was in
the mid-1970s.  Nowadays, ALL devices are semi-intelligent and no
system since Unix has allowed applications direct access to devices,
except for specialised HPC and graphics.

We used to get 90% of theoretical peak performance on mainframes
using asynchronous I/O from clean, portable applications, but it
was NOT done by treating the I/O as threads and controlling their
synchronisation by hand.  In fact, quite the converse!  It was done
by realising that asynchronous I/O and explicit threading are best
separated ENTIRELY.  There were two main models:

Streaming, as in most languages (Fortran, C, Python, but NOT in
POSIX).  The key properties here are that the transfer boundaries
have no significance, only heavyweight synchronisation primitives
(open, close etc.) provide any constraints on when data are actually
transferred and (for very high performance) buffers are unavailable
from when a transfer is started to when it is checked.  If copying
is acceptable, the last constraint can be dropped.

In the simple case, this allows the library/system to reblock and
perform transfers asynchronously.  In the more advanced case, the
application has to use multiple buffering (at least double), but
can get full performance without any form of threading.  IBM MVT
applications used to get up to 90% without hassle in parallel with
computation and using only a single thread (well, there was only a
single CPU, anyway).

The other model is transactions.  This has the property that there
is a global commit primitive, and the order of transfers is undefined
between commits.  Inter alia, it means that overlapping transfers
are undefined behaviour, whether in a single thread or in multiple
threads.  BSP uses this model.

The MPI-2 design team included a lot of ex-mainframe people and
specifies both models.  While it is designed for parallel applications,
the I/O per se is not controlled like threads.

Regards,
Nick Maclaren,
University of Cambridge Computing Service,
New Museums Site, Pembroke Street, Cambridge CB2 3QH, England.
Email:  [EMAIL PROTECTED]
Tel.:  +44 1223 334761Fax:  +44 1223 334679
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] generic async io (was: microthreading vs. async io)

2007-02-15 Thread Nick Maclaren

[EMAIL PROTECTED] wrote:
>
> Knowing the history of something like this is very helpful, but I'm not
> sure what you mean by this first paragraph.  I think I'm most unclear
> about the meaning of "The 'threading' approach to asynchronous I/O"?
> Its opposite ("separating asynchronous I/O from threading") doesn't
> illuminate it much more.  Could you elaborate?

I'll try.  Sorry about being unclear - it is one of my failings.
Here is an example draft of some interfaces:

Threading
-

An I/O operation passes a buffer, length, file and action and receives a
token back.  This token can be queried for completion, waited on and so
on, and is cancelled by waiting on it and getting a status back.  I.e.
it is a thread-like object.  This is the POSIX-style operation, and is
what I say cannot be made to work effectively.

Streaming
-

An I/O operation either writes some data to a stream or reads some data
from it; such actions are sequenced within a thread, but not between
threads (even if the threads coordinate their I/O).  Data written goes
into limbo until it is read, and there is no way for a reader to find
the block boundaries it was written with or whether data HAS been
written.  A non-blocking read merely tests if data are ready for
reading, which is not the same.

There are no positioning operations, and only open, close and POSSIBLY a
heavyweight synchronise or rewind (both equivalent to close+open) force
written data to be transferred.  Think of Fortran sequential I/O without
BACKSPACE or C I/O without ungetc/ungetchar/fseek/fsetpos.

Transactions

An I/O operation either writes some data to a file or reads some data
from it.  There is no synchronisation of any form until a commit.  If
two transfers between a pair of commits overlap (including file length
changes), the behaviour is undefined.  All I/O includes its own
positioning, and no positioning is relative.

Regards,
Nick Maclaren,
University of Cambridge Computing Service,
New Museums Site, Pembroke Street, Cambridge CB2 3QH, England.
Email:  [EMAIL PROTECTED]
Tel.:  +44 1223 334761Fax:  +44 1223 334679
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] generic async io (was: microthreading vs. async io)

2007-02-16 Thread Nick Maclaren

Greg Ewing <[EMAIL PROTECTED]> wrote:
>
> > An I/O operation passes a buffer, length, file and action and receives a
> > token back.
> 
> You seem to be using the word "threading" in a completely
> different way than usual here, which may be causing some
> confusion.

Not really, though I may have been unclear again.  Here is why that
approach is best regarded as a threading concept:

Perhaps the main current approach to using threads to implement
asynchronous I/O operates by the main threads doing just that, and
the I/O threads transferring the data synchronously.  The reason
that a token is needed is to avoid a synchronous data copy that
blocks both threads.

My general point is that all experience is that asynchronous I/O is
best done by separating it completely from threads, and defining a
proper asynchronous but NOT threaded interface.

Regards,
Nick Maclaren,
University of Cambridge Computing Service,
New Museums Site, Pembroke Street, Cambridge CB2 3QH, England.
Email:  [EMAIL PROTECTED]
Tel.:  +44 1223 334761Fax:  +44 1223 334679
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] Class destructor

2007-02-28 Thread Nick Maclaren

I am gradually making progress with my binary floating-point software,
but have had to rewrite several times as I have forgotten most of the
details of how to do it!  After 30 years, I can't say I am surprised.

But I need to clean up workspace when a class (not object) is
deallocated.  I can't easily use attributes, as people suggested,
because there is no anonymous storage built-in type.  I could subvert
one of the existing storage types (buffer, string etc.), but that is
unclean.  And I could write one, but that is excessive.

So far, I have been unable to track down how to get something called
when a class is destroyed.  The obvious attempts all didn't work, in
a variety of ways.  Surely there must be a method?  This could be in
either Python or C.

Thanks.


Regards,
Nick Maclaren,
University of Cambridge Computing Service,
New Museums Site, Pembroke Street, Cambridge CB2 3QH, England.
Email:  [EMAIL PROTECTED]
Tel.:  +44 1223 334761Fax:  +44 1223 334679
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Class destructor

2007-02-28 Thread Nick Maclaren

"Phillip J. Eby" <[EMAIL PROTECTED]> wrote:
> 
> >But I need to clean up workspace when a class (not object) is
> >deallocated.  I can't easily use attributes, as people suggested,
> >because there is no anonymous storage built-in type.  I could subvert
> >one of the existing storage types (buffer, string etc.), but that is
> >unclean.  And I could write one, but that is excessive.
> >   
> >So far, I have been unable to track down how to get something called
> >when a class is destroyed.  The obvious attempts all didn't work, in
> >a variety of ways.  Surely there must be a method?  This could be in
> >either Python or C.
> 
> Have you tried a PyCObject?  This is pretty much what they're for:

Oh, yes, I use them in several places, but they don't really help.

Their first problem is that they take a 'void *' and not a request
for space, so I have to allocate and deallocate the space manually.
Now, I could add a destructor to each of them and do that, but it
isn't really much prettier than subverting one of the semi-generic
storage types for an improper purpose!

It would be a heck of a lot cleaner to deallocate all of my space
in exactly the converse way that I allocate and initialise it.  It
would also all me to collect and log statistics, should I so choose.
This could be VERY useful for tuning!  I haven't done that, yet, but
might well do so.

All in all, what I need is some way to get a callback when a class
object is destroyed.  Well, actually, any time from its last use
for object work and the time that its space is reclaimed - I don't
need any more precise time than that.

I suppose that I could add a C object as an attribute that points to
a block of memory that contains copies of all my workspace pointers,
and use the object deallocator to clean up.  If all else fails, I
will try that, but it seems a hell of a long way round for what I
would have thought was a basic requirement.

Regards,
Nick Maclaren,
University of Cambridge Computing Service,
New Museums Site, Pembroke Street, Cambridge CB2 3QH, England.
Email:  [EMAIL PROTECTED]
Tel.:  +44 1223 334761Fax:  +44 1223 334679
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Class destructor

2007-02-28 Thread Nick Maclaren

"Phillip J. Eby" <[EMAIL PROTECTED]> wrote:
>
> Well, you could use a custom metaclass with a tp_dealloc or whatever.

Yes, I thought of that, but a custom metaclass to provide one callback
is pretty fair overkill!

> But I just mainly meant that a PyCObject is almost as good as a weakref
> for certain purposes -- i.e. it's got a pointer and a callback.

Ah.  Yes.  Thanks for suggesting it - if it is the simplest way, I
may as well do it.

> You could of course also use weak references, but that's a bit more
> awkward as well.

Yes.  And they aren't a technology I have used (in Python), so I would
have to find out about them.  Attributes etc. I have already played
with.

Regards,
Nick Maclaren,
University of Cambridge Computing Service,
New Museums Site, Pembroke Street, Cambridge CB2 3QH, England.
Email:  [EMAIL PROTECTED]
Tel.:  +44 1223 334761Fax:  +44 1223 334679
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Class destructor

2007-03-02 Thread Nick Maclaren

"Guido van Rossum" <[EMAIL PROTECTED]> wrote:
>
> Can you explain the reason for cleaning up in this scenario? Are you
> rapidly creating and destroying temporary class objects? Why can't you
> rely on the regular garbage collection process? Or does you class
> create an external resource like a temp file?

Effectively the latter.  The C level defines a meta-class, which is
instantiated with a specific precision, range etc. to derive the class
that can actually be used.  There can be an arbitrary number of such
derived classes, with different properties.  Very like Decimal, but
with the context as part of the derived class.

The instantiation creates quite a lot of constants and scratch space,
some of which are Python objects but others of which are just Python
memory (PyMem_Malloc); this is where an anonymous storage built-in
type would be useful.  The contents of these are of no interest to
any Python code, and even the objects are ones which mustn't be
accessed by the exported interfaces.

Also, on efficiency grounds, all of those need to be accessible by
C pointers from the exported class.  Searching by name every time they
are needed is far too much overhead.  Note that, as with Decimal, the
issue is that they are arbitrary sized and therefore can't simply be
put in the class structure.

Now, currently, I have implemented the suggestion of using the callback
on the C object that points to the structure that contains the pointers
to all of those.  I need to investigate it in more detail, because I
have had mixed success - that could well be the result of another bug
in my code, so let's not worry about it.

In THIS case, I am now pretty sure that I don't need any more, but I
can imagine classes where it wouldn't be adequate.  In particular,
THIS code doesn't need to do anything other than free memory, so I
don't care whether the C object attribute callback is called before
or after the class object is disposed of.  But that is obviously not
the case in general.

Regards,
Nick Maclaren,
University of Cambridge Computing Service,
New Museums Site, Pembroke Street, Cambridge CB2 3QH, England.
Email:  [EMAIL PROTECTED]
Tel.:  +44 1223 334761Fax:  +44 1223 334679
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Class destructor

2007-03-02 Thread Nick Maclaren

Sorry about a second message, but I mentioned this aspect earlier,
and it is semi-independent.

If I want to produce statistics, such as the times spent in various
operations, I need a callback when the class is disposed of.  Now,
that WOULD be inconvenient to use the C object attribute callback,
unless I could be sure that would be called while the class structure
is still around.  That could be resolved by taking a copy, of course,
but that is messy.

This also relates to one of my problems with the callback.  I am not
being called back if the class is still live at program termination;
ones that have had their use counts drop to zero do cause a callback,
but not ones whose use count is above zero.  I am not sure whether
this is my error or a feature of the garbage collector.

If the latter, it doesn't matter from the point of view of freeing
space, but is assuredly a real pain for producing statistics.  I
haven't looked into it, as it is not an immediate task.


Regards,
Nick Maclaren,
University of Cambridge Computing Service,
New Museums Site, Pembroke Street, Cambridge CB2 3QH, England.
Email:  [EMAIL PROTECTED]
Tel.:  +44 1223 334761Fax:  +44 1223 334679
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] except Exception as err, tb [was: with_traceback]

2007-03-02 Thread Nick Maclaren

"Jim Jewett" <[EMAIL PROTECTED]> wrote:
> Guido van Rossum wrote:
> 
> > Since this can conceivably be going on in parallel in multiple
> > threads, we really don't ever want to be sharing whatever object
> > contains the head of the chain of tracebacks since it mutates at every
> > frame bubble-up.
> 
> So (full) exceptions can't be unitary objects.
> 
> In theory, raising an already-instantiated instance could indicate "no
> traceback", which could make pre-cooked exceptions even lighter.

Grrk.  I think that this is right, but the wrong way to think of it!

If we regard a kind of exception as a class, and an actual occurrence
as an instance, things become a lot cleaner.  The class is very
simple, because all it says is WHAT happened - let's say divide by
zero, or an attempt to finagle an object of class chameleon.

The instance contains all of the information about the details, such
as the exact operation, the values and the context (including the
traceback).  It CAN'T be an object, because it is not 'assignable'
(i.e. a value) - it is inherently bound to its context.  You can
turn it into an object by copying its context into an assignable form,
but the actual instance is not assignable.

This becomes VERY clear when you try to implement advanced exception
handling - rare nowadays - including the ability to trap exceptions,
fix up the failure and continue (especially in a threaded environment).
This makes no sense whatsoever in another context, and it becomes
clear that the action of turning an instance into an object disables
the ability to fix up the exception and continue.  You can still
raise a Python-style exception (i.e. abort up to the closest handler),
but you can't resume transparently.

I have implemented such a system, IBM CEL was one, and VMS had/has
one.  I don't know of any in the Unix or Microsoft environments, but
there may be a few in specialised areas.

Harking back to your point, your "already-instantiated instance"
is actually an object derived directly from the exception class, and
everything becomes clear.  Because it is an object, any context it
includes was a snapshot and is no longer valid.  In your case, you
would want it to have "context: unknown".

Regards,
Nick Maclaren,
University of Cambridge Computing Service,
New Museums Site, Pembroke Street, Cambridge CB2 3QH, England.
Email:  [EMAIL PROTECTED]
Tel.:  +44 1223 334761Fax:  +44 1223 334679
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Access to bits for a PyLongObject

2007-03-06 Thread Nick Maclaren

=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?= <[EMAIL PROTECTED]> wrote:
> Eric V. Smith schrieb:
> > I'm working on PEP 3101, Advanced String Formatting.  About the only 
> > built-in numeric formatting I have left to do is for converting a 
> > PyLongOjbect to binary.
> >
> > I need to know how to access the bits in a PyLong.  
>
> I think it would be a major flaw in PEP 3101 if you really needed it.
> The long int representation should be absolutely opaque - even the
> fact that it is a sign+magnitude representation should be hidden.

Well, it depends on the level for which PEP 3101 is intended.  I had
the same problem, and was pointed at _PyLong_AsByteArray, which was
all I needed.  In my case, going though a semi-generic formatter
would not have been an acceptable interface (performance).

However, if PEP 3101 is intended to be a higher level of formatting,
then I agree with you.  So I have nailed my colours firmly to the
fence :-)

Regards,
Nick Maclaren,
University of Cambridge Computing Service,
New Museums Site, Pembroke Street, Cambridge CB2 3QH, England.
Email:  [EMAIL PROTECTED]
Tel.:  +44 1223 334761Fax:  +44 1223 334679
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Status of thread cancellation

2007-03-19 Thread Nick Maclaren

Grrk.  I have done this myself, and been involved in one of the VERY
few commercial projects that attempted to do it properly (IBM CEL,
the other recent one being VMS).  I am afraid that there are a lot
of misapprehensions here.

Several people have said things like:

> The thing to model this on, I think, would be the
> BSD sigmask mechanism, which lets you selectively
> block certain signals to create a critical section
> of code. A context manager could be used to make
> its use easier and less error-prone (i.e. harder
> to block async exceptions and then forget to unblock
> them).

No, no, no!  That is an TRULY horrible!  It works fairly well for
things like device drivers, which are both structurally simple and
with no higher level recovery mechanism, so that a failure turning
into a hard hang is not catastrophic.  But it is precisely what you
DON'T want for complex applications, especially when a thread may
need to call an external service 'non-interruptibly'.

Think of updating a complex object in a multi-file database, for
example.  Interrupting half-way through leaves the database in a
mess, but blocking interrupts while (possibly remote) file updates
complete is asking for a hang.  You also see it in horrible GUI
(including raw mode text) programs that won't accept interrupts
until you have completed the action they think you have started.
One of the major advantages of networked systems is that you can
usually log in remotely and kill -9 the damn process!



The way that I, IBM and DEC approached it was by the classic
callback mechanism, with a carefully designed way of promoting
unhandled exceptions/interrupts.  For example, the following is
roughly what I did (somewhat extended, as I didn't do all of this
for all exceptions):

An event set a defined flag, which could be tested (and cleared) by
the thread.  If a second, similar event arrived (or it was not
handled after a suitable time), the event was escalated.

If so, a handler was called that HAD to return (again within a
specific time).  If a second, similar event arrived or it didn't
return by a suitable time, the event was escalated.

If so, another handler was called that COULDN'T return.  If another
event arrived, it returned, or it failed to close down the thread,
the event was escalated.

If so, the thread's built-in environment was closed down without
giving the thread a chance to intervene.  If that failed, the event
was escalated.

If so, the thread was frozen and process termination started.  If
clean termination failed, the event was escalated.

If so, the run-time system produced a dump and killed itself.



You can implement a BSD-style ignore by having an initial handler
that just clears the flag and returns, but a third interrupt before
it does so will force close-down.  There was also a facility to
escalate an exception at the point of generation, which could be
useful for forcible closedown.

There are a zillion variations of the above, but all mainframe
experience is that callbacks are the only sane way to approach the
problem IN APPLICATIONS.  In kernel code, that is not so, which is
why so many of the computer scientists design BSD-style handling
(i.e. they think of kernel programming rather than very complex
application programming).



> Unconditionally killing a whole process is no big
> problem because all the resources it's using get
> cleaned up by the OS, and the effect on other
> processes is minimal and well-defined (pipes and
> sockets get EOF, etc.). But killing a thread can
> leave the rest of the program in an awkward state.

I wish that were so :-(

Sockets, terminals etc. are stateful devices, and killing a process
can leave them in a very unclean state.  It is one of the most
common causes of unkillable processes (the process can't go until
its files do, and the socket is jammed).  Many people can witness
the horrible effects of ptys being left in 'echo off' or worse
states, the X focus being left in a stuck override redirect window
and so on.

But you also have the multi-file database problem, which also applies
to shared memory segments.  Even if the process dies cleanly, it may
be part of an application whose state is global across many processes.
One common example is adding or deleting a user, where an unclean
kill can leave the system in a very weird state.


Regards,
Nick Maclaren,
University of Cambridge Computing Service,
New Museums Site, Pembroke Street, Cambridge CB2 3QH, England.
Email:  [EMAIL PROTECTED]
Tel.:  +44 1223 334761Fax:  +44 1223 334679
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Status of thread cancellation

2007-03-19 Thread Nick Maclaren

Jon Ribbens <[EMAIL PROTECTED]> wrote:
> 
> Can you elaborate on this? You can get zombie entries in the process
> table if nobody's called 'wait()' on them, and you can (extremely
> rarely) get unkillable process in 'disk-wait' state (usually due to
> hardware failure or a kernel bug, I suspect), but I've never heard
> of a process on a Unix-like system being unkillable due to something
> to do with sockets (or any other kind of file descriptor for that
> matter). How could a socket be 'jammed'? What does that even mean?

Well, I have seen it hundreds of times on a dozen different Unices;
it is very common.  You don't always SEE the stuck process - sometimes
the 'kill -9' causes the pid to become invisible to ps etc., and
just occasionally it can continue to use CPU until the system is
rebooted.  That is rare, however, and it normally just hangs onto
locks, memory and other such resources.  Very often its vampiric
status is visible only because such things haven't been freed,
or when you poke through kernel structures.

Sockets get jammed because they are used to connect to subprocesses
or kernel threads, which in turn access unreliable I/O devices.  If
there is a glitch on the device, the error recovery very often fails
to work, cleanly, and may wait for an event that will never occur
or go into a loop (usually a sleep/poll loop).  Typically, a HIGHER
level then times out the failing error recovery, so that the normal
programmer doesn't notice.  But it very often fails to kill the
lower level code.

As far as applications are concerned, a jammed socket is one where
the higher level recovery has NOT done that, and is waiting for the
lower level to complete - which it isn't going to do!

The other effect that ordinary programmers notice is a system very
gradually starting to run down after days/weeks/months of continual
operation.  The state is cleared by rebooting.

Regards,
Nick Maclaren.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

1 2 >

1 - 100 of 123 matches

Mail list logo