[Python-Dev] GC Changes

2007-09-30 Thread Justin Tulloss
Hello,

I've been doing some tests on removing the GIL, and it's becoming clear that
some basic changes to the garbage collector may be needed in order for this
to happen efficiently. Reference counting as it stands today is not very
scalable.

I've been looking into a few options, and I'm leaning towards the
implementing IBMs recycler GC
(http://www.research.ibm.com/people/d/dfb/recycler-publications.html
) since it is very similar to what is in place now from the users'
perspective. However, I haven't been around the list long enough to really
understand the feeling in the community on GC in the future of the
interpreter. It seems that a full GC might have a lot of benefits in terms
of performance and scalability, and I think that the current gc module is of
the mark-and-sweep variety. Is the trend going to be to move away from
reference counting and towards the mark-and-sweep implementation that
currently exists, or is reference counting a firmly ingrained tradition?

On a more immediately relevant note, I'm not certain I understand the full
extent of the gc module. From what I've read, it sounds like it's fairly
close to a fully functional GC, yet it seems to exist only as a
cycle-detecting backup to the reference counting mechanism. Would somebody
care to give me a brief overview on how the current gc module interacts with
the interpreter, or point me to a place where that is done? Why isn't the
mark-and-sweep mechanism used for all memory management?

Thanks a lot!

Justin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [python] Re: New lines, carriage returns, and Windows

2007-09-30 Thread Guido van Rossum
On 9/30/07, Greg Ewing <[EMAIL PROTECTED]> wrote:
> Michael Foord wrote:
> > We stick to using the .NET file I/O and so don't
> > have a problem. The only time it is an issue for us is our tests, where
> > we have string literals in our test code (where new lines are obviously
> > '\n')
>
> If you're going to do that, you really need to be consistent
> about and have IronPython use \r\n internally for line endings
> *everywhere*, including string literals.

I don't know what you mean by "internally". There's lots of portable
code that uses the \n character in string literals (either to generate
line endings or to recognize them). That code can't suddenly be made
invalid. And changing all string literals that say "\n" to secretly
become "\r\n" would be worse than the \r <--> \n swap that some old
Apple tools used to do. (If len("\n") == 2, what would len("\r\n")
be?)

> > It is just slightly ironic that the time Python 'gets it wrong' (for
> > some value of wrong) is when you are using text mode for I/O :-)
>
> I would say IronPython is getting it wrong by using inconsistent
> internal representations of line endings.

Honestly, I find it hard to see much merit in this discussion. A
number of Python libraries, including print() and io.py, use \n to
represent line endings in memory, and translate these to/from
platform-appropriate line endings when reading/writing text files.
OTOH, some other APIs, for example, sockets talking various internet
protocols (from SMTP to HTTP) as well as most (all?) native .NET APIs,
use \r\n to represent line endings. There are any number of ways to
convert between these conversions, including various invocations of
s.replace() and s.splitlines() (the latter does a
universal-newlines-like thing). Applications can take care of this,
and APIs can choose to use either convention for line endings (or
both, in the case of input).

Yes, occasionally users get confused. Too bad. They'll have to learn
about this issue. The issue isn't going away by wishing it to go away;
it is a fundamental difference between Windows and Unix, and neither
is likely to change or disappear. Changing Python to use the Windows
convention internally isn't going to help one bit. Changing Python to
use the platforn's convention is impossible without introducing a new
string escape that would mean \r\n on Windows and \n on Unix; and
given that there are legitimate reasons to sometimes deal with \r\n
explicitly even on Unix (and with just \n even on Windows) we wouldn't
be completely isolated from the issue. Changing APIs to not represent
the line ending as a character (as the Java I/O libraries do) would be
too big a change (and how would we distinguish between readline()
returning an empty line and EOF?) -- and I'm sure the issue still pops
up in plenty of places in Java.

The best solution for IronPython is probably to have the occasional
wrapper around .NET APIs that translates between \r\n and \n on the
boundary between Python and .NET; but one must be able to turn this
off or bypass the wrappers in cases where the data retrieved from one
.NET API is just passed straight on to another .NET API (and the
translation would just cause two redundant copies being made).

Get used to it. End of discussion.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [python] Re: New lines, carriage returns, and Windows

2007-09-30 Thread Greg Ewing
Michael Foord wrote:
> We stick to using the .NET file I/O and so don't 
> have a problem. The only time it is an issue for us is our tests, where 
> we have string literals in our test code (where new lines are obviously 
> '\n')

If you're going to do that, you really need to be consistent
about and have IronPython use \r\n internally for line endings
*everywhere*, including string literals.

> It is just slightly ironic that the time Python 'gets it wrong' (for 
> some value of wrong) is when you are using text mode for I/O :-)

I would say IronPython is getting it wrong by using inconsistent
internal representations of line endings.

-- 
Greg Ewing, Computer Science Dept, +--+
University of Canterbury,  | Carpe post meridiem! |
Christchurch, New Zealand  | (I'm not a morning person.)  |
[EMAIL PROTECTED]  +--+
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] New lines, carriage returns, and Windows

2007-09-30 Thread Greg Ewing
[EMAIL PROTECTED] wrote:
> I've been thinking about this some more (in lieu of actually writing up any
> sort of proposal ;-) and I'm not so sure it would be all that useful.

Yes, despite being the one who suggested it, I've come to
the same conclusion myself. The problem should really be
addressed at the source, which is the Python/.NET boundary.
Anything else would just lead to ambiguity.

So I'm voting -1 on my own proposal here.

-- 
Greg Ewing, Computer Science Dept, +--+
University of Canterbury,  | Carpe post meridiem! |
Christchurch, New Zealand  | (I'm not a morning person.)  |
[EMAIL PROTECTED]  +--+
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] New lines, carriage returns, and Windows

2007-09-30 Thread Greg Ewing
Nick Maclaren wrote:
> I have implemented both of those two models
> on systems that are FAR more different than most people can imagine.
> Both work, and neither causes confusion.  The C/Unix/Python one does.

Now I'm not sure what *you* mean by the C/Unix/Python
model. As far as newlines are concerned, the internal
model is fine as far as I can see.

> a mismatch between
>>the world of Python strings which use "\n" and .NET
>>library code expecting strings which use "\r\n".
> 
> That's an I/O problem :-)

If you define passing a string to/from any .NET function
as I/O, I suppose that's true, but it's not what people
normally mean by the term.

> the REASON it causes trouble is the inconsistency
> in the basic C/Unix/Python text I/O model.  Let's consider just
> \f, \r and \n,

But we're not talking about \f or anything else here, only
newlines. I've never heard anyone complain about getting
confused over the handling of \f in Python. That may be
because nobody uses \f for anything these days, but whatever
the reason, it seems to be a non-issue.

In any case, it still doesn't mean that you "don't get
back what you wrote". If you write "\f\n" to a file using
Python and read it back, you get "\f\n". If you write just
"\f", you get back "\f". What the \f *means* is a separate
issue.

-- 
Greg Ewing, Computer Science Dept, +--+
University of Canterbury,  | Carpe post meridiem! |
Christchurch, New Zealand  | (I'm not a morning person.)  |
[EMAIL PROTECTED]  +--+
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] OpenSSL httplib bug

2007-09-30 Thread Brett Cannon
On 9/30/07, Richie Ward <[EMAIL PROTECTED]> wrote:
> I was using httplib to power my xml rpc script.
>
> I had problems when I wanted to use SSL and I got this error:
>   File "/usr/lib/python2.5/httplib.py", line 1109, in recv
> return self._ssl.read(len)
> socket.sslerror: (8, 'EOF occurred in violation of protocol')
>
> I figured out this was because of poor error handling in python.
>
> May I suggest this as a fix to this bug:
> $ diff /usr/lib/python2.5/httplib.py /usr/lib/python2.5/httplib.py~
> 1109,1112c1109
> < try:
> < return self._ssl.read(len)
> < except socket.sslerror:
> < return
> ---
> > return self._ssl.read(len)
>
> Just a note. I am by no means a python expert, just good enough to get
> my work done.
> I use Ubuntu gutsy.

If you could, Richie, please open a bug report at bugs.python.org and
attach your patch to it?  That way it won't get lost and it can be
attended to properly.

-Brett
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [python] Re: New lines, carriage returns, and Windows

2007-09-30 Thread Greg Ewing
Nick Maclaren wrote:
> I don't know PRECISELY what you mean by "universal newlines mode"

I mean precisely what Python means by the term: any of
"\r", "\n" or "\r\n" represent a newline, and no distinction
is made between them.

You only need to use that if you don't know what convention
is being used by the file you're reading. And if you don't
know that, you've already lost information about what the
contents of the file means, and there's nothing that any
I/O system can do to get it back.

> Model 1:  certain characters can be used only in combination.
 > ...

That's all fine if you know the file adheres to those
conventions. Just open it in binary mode and go for it.

The I/O systems of C and/or Python are designed for
environments where the files *don't* adhere to conventions
as helpful as that. They're making the best of what they're
given.

> Note that the above is what the program sees - what is written
> to the outside world and how input is read is another matter.
> 
> But I can assure you, from my own and many other people's experience,
> that neither of the above models cause the confusion being shown by
> the postings in this thread.

There's no confusion about how newlines are represented
*inside* a Python program. The convention is quite clear -
a newline is "\n" and only "\n". Confusion only arises when
people try to process strings internally that don't adhere
to that convention.

-- 
Greg Ewing, Computer Science Dept, +--+
University of Canterbury,  | Carpe post meridiem! |
Christchurch, New Zealand  | (I'm not a morning person.)  |
[EMAIL PROTECTED]  +--+
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] OpenSSL httplib bug

2007-09-30 Thread Richie Ward
I was using httplib to power my xml rpc script.

I had problems when I wanted to use SSL and I got this error:
  File "/usr/lib/python2.5/httplib.py", line 1109, in recv
return self._ssl.read(len)
socket.sslerror: (8, 'EOF occurred in violation of protocol')

I figured out this was because of poor error handling in python.

May I suggest this as a fix to this bug:
$ diff /usr/lib/python2.5/httplib.py /usr/lib/python2.5/httplib.py~
1109,1112c1109
< try:
< return self._ssl.read(len)
< except socket.sslerror:
< return
---
> return self._ssl.read(len)

Just a note. I am by no means a python expert, just good enough to get
my work done.
I use Ubuntu gutsy.
-- 
Thanks, Richie Ward
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [python] Re: New lines, carriage returns, and Windows

2007-09-30 Thread Nick Maclaren
Michael Foord <[EMAIL PROTECTED]> wrote:
> [EMAIL PROTECTED] wrote:
> 
> > Michael> Actually, I usually get these strings from Windows UI
> > Michael> components. A file containing '\r\n' is read in with '\r\n'
> > Michael> being translated to '\n'. New user input is added containing
> > Michael> '\r\n' line endings. The file is written out and now contains a
> > Michael> mix of '\r\n' and '\r\r\n'.
> >   
> > So you need a translation layer between the UI component and your code.
> > Treat the component as a text file and perform the desired mapping.  Yes?
> 
> Actually the problem was reported by one of the IronPython developers on 
> behalf of another user. We stick to using the .NET file I/O and so don't 
> have a problem. The only time it is an issue for us is our tests, where 
> we have string literals in our test code (where new lines are obviously 
> '\n') and we do a manual 'replace'. Not very difficult.
> 
> It is just slightly ironic that the time Python 'gets it wrong' (for 
> some value of wrong) is when you are using text mode for I/O :-)

Plus ca change, 

That has been the problem for as long as I have been using the byte
stream model (nearly 40 years now).  Provided that you can get
control, OR there are well-defined semantics, you can sort things
out.  The semantics "we define only the trivial case, and the
programmer must do something arcane, undefined and system-dependent
for the rest" means that it is impossible for an interface to do
the 'right' translation unless it knows what each side of it is
assuming.

As I say, there are solutions.


Regards,
Nick Maclaren,
University of Cambridge Computing Service,
New Museums Site, Pembroke Street, Cambridge CB2 3QH, England.
Email:  [EMAIL PROTECTED]
Tel.:  +44 1223 334761Fax:  +44 1223 334679
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [python] Re: New lines, carriage returns, and Windows

2007-09-30 Thread Michael Foord
[EMAIL PROTECTED] wrote:
> Michael> Actually, I usually get these strings from Windows UI
> Michael> components. A file containing '\r\n' is read in with '\r\n'
> Michael> being translated to '\n'. New user input is added containing
> Michael> '\r\n' line endings. The file is written out and now contains a
> Michael> mix of '\r\n' and '\r\r\n'.
>
> So you need a translation layer between the UI component and your code.
> Treat the component as a text file and perform the desired mapping.  Yes?
>
>   

Actually the problem was reported by one of the IronPython developers on 
behalf of another user. We stick to using the .NET file I/O and so don't 
have a problem. The only time it is an issue for us is our tests, where 
we have string literals in our test code (where new lines are obviously 
'\n') and we do a manual 'replace'. Not very difficult.

It is just slightly ironic that the time Python 'gets it wrong' (for 
some value of wrong) is when you are using text mode for I/O :-)

Michael

> Skip
>
>   

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [python] Re: New lines, carriage returns, and Windows

2007-09-30 Thread skip

Michael> Actually, I usually get these strings from Windows UI
Michael> components. A file containing '\r\n' is read in with '\r\n'
Michael> being translated to '\n'. New user input is added containing
Michael> '\r\n' line endings. The file is written out and now contains a
Michael> mix of '\r\n' and '\r\r\n'.

So you need a translation layer between the UI component and your code.
Treat the component as a text file and perform the desired mapping.  Yes?

Skip
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] New lines, carriage returns, and Windows

2007-09-30 Thread Nick Maclaren
[EMAIL PROTECTED] wrote:
> 
> I've been thinking about this some more (in lieu of actually writing up any
> sort of proposal ;-) and I'm not so sure it would be all that useful.  If
> you've opened a file in text mode you should only be writing newlines as
> '\n' anyway.  If you want to translate a text file imported from another
> system to use the current system's line ending just open both the input and
> output files in text mode.

I.e. at least \r, \f and \v are discouraged - i.e. system-dependent,
at best.  That works.

> With universal newlines mode for output, should writing '\r\n' result in one
> or two newlines (or one-and-a-half)?  Depending on the platform you can
> argue that it should write out '\r\r', '\r\n\r\n' or '\n\n' or if on Windows
> that it should be left alone as '\r\n'.  There is, of course, the current
> '\r\r\n' behavior as well.  I don't think there's obviously one best answer.

Quite.  And it has nothing to do with the format the outside system
uses - your first question is purely a matter of what the semantics
of the Python program are.  The question applies as much to zOS as
to any of the systems Python supports.

> If you want to do something esoteric, open the file in binary mode and do
> whatever you like.

Er, no.  That's the Unix mistake.  It works, provided two things are
true:

1) You don't need to write portable formatting.

2) The 'outside system' uses the control characters of a byte
stream for formatting.

Let's skip (1) - but (2) is universally true, nowadays, isn't it?
Er, no.  Consider reading and writing to an X window (NOT an xterm).
Such formatting is out-of-band (sorry, I used out-of-bound in a
previous posting).

Ouch.


Regards,
Nick Maclaren,
University of Cambridge Computing Service,
New Museums Site, Pembroke Street, Cambridge CB2 3QH, England.
Email:  [EMAIL PROTECTED]
Tel.:  +44 1223 334761Fax:  +44 1223 334679
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] New lines, carriage returns, and Windows

2007-09-30 Thread skip

Greg> Maybe there should be a universal newlines mode defined for output
Greg> as well as input, which translates any of "\r", "\n" or "\r\n"
Greg> into the platform line ending.

Skip> I'd be open to such a change.  Principle of least surprise?

Guido> The symmetry isn't as strong as you suggest, but I agree it would
Guido> be a useful feature. Would you mind filing a Py3k feature request
Guido> so we don't forget?

Guido> A proposal for an API given the existing newlines=... parameter
Guido> (described in detail in PEP 3116) would be even better.

I've been thinking about this some more (in lieu of actually writing up any
sort of proposal ;-) and I'm not so sure it would be all that useful.  If
you've opened a file in text mode you should only be writing newlines as
'\n' anyway.  If you want to translate a text file imported from another
system to use the current system's line ending just open both the input and
output files in text mode.

With universal newlines mode for output, should writing '\r\n' result in one
or two newlines (or one-and-a-half)?  Depending on the platform you can
argue that it should write out '\r\r', '\r\n\r\n' or '\n\n' or if on Windows
that it should be left alone as '\r\n'.  There is, of course, the current
'\r\r\n' behavior as well.  I don't think there's obviously one best answer.
If you want to do something esoteric, open the file in binary mode and do
whatever you like.

Skip
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] New lines, carriage returns, and Windows

2007-09-30 Thread Nick Maclaren
Greg Ewing <[EMAIL PROTECTED]> wrote:
> 
> I don't see how this is different from Unix/C "\n" being
> an atomic newline character.

Have you used systems with the I/O models I referred to (or ones
with newlines being out-of-bound data)?

> If you're saying that BCPL is better because it defines
> standard semantics for more control characters than just
> "\n", that may be true, but C is doing about the best it
> can with "\n" as far as I can see, given all the crazy
> things that different OSes want to do with line endings.

I am afraid that you are wrong - see my other posting for how
to do it better.  Look, I have implemented both of those two models
on systems that are FAR more different than most people can imagine.
Both work, and neither causes confusion.  The C/Unix/Python one does.

> In any case, the problem which started all this isn't
> really an I/O problem at all, it's a mismatch between
> the world of Python strings which use "\n" and .NET
> library code expecting strings which use "\r\n".

That's an I/O problem :-)

> The correct thing to do with that is to translate whenever
> a string crosses a boundary between Python code and
> .NET code. This is something that ought to be done
> automatically by the Python/.NET interfacing machinery,
> maybe by having a different type for .NET strings.

Agreed.  But the REASON it causes trouble is the inconsistency
in the basic C/Unix/Python text I/O model.  Let's consider just
\f, \r and \n, and a few questions:

Exactly what does a free-standing \f mean?

Does \n\f\n mean starting at the top of a page or one line down?

How do \r and \f interact with line-buffering?  Think about
MacOS here.

I could go on, but those are enough to indicate that the problem
is insoluble.  The answer "Undefined but not even explicitly
discouraged" is a recipe for confusion.


Regards,
Nick Maclaren,
University of Cambridge Computing Service,
New Museums Site, Pembroke Street, Cambridge CB2 3QH, England.
Email:  [EMAIL PROTECTED]
Tel.:  +44 1223 334761Fax:  +44 1223 334679
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [python] Re: New lines, carriage returns, and Windows

2007-09-30 Thread Nick Maclaren
Greg Ewing <[EMAIL PROTECTED]> wrote:
> 
> > Grrk.  That's the problem.  You don't get back what you have written
> 
> You do as long as you *don't* use universal newlines mode
> for reading. This is the best that can be done, because
> universal newlines are inherently ambiguous.

I don't know PRECISELY what you mean by "universal newlines mode",
and this issue is all about the details, so any response would
merely enhance the confusion.

> If you want universal newlines, you just have to accept
> that you can't also have \r characters meaning something
> other than newlines in your files. This is true regardless
> of what programming language or I/O model is being used.

No, that is not true, and I have used more than one model where
it wasn't.  Let's stick to models where newlines are special
characters - I prefer the ones where they are not, but that is
by the way.

Model 1:  certain characters can be used only in combination.
E.g. \f must occur immediately before (or after) a \n, which
it modifies.  r is either a newline-with-overprint or must be
associated with a \n.  In both cases, only ONE of the alternatives
is permitted in the chosen model - the other use then becomes an
error (and raises an exception).

Model 2: (BCPL) there are a variety of newline characters, \n for
plain newline, \f for newline-with-form-feed and \r for newline-
with-overprint.  ALL cause a newline, with the associated property.

Note that the above is what the program sees - what is written
to the outside world and how input is read is another matter.

But I can assure you, from my own and many other people's experience,
that neither of the above models cause the confusion being shown by
the postings in this thread.


Regards,
Nick Maclaren,
University of Cambridge Computing Service,
New Museums Site, Pembroke Street, Cambridge CB2 3QH, England.
Email:  [EMAIL PROTECTED]
Tel.:  +44 1223 334761Fax:  +44 1223 334679
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com