Re: [Python-Dev] Documentation idea

2008-10-16 Thread Raymond Hettinger

From: "Doug Hellmann" <[EMAIL PROTECTED]

This seems like a large undertaking.


Not necessarily.  It can be done incrementally, starting with things like str.split() that almost no one understands completely.  It 
should be put here and there where it adds some clarity.



I'm sure you're not  underestimating the effort, but I have the sense that you may be  overestimating the usefulness of the 
results (or maybe I'm  underestimating them through some lack of understanding).  Would it be  more optimal (in terms of both 
effort and results) to extend the  existing documentation and/or docstrings with examples that use all of  the functions so 
developers can see how to call them and what results  to expect?


The idea includes pure python code augmented by doctestable doctrings
with enough examples.  So, we're almost talking about the same thing.
There is one difference; since the new attribute is guaranteed to be
executable, it can be reliably run through doctest.  The same is *not* true
for arbitrary docstrings.


Raymond

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Documentation idea

2008-10-16 Thread Doug Hellmann


On Oct 16, 2008, at 5:11 PM, Raymond Hettinger wrote:


Raymond Hettinger wrote:
* It will assist pypy style projects and other python  
implementations

when they have to build equivalents to CPython.

* Will eliminate confusion about what functions were exactly  
intended to

do.

* Will confer benefits similar to test driven development where the
documentation and  pure python version are developed first and  
doctests

gotten to pass, then the C version is created to match.


I haven't seen anyone comment about this assertion of "equivalence".
Doesn't it strike you as difficult to maintain *two* versions of  
every

function, and ensure they match *exactly*?


Glad you brought this up.  My idea is to present rough equivalence
in unoptimized python that is simple and clear.  The goal is to  
provide

better documentation where code is more precise than English prose.
That being said, some subset of the existing tests should be runnable
against the rough equivalent and the python code should incorporate  
doctests.
Running both sets of test should suffice to maintain the rough  
equivalence.


This seems like a large undertaking.  I'm sure you're not  
underestimating the effort, but I have the sense that you may be  
overestimating the usefulness of the results (or maybe I'm  
underestimating them through some lack of understanding).  Would it be  
more optimal (in terms of both effort and results) to extend the  
existing documentation and/or docstrings with examples that use all of  
the functions so developers can see how to call them and what results  
to expect?


Doug

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Documentation idea

2008-10-16 Thread Raymond Hettinger

Raymond Hettinger wrote:

* It will assist pypy style projects and other python implementations
when they have to build equivalents to CPython.

* Will eliminate confusion about what functions were exactly intended to
do.

* Will confer benefits similar to test driven development where the
documentation and  pure python version are developed first and doctests
gotten to pass, then the C version is created to match.


I haven't seen anyone comment about this assertion of "equivalence".
Doesn't it strike you as difficult to maintain *two* versions of every
function, and ensure they match *exactly*?


Glad you brought this up.  My idea is to present rough equivalence
in unoptimized python that is simple and clear.  The goal is to provide
better documentation where code is more precise than English prose.
That being said, some subset of the existing tests should be runnable
against the rough equivalent and the python code should incorporate doctests.
Running both sets of test should suffice to maintain the rough equivalence.

The notion of exact equivalence should be left to PyPy folks who can attest
that the code can get convoluted when you try to simulate exactly when
error checking is performed, read-only behavior for attributes, and making
the stacktraces look the same when there are errors.  In contrast, my
goal is an approximation that is executable but highly readable and expository.

My thought is to do this only with tools where it really does enhance the
documentation.  The exercise is worthwhile in and of itself.  For example,
I'm working on a pure python version of str.split() and quickly determined
that the docs are *still* in error even after many revisions over the years
(the whitespace version does not, in fact, start by stripping whitespace
from both ends).  Here's what I have so far:

def split(s, sep=None, maxsplit=-1):
   """split(S, [sep [,maxsplit]]) -> list of strings

   Return a list of the words in the string S, using sep as the
   delimiter string.  If maxsplit is given, at most maxsplit
   splits are done. If sep is not specified or is None, any
   whitespace string is a separator and empty strings are removed
   from the result.

   >>> from itertools import product
   >>> s = ' 11   2  333  4  '
   >>> split(s, None)
   ['11', '2', '333', '4']
   >>> n = 8
   >>> for s in product('ab ', repeat=n):
   ... for maxsplit in range(-2, len(s)+2):
   ... s = ''.join(s)
   ... assert s.split(None, maxsplit) == split(s, None, maxsplit), namedtuple('Err', 'str maxsplit result target')(repr(s), 
maxsplit, split(s,None,maxsplit), s.split(None, maxsplit))


   """
   result = []
   spmode = True
   start = 0
   if maxsplit != 0:
   for i, c in enumerate(s):
   if spmode:
   if not c.isspace():
   start = i
   spmode = False
   elif c.isspace():
   result.append(s[start:i])
   start = i
   spmode = True
   if len(result) == maxsplit:
   break
   rest = s[start:].lstrip()
   return (result + [rest]) if rest else result

Once I have the cleanest possible, self-explantory code that passes tests, I'll improve the variable names and make a more sensible 
docstring with readable examples.  Surprisingly, it hasn't been a trivial exercise to come-up with an equivalent that corresponds 
more closely to the way we think instead of corresponding the C code -- I want to show *what* is does more than *how* it does it.



Raymond

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Documentation idea

2008-10-16 Thread Brett Cannon
On Thu, Oct 16, 2008 at 11:13 AM, Scott Dial
<[EMAIL PROTECTED]> wrote:
> Raymond Hettinger wrote:
>> * It will assist pypy style projects and other python implementations
>> when they have to build equivalents to CPython.
>>
>> * Will eliminate confusion about what functions were exactly intended to
>> do.
>>
>> * Will confer benefits similar to test driven development where the
>> documentation and  pure python version are developed first and doctests
>> gotten to pass, then the C version is created to match.
>
> I haven't seen anyone comment about this assertion of "equivalence".
> Doesn't it strike you as difficult to maintain *two* versions of every
> function, and ensure they match *exactly*?

More time-consuming than difficult. Raymond is currently talking about
things like built-ins and methods on types who do not exactly change
very often.

> The utility to PyPy-style
> projects is minimized if the two version aren't identical. And while
> it's possible to say, "the tests say they are equiavelent, so they are;"
> history is quite clear about people depending on "features" that are
> untested and were unintended side-effects of the manner in which
> something was implemented.

Right, and when we find out that there is a difference, we typically
standardize on a specific version and developers using the bogus
semantics switch.

> I think it would be a dilution of developer
> man-hours to force them to maintain two versions in lock-step, and it
> significantly adds to the burden of writing and reviewing potential
> bugfixes.
>

Well, I don't see this applying to every extension module in the
stdlib that does not already have a pure Python equivalent. This view
also assumes that if this position was taken people will continue to
write extension modules when they are not necessarily needed. If this
actually makes people to write more pure Python code over extension
modules I think that is a plus.

And Raymond, more than probably anyone, can address the overhead he
has faced in maintaining both the pure Python version of itertools in
the docs and the extension module.

> While I applaud the idea of documenting C functions in this manner,
> let's not confuse documentation with equivalence. If the standard
> distribution of Python exports the C version, then all bets are off
> whether the Python version is a drop-in replacement (even if the
> buildbots regularly test them).

Well, considering we have not even gotten far enough to actually do
this for the documentation case, I think worrying about equivalence
might be jumping the gun slightly as it is more work as you point out,
Scott.

But one thing about doing this is it might draw in the various
alternative VM folks to help maintain the Python code. If Jython,
IronPython, and/or PyPy actually use the Python code for themselves
then I suspect they would help with maintenance.

-Brett
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Documentation idea

2008-10-16 Thread Scott Dial
Raymond Hettinger wrote:
> * It will assist pypy style projects and other python implementations
> when they have to build equivalents to CPython.
> 
> * Will eliminate confusion about what functions were exactly intended to
> do.
> 
> * Will confer benefits similar to test driven development where the
> documentation and  pure python version are developed first and doctests
> gotten to pass, then the C version is created to match.

I haven't seen anyone comment about this assertion of "equivalence".
Doesn't it strike you as difficult to maintain *two* versions of every
function, and ensure they match *exactly*? The utility to PyPy-style
projects is minimized if the two version aren't identical. And while
it's possible to say, "the tests say they are equiavelent, so they are;"
history is quite clear about people depending on "features" that are
untested and were unintended side-effects of the manner in which
something was implemented. I think it would be a dilution of developer
man-hours to force them to maintain two versions in lock-step, and it
significantly adds to the burden of writing and reviewing potential
bugfixes.

While I applaud the idea of documenting C functions in this manner,
let's not confuse documentation with equivalence. If the standard
distribution of Python exports the C version, then all bets are off
whether the Python version is a drop-in replacement (even if the
buildbots regularly test them). I feel so strongly about this that I
think that the consideration of adding this should be frame /solely/ as
a documentation tool and nothing more.

-Scott

-- 
Scott Dial
[EMAIL PROTECTED]
[EMAIL PROTECTED]
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Documentation idea

2008-10-15 Thread Fernando Perez
Raymond Hettinger wrote:


> Bright idea
> --
> Let's go one step further and do this just about everywhere and instead of
> putting it in the docs, attach an exec-able string as an
> attribute to our C functions.  Further, those pure python examples should
> include doctests so that the user can see a typical invocation and calling
> pattern.
> 
> Say we decide to call the attribute something like ".python", then you
> could write something like:
> 
> >>> print(all.python)
>def all(iterable):
> '''Return True if all elements of the iterable are true.
> 

[...]

+1 from the peanut gallery, with a note: since ipython is a common way for
many to use/learn python interactively, if this is adopted, we'd
*immediately* add to ipython's '?' introspection machinery the ability to
automatically find this information.  This way, when people type "all?"
or "all??" we'd fetch the doc and source code.

A minor question inspired by this: would it make sense to split the
docstring part from the code of this .python object?  I say this because in
principle, the docstring should be the same of the 'parent', and it would
simplify our implementation to eliminate the duplicate printout. 
The .python object could always be a special string-like object made from
combining the pure python code with a single docstring, common to the C and
the Python versions, that would remain exec-able.

In any case, details aside I think this is great and if it comes to pass,
we'll be happy to make it readily accessible to interactive users via
ipython.

Cheers,

f

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Documentation idea

2008-10-10 Thread Brett Cannon
On Fri, Oct 10, 2008 at 9:46 PM, Terry Reedy <[EMAIL PROTECTED]> wrote:
> Brett Cannon wrote:
>>
>> On Fri, Oct 10, 2008 at 1:45 PM, Terry Reedy <[EMAIL PROTECTED]> wrote:
>
>>> The advantage of the decorator version is that the compiler or module
>>> loader
>>> could be special cased to recognize the 'C' decorator and try it first
>>> *before* using the Python version, which would serve as a backup.  There
>>> could be a standard version in builtins that people could replace to
>>> implement non-standard loading on a particular system.  To cater to other
>>> implementations, the name could be something other than 'C', or we could
>>> define 'C' to be the initial of "Code" (in the implementation language).
>>>  Either way, other implementation could start with a do-nothing "C"
>>> decorator and run the file as is, then gradually replace with lower-level
>>> code.
>>>
>>
>> The decorator doesn't have to require any special casing at all
>> (changing the parameters to keep the code short)::
>>
>>  def C(module_name, want):
>> def choose_version(ob):
>> try:
>>   module = __import__(module_name, fromlist=[want])
>>   return getattr(module, want)
>>  except (ImportError, AttributeError):
>>return ob
>>  return choose_version
>>
>> The cost is purely during importation of the module and does nothing
>> fancy at all and relies on stuff already available in all Python VMs.
>
> If I understand correctly, this decorator would only be applied *after* the
> useless Python level function object was created.

Yes.

>  I was proposing bypassing
> that step when not necessary, and I believe special casing *would* be
> required for that.

Yes, that would.

-Brett
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Documentation idea

2008-10-10 Thread Terry Reedy

Brett Cannon wrote:

On Fri, Oct 10, 2008 at 1:45 PM, Terry Reedy <[EMAIL PROTECTED]> wrote:



The advantage of the decorator version is that the compiler or module loader
could be special cased to recognize the 'C' decorator and try it first
*before* using the Python version, which would serve as a backup.  There
could be a standard version in builtins that people could replace to
implement non-standard loading on a particular system.  To cater to other
implementations, the name could be something other than 'C', or we could
define 'C' to be the initial of "Code" (in the implementation language).
 Either way, other implementation could start with a do-nothing "C"
decorator and run the file as is, then gradually replace with lower-level
code.



The decorator doesn't have to require any special casing at all
(changing the parameters to keep the code short)::

  def C(module_name, want):
 def choose_version(ob):
 try:
   module = __import__(module_name, fromlist=[want])
   return getattr(module, want)
  except (ImportError, AttributeError):
return ob
  return choose_version

The cost is purely during importation of the module and does nothing
fancy at all and relies on stuff already available in all Python VMs.


If I understand correctly, this decorator would only be applied *after* 
the useless Python level function object was created.  I was proposing 
bypassing that step when not necessary, and I believe special casing 
*would* be required for that.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Documentation idea

2008-10-10 Thread Brett Cannon
On Fri, Oct 10, 2008 at 1:45 PM, Terry Reedy <[EMAIL PROTECTED]> wrote:
> [EMAIL PROTECTED] wrote:
>>
>> On 9 Oct, 11:12 pm, [EMAIL PROTECTED] wrote:
>>>
>>> Background
>>> --
>>> In the itertools module docs, I included pure python equivalents for each
>>> of the C functions.  Necessarily, some of those equivalents are only
>>> approximate but they seem to have greatly enhanced the docs.
>>
>> Why not go the other direction?
>>
>> Ostensibly the reason for writing a module like 'itertools' in C is purely
>> for performance.  There's nothing that I'm aware of in that module which
>> couldn't be in Python.
>>
>> Similarly, cStringIO, cPickle, etc.  Everywhere these diverge, it is (if
>> not a flat-out bug) not optimal.  External projects are encouraged by a
>> wealth of documentation to solve performance problems in a similar way:
>> implement in Python, once you've got the interface right, optimize into C.
>>
>> So rather than have a C implementation, which points to Python, why not
>> have a Python implementation that points at C?  'itertools' (and similar)
>> can actually be Python modules, and use a decorator, let's call it "C", to
>> do this:
>>
>>   @C("_c_itertools.count")
>>   class count(object):
>>   """
>>   This is the documentation for both the C version of itertools.count
>>   and the Python version - since they should be the same, right?
>>   """
>
> The ancient string module did something like this, except that the rebinding
> of function names was done at the end by 'from _string import *' where
> _string had C versions of some but not all of the functions in string.  (And
> the list of replacements could vary by version and platform and compiler
> switches.)  This was great for documenting the string module.  It was some
> of the first Python code I studied after the tutorial.
>
> The problem with that and the above (with modification, see below) is the
> creation and discarding of unused function objects and the time required to
> do so.
>
> The advantage of the decorator version is that the compiler or module loader
> could be special cased to recognize the 'C' decorator and try it first
> *before* using the Python version, which would serve as a backup.  There
> could be a standard version in builtins that people could replace to
> implement non-standard loading on a particular system.  To cater to other
> implementations, the name could be something other than 'C', or we could
> define 'C' to be the initial of "Code" (in the implementation language).
>  Either way, other implementation could start with a do-nothing "C"
> decorator and run the file as is, then gradually replace with lower-level
> code.
>

The decorator doesn't have to require any special casing at all
(changing the parameters to keep the code short)::

  def C(module_name, want):
 def choose_version(ob):
 try:
   module = __import__(module_name, fromlist=[want])
   return getattr(module, want)
  except (ImportError, AttributeError):
return ob
  return choose_version

The cost is purely during importation of the module and does nothing
fancy at all and relies on stuff already available in all Python VMs.

-Brett
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Documentation idea

2008-10-10 Thread Terry Reedy

[EMAIL PROTECTED] wrote:


On 9 Oct, 11:12 pm, [EMAIL PROTECTED] wrote:

Background
--
In the itertools module docs, I included pure python equivalents for 
each of the C functions.  Necessarily, some of those equivalents are 
only approximate but they seem to have greatly enhanced the docs.


Why not go the other direction?

Ostensibly the reason for writing a module like 'itertools' in C is 
purely for performance.  There's nothing that I'm aware of in that 
module which couldn't be in Python.


Similarly, cStringIO, cPickle, etc.  Everywhere these diverge, it is (if 
not a flat-out bug) not optimal.  External projects are encouraged by a 
wealth of documentation to solve performance problems in a similar way: 
implement in Python, once you've got the interface right, optimize into C.


So rather than have a C implementation, which points to Python, why not 
have a Python implementation that points at C?  'itertools' (and 
similar) can actually be Python modules, and use a decorator, let's call 
it "C", to do this:


   @C("_c_itertools.count")
   class count(object):
   """
   This is the documentation for both the C version of itertools.count
   and the Python version - since they should be the same, right?
   """


The ancient string module did something like this, except that the 
rebinding of function names was done at the end by 'from _string import 
*' where _string had C versions of some but not all of the functions in 
string.  (And the list of replacements could vary by version and 
platform and compiler switches.)  This was great for documenting the 
string module.  It was some of the first Python code I studied after the 
tutorial.


The problem with that and the above (with modification, see below) is 
the creation and discarding of unused function objects and the time 
required to do so.


The advantage of the decorator version is that the compiler or module 
loader could be special cased to recognize the 'C' decorator and try it 
first *before* using the Python version, which would serve as a backup. 
 There could be a standard version in builtins that people could 
replace to implement non-standard loading on a particular system.  To 
cater to other implementations, the name could be something other than 
'C', or we could define 'C' to be the initial of "Code" (in the 
implementation language).  Either way, other implementation could start 
with a do-nothing "C" decorator and run the file as is, then gradually 
replace with lower-level code.


Terry Jan Reedy

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Documentation idea

2008-10-10 Thread Brett Cannon
On Thu, Oct 9, 2008 at 8:37 PM,  <[EMAIL PROTECTED]> wrote:
>
> On 9 Oct, 11:12 pm, [EMAIL PROTECTED] wrote:
>>
>> Background
>> --
>> In the itertools module docs, I included pure python equivalents for each
>> of the C functions.  Necessarily, some of those equivalents are only
>> approximate but they seem to have greatly enhanced the docs.
>
> Why not go the other direction?
>
> Ostensibly the reason for writing a module like 'itertools' in C is purely
> for performance.  There's nothing that I'm aware of in that module which
> couldn't be in Python.
>
> Similarly, cStringIO, cPickle, etc.  Everywhere these diverge, it is (if not
> a flat-out bug) not optimal.  External projects are encouraged by a wealth
> of documentation to solve performance problems in a similar way: implement
> in Python, once you've got the interface right, optimize into C.
>
> So rather than have a C implementation, which points to Python, why not have
> a Python implementation that points at C?  'itertools' (and similar) can
> actually be Python modules, and use a decorator, let's call it "C", to do
> this:
>
>   @C("_c_itertools.count")
>   class count(object):
>   """
>   This is the documentation for both the C version of itertools.count
>   and the Python version - since they should be the same, right?
>   """
>

And that decorator is generic enough to work for both classes and functions.

> In Python itself, the Python module would mostly be for documentation, and
> therefore solve the problem that Raymond is talking about, but it could also
> be a handy fallback for sanity checking, testing, and use in other Python
> runtimes (ironpython, jython, pypy).

Which is why I would love to make this almost a policy for new modules
that do not have any C dependency.

>  Many third-party projects already use
> ad-hoc mechanisms for doing this same thing, but an officially-supported way
> of saying "this works, but the optimized version is over here" would be a
> very useful convention.
>
> In those modules which absolutely require some C stuff to work, the python
> module could still serve as documentation.
>

Add to this some function to help test both the pure Python and C
implementation, like ``py_version, c_version =
import_versions('itertools', '_c_itertools')``, so you can run the
test suite against both versions, and you then have pretty much
everything covered for writing the code in Python to start and
optimizing as needed in C.

-Brett
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Documentation idea

2008-10-09 Thread Jared Grubb
This is a really interesting idea. If extra memory/lookup overhead is  
a concern, you could enable this new feature by default when the  
interactive interpreter is started (where it's more likely to be  
invoked), and turn it off by default when running scripts/modules.


Jared

On 9 Oct 2008, at 20:37, [EMAIL PROTECTED] wrote:



On 9 Oct, 11:12 pm, [EMAIL PROTECTED] wrote:

Background
--
In the itertools module docs, I included pure python equivalents  
for each of the C functions.  Necessarily, some of those  
equivalents are only approximate but they seem to have greatly  
enhanced the docs.


Why not go the other direction?

Ostensibly the reason for writing a module like 'itertools' in C is  
purely for performance.  There's nothing that I'm aware of in that  
module which couldn't be in Python.


Similarly, cStringIO, cPickle, etc.  Everywhere these diverge, it is  
(if not a flat-out bug) not optimal.  External projects are  
encouraged by a wealth of documentation to solve performance  
problems in a similar way: implement in Python, once you've got the  
interface right, optimize into C.


So rather than have a C implementation, which points to Python, why  
not have a Python implementation that points at C?  'itertools' (and  
similar) can actually be Python modules, and use a decorator, let's  
call it "C", to do this:


  @C("_c_itertools.count")
  class count(object):
  """
  This is the documentation for both the C version of  
itertools.count

  and the Python version - since they should be the same, right?
  """

In Python itself, the Python module would mostly be for  
documentation, and therefore solve the problem that Raymond is  
talking about, but it could also be a handy fallback for sanity  
checking, testing, and use in other Python runtimes (ironpython,  
jython, pypy).  Many third-party projects already use ad-hoc  
mechanisms for doing this same thing, but an officially-supported  
way of saying "this works, but the optimized version is over here"  
would be a very useful convention.


In those modules which absolutely require some C stuff to work, the  
python module could still serve as documentation.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/jared.grubb%40gmail.com


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Documentation idea

2008-10-09 Thread glyph


On 9 Oct, 11:12 pm, [EMAIL PROTECTED] wrote:

Background
--
In the itertools module docs, I included pure python equivalents for 
each of the C functions.  Necessarily, some of those equivalents are 
only approximate but they seem to have greatly enhanced the docs.


Why not go the other direction?

Ostensibly the reason for writing a module like 'itertools' in C is 
purely for performance.  There's nothing that I'm aware of in that 
module which couldn't be in Python.


Similarly, cStringIO, cPickle, etc.  Everywhere these diverge, it is (if 
not a flat-out bug) not optimal.  External projects are encouraged by a 
wealth of documentation to solve performance problems in a similar way: 
implement in Python, once you've got the interface right, optimize into 
C.


So rather than have a C implementation, which points to Python, why not 
have a Python implementation that points at C?  'itertools' (and 
similar) can actually be Python modules, and use a decorator, let's call 
it "C", to do this:


   @C("_c_itertools.count")
   class count(object):
   """
   This is the documentation for both the C version of 
itertools.count

   and the Python version - since they should be the same, right?
   """

In Python itself, the Python module would mostly be for documentation, 
and therefore solve the problem that Raymond is talking about, but it 
could also be a handy fallback for sanity checking, testing, and use in 
other Python runtimes (ironpython, jython, pypy).  Many third-party 
projects already use ad-hoc mechanisms for doing this same thing, but an 
officially-supported way of saying "this works, but the optimized 
version is over here" would be a very useful convention.


In those modules which absolutely require some C stuff to work, the 
python module could still serve as documentation.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Documentation idea

2008-10-09 Thread Raymond Hettinger

Yes, I'm looking a couple of different approaches to loading the strings.
For now though, I want to focus on the idea itself, not the implementation.
The important thing is to gather widespread support before getting into
the details of how the strings get loaded.


Raymond



- Original Message - 
From: "Lisandro Dalcin" <[EMAIL PROTECTED]>


Have you ever considered the same approach for docstrings in C code?
As reference, NumPy already has some trickery for maintaining
docstrings outside C sources. Of course, descriptors would be a far
better for implementing and support this in core Python and other
projects...


 This keeps the C build from getting fat.  More


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Documentation idea

2008-10-09 Thread Lisandro Dalcin
On Thu, Oct 9, 2008 at 8:50 PM, Raymond Hettinger <[EMAIL PROTECTED]> wrote:
> [Christian Heimes]
>>
>> The idea sounds great!
>>
>> Are you planing to embed the pure python code in C code?
>
> Am experimenting with a descriptor that fetches the attribute string from a
> separate text file.

Have you ever considered the same approach for docstrings in C code?
As reference, NumPy already has some trickery for maintaining
docstrings outside C sources. Of course, descriptors would be a far
better for implementing and support this in core Python and other
projects...

>  This keeps the C build from getting fat.  More
> importantly, it let's us write the execable string in a more natural way (it
> bites to write C style docstrings using \n and trailing backslashes).  The
> best part is that people without C compilers can still submit patches to the
> text files.
>
>
> Raymond
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> http://mail.python.org/mailman/options/python-dev/dalcinl%40gmail.com
>



-- 
Lisandro Dalcín
---
Centro Internacional de Métodos Computacionales en Ingeniería (CIMEC)
Instituto de Desarrollo Tecnológico para la Industria Química (INTEC)
Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET)
PTLC - Güemes 3450, (3000) Santa Fe, Argentina
Tel/Fax: +54-(0)342-451.1594
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Documentation idea

2008-10-09 Thread Brett Cannon
On Thu, Oct 9, 2008 at 4:12 PM, Raymond Hettinger <[EMAIL PROTECTED]> wrote:
[SNIP]
> Bright idea
> --
> Let's go one step further and do this just about everywhere and instead of
> putting it in the docs, attach an exec-able string as an attribute to our C
> functions.  Further, those pure python examples should include doctests so
> that the user can see a typical invocation and calling pattern.
>
> Say we decide to call the attribute something like ".python", then you could
> write something like:
>
>   >>> print(all.python)
>  def all(iterable):
>   '''Return True if all elements of the iterable are true.
>
>   >>> all(isinstance(x, int) for x in [2, 4, 6.13, 8])
>   False
>   >>> all(isinstance(x, int) for x in [2, 4, 6, 8])
>   True
>   '''
>
>   for element in iterable:
>   if not element:
>return False
>   return True
>
> There you have it, a docstring, doctestable examples, and pure python
> equivalent all in one place.  And since the attribute is distinguished from
> __doc__, we can insist that the string be exec-able (something we can't
> insist on for arbitrary docstrings).
>

The idea is great. I assume the special file support is mostly for the
built-ins since extension modules can do what heapq does; have a pure
Python version people import and that code pulls in any supporting C
code.

As for an implementation, you could go as far as to have a flag in the
extension module that says, "look for Python equivalents" and during
module initialization find the file and pull it in. Although doing it
that way would not necessarily encourage people as much to start with
the pure Python version and then only do C equivalents when
performance or design requires it.

> Benefits
> 
>
> * I think this will greatly improve the understanding of tools like
> str.split() which have proven to be difficult to document with straight
> prose.  Even with simple things like any() and all(), it makes it
> self-evident that the functions have early-out behavior upon hitting the
> first mismatch.
>
> * The exec-able definitions and docstrings will be testable
>

And have some way to test both the Python and C version with the same
tests (when possible)?

-Brett
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Documentation idea

2008-10-09 Thread Raymond Hettinger

[Christian Heimes]

The idea sounds great!

Are you planing to embed the pure python code in C code?


Am experimenting with a descriptor that fetches the attribute string from a separate text file.  This keeps the C build from getting 
fat.  More importantly, it let's us write the execable string in a more natural way (it bites to write C style docstrings using \n 
and trailing backslashes).  The best part is that people without C compilers can still submit patches to the text files.



Raymond 


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Documentation idea

2008-10-09 Thread Christian Heimes

Raymond Hettinger wrote: lots of cool stuff!

The idea sounds great!

Are you planing to embed the pure python code in C code? That's going to 
increase the data segment of the executable. It should be possible to 
disable and remove the pure python example with a simple ./configure 
option and some macro magic. File size and in memory size is still 
critical for embedders.


Christian

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Documentation idea

2008-10-09 Thread Raymond Hettinger

Background
--
In the itertools module docs, I included pure python equivalents for each of the C functions.  Necessarily, some of those 
equivalents are only approximate but they seem to have greatly enhanced the docs.  Something similar is in the builtin docs for 
any() and all().  The new collections.namedtuple() factory function also includes a verbose option that prints a pure python 
equivalent for the generated class. And in the decimal module, I took examples directly from the spec and included them in 
doc-testable docstrings.  This assured compliance with the spec while providing clear examples to anyone who bothers to look at the 
docstrings.


For itertools docs, I combined those best practices and included sample calls in the pure-python code (see the current docs for 
itertools to see what I mean -- perhaps look at the docs for a new tool like itertools.product() or itertools.izip_longest() to see 
how useful it is).


Bright idea
--
Let's go one step further and do this just about everywhere and instead of putting it in the docs, attach an exec-able string as an 
attribute to our C functions.  Further, those pure python examples should include doctests so that the user can see a typical 
invocation and calling pattern.


Say we decide to call the attribute something like ".python", then you could 
write something like:

   >>> print(all.python)
  def all(iterable):
   '''Return True if all elements of the iterable are true.

   >>> all(isinstance(x, int) for x in [2, 4, 6.13, 8])
   False
   >>> all(isinstance(x, int) for x in [2, 4, 6, 8])
   True
   '''

   for element in iterable:
   if not element:
return False
   return True

There you have it, a docstring, doctestable examples, and pure python equivalent all in one place.  And since the attribute is 
distinguished from __doc__, we can insist that the string be exec-able (something we can't insist on for arbitrary docstrings).


Benefits


* I think this will greatly improve the understanding of tools like str.split() which have proven to be difficult to document with 
straight prose.  Even with simple things like any() and all(), it makes it self-evident that the functions have early-out behavior 
upon hitting the first mismatch.


* The exec-able definitions and docstrings will be testable

* It will assist pypy style projects and other python implementations when they 
have to build equivalents to CPython.

* We've gotten good benefits from doctests for pure python functions, why not 
extend this best practice to our C functions.

* The whole language will become more self-explanatory and self-documenting.

* Will eliminate confusion about what functions were exactly intended to do.

* Will confer benefits similar to test driven development where the documentation and  pure python version are developed first and 
doctests gotten to pass, then the C version is created to match.


* For existing code, this is a perfect project for people who want to start contributing to the language but aren't ready to start 
writing C (the should be able to read C however so that the equivalent really does match the C code).



Limits
-

* In some cases, there may be no pure python equivalent (i.e. sys.getsize()).

* Sometimes the equivalent can only be approximate because the actual C 
function is too complex (i.e. itertools.tee()).

* Some cases, like int(), are useful as a type, have multiple functions, and 
are hard to write as pure python equivalents.

* For starters, it probably only makes to do this for things that are more "algorithmic" like any() and all() or things that have a 
straight-forward equivalent like property() written using descriptors.



Premise
---

Sometimes pure python is more expressive, precise, and easy to read than English prose. 


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com