Is it your intent to push for more use of the abstract API instead of
the concrete APIs for all of Python's C data structures?  Current API
aside, are you advocating this approach for all new built-in types?
Would you argue that Python 3.0's C API be stripped of everything but
the abstract API and the bare essentials of the concrete API?  

If so, then I think this is extremely misguided.  C is not Python, and
while the abstract API is useful for some things, so is the concrete
API.

In fact, the Python C API's clarity, utility, completeness, and
discoverability has made Python one of the nicest languages to embed and
extend, and I see no reason to deviate from that for the sake of blind
TOOWTDI worship.  We have a rich tradition of providing both concrete
and abstract APIs at the C layer, and I think that's a good thing that
we should continue here.

On Mon, 2006-03-20 at 03:44 -0500, Raymond Hettinger wrote:

> PySet_Clear()
> -------------
> Use PyObject_CallMethod(s, "clear", NULL).
> 
> Or if you need to save a millisecond on an O(n) operation, use 
> PyNumber_InPlaceSubtract(s,s) as shown in the docs.  If the name bugs you, it 
> only takes a one-line macro to define a wrapper.  The set API should not be 
> cluttered with unnecessary and redundant functions.

This is a great example of what I'm talking about.  You lose some static
C compiler checks when you use either of these alternatives.  C is not
Python and we shouldn't try to make it so.

The documentation is much less concise too, and if macros are
encouraged, then every extension will invent their own name, further
reducing readability, or use the obvious choice of PySet_Clear() and
then question why Python doesn't provide this itself.

This also has a detrimental effect on debugging.  Macros suck for
debugging and going through all the abstract API layers sucks.  A nice,
clean, direct call is so much more embedder-friendly.

In addition, you essentially have all the pieces for PySet_Clear() right
there in front of you, so why not expose them to embedders and make
their lives easier?  Forcing them to go through the abstract API or use
obscure alternatives does not improve the API.  It seems a false economy
to not include concrete API calls just to end up back in setobject.c
after layers of indirection.

> PySet_Next()
> ------------
> This is also redundant.  The preferred way to iterate over a set should be 
> PyObject_GetIter(s).  The iter api is generic and works for all containers.  
> It 
> ought to be the one-way-to-do-it.

For the C API, I disagree for the reasons stated above.  In this
specific case, using the iterator API actually imposes more pain on
embedders because there are more things you have to keep track of and
that can go wrong.  PyDict_Next() is a very nice and direct API, where
you often don't have to worry about reference counting (borrowed refs in
this case are the right thing to return).  You also don't have to worry
about error conditions, and both of these things reduce bugs because it
usually means less code.  PySet_Next() would provide the same benefits.

I don't buy the safety argument against PyDict_Next()/PySet_Next()
because they are clearly documented as requiring no modification during
iteration.  Again, this is what I mean by useful concrete vs. abstract
APIs.  When you /know/ you have a set and you /know/ you won't be
modifying it, PySet_Next() is the perfect interface.  If you will be
modifying the set, or don't know what kind of sequence you have, then
the abstract API is the right thing to use.

> Further, it doesn't make sense to model this after the dictionary API where 
> the 
> next function is needed to avoid double lookups by returning pointers to both 
> the key and value fields at the same time (allowing for modification of the 
> value field).  In contrast, for sets, there is no value field to look-up or 
> mutate (the key should not be touched).  So, we shouldn't be passing around 
> pointers to the internal structure. I want to keep the internal structure of 
> sets much more private than they were for dictionaries -- all access should 
> be 
> through the provided C API functions -- that keeps the implementation 
> flexible 
> and allows for future improvements without worrying that we've broken code 
> for 
> someone who has touched the internal structure directly.

The implementation of PySet_Next() would not return setentrys, it would
return PyObjects.  Yes, those would be borrowed refs to setentry.keys,
but you still avoid direct access to internal structures. 

> Also, the _Next() api is not as safe as the _GetIter api which checks for 
> mutation during iteration.  The safety should not be tossed aside without 
> good 
> reason.
> 
> 
> PySet_Update()
> ---------------
> Use PyObject_CallMethod(s, "update", "O", iterable).  That is the preferred 
> way 
> to access all of the high volume methods.  

Again, I disagree, but I don't think I need to restate my reasons.

> Only the fine grained methods (like 
> contains, add, pop, or discard) have a need for a direct call.  Adding 
> unnecessary functions for the many-at-once methods gains you nothing -- 
> perhaps 
> saving a tiny O(1) look-up step in an O(n) operation.
> 
> FWIW, the same reasoning also applies to why the list API defines 
> PyList_Append() but not PyList_Extend().

Personally, I think that's a bug in the PyList C API.  I haven't
complained because I've rarely needed it, but it /is/ a deficiency.

> PySet_AsList()
> ---------------
> There is already a function expressly for this purpose, PySequence_List(s).

I'll grant you this one. ;)  Forget PySet_AsList().

I'll try to answer the rest of your message without repeating myself too
much. ;)

> As it stands now, it is possible to use sets in C programs and access them in 
> a 
> way that has a direct correspondence to pure Python code -- using 
> PyObject_CallMethod() for things we would usually access by name, using the 
> PyNumber API for things we would access using operators, using other parts of 
> the abstract API exactly as we would in Python (PyObject_Repr, 
> PyObject_GetIter, 
> PySequence_List, PyObject_Print, etc.), and using a handful of direct access 
> functions for the fine grained methods like (add, pop, contains, etc.).  IOW, 
> I 
> like the way the C code looks now and I like the minimal, yet complete API. 
> Let's don't muck it up.

This is where you and I disagree.  Again, C is not Python.  I actually
greatly dislike having to use things like PyObject_Call() for concrete
objects.  First, the C code does not look like Python at all, and is
actually /less/ readable because now you have to look in two places to
understand what the code does.  Second, it imposes much more pain when
debugging because of all the extra layers you have to step through.

But of course, with a rich concrete and abstract API, as most Python
types have, we both get to appease our aesthetic demons, and chose the
right tool for the job.

> FWIW, the C implementation in Py2.5 already provides nice speed-ups for many 
> operations.  Likewise, its memory requirements have been reduced by a third. 
> Try to enjoy the improvements without gilding the lily.

Let's embrace C and continue to make life easier for the C coder.  You
can't argue that going through all the rigamarole of the iterator API
would be faster than PySet_Next(), and it certainly won't be more
readable or easier to debug.  A foolish consistency, and all that...

Cheers,
-Barry

Attachment: signature.asc
Description: This is a digitally signed message part

_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to