Let me preface this reply with the concern that my level of competence, in this area, is insufficient. However, there are a number of folk 'here' who are 'into' Python's internals, and will (hopefully) jump-in...

Also, whilst we appear to be concentrating on understanding the content of a data-structure, have we adequately defined "length"?
(per msg title)

- total number of o/p lines (per paper.ref)
- total number of characters 'printed'
- total number of elements l-r (of any/all embedded data-structures)
- the number of elements in each embedded data-structure
- the number of characters displayed from each embedded d-s
- depth of data-structure t-d
- something else?


On 25/07/2020 10:52, Stavros Macrakis wrote:
dn, Thanks again.

For background, I come from C and Lisp hacking (one of the MIT developers of Macsyma <https://en.wikipedia.org/wiki/Macsyma>/Maxima <https://sourceforge.net/p/maxima/wiki/Home/>) and also play with R, though I haven't been a professional developer for many years. I know better than to Reply to a Digest -- sorry about that, I was just being sloppy.

Us 'silver-surfers' have to stick-together! Also in the seventies I decided Lisp was not for me...


The reason I wanted print-length limitation was that I wanted to get an overview of an object I'd created, which contains some very long lists. I expected that this was standard functionality that I simply couldn't find in the docs.

I'm familiar with writing pretty-printer ("grind") functions with string output (from way back: see section II.I, p. 12 <http://bitsavers.trailing-edge.com/pdf/mit/ai/aim/AIM-279.pdf>), but I'm not at all familiar with Python's type/class system, which is why I'm trying to understand it by playing with it.

I accept, one might say 'on faith', that in Python "everything is an object", and proceed from there. Sorry!

Similarly, I've merely accepted the limitations of pprint() - and probably use it less-and-less, as I become more-and-more oriented towards TDD...


I did try looking at the Python Standard Library docs, but I don't see where it mentions the superclasses of the numerics or of the collection types or the equivalent of *numberp*. If I use *type(4).__bases__*, I get just*(<class 'object'>,)*, which isn't very helpful. I suspect that that isn't the correct way of finding a class's superclasses -- what is?

If you haven't already, try:
- The Python Language Reference Manual (see Python docs)
        in particular "Data Model"
- PSL: Data Types = "types -- Dynamic type creation and names for built-in types"
- PSL: collections
- PSL: collections.abc

Another source of 'useful background' are PEPs (Python Enhancement Proposals). Note that some have been accepted and are part of the current-language - so the "proposal" part has become an historic record. In comparison: some have been rejected, and others are still under-discussion...

- PEP 0: an index
- PEP 3119 -- Introducing Abstract Base Classes
- PEP 3141 -- A Type Hierarchy for Numbers
- and no-doubt many more, which will keep you happily entertained, and save the members of your local flock of sheep from thinking that to you they are a mere number...


BTW, where do I look to understand the difference between *dir(type(4)) *(which does not include *__bases__*) and *type(4).__dir__(type(4)) *(which does)? According to Martelli (2017, p. 127), *dir(*x*)* just calls /*x*./*__dir__()*; but *type(4).__dir__() *=> ERR for me. Has this changed since 3.5, or is Martelli just wrong?

I don't know - and I'm not about to question Alex!


There's nothing else obvious in dir(0) or in dir(type(0)). After some looking around, I find that the base classes are not built-in, but need to be added with the *numbers* and *collections.abc *modules? That's a surprise!

Yes, to me there is much mystery in this (hence "faith", earlier).

Everything is a sub-class of object. When I need to differentiate, eg between a list and a dict; I either resort to isinstance() or back to the helpful table/taxonomy in collections.abc and hasattr() - thus a tuple is a Collection and a Sequence, but not a MutableSequence like a list. A set looks like a list until it comes to duplicate values or behaving as a Sequence. A dict is a MutableMapping, but as you say (below), when considered a Collection will only behave as a list of keys. So, we then chase the *View-s...


You suggested I try *pp.__builtins__.__dict__()* . I couldn't figure out what you meant by *pp* here (the module name *pprint*? the class *pprint.PrettyPrint*? the configured function *pprint.PrettyPrinter(width=20,indent=3).pprint*? none worked...). I finally figured out that you must have meant something like *pp=pprint.PrettyPrinter(width=80).print; pp(__builtins__.__dict__)*. Still not sure which attributes could be useful.

With apologies: "pp" is indeed pprint. The code-example should have been prefaced with:

    from pprint import pprint as pp

This is (?my) short-hand, whenever I'm using pprint within a module. (you will find our number-crunching friends referring to "np" rather than the full: "numpy", and similar...


    With bottom-up prototyping it is wise to start with the 'standard'
    cases! (and to 'extend' one-bite at a time)
Agreed! I started with lists, but couldn't figure out how to extend that to tuples and sets.  I was thinking I could build a list then convert it to a tuple or set. The core recursive step looks something like this:

      CONVERT( map( lambda i: limit(i, length, depth-1) , obj[0:length] ) + ( [] if len(obj) <= length else ['...'] ) )

... since map returns an iterator, not a collection of the same type as its input -- so how do I convert to the right result type (CONVERT)?

After discovering that typ.__new__(typ,obj) doesn't work for mutables, and thrashing for a while, I tried this:

       def convert(typ,obj):
            newobj = typ.__new__(typ,obj)
            newobj.__init__(obj)
            return newobj

which is pretty ugly, because the *__new__* initializer is magically ignored for a mutable (with no exception) and the *__init__* setter is magically ignored for an immutable (with no exception). But it seems to work....

Now, on to dictionaries! Bizarrely, the *list()* of a dictionary, and its iterator, return only the keys, not the key-value pairs. No problem! We'll create yet another special case, and use */set/.items()* (which for some reason doesn't exist for other collection types). And /mirabile dictu/, *convert *works correctly for that!:

    dicttype = type({'a':1})
    test = {'a':1,'b':2}
    convert(dicttype,test.items()) => {'a':1,'b':2}

So we're almost done. Now all we have to do is slice the result to the desired length:

    convert(dicttype,test.items()[0:1])      # ERR


But */dict./items() *is not sliceable. However, it /is/ iterable... but we need another count variable (or is there a better way?):


    c = 0
convert(dicttype, [ i for i in test.items() if (c:=c+1)<2 ])

Phew! That was a lot of work, and I'm left with a bunch of special cases, but it works. Now I need to understand from a Python guru what the Pythonic way of doing this is which /doesn't/ require all this ugliness.

(This doesn't really work for the original problem, because there's no way of putting "..." at the end of a dictionary object, but I still think I learned something about Python.)

I did take a look at the pprint source code, and could no doubt modify it to handle print-length, but at this point, I'm still trying to understand how Python code can be written generically. So I was disappointed to see that *_print_list, _print_tuple, *and*_print_set *are not written generically, but as three separate functions. I also wonder what the '({' case is supposed to cover.

A lot of questions -- probably based on a lot of misunderstandings!

...and any response severely limited by my competence in these topics, eg my assumption that the data would be presented with different brackets, ie the "({", according to type.

Recently, I offered a "Friday Finking" to the list, relating a Junior Programmer's wrestling with the challenge of expanding an existing API from scalar values to a choice between scalars and a tuple. (or was it a list - or does it really matter?) There seemed to be no suggestion beyond isinstance().

In this case, there will be a 'ladder' of if...elif...else clauses, and quite possibly needed in two places - parsing and printing. (The ref.paper talked of two passes, so...) PS there is talk of a case/switch which will handle class-distinction, but alas, not currently available (PEP 622?Python 3.10).

Is the challenge one of attempting to retain and represent the values within the data-structure? There is an implicit issue here, that a first-approach may be essentially replication (ie storage-expensive).

Nevertheless, proceeding in this fashion, remember that a Python list is not the "array" of other languages! Content-elements need not be homogeneous. So, an 'accumulator' list could contain a dict, a set, sundry scalars, and/or inner-lists, in perfect happiness.

Through zip(), Python has a very handy (IMHO) way of linking two lists, without any effort on my part. (I work very hard at being this lazy!) So, during 'parsing', it is possible to (say) build one list recording 'type', eg list, dict, set, ... and a parallel list containing the values (or k-v pairs, or i,j, or...). Yet more, if/when a 'length' metric can be computed... Thereafter when it comes to presentation, the assembled lists can be zip-ped together in a for-loop to yield the final-presentation.

Apologies, the above seems to be 'fluff around the edges' rather than addressing the central needs of the problem. I'm also a little concerned about your expertise level and the likelihood that I may be 'talking down'.
--
Regards =dn
--
https://mail.python.org/mailman/listinfo/python-list

Reply via email to