[Python-ideas] Re: Percent notation for array and string literals, similar to Perl, Ruby

Todd Tue, 22 Oct 2019 17:56:53 -0700

On Tue, Oct 22, 2019 at 7:57 PM Steven D'Aprano <st...@pearwood.info> wrote:


> On Tue, Oct 22, 2019 at 04:11:45PM -0400, Todd wrote:
> > On Tue, Oct 22, 2019 at 3:54 PM Steve Jorgensen <ste...@stevej.name>
> wrote:
> >
> > > See
> > >
> https://en.wikibooks.org/wiki/Ruby_Programming/Syntax/Literals#The_%_Notation
> > > for what Ruby offers.
> > >
> > > For me, the arrays are the most useful aspect.
> > >
> > >     %w{one two three}
> > >     => ["one", "two", "three"]
>
>
> I would expect %w{ ... } to return a set, not a list:
>
>     %w[ ... ]  # list
>     %w{ ... ]  # set
>     %w( ... )  # tuple
>

This is growing into an entire new group of constructors for a very, very
limited number of operations that have been privileged for some reason.
Should %{a=b c=d} create dicts, too?  Why not?  Why should strings be
privileged over, say, numbers?  Why should %w[1 2 3] make ['1', '2', '3']
instead of [1, 2, 3]?  And why whitespace instead of a comma?  We have
general ways to handle all of this stuff that doesn't lock us into a single
special case.


> and I would describe them as list/set/tuple "word literals". Unlike
> list etc displays [spam, eggs, cheese] these would actually be true
> literals that can be determined entirely at compile-time.
>

I don't know enough about the internals to say whether this would be
possible or not.


> > I am not seeing the advantage of this.  Can you provide some specific
> > examples that you think would benefit from this syntax?
>
> I would use this feature, or something like it, a lot, especially in
> doctests where there is a premium in being able to keep examples short
> and on one line.
>
> Here is a small selection of examples from my code that would be
> improved by something like the suggested syntax. I have trimmed some of
> them for brevity, and to keep them on one line. (Anything with an
> ellipsis ... has been trimmed.) I have dozens more, but they'll all
> pretty similar and I don't want to bore you.
>
>
>     __slots__ = ('key', 'value', 'prev', 'next', 'count')
>
>     __all__ = ["Mode_Estimators", "Location", "mfv", ...]
>
> The "string literal".split() idiom is especially common, especially for
> data tables of strings. Here are some examples:
>
>     NUMBERS = ('zero one two three ... twenty-eight twenty-nine').split()
>
>     _TOKENS = set("indent assign addassign subassign ...".split())
>
>     __all__ = 'loopup loopdown reduce whileloop recursive product'.split()
>
>     for i, colour in enumerate('Black Red Green Yellow Blue Magenta Cyan
> White'.split()):
>
>     for methodname in 'pow add sub mul truediv'.split():
>
>     attrs = "__doc__  __version__  __date__  __author__  __all__".split()
>
>     names = 'meta private dunder ignorecase invert'.split()
>
>     unsorted = "The quick brown Fox jumps over the lazy Dog".split()
>
>     blocks = chaff.pad('flee to south'.split(), key='george')
>
>     minmax('aa bbbb c ddd eeeee f ggggg'.split(), key=len)
>
>
> My estimate is that I would use this "string literal".split() idiom:
>
> - about 60-70% in doctests;
> - about 5-10% in other tests;
> - about 25% in non-test code.
>
>
> Anyone who has had to write out a large, or even not-so-large, list of
> words could benefit from this. Why quote each word individually like a
> drudge, when the compiler could do it for you at compile-time?
>
> Specifically as a convenience for this "list of words" use-case,
> namedtuple splits a single string into words, e.g.
>
>     namedtuple('Parameter', 'name alias default')
>
> I do the same in some of my functions as well, to make it easier to pass
> lists of words.
>
> Similarly, support for keyword arguments in the dict constructor was
> specifically added to ease the case where your keys were single words:
>
>     # {'spam': 1, 'eggs': 2}
>     dict(spam=1, eggs=2)
>
>
> Don't underestimate the annoyance factor of having to write out things
> by hand when the compiler could do it for you. Analogy: we have list
> displays to make it easy to construct a list:
>
>     mylist = [2, 7, -1]
>
> but that's strictly unnecessary, since we could construct it like
> this:
>
>     mylist = list()
>     mylist.append(2)
>     mylist.append(7)
>     mylist.append(-1)
>
> If you think I'm being fascious about the list example, you've probably
> never used standard Pascal, which had arrays but no syntax to initialise
> them except via a sequence of assignments. That wasn't too bad if you
> could put the assignments in a loop, but was painful if the initial
> entries were strings or floats.
>

Yes, I understand that Python has syntactic sugar.  But any new syntactic
sugar necessarily has an uphill battle due people having to learn it, books
and classes having to be updated, linters updated, new pep8 guidelines
written, etc.  We already have a way to split strings.  So the question is
why we need this in addition to what we already have, especially
considering it is so radically different than anything else in Python.  If
the primary use-case is docstrings, then this is something everyone will
have to learn very early on, it wouldn't be something people could just
ignore if they didn't want to use it like, say, the @ matrix multiplication
operator. So everyone would have to learn a completely new way of building
lists, tuples, and sets that only applies to a particular combination of
strings and whitespace.


> > For the example you gave, besides saving a few characters I don't see the
> > advantage over the existing way we have to do that:
> >
> > 'one two three'.split()
>
> One of the reasons why Python is "slow" is that lots of things that can
> be done at compile-time are deferred to run-time. I doubt that splitting
> short strings will often be a bottle-neck, but idioms like this cannot
> help to contribute (even if only a little bit) to the extra work the
> Python interpreter does at run-time:
>
>     load a pre-allocated string constant
>     look up the "split" attribute in the instance (not found)
>     look up the "split" attribute in the class
>     call the descriptor protocol which returns a method
>     call the method
>     build and return a list
>     garbage collect the string constant
>
> versus:
>
>     build and return a list from pre-allocated strings
>
> (Or something like this, I'm not really an expert on the Python
> internals, I just pretend to know what I'm talking about.)
>

Yes, but as far as I am aware Python doesn't typically add new syntax just
to avoid a small performance penalty.  The new syntax should have some real
use-cases that current syntax can't solve.  I am not seeing that here.


> > Python usually uses [ ] for list creation or indexing.  Co-opting it for
> a
> > substantially different purpose of string processing like this doesn't
> > strike me as a good idea, especially since we have two string identifiers
> > already, ' and ".
>
> I'm not sure why you describe this as "string processing". The result
> you get is a list, not a string. This would be pure syntactic sugar for:
>
>     %w[words]  # "words".split()
>     %w{words}  # set("words".split())
>     %w(words)  # tuple("words".split())
>
> except done by the compiler, at compile-time, not runtime.
>
>
The result is a list, but the input is a string.  It is string processing
the same way all the string methods are string processing.

_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/WVGR2BDS3EPUZFVLRYZA3T2RAS3T6OL2/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: Percent notation for array and string literals, similar to Perl, Ruby

Reply via email to