Re: [Python-ideas] Optional parameters without default value

2017-03-09 Thread Terry Reedy

On 3/2/2017 3:03 AM, Serhiy Storchaka wrote:

Function implemented in Python can have optional parameters with default
value. It also can accept arbitrary number of positional and keyword
arguments if use var-positional or var-keyword parameters (*args and
**kwargs).


In other words, Python signature possibilities are already unusually 
complex.



But there is no way to declare an optional parameter that
don't have default value.

... [moving the following up]
> I propose to add a new syntax for optional parameters. If the argument
> corresponding to the optional parameter without default value is not
> specified, the parameter takes no value.

-1

Being able to do this would violate what I believe is the fundamental 
precondition for python-coded function bodies: all parameters are bound 
to an object (so that using a parameter name is never a NameError); all 
arguments are used exactly once in the binding process; the binding is 
done without ambiguity (or resort to disambiguation rules).  Calls that 
prevent establishment of this precondition result in an exception.


This precondition is normal in computing languages.  I believe that all 
of the ~20 languages I have used over decades have had it.  In any case, 
I believe it is important in understanding Python signatures and calls, 
and that there would need to be a strong reason to alter this 
precondition.  (Stronger than I judge the one given here to be.)


> Currently you need to use the sentinel idiom

which binds a special object to a parameter, thus fulfilling the 
precondition.



for implementing this:

_sentinel = object()
def get(store, key, default=_sentinel):
if store.exists(key):
return store.retrieve(key)
if default is _sentinel:
raise LookupError
else:
return default

There are drawback of this:

* Module's namespace is polluted with sentinel's variables.


If one cares, one can change the internal reference to 'get._sentinel' 
and add

  get._sentinel = _sentinel; del _sentinel
after the def (or package this in a decorator.


* You need to check for the sentinel before passing it to other function
by accident.


This might be a feature.


* Possible name conflicts between sentinels for different functions of
the same module.


Since None can be used as a sentinel for multiple functions, I don't 
understand the problem you are pointing to.



* Since the sentinel is accessible outside of the function, it possible
to pass it to the function.


1. Give it a more private name (___private___?) similar to a reserved name.
2. Hide it better (as a object attribute, for instance).


* help() of the function shows reprs of default values. "foo(bar=)" looks ugly.


Someone suggested a subclass of object with str = repr that prints 
something like 'bar = '.  I think functools would be the 
appropriate place for the class, predefined instance, and possibly a 
decorator.  Make the instance an attribute of the class so it a) would 
have the same name both in the header and body, and b) would not be an 
attribute of user module or user functions.


__
Terry Jan Reedy


___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Exploiting type-homogeneity in list.sort() (again!)

2017-03-09 Thread Tim Peters
[Tim Delaney ]
> Isn't there already always a scan of the iterable to build the keys array
> for sorting (even if no key keyword param is specified)?

No - `.sort()` is a list method, and as such has nothing to do with
arbitrary iterables, just lists (perhaps you're thinking of the
`sorted()` function?).  If no `key=` argument is passed, the list guts
itself is used (as-is) as the vector of keys.


> In which case adding the homogenity check there seems like it shouldn't
> add much overhead at all (I have to say that I was surprised with 10+%
> reductions in speed in some of the heterogenous TimSort tests for this 
> reason).

Those are worst cases where the current sort does very few compares
(like just N-1 for a list of length N).  Because they do "amazingly"
few compares, they're already "amazingly" fast.  And they also do
little data movement (e.g., none at all for /sort & =sort, and N//2
pointer swaps for \sort).  Because of that any new O(N) overhead would
make them significantly slower - unless the new overhead pays off by
allowing a larger time saving than it costs.(as it does when the list
is same-type).

There is a "natural" place to insert "same type?" checks:  the outer
loop of the sort marches over the vector once, left to right,
alternately identifying the next natural run, then possibly extending
it and/or merging it into previous runs.  The checks could be put
there instead, but the code would be ugly and more invasive - I
wouldn't even bother trying it.


> And could specific richcompares be refactored so there was a "we really know
> what the types are is, no need to check" version available to sort() (with
> the typechecking version available for general use/unoptimised sorting)?

They're not _just_ type-aware.  For example, the special function for
ints is specialized to integers that happen to fit in one (internal)
"digit", and the special function for strings is specialized to those
that happen to be stored in PyUnicode_1BYTE_KIND format.  Adding such
stuff to the public C API would be ... well, for a start, tedious ;-)
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Submitted a PR!

2017-03-09 Thread Elliot Gorokhovsky
There is an "apple to apple" compare function. It's unsafe_object_compare.
If you look at the pre-sort check, you will find that if the list is
homogeneous but not of float, int, string, or tuple, it sets
compare_funcs.key_richcompare = ob_type->tp_richcompare and sets
compare_funcs.key_compare = unsafe_object_compare. The latter is a wrapper
for the former, bypassing the type checks in PyObject_RichCompareBool.
Examples of benchmarks that use this functionality include the non-latin
string and unbounded int benchmarks.

In the table at the bottom of my patch description, it's described as
follows:

Compare function for general homogeneous lists; just a wrapper for
ob_type->tp_richcompare, which is stored by the pre-sort check at
compare_funcs.key_richcompare. This yields modest optimization
(neighbourhood of 10%), but we generally hope we can do better.

Further, in the code, the comments describe it as follows:

/* Homogeneous compare: safe for any two compareable objects of the same
type.
* (compare_funcs.key_richcompare is set to ob_type->tp_richcompare in the
* pre-sort check.)
*/

Does that answer your question?



On Thu, Mar 9, 2017 at 6:18 PM Erik  wrote:

> Hi.
>
> I may be way off-base here, but having scanned the patch I'm not sure I
> agree that it's the right way forward.
>
> What seems to be happening is that the homogeneity of the list is
> determined somehow (whether tracked with a hint or scanned just-in-time)
> and then a specific comparison function for a known subset of built-in
> types is selected if appropriate.
>
> I had assumed that there would be an "apples-to-apples" comparison
> function in the type structure and that the patch was simply tracking
> the list's homogeneity in order to enter a (generic) alternative loop to
> call that function over PyObject_RichCompare().
>
> Why is that not the case? When a new C-level type is introduced (either
> a built-in or an extension module), why does the list object's code need
> to know about it in order to perform this optimisation?
>
> Why is there not a "tp_apple2apple" slot in the type structure which
> higher level functions (including the RichCompare() stuff - the first
> thing that function does is check the type of the objects anyway) can
> call if it determines that the two objects have the same type?
>
> Such a slot would also speed up "contains", "count", etc (for all
> classes) with no extra work, and no overhead of tracking or scanning the
> sequence's homogeneity.
>
> E.
>
>
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Submitted a PR!

2017-03-09 Thread Erik

Hi.

I may be way off-base here, but having scanned the patch I'm not sure I 
agree that it's the right way forward.


What seems to be happening is that the homogeneity of the list is 
determined somehow (whether tracked with a hint or scanned just-in-time) 
and then a specific comparison function for a known subset of built-in 
types is selected if appropriate.


I had assumed that there would be an "apples-to-apples" comparison 
function in the type structure and that the patch was simply tracking 
the list's homogeneity in order to enter a (generic) alternative loop to 
call that function over PyObject_RichCompare().


Why is that not the case? When a new C-level type is introduced (either 
a built-in or an extension module), why does the list object's code need 
to know about it in order to perform this optimisation?


Why is there not a "tp_apple2apple" slot in the type structure which 
higher level functions (including the RichCompare() stuff - the first 
thing that function does is check the type of the objects anyway) can 
call if it determines that the two objects have the same type?


Such a slot would also speed up "contains", "count", etc (for all 
classes) with no extra work, and no overhead of tracking or scanning the 
sequence's homogeneity.


E.

___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] dict(default=int)

2017-03-09 Thread Chris Angelico
On Fri, Mar 10, 2017 at 11:29 AM, Erik  wrote:
> On 09/03/17 23:04, Spencer Brown wrote:
>>
>> Might make more sense to be dict.default(int), that way it doesn't
>> have redundant dict names.
>
>
> I thought that, too.
>
>> since you could do {1:2, 3:4}.default(int)
>
>
> Could you?
>
> Python 3.6.0 (default, Mar  9 2017, 00:43:06)
> [GCC 5.4.0 20160609] on linux
> Type "help", "copyright", "credits" or "license" for more information.
 type(dict())
> 
 type({})
> 
 type(dict)
> 
>
> The thing bound to the name 'dict' is not the same as the object returned by
> _calling_ 'dict'.

Yes, you could; it'd be a classmethod, like dict.fromkeys. IMO it
should just ignore any instance argument - same as you see here:

>>> dict.fromkeys(range(3))
{0: None, 1: None, 2: None}
>>> {1:2,3:4}.fromkeys(range(3))
{0: None, 1: None, 2: None}

ChrisA
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] dict(default=int)

2017-03-09 Thread Erik

On 09/03/17 23:04, Spencer Brown wrote:

Might make more sense to be dict.default(int), that way it doesn't
have redundant dict names.


I thought that, too.


since you could do {1:2, 3:4}.default(int)


Could you?

Python 3.6.0 (default, Mar  9 2017, 00:43:06)
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> type(dict())

>>> type({})

>>> type(dict)


The thing bound to the name 'dict' is not the same as the object 
returned by _calling_ 'dict'.


E.
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Exploiting type-homogeneity in list.sort() (again!)

2017-03-09 Thread Chris Barker
On Thu, Mar 9, 2017 at 2:04 AM, Barry Scott  wrote:

> It was not clear to me that you need to scan the list at the start to make
> sure its homogeneous. Given that the type check is so cheap will it
> slow the sort if you do the pointer check in the compare code? I am not
> suggesting you run rich compare full fat on each compare.
>

I think you have a point here.

IIUC, the current code makes no assumptions about type homogeneity. So it
must do something generic at each compare. Perhaps that generic thing (of
course, Elliot knows that is) does do a pointer compare, and then something
smart and fast for some built-ins. but there is still a few steps:

these are both the same type
what type are they
how do I compare them?
do the compare

These may well all be fast, but it's still a few steps, and have to be done
O(n*log(n)) times

IIUC, Elliot's patch does something like:

First go through the whole list and do:
  What is the type of the first item (only once)
  How do I compare that type (only once)
  Is everything else in the list the same type ( O(n) )

Then the actual sort:
 - do the compare ( O(n*log(n)) )

So there is one operation done n times, and one done n*log(n) times, rather
than 4 done n*log(n) times, if the constants are about the same, that saves
3n*log(n) - n operations, or -- if n is non-trivial, then n*3*log(n)
operations

so we go from 4*n*log(n) to n*log(n) about a 4 times speed up -- in theory.
with all the other complications of computer performance, only profiling
will tell you! (also, it's not 4 times on the whole sort -- just the
compare part -- you still need to shuffle the values around, which
presumably takes at last as long as each of the above operations)

(not sure about all four of those steps, it may be only three -- but still
a 3 time speed up)

Now Barry's idea:

Assume that the list is homogenous, and crap out if it turns out it's not:

Start off:
  What is the type of the first item (only once)
  How do I compare that type (only once)
Do the sort:
  Is the next item the correct type? O(n*log(n))
  Do the compare ( O(n*log(n)) )

So we now have 2 operations to be run O(n*log(n)) -- so not as good at only
one, but still better than the 3 or 4 of the fully general sort. And this
would have less impact on the non-homogenous case.

Though Elliot points out that this would be much harder to branch-predict,
etc... so maybe not.

Maybe worth trying out and profiling??

NOTE: I really have no idea what really goes in in that code -- so I may
have this all wrong...

-CHB


-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] dict(default=int)

2017-03-09 Thread Spencer Brown

 Might make more sense to be dict.default(int), that way it doesn't have 
redundant dict names. Only problem is then it might be a bit confusing, since 
you could do {1:2, 3:4}.default(int) and not get the values back. Maybe 
'withdefault', and return a copy if called on the instance?
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] dict(default=int)

2017-03-09 Thread Chris Barker
>If we really want to make defaultdict feel more "builtin" (and I don't see
> >any reason to do so), I'd suggest adding a factory function:
> >
> >dict.defaultdict(int)
>


> Nice.
>

I agree -- what about:

dict.sorteddict() ??

make easy access to various built-in dict variations...

-CHB


-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Exploiting type-homogeneity in list.sort() (again!)

2017-03-09 Thread Tim Delaney
On 9 March 2017 at 21:04, Barry Scott  wrote:

> It was not clear to me that you need to scan the list at the start to make
> sure its homogeneous. Given that the type check is so cheap will it
> slow the sort if you do the pointer check in the compare code? I am not
> suggesting you run rich compare full fat on each compare.
>

Isn't there already always a scan of the iterable to build the keys array
for sorting (even if no key keyword param is specified)? In which case
adding the homogenity check there seems like it shouldn't add much overhead
at all (I have to say that I was surprised with 10+% reductions in speed in
some of the heterogenous TimSort tests for this reason).

And could specific richcompares be refactored so there was a "we really
know what the types are is, no need to check" version available to sort()
(with the typechecking version available for general use/unoptimised
sorting)?

Tim Delaney
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] dict(default=int)

2017-03-09 Thread Barry Warsaw
On Mar 08, 2017, at 05:49 PM, Eric V. Smith wrote:

>If we really want to make defaultdict feel more "builtin" (and I don't see
>any reason to do so), I'd suggest adding a factory function:
>
>dict.defaultdict(int)
>
>Similar in spirit to dict.fromkeys(), except of course returning a
>defauldict, not a dict.

Nice.

-Barry


pgpK0qyKJMncc.pgp
Description: OpenPGP digital signature
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Submitted a PR!

2017-03-09 Thread Elliot Gorokhovsky
Just submitted a PR implementing this:
https://github.com/python/cpython/pull/582
-- just need someone to review it now :)

Thanks for all your feedback, everyone!

On Sun, Mar 5, 2017 at 12:19 AM Elliot Gorokhovsky <
elliot.gorokhov...@gmail.com> wrote:

> (Summary of results: my patch at https://bugs.python.org/issue28685 makes
> list.sort() 30-50% faster in common cases, and at most 1.5% slower in the
> uncommon worst case.)
>
> Hello all,
>
> You may remember seeing some messages on here about optimizing list.sort()
> by exploiting type-homogeneity: since comparing apples and oranges is
> uncommon (though possible, i.e. float to int), it pays off to check if the
> list is type-homogeneous (as well as homogeneous with respect to some other
> properties), and, if it is, to replace calls to PyObject_RichCompareBool
> with calls to ob_type->tp_richcompare (or, in common special cases, to
> optimized compare functions). The entire patch is less than 250 lines of
> code, most of which is pretty boilerplate (i.e. a lot of assertions in
> #ifdef Py_DEBUG blocks, etc).
>
> I originally wrote that patch back in November. I've learned a lot since
> then, both about CPython and about mailing list etiquette :). Previous
> discussion about this can be found at
> https://mail.python.org/pipermail/python-dev/2016-October/146648.html and
> https://mail.python.org/pipermail/python-ideas/2016-October/042807.html.
>
> Anyway, I recently redid the benchmarks much more rigorously (in
> preparation for presenting this project at my school's science fair),
> achieving a standard deviation of less than 0.5% of the mean for all
> measurements. The exact benchmark script used can be found at
> https://github.com/embg/python-fastsort-benchmark (it's just sorting
> random lists of/lists of tuples of [type]. While listsort.txt talks about
> benchmarking different kinds of structured lists, instead of just random
> lists, the results here would hold in those cases just as well, because
> this makes individual comparisons cheaper, instead of reducing the number
> of comparisons based on structure).
>
> I also made a poster describing the optimization and including a pretty
> graph displaying the benchmark data:
> https://github.com/embg/python-fastsort-benchmark/blob/master/poster.pdf.
> For those who would rather read the results here (though it is a *really*
> pretty graph):
>
> ***
> Percent improvement for sorting random lists of [type]
> (1-patched/unpatched):
> float: 48%
> bounded int (magnitude smaller than 2^32): 48.4%
> latin string (all characters in [0,255]): 32.7%
> general int (reasonably uncommon?): 17.2%
> general string (reasonably uncommon?): 9.2%
> tuples of float: 63.2%
> tuples of bounded int: 64.8%
> tuples of latin string: 55.8%
> tuples of general int: 50.3%
> tuples of general string: 44.1%
> tuples of heterogeneous: 41.5%
> heterogeneous (lots of float with an int at the end; worst-case): -1.5%
> ***
>
> Essentially, it's a gamble where the payoff is 20-30 times greater than
> the cost, and the odds of losing are very small. Sorting is perhaps not a
> bottleneck in most applications, but considering how much work has gone
> into Python's sort (Timsort, etc; half of listobject.c is sort code), I
> think it's interesting that list.sort() can be made essentially twice
> faster by a relatively simple optimization. I would also add that Python
> dictionaries already implement this optimization: they start out optimizing
> based on the assumption that they'll only be seeing string keys, checking
> to make sure that assumption holds as they go. If they see a non-string
> key, they permanently switch over to the general implementation. So it's
> really the same idea, except here it doesn't matter as much what type we're
> dealing with, which is important, because lists are commonly used with lots
> of different types, as opposed to dictionaries, which overwhelmingly
> commonly use string keys, especially internally. (Correct me if I'm wrong
> in any of the above).
>
> I got a lot of great feedback/input on this patch as I was writing it, but
> after submitting it, I didn't hear much from anybody. (The reason I took so
> long to post was because I wanted to wait until I had the chance to do the
> benchmarks *right*). What do you all think?
>
> Thanks,
> Elliot
>
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Exploiting type-homogeneity in list.sort() (again!)

2017-03-09 Thread Elliot Gorokhovsky
On Thu, Mar 9, 2017 at 3:04 AM Barry Scott  wrote:

>
> So you do O(nlogn)*2 pointer compares with my suggestion it seems? Which
> is smaller the O(n) pointer checks?
>

Not sure what you mean here... pretty sure your inequality is backwards?

Look -- the point is, we already *do* the pointer checks during every
compare. That's the current implementation. I believe my benchmarks are
sufficient to prove that my patch is much faster than the current
implementation. Which makes sense, because we do one type check per object,
as opposed to something like log n per object, and we do them all at once,
which I do believe is faster than doing them across calls.

Now, you're not entirely wrong -- the current implementation doesn't do the
type checks as efficiently as it could. Specifically, it does them once in
PyObject_RichCompareBool, and then *again* in ob_type->tp_richcompare
(which it has to for safety: ob_type->tp_richcompare doesn't know the
arguments have already been type-checked!) You could totally define
optimized compares that do the type-checks more efficiently, and you would
see a benefit, just not nearly as extreme as the benefit I demonstrate.

Thanks again for your feedback!
Elliot
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Exploiting type-homogeneity in list.sort() (again!)

2017-03-09 Thread Barry Scott

> On 8 Mar 2017, at 22:58, Elliot Gorokhovsky  
> wrote:
> 
> On Wed, Mar 8, 2017 at 2:14 PM Barry  > wrote:
> Can you assume that list of of type(list[0]) and use that type's optimised 
> sort?
> But in the optimised sort code check that the types are as required.
> If you hit an element that is not of the required type then fall back to the 
> unoptimised sort.
> 
> Well, how would you tell if you've hit an element that is not of the required 
> type? You'd have to check during every compare, right? And that's precisely 
> what we're trying to avoid!

What it seemed the trick for optimisation is is to compare the type pointer of 
an object to see if its the same as a type supported by the chosen optimised 
sort.

It was not clear to me that you need to scan the list at the start to make sure 
its homogeneous. Given that the type check is so cheap will it
slow the sort if you do the pointer check in the compare code? I am not 
suggesting you run rich compare full fat on each compare.

> The whole point of my patch is that we do O(nlogn) compares, but only have 
> O(n) elements, so it's much cheaper to do all the type checks in advance, and 
> in the very likely case that our list is homogeneous, switch to an optimized 
> special-case compare function.

So you do O(nlogn)*2 pointer compares with my suggestion it seems? Which is 
smaller the O(n) pointer checks?

> 
> Even when we only do O(n) compares, my patch is still much faster (see 
> benchmarks higher up in this thread). Why? Well, if you're doing the type 
> checks during the compares, you're doing them across different function 
> calls, with other stuff interspersed in between. So pipeline/branch 
> prediction/caching is less able to optimize away the overhead of the safety 
> checks (I don't know how CPUs work, but one of those is relevant here). With 
> the pre-sort check, it's all in a single iteration through the list, and 
> we're taking the same exact branch every time; it's much faster. Think 
> iterating over a matrix row-wise versus iterating column-wise (not exactly 
> relevant here since that's about cache, but that's the idea. Again, I don't 
> really know how CPUs work).

Provided the code is small I think both versions of the algorithm will benefit 
from cache and branch prediction.

> 
> So in short: no, we can't just check as we go, because that's what we're 
> already doing. But we can optimize significantly by checking in advance. I 
> mean, practically speaking, the advance check is basically free. The 
> compare-time checks, in sum, are not, both asymptotically and practically.

I not clear this is a true. But I have not read the code.

Barry___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/