Re: [Python-ideas] [Python-Dev] Optimizing list.sort() by checking type in advance

2016-10-10 Thread Greg Ewing

Elliot Gorokhovsky wrote:
I will be able to rule that out 
when I implement this as a patch instead of an extension module and test 
my own build.


You could test it against a locally built Python without
having to go that far.

--
Greg
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Optimizing list.sort() by checking type in advance

2016-10-10 Thread Greg Ewing

Elliot Gorokhovsky wrote:
if the list is all 
floats, just copy all the floats into a seperate array, use the standard 
library quicksort, and then construct a sorted PyObject* array.


My question would be whether sorting list of just floats
(or where the keys are just floats) is common enough to
be worth doing this.

--
Greg
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] suppressing exception context when it is not relevant

2016-10-10 Thread Nick Coghlan
On 11 October 2016 at 10:43, Václav Dvořák  wrote:
> But I find this misleading, as the original KeyError is not really an error
> at all. I could of course avoid the situation by changing the try/except
> (EAFP) into a test for the key's presence (LBYL) but that's not very
> Pythonic and less thread-friendly (not that the above is thread-safe as is,
> but that's beside the point). Also, yes, I could instead subclass dict and
> implement __missing__, but that's only a solution for this particular case.
> The problem (if you agree it's a problem) occurs any time an exception is
> not actually an error, but rather a condition that just happens to be
> indicated by an exception.
>
> It's unreasonable to expect all code in some_api to change their raise X to
> raise X from None (and it wouldn't even make sense in all cases). Is there a
> clean solution to avoid the unwanted exception chain in the error message?

Yes, you can restructure the code so you're not doing further work in
the exception handler, and instead do the work after the try/except
block finishes and the exception context is cleared automatically:

value = MISSING = object()
try:
value = cache_dict[key]
except KeyError:
pass
if value is MISSING:
value = some_api.get_the_value_via_web_service_call(key)
cache_dict[key] = value

(This is the general case of MRAB's answer, as the try/except
KeyError/pass pattern above is what dict.get() implements)

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] [Python-Dev] Optimizing list.sort() by checking type in advance

2016-10-10 Thread Guido van Rossum
On Mon, Oct 10, 2016 at 7:56 PM, Elliot Gorokhovsky
 wrote:
> So here's a simple attempt at taking lots of measurements just using
> time.time() with lists of ints. The results are great, if they are valid
> (which I leave to you to judge); even for lists with just one element, it's
> 16% faster!

But that's suspicious in itself -- since no comparisons are needed to
sort a 1-element list, if it's still faster, there must be something
else you're doing (or not doing) that's affecting the time measured.

I wonder if it's the method lookup that's is slower than the entire
call duration? That explains why s[:1] == 'x' is faster than
s.startswith('x'), for example.

A simple nit on your test code: calling time() twice per iteration
could also affect things. I would just call time() once before and
once after the innermost for-loops. (IIRC timeit tries to compensate
for the cost of the loop itself by measuring an empty loop, but that's
got its own set of problems.)

Anyway, you should ignore me and listen to Tim, so I'll shut up now.

-- 
--Guido van Rossum (python.org/~guido)
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Improve error message when missing 'self' in method definition

2016-10-10 Thread Stephen J. Turnbull
Chris Angelico writes:

 > Given that it's not changing semantics at all, just adding info/hints
 > to an error message, it could well be added in a point release.

But it does change semantics, specifically for doctests.

I seem to recall that that is considered a blocker for this kind of
change in a maintenance-only branch.  In the end that's probably up to
the RM, but I would be mildly against it.  FWIW YMMV of course.


___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Optimizing list.sort() by checking type in advance

2016-10-10 Thread Jonathan Goble
On Mon, Oct 10, 2016 at 11:30 PM Elliot Gorokhovsky <
elliot.gorokhov...@gmail.com> wrote:

> - I expect tuples will also be worth specializing (complex sort keys are
> often implemented as tuples).
>
> I'm not sure what you mean here... I'm looking at the types of lo.keys,
> not of saved_ob_item (I think I said that earlier in this thread by mistake
> actually). So if someone is passing tuples and using itemgetter to extract
> ints or strings or whatever, the current code will work fine; lo.keys will
> be scalar types. Unless I misunderstand you here. I mean, when would
> lo.keys actually be tuples?
>

If someone wanted to sort, e.g., a table (likely a list of tuples) by
multiple columns at once, they might pass the key function as
`itemgetter(3, 4, 5)`, meaning to sort by "column" (actually item) 3, then
columns 4 and then 5 as tiebreakers. This itemgetter will return a new
tuple of three items, that tuple being the key to sort by. Since tuples
sort by the first different item, in this theoretical example the result of
sort() will be exactly what the user wanted: a table sorted by three
columns at once.

A practical example of such a use case is sorting by last name first and
then by first name where two people have the same last name. Assuming a
list of dicts in this case, the key function passed to sort() would simply
be `itemgetter('lastname", "firstname")`, which returns a tuple of two
items to use as the key.

So yes, there are perfectly valid use cases for tuples as keys.
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Optimizing list.sort() by checking type in advance

2016-10-10 Thread Elliot Gorokhovsky
Oh no, the idea here is just you would copy over the floats associated with
the PyObject* and keep them in an array of such structs, so that we know
which PyObject* are associated with which floats. Then after the standard
library quicksort sorts them you would copy the PyObject* into the list. So
you sort the PyObject* keyed by the floats. Anyway, I think the copying
back and forth would probably be too expensive, it's just an idea. Also, I
apologize for the formatting of my last email, I didn't realize Inbox would
mess up the quoting like that. I'll ensure I use plain-text quotes from now
on.

On Mon, Oct 10, 2016 at 9:38 PM Chris Angelico  wrote:

> On Tue, Oct 11, 2016 at 2:29 PM, Elliot Gorokhovsky
>  wrote:
> > Ya, I think this may be a good approach for floats: if the list is all
> > floats, just copy all the floats into a seperate array, use the standard
> > library quicksort, and then construct a sorted PyObject* array. Like
> maybe
> > set up a struct { PyObject* payload, float key } type of deal.
>
> Not quite sure what you mean here. What is payload, what is key? Are
> you implying that the original float objects could be destroyed and
> replaced with others of equal value? Python (unlike insurance claims)
> guarantees that you get back the exact same object as you started
> with.
>
> ChrisA
> ___
> Python-ideas mailing list
> Python-ideas@python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Optimizing list.sort() by checking type in advance

2016-10-10 Thread Chris Angelico
On Tue, Oct 11, 2016 at 2:29 PM, Elliot Gorokhovsky
 wrote:
> Ya, I think this may be a good approach for floats: if the list is all
> floats, just copy all the floats into a seperate array, use the standard
> library quicksort, and then construct a sorted PyObject* array. Like maybe
> set up a struct { PyObject* payload, float key } type of deal.

Not quite sure what you mean here. What is payload, what is key? Are
you implying that the original float objects could be destroyed and
replaced with others of equal value? Python (unlike insurance claims)
guarantees that you get back the exact same object as you started
with.

ChrisA
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Optimizing list.sort() by checking type in advance

2016-10-10 Thread Elliot Gorokhovsky
Thanks for looking at this!

That's why I spent months of my life (overall) devising a sequence of
sorting algorithms for Python that reduced the number of comparisons needed.


Yes, that's why I think this is so cool: for a couple dozen lines of code,
we can get (at least for some cases, according to my questionable
benchmarks) the kinds of massive improvements you had to use actual
computer science to achieve (as opposed to mere hackery).


Note that when Python's current sort was adopted in Java, they still kept a
quicksort variant for "unboxed" builtin types.  The adaptive merge sort
incurs many overheads that often cost more than they save unless
comparisons are in fact very expensive compared to the cost of pointer
copying (and in Java comparison of unboxed types is cheap).  Indeed, for
native numeric types, where comparison is dirt cheap, quicksort generally
runs faster than mergesort despite that the former does _more_ comparisons
(because mergesort does so much more pointer-copying).


Ya, I think this may be a good approach for floats: if the list is all
floats, just copy all the floats into a seperate array, use the standard
library quicksort, and then construct a sorted PyObject* array. Like maybe
set up a struct { PyObject* payload, float key } type of deal. This
wouldn't work for strings (unicode is scary), and probably not for ints
(one would have to check that all the ints are within C long bounds).
Though on the other hand perhaps this would be too expensive?

I had considered something "like this" for Python 2, but didn't pursue it
because comparison was defined between virtually any two types (34 < [1],
etc), and people were careless about that (both by design and by
accident).  In Python 3, comparison "blows up" for absurdly mixed types, so
specializing for homogeneously-typed lists is a more promising idea on the
face of it.

The comparisons needed to determine _whether_ a list's objects have a
common type is just len(list)-1 C-level pointer comparisons, and so goes
fast.  So I expect that, when it applies, this would speed even sorting an
already-ordered list with at least 2 elements.

That's what my crude benchmarks indicate... when I applied my sort to a
list of 1e7 ints with a float tacked on the end, my sort actually ended up
being a bit faster over several trials (which I attribute to
PyObject_RichCompare == Py_True being faster than PyObject_RichCompareBool
== 1, apologies for any typos in that code).


For a mixed-type list with at least 2 elements, it will always be pure
loss.  But (a) I expect such lists are uncommon (and especially uncommon in
Python 3); and (b) a one-time scan doing C-level pointer comparisons until
finding a mismatched type is bound to be a relatively tiny cost compared to
the expense of all the "rich comparisons" that follow.

So +1 from me on pursuing this.

Elliot, please:

- Keep this on python-ideas.  python-dev is for current issues in Python
development, not for speculating about changes.

- Open an issue on the tracker:  https://bugs.python.org/


OK


- At least browse the info for developers:
https://docs.python.org/devguide/


Ya, I'm working on setting this up as a patch in the hg repo as opposed to
an extension module to make benchmarking cleaner/more sane.


- Don't overlook Lib/test/sortperf.py.  As is, it should be a good test of
what your approach so far _doesn't_ help, since it sorts only lists of
floats (& I don't think you're special-casing them).  If the timing results
it reports aren't significantly hurt (and I expect they won't be), then add
specialization for floats too and gloat about the speedup :-)

Ya, I mean they aren't special-cased, but homogenous lists of floats still
fit in the tp->rich_compare case, which still bypasses the expensive
PyObject_RichCompare. I'll guess I'll see when I implement this as a patch
and can run it on sortperf.py.


- I expect tuples will also be worth specializing (complex sort keys are
often implemented as tuples).


I'm not sure what you mean here... I'm looking at the types of lo.keys, not
of saved_ob_item (I think I said that earlier in this thread by mistake
actually). So if someone is passing tuples and using itemgetter to extract
ints or strings or whatever, the current code will work fine; lo.keys will
be scalar types. Unless I misunderstand you here. I mean, when would
lo.keys actually be tuples?


Nice start! :-)

Thanks!
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] suppressing exception context when it is not relevant

2016-10-10 Thread Václav Dvořák
I'm aware of "raise ... from None" (from PEP 415). However, how can I
achieve that same effect (of suppressing the "During handling of the above
exception, another exception occurred" message) without having control over
the code that is executed from the except clause? I thought that
sys.exc_clear() could be used for this, but that function doesn't exist in
Python 3 anymore.

Why would I want this? I have some simple caching code that looks like
(simplified):

try:
value = cache_dict[key]
except KeyError:
value = some_api.get_the_value_via_web_service_call(key)
cache_dict[key] = value


When there's an exception in the API call, the output will be something
like this:

Traceback (most recent call last):
  File ..., line ..., in ...
KeyError: '...'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File ..., line ..., in ...
some_api.TheInterestingException: ...


But I find this misleading, as the original KeyError is not really an error
at all. I could of course avoid the situation by changing the try/except
(EAFP) into a test for the key's presence (LBYL) but that's not very
Pythonic and less thread-friendly (not that the above is thread-safe as is,
but that's beside the point). Also, yes, I could instead subclass dict and
implement __missing__, but that's only a solution for this particular case.
The problem (if you agree it's a problem) occurs any time an exception is
not actually an error, but rather a condition that just happens to be
indicated by an exception.

It's unreasonable to expect all code in some_api to change their raise
X to raise
X from None (and it wouldn't even make sense in all cases). Is there a
clean solution to avoid the unwanted exception chain in the error message?

If not, would it make sense to re-introduce sys.exc_clear() for this
purpose?

(I originally asked about this here: http://stackoverflow.
com/questions/30235516/how-to-suppress-displaying-the-
parent-exception-the-cause-for-subsequent-excep but find the answer
unappealing.)

Vashek
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Flagging blocking functions not to be used with asyncio

2016-10-10 Thread Nathaniel Smith
On Mon, Oct 10, 2016 at 2:59 AM, Martin Teichmann
 wrote:
> This is why I got my idea to flag such calls. Unfortunately, I
> realized that it is nearly impossible to tell whether a read call is
> blocking or not. We would need to know whether the file descriptor we
> read from was created as non-blocking, or whether it was an actual
> file, and how fast the file storage is for this file (SSD: maybe fine,
> Network: to slow, magnetic disk: dunno). All of this is unfortunately
> not a Python issue, but an issue for the underlying operating system.

Yeah, it really doesn't help that a synchronous network query to a
remote SSD-backed database can easily be lower latency than a
synchronous local disk read to spinning media, yet the fact that we
have async network APIs but no standard async disk APIs means that we
would inevitably find ourselves warning about the former case while
letting the latter one pass silently...

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Flagging blocking functions not to be used with asyncio

2016-10-10 Thread Chris Angelico
On Mon, Oct 10, 2016 at 8:59 PM, Martin Teichmann
 wrote:
> We would need to know whether the file descriptor we
> read from was created as non-blocking, or whether it was an actual
> file, and how fast the file storage is for this file (SSD: maybe fine,
> Network: to slow, magnetic disk: dunno). All of this is unfortunately
> not a Python issue, but an issue for the underlying operating system.

Probably not worth trying to categorize those reads by source.
However, one important feature would be: coming from cache, or
actually waiting for content? With pipes and sockets, this is a very
significant difference, and if you've done a peek() or select() to
find that there is content there, a read() should be perfectly legal,
even in an asyncio world.

ChrisA
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Flagging blocking functions not to be used with asyncio

2016-10-10 Thread Martin Teichmann
Hi,

> Honestly before writing a lot of code here I'd like to hear more from
> Martin about the spread of mistakes he's observed among his users.

Over the weekend, I tried to classify the mistakes I found. Most of
the times, it's something like "I'm just doing a quick lookup on the
database, that shouldn't be a problem". For people coming from a
threading background, this are indeed fast operations, they don't
consider such calls as blocking. In the end, it all boils down to some
read operation down in some non-asyncio code.

This is why I got my idea to flag such calls. Unfortunately, I
realized that it is nearly impossible to tell whether a read call is
blocking or not. We would need to know whether the file descriptor we
read from was created as non-blocking, or whether it was an actual
file, and how fast the file storage is for this file (SSD: maybe fine,
Network: to slow, magnetic disk: dunno). All of this is unfortunately
not a Python issue, but an issue for the underlying operating system.

So I guess I have to tell my users to program carefully and think
about what they're reading from. No automatic detection of problems
seems to be possible, at least not easily.

Greetings

Martin
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/