Re: [Python-Dev] PEP 471 (scandir): Add a new DirEntry.inode() method?

2015-02-14 Thread Marko Rauhamaa
Antoine Pitrou solip...@pitrou.net:

 The whole point of scandir is to expose low-level system calls in a
 cross-platform way.

Cross-platform is great and preferable, but low-level system facilities
should be made available even when they are unique to a particular OS.


Marko
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python 2.x and 3.x use survey, 2014 edition

2014-12-16 Thread Marko Rauhamaa
Mark Roberts wiz...@gmail.com:

 it's outright insulting to be told my complaints about writing 2/3
 compatible code are invalid on the basis of premature optimization.

IMO, you should consider forking your library code for Python2 and
Python3. The multidialect code will look unidiomatic for each dialect.
When the critical mass favors Python3 (possibly within a couple of
years), the transition will be as total and quick as from VHS to DVDs.
At that point, a multidialect library would seem quaint, while a
separate Python2 fork can simply be left behind (bug fixes only).


Marko
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python 2.x and 3.x use survey, 2014 edition

2014-12-16 Thread Marko Rauhamaa
Brian Curtin br...@python.org:

 I'm a few inches shorter than Brett, but having done several sizable
 ports, dual-source has never even on the table. I would prefer the
 run 2to3 at installation time option before maintaining two versions
 (which I do not prefer at all in reality).

How about run 3to2 at installation time?


Marko
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] datetime nanosecond support (ctd?)

2014-12-11 Thread Marko Rauhamaa
Ethan Furman et...@stoneleaf.us:

 On 12/11/2014 11:14 AM, Guido van Rossum wrote:
 (I wouldn't be surprised if there wasn't -- while computer clocks
 have a precision in nanoseconds, that doesn't mean they are that
 *accurate* at all (even with ntpd running).

 The real-world use cases deal with getting this information from other
 devices (network cards, GPS, particle accelerators, etc.), so it's not
 really a matter of cross-computer accurancy, but relative accuracy
 (i.e. how long did something take?).

It would be nice if it were possible to deal with high-precision epoch
times and time deltas without special tricks. I have had to deal with
femtosecond-precision IRL (albeit in a realtime C application, not in
Python). Quad-precision floats (URL:
http://en.wikipedia.org/wiki/Quadruple-precision_floating-point_format)
would do it for Python:

 * just do it in seconds

 * have enough precision for any needs

 * have enough renge for any needs


Marko
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Critical bash vulnerability CVE-2014-6271 may affect Python on *n*x and OSX

2014-09-25 Thread Marko Rauhamaa
Steven D'Aprano st...@pearwood.info:

 Perhaps I'm missing something, but aren't there easier ways to attack 
 os.system than the bash env vulnerability?

The main concern is the cases where you provide a service accessible
through an SSH login and try to sandbox the client with limited
functionality. SSH passes some environment variables on to the service
which can then be used as an XSS vector.

For example, if you wrote an SVN server's SSH front end with Python and
used subprocess.Popen(shell=True) to execute the SVN operations, you
could become a victim.


Marko
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 394 - Clarification of what python command should invoke

2014-09-19 Thread Marko Rauhamaa
Donald Stufft don...@stufft.io:

 My biggest problem with ``python3``, is what happens after 3.9. I know
 Guido doesn’t particularly like two digit version numbers and it’s
 been suggested on this list that instead of 3.10 we’re likely to move
 directly into 4.0 regardless of if it’s a “big” change or not.

python3 should be the name of the programming language specification. If
CPython-4.0 supports all of the python3 programming language,
/usr/bin/python3 should be a symbolic link to CPython-4.0. Or, if I
reimplemented python3 with cyphon0.7, python3 could be a link to that.

If CPython-4.x dropped support for some python3 features, you would no
longer link python3 to it.

Now, what would plain python be? It would make life easier for a lot
of people if it were python2 for all eternity.

By analogy, look at #!/bin/sh, which used to invoke the Bourne shell,
later /bin/bash and on modern Debian, /bin/dash.


Marko
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] RFC: PEP 475, Retry system calls failing with EINTR

2014-09-01 Thread Marko Rauhamaa
Victor Stinner victor.stin...@gmail.com:

 No, it's the opposite. The PEP doesn't change the default behaviour of
 SIGINT: CTRL+C always interrupt the program.

Which raises an interesting question: what happens to the os.read()
return value if SIGINT is received?


Marko
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] RFC: PEP 475, Retry system calls failing with EINTR

2014-09-01 Thread Marko Rauhamaa
Charles-François Natali cf.nat...@gmail.com:

 Which raises an interesting question: what happens to the os.read()
 return value if SIGINT is received?

 There's no return value, a KeywordInterrupt exception is raised.
 The PEP wouldn't change this behavior.

Slightly disconcerting... but I'm sure overriding SIGINT would cure
that. You don't want to lose data if you want to continue running.

 As for the general behavior: all programming languages/platforms
 handle EINTR transparently.

C doesn't. EINTR is there for a purpose. I sure hope Python won't bury
it under opaque APIs.

The two requirements are:

 * Allow the application to react to signals immediately in the main
   flow.

 * Don't lose information.


Marko
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] RFC: PEP 475, Retry system calls failing with EINTR

2014-09-01 Thread Marko Rauhamaa
R. David Murray rdmur...@bitdance.com:

 Windows.  Enough said?
 [...]
 This should tell you just about everything you need to know about why
 we want to fix this problem so that things work cross platform.

I feel your pain. Well, not really; I just don't want my linux bliss to
be taken away.


Marko
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] RFC: PEP 475, Retry system calls failing with EINTR

2014-09-01 Thread Marko Rauhamaa
R. David Murray rdmur...@bitdance.com:

 On Mon, 01 Sep 2014 14:15:52 +0300, Marko Rauhamaa ma...@pacujo.net wrote:
  * Allow the application to react to signals immediately in the main
flow.

 You don't want to be writing your code in Python then. In Python you
 *never* get to react immediately to signals. The interpreter sets a
 flag and calls the python signal handler later. Yes, the call is ASAP,
 but ASAP is *not* immediately.

You don't have to get that philosophical.

Immediately means, without delay, without further I/O.


Marko
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] RFC: PEP 475, Retry system calls failing with EINTR

2014-08-31 Thread Marko Rauhamaa
Victor Stinner victor.stin...@gmail.com:

 Proposition
 ===

 If a system call fails with ``EINTR``, Python must call signal
 handlers: call ``PyErr_CheckSignals()``. If a signal handler raises
 an exception, the Python function fails with the exception.
 Otherwise, the system call is retried.  If the system call takes a
 timeout parameter, the timeout is recomputed.

Signals are tricky and easy to get wrong, to be sure, but I think it is
dangerous for Python to unconditionally commandeer signal handling. If
the proposition is accepted, there should be a way to opt out.


Marko
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] RFC: PEP 475, Retry system calls failing with EINTR

2014-08-31 Thread Marko Rauhamaa
Victor Stinner victor.stin...@gmail.com:

 Sorry but I don't understand your remark. What is your problem with
 retrying syscall on EINTR?

The application will often want the EINTR return (exception) instead of
having the function resume on its own.

 Can you please elaborate? What do you mean by get wrong?

Proper handling of signals is difficult and at times even impossible.
For example it is impossible to wake up reliably from the select(2)
system call when a signal is generated (which is why linux now has
pselect).


Marko
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] RFC: PEP 475, Retry system calls failing with EINTR

2014-08-31 Thread Marko Rauhamaa
Victor Stinner victor.stin...@gmail.com:

 But I don't get you point. How does this PEP make the situation worse?

Did I say it would? I just wanted to make sure the system call
resumption doesn't become mandatory.

Haven't thought through what the exception raising technique would
entail. It might be perfectly ok apart from being a change to the signal
handler API.

 I don't know issues of signals with select() (and without a file
 descriptor used to wake up it).

A signal handler often sets a flag, which is inspected when select()
returns. The problem is when a signal arrives between testing the flag
and calling select(). The pselect() system call allows you to block
signals and have the system call unblock them correctly to avoid the
race.


Marko
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] RFC: PEP 475, Retry system calls failing with EINTR

2014-08-31 Thread Marko Rauhamaa
Ethan Furman et...@stoneleaf.us:

 On 08/31/2014 02:19 PM, Marko Rauhamaa wrote:
 The application will often want the EINTR return (exception) instead
 of having the function resume on its own.

 Examples?

 As an ignorant person in this area, I do not know why I would ever
 want to have EINTR raised instead just getting the results of, say, my
 read() call.

Say you are writing data into a file and it takes a long time (because
there is a lot of data or the medium is very slow or there is a hardware
problem). You might have designed in a signaling scheme to address just
this possibility. Then, the system call had better come out right away
without trying to complete the full extent of the call.

If a signal is received when read() or write() has completed its task
partially ( 0 bytes), no EINTR is returned but the partial count.
Obviously, Python should take that possibility into account so that
raising an exception in the signal handler (as mandated by the PEP)
doesn't cause the partial result to be lost on os.read() or os.write().


Marko
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] RFC: PEP 475, Retry system calls failing with EINTR

2014-08-31 Thread Marko Rauhamaa
R. David Murray rdmur...@bitdance.com:

 PS: I recently switched from using selectors to using a timeout on a
 socket because in that particular application I could, and because
 reading a socket with a timeout handles EINTR (in recent python
 versions), whereas reading a non-blocking socket doesn't. Under the
 hood, a socket with a timeout is a non-blocking socket.

Under what circumstances would a nonblocking socket generate an EINTR?

I believe the biggest EINTR problem child is file I/O, which is always
blocking in linux.


Marko
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 476: Enabling certificate validation by default!

2014-08-30 Thread Marko Rauhamaa
Paul Moore p.f.mo...@gmail.com:

 Cool, in which case this sounds like a good plan. I have no particular
 opinion on whether there should be a global Python-level don't check
 certificates option, but I would suggest that the docs include a
 section explaining how a user can implement a
 --no-check-certificates flag in their program if they want to (with
 appropriate warnings as to the risks, of course!). Better to explain
 how to do it properly than to say you shouldn't do that and have
 developers implement awkward or incorrect hacks in spite of the
 advice.

Will there be a way to specify a particular CA certificate (as in wget
--ca-certificate)?

Will there be a way to specify a particular CA certificate directory (as
in wget --ca-directory)?


Marko
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Bytes path support

2014-08-23 Thread Marko Rauhamaa
Stephen J. Turnbull step...@xemacs.org:

 Just read as bytes and decode piecewise in one way or another. For
 Oleg's HTML case, there's a well-understood structure that can be used
 to determine retry points

HTML and XML are interesting examples since their encoding is initially
unknown:

  ?xml version=1.0?
  ^
  +--- Now I know it is UTF-8

  ?xml version=1.0 encoding=UTF-16?
  ^
  +--- Now I know it was UTF-16
   all along!

Then we have:


  HTTP/1.1 200 OK
  Content-Type: text/html; charset=ISO-8859-1

  !DOCTYPE HTML PUBLIC -//W3C//DTD HTML 4.01 Transitional//EN
  html
  head
  meta http-equiv=Content-Type content=text/html; charset=utf-16

See how deep you have to parse the TCP stream before you realize the
content encoding is UTF-16.


Marko
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Bytes path support

2014-08-23 Thread Marko Rauhamaa
Isaac Morland ijmor...@uwaterloo.ca:

  HTTP/1.1 200 OK
  Content-Type: text/html; charset=ISO-8859-1

  !DOCTYPE HTML PUBLIC -//W3C//DTD HTML 4.01 Transitional//EN
  html
  head
  meta http-equiv=Content-Type content=text/html; charset=utf-16

 For HTML it's not quite so bad.  According to the HTML 4 standard:
 [...]

 The Content-Type header takes precedence over a meta element. I
 thought I read once that the reason was to allow proxy servers to
 transcode documents but I don't have a cite for that. Also, the meta
 element must only be used when the character encoding is organized
 such that ASCII-valued bytes stand for ASCII characters so the
 initial UTF-16 example wouldn't be conformant in HTML.

That's not how I read it:

   The META declaration must only be used when the character encoding is
   organized such that ASCII characters stand for themselves (at least
   until the META element is parsed). META declarations should appear as
   early as possible in the HEAD element.

   URL: http://www.w3.org/TR/1998/REC-html40-19980424/charset.ht
   ml#doc-char-set

IOW, you must obey the HTTP character encoding until you have parsed a
conflicting META content-type declaration.

The author of the standard keeps a straight face and continues:

   For cases where neither the HTTP protocol nor the META element
   provides information about the character encoding of a document, HTML
   also provides the charset attribute on several elements. By combining
   these mechanisms, an author can greatly improve the chances that,
   when the user retrieves a resource, the user agent will recognize the
   character encoding.


Marko
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Bytes path support

2014-08-23 Thread Marko Rauhamaa
R. David Murray rdmur...@bitdance.com:

 The same problem existed in python2 if your goal was to produce a stream
 with a consistent encoding, but now python3 treats that as an error.

I have a different interpretation of the situation: as a rule, use byte
strings in Python3. Text strings are a special corner case for
applications that have to deal with human languages.

If your application has to talk SMTP, use bytes.

If your application has to do IPC, use bytes.

If your application has to do file I/O, use bytes.

If your application is a word processor or an IM client, you have text
strings available. You might find, though, that barely any modern GUI
application is satisfied with crude text strings. You will need weights,
styles, sizes, emoticons, positions, directions, shadows, alignment etc
etc so it may be that Python's text strings are only good enough for
storing individual characters or short snippets.

In sum, Python's text strings might have one sweet spot: Usenet clients.


Marko
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Bytes path support

2014-08-21 Thread Marko Rauhamaa
Martin v. Löwis mar...@v.loewis.de:

 I think the people defending the Unix file names are just bytes side
 often miss an important detail: displaying file names to the user, and
 allowing the user to enter file names.

The user interface is a real issue and needs to be addressed. It is
separate from the OS interface, though.

 A script that just needs to traverse a directory tree and look at
 files by certain criteria can easily do so with not worrying about a
 text interpretation of the file names.

A single system often has file names that have been encoded with
different schemes. Only today, I have had to deal with the JIS character
table (URL:
http://i.msdn.microsoft.com/cc305152.932%28en-us,MSDN.10%29.gif) -- you
will notice that it doesn't have a backslash character. A coworker uses
ISO-8859-1.

I use UTF-8. UTF-8, of course, will refuse to deal with some byte
sequences.

My point is that the poor programmer cannot ignore the possibility of
funny character sets. If Python tried to protect the programmer from
that possibility, the result might be even more intractable: how to act
on a file with an non-UTF-8 filename if you are unable to express it as
a text string?


Marko
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Bytes path support

2014-08-21 Thread Marko Rauhamaa
Nick Coghlan ncogh...@gmail.com:

 Python 3 says it's *our* problem to deal with on behalf of our
 developers.

URL: http://www.imdb.com/title/tt0120623/quotes?item=qt0353406

Flik: I was just trying to help.

Mr. Soil: Then help us; *don't* help us.


Marko
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Bytes path support

2014-08-19 Thread Marko Rauhamaa
Tres Seaver tsea...@palladion.com:

 On 08/19/2014 01:43 PM, Ben Hoyt wrote:
 Fair enough. I don't quite understand, though -- why is the official
 policy to kill something that's essential on *nix?

 ISTM that the policy is based on a fantasy that it looks like text to
 me in my use cases, so therefore it must be text for everyone.

What I like about Python is that it allows me to write native linux code
without having to make portability compromises that plague, say, Java. I
have select.epoll(). I have os.fork(). I have socket.TCP_CORK. The
textualization of Python3 seems part of a conscious effort to make
Python more Java-esque.


Marko
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Bytes path support

2014-08-19 Thread Marko Rauhamaa
Guido van Rossum gu...@python.org:

 With my serious hat on, I would like to claim that *conceptually*
 filenames are most definitely text. Due to various historical
 accidents the UNIX system calls often encoded text as arguments, and
 we sometimes need to control that encoding.

Due to historical accidents, text (in the Python sense) is not a
first-class data type in Unix. Text, machine language, XML, Python etc
are interpretations of bytes. Bytes are the first-class data type
recognized by the kernel. That reality cannot be wished away.

 Hence the occasional need for bytes arguments. But most of the time
 you don't have to think about that, and forcing users to worry about
 it is mostly as counter-productive as forcing to think about the
 encoding of every text file.

The users of Python programs can often be given higher-level facades.
Unix programmers, though, shouldn't be shielded from bytes.


Marko
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Multiline with statement line continuation

2014-08-16 Thread Marko Rauhamaa
Steven D'Aprano st...@pearwood.info:

 I simply don't care. They will try it, discover that tuples are not 
 context managers, fix their code, and move on.

*Could* tuples (and lists and sequences) be context managers?

*Should* tuples (and lists and sequences) be context managers?

 I don't think that some vague similarity between it and tuples is
 justification for rejecting the proposal.

You might be able to have it bothways. You could have:

   with (open(name) for name in os.listdir(config)) as files:
   ...


Marko
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] == on object tests identity in 3.x - list delegation to members?

2014-07-13 Thread Marko Rauhamaa
Nick Coghlan ncogh...@gmail.com:

 Right, it's not a mere optimisation - it's the only way to get
 containers to behave sensibly. Otherwise we'd end up with nonsense
 like:

 x = float(nan)
 x in [x]
 False

Why is that nonsense? I mean, why is it any more nonsense than

x == x
   False

Anyway, personally, I'm perfectly happy to live with the choices of
past generations, regardless of whether they were good or not. What you
absolutely don't want to do is correct the choices of past generations.


Marko
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] collections.sortedtree

2014-03-30 Thread Marko Rauhamaa
Guido van Rossum gu...@python.org:

 Yeah, so the pyftp fix is to keep track of how many timers were
 cancelled, and if the number exceeds a threshold it just recreates the
 heap, something like

 heap = [x for x in heap if not x.cancelled]
 heapify(heap)

I measured my target use case with a simple emulation on my linux PC.

The simple test case emulates this scenario:

Start N connections at frequency F and have each connection start a
timer T. Then, rotate over the connections at the same frequency F
restarting timer T. Stop after a duration that is much greater than
T.

Four different timer implementations were considered:

   HEAPQ: straight heapq

   HEAPQ*: heapq with the pyftp fix (reheapify whenever 80% of the
   outstanding timers have been canceled)

   SDICT: sorteddict (my C implementation)

   PyAVL: Python AVL tree (my implementation)


Here are the results:

N = 1000, F = 100 Hz, T = 10 min, duration 1 hr

=
Virt Res  max len()   urs   sys   CPU
 MB   MB   s s %
=
HEAPQ22   166000121.5   4.3   0.7
HEAPQ*   117 500018.4   4.2   0.6
SDICT116 100018.2   3.9   0.6
PyAVL116 100039.3   3.6   1.2
=


N = 1, F = 1000 Hz, T = 10 min, duration 1 hr

=
Virt Res  max len()   urs   sys   CPU
 MB   MB   s s %
=
HEAPQ   125  120   600044   223.0  25.8   6.9
HEAPQ*   21   165   186.8  30.0   6.0
SDICT15   101   196.6  25.7   6.2
PyAVL16   111   412.5  22.3  12.1
=


Conclusions:

 * The CPU load is almost identical in HEAPQ, HEAPQ* and SDICT.

 * HEAPQ* is better than HEAPQ because of the memory burden.

 * PyAVL is not all that bad compared with SDICT.


Marko
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] collections.sortedtree

2014-03-28 Thread Marko Rauhamaa
Raymond Hettinger raymond.hettin...@gmail.com:

 * An AVL balanced tree isn't the only solution or necessarily the best
 solution to the problem. Tree nodes tend to take more space than
 denser structures and they have awful cache locality (these are the
 same reasons that deques use doubly-linked blocks rather than a plain
 doubly linked lists).

Maybe. The key is the API. The implementation underneath should be
changeable. For example, Jython would probably use SortedTree to
implement it.

Performance tests should help decide when an implementation is switched
for a more efficient one. In some of my tests, I haven't seen any
significant performance differences between RB trees and AVL trees, for
example. The blist implementation, which I have taken a quick glance at,
buys cache locality at the price of block copying; I have no data to
decide if the tradeoff is a good one.

The main thing, IMO, is to get one sorted dictionary in.

 * The name of the tool probably should not be sortedtree.

Correct. That was a mistake on the Subject line. In the code, it's
sorteddict.

 * That said, it is a reasonable possibility that the standard library
 would benefit from some kind sorted collection (the idea comes up from
 time to time).

Yes. As a user, I have craved for an implementation, which is readily
available in Java and the linux kernel, for example.


Marko
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Negative timedelta strings

2014-03-28 Thread Marko Rauhamaa
Greg Ewing greg.ew...@canterbury.ac.nz:

 ISO 8601 doesn't seem to define a representation for
 negative durations, though, so it wouldn't solve the
 original problem.

XSD uses ISO 8601 durations and allows a sign before the initial P.

It would appear PT1M means 60 or 61 seconds. P1D means 23, 24 or 25
hours. P1M means 28..31 days etc. Timedelta would have no option but to
stick to seconds:

   P29389453.2345S

but then, why not simply use a number:

   29389453.2345


Marko
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] collections.sortedtree

2014-03-28 Thread Marko Rauhamaa
Marko Rauhamaa ma...@pacujo.net:

 For example, Jython would probably use SortedTree to implement it.

That word just keeps coming out of my keyboard. The Java class is of
course the TreeMap.


Marko
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] collections.sortedtree

2014-03-27 Thread Marko Rauhamaa
Thomas Wouters tho...@python.org:

 Not to mention discussion about whether it shouldn't just be an existing
 PyPI package, like http://pypi.python.org/pypi/blist, rather than a new
 implementation.

I'm fine with any implementation as long as it is in the standard
library.


Marko
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] collections.sortedtree

2014-03-26 Thread Marko Rauhamaa

I have made a full implementation of a balanced tree and would like to
know what the process is to have it considered for inclusion in Python
3.

To summarize, the implementation closely parallels dict() features and
resides in _collectionsmodule.c under the name collections.sortedtree.
It uses solely the  operator to compare keys. I have chosen the AVL
tree as an implementation technique.

The views support a number of optional arguments:

sorteddict.keys(reversed=False, start=unspecified, inclusive=True)


The primary objective of having a balanced tree in the standard library
is to support ordered access in an efficient manner. The typical
applications would include timers (networking), aging (cache) and prefix
patterns (routing).


Marko
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] collections.sortedtree

2014-03-26 Thread Marko Rauhamaa
Guido van Rossum gu...@python.org:

 Actually, the first step is publish it on PyPI, the second is to get a
 fair number of happy users there. The bar for getting something
 included into the stdlib is pretty high -- you need to demonstrate
 that there is a need *and* that having it as a 3rd party module is a
 problem.

I hear you about the process.

About the need part, I'm wondering if you haven't felt it in
implementing the timers for asyncio. I have had that need in several
network programming projects and have ended up using my AVL tree
implementation (C and Python).

Well, time will tell if frequent canceled timers end up piling up the
heap queue.


Marko
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] collections.sortedtree

2014-03-26 Thread Marko Rauhamaa
Antoine Pitrou solip...@pitrou.net:

 Wouldn't a heapq work as well for those two?

In my experience, networking entities typically start a timer at each
interaction and cancel the pending one. So you have numerous timers that
virtually never expire. You might have 100 interactions per second, each
canceling and restarting a 10-minute timer.

I don't know first hand if that causes heap queues to cause measurable
heap or CPU pressure.


Marko
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] collections.sortedtree

2014-03-26 Thread Marko Rauhamaa
Terry Reedy tjre...@udel.edu:

 Perhaps the collections doc should mention that there are other
 specializes container types available on PyPI and either list some or
 point to a wiki page listing some. There must be at least 10 that
 could be included in such a list.

There *is* a relatively high threshold of importing C extensions from an
external source. If I build an application making use of them and advise
coworkers to use it, they would likely balk at having to compile them.
Not all machines have a development toolkit.

Furthermore:

   # which pip
   /usr/bin/which: no pip in (/usr/local/sbin:/usr/local/bin:/sbin:/bin:\
   /usr/sbin:/usr/bin:/root/bin)
   # yum install pip
   No package pip available.


Marko
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] collections.sortedtree

2014-03-26 Thread Marko Rauhamaa
Antoine Pitrou solip...@pitrou.net:

 Marko Rauhamaa ma...@pacujo.net wrote:
 In my experience, networking entities typically start a timer at each
 interaction and cancel the pending one. So you have numerous timers
 that virtually never expire. You might have 100 interactions per
 second, each canceling and restarting a 10-minute timer.

 Each individual heapq operation (push or pop) will be O(log n). That's
 not different from a balanced search tree (although of course the
 constant multiplier may vary).

Yes, but if I have 1000 connections with one active timer each. The size
of my sorted tree is 1000 timer objects. There are typically no expiries
to react to.

If the canceled timer lingers in the heapq till its expiry (in 10
minutes), the size is 100 * 10 * 60 = 60,000. The CPU has to wake up
constantly to clear the expired timers.

In practice, none of that might matter.


Marko
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] collections.sortedtree

2014-03-26 Thread Marko Rauhamaa
Dan Stromberg drsali...@gmail.com:

 It'd likely make sense to have either a pure python implementation, or
 pure python and C-extended, so that Pypy and Jython can share the
 feature with CPython.

Jython can build directly on Java's native SortedMap implementation. The
API should not tie it to a tree. Optimizations and refactorings should
be allowed. Only O(log N) worst-case behavior should be mandated.

(And now I notice I named this thread wrong; I named my thingy
collections.sorteddict.)


Marko
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com