i've seen discussion about improving the IO stack of python, to rely less on
the low-level lib-c implementation. so, i wanted to share my ideas in
that niche.

i feel today's sockets are way *outdated* and overloaded. python's sockets are
basically a wrapper for the low-level BSD sockets... but IMHO it would be much
nicer to alleviate this dependency: expose a more high-level interface to socket
programming. the *BSD-socket methodology* does not sit well with pythonic
paradigms.

let's start with {set/get}sockopt... that's one of the ugliest things
in python, i
believe most would agree. it's basically C programming in python. so, indeed,
it's a way to overcome differences between platforms and protocols, but i
believe it's not the way python should handle it.

my suggestion is nothing "revolutionary". it's basically taking the existing
socket module and extending it for most common use cases.

there are two types of sockets, streaming and datagram. the underlying
protocols don't matter. and these two types of sockets have different semantics
to them: send/recv vs. sendto/recvfrom. so why not introduce a StreamSocket
and DgramSocket types? and of course RawSocket should be introduced
to completement them.

you can argue that recvfrom and sendto can be used on streaming sockets
as well, but did anyone ever use it? i never saw such code, and i can't think
why you would want to use it.

next, all the socket options would become properties or methods (i prefer
properties). each protocol would subclass {Stream/Dgram/Raw}Socket
and add its protocol-specific options.

here's an example for a hierarchy:
Socket
    RawSocket
    DgramSocket
        UDPSocket
    StreamSocket
        TCPSocket
            SSLSocket

the above tree is only partial of course. but it needn't be complete,
either. less
used protocols, like X25 or ICMP could be constructed directly with the Socket
class, in the old fashion of passing parameters. after all, the suggested class
hierarchy only wraps the existing socket constructor and adds a more python
API to its options.

here's an example:
s = TCPSocket(AF_INET6)
s.reuse_address = True # this option is inherited from Socket
s.no_delay = True # this is a TCP-level option
s.bind(("", 12345))
s.listen(1)
s2 = s.accept()
s2.send("hello")

or
s = UDPSocket()
s.allow_broadcast = True
s.sendto("hello everybody", ("255.255.255.255", 12345))

perhaps we should consider adding an "options" namespace, in order to
keep the root level of the instance simpler. for example:
s.options.reuse_address = True

it clarifies that reuse_address is an option. is it necessary? donno.

and since we can override bind(), perhaps we should override it to provide
a more specific interface, i.e.
def bind(self, addr, port):
    super(self, ...).bind((addr, port))

because we *know* it's a tcp socket, so we don't need to *retain support* for
all addressing forms: it's an IP address and a port.

---

i would also want to replace the current BSD semantics for *client sockets*,
of first creating a socket and then connecting it, i.e.,
s = socket()
s.connect(("localhost", 80))

i would prefer
s = ConnectedSocket(("localhost", 80))

because a *connecting the socket* is part of *initiallizing* it, hence
it should
be part of the class' constructor, and not a separate phase of the socket's
life.

perhaps the syntax should be
s = TCPSocket.connect(("localhost", 80))
# or s = TCPSocket.connect("localhost", 80)
# if we override connect()

where <socketclass>.connect would be a classmethod, which returns a
new instance of the class, connected to the server. of course DgramSockets
don't need such a mechanism.

i would like to suggest the same about connection-oriented server sockets,
but the case with those is a little more complicated, and possibly
asynchronous (select()ing before accept()ing), so i would retain the existing
semantics.

---

another thing i find quite silly is the way sockets behave on shutdown and
in non-blocking mode.

when the connection breaks, i would expect recv() to raise EOFError, or
some sort of socket.error, instead of returning "". moreover, when i'm using
a non-blocking recv(), and there's no data to return, i would expect "", not a
socket.timeout exception.

to sum it up:
* no data = ""
* connection breaks = EOFError

the situation, however, is *exactly the opposite*. which is quite not intuitive
or logical, and i remember having to write this code:

def recv(s):
    try:
        data = s.recv(1000)
        if not data: # socket closed
             raise EOFError
    except socket.timeout:
        data = "" # timeout
    return data

to accumulate data from non-blocking sockets, in a friendly way.

so yeah, the libsocket version of recv returns 0 on EOF and -1 with some
errno when there's no data, but the pythonic version shouldn't just *copy*
this behavior -- it should *translate* it to pythonic standards.

you have to remember that libsocket and the rest where written in the 80's,
and are very platform-dependent. plus, C doesn't allow multiple return values
or exceptions, so they had to do it this way.

the question that should guide you is, "if you where to write pythonic sockets,
how would they look?" rather than "how do i write a 1:1 wrapper for libsocket?"

---

by the way, a little cleanup:

* why does accept return a tuple? instead of
newsock, sockname = sock.accept()

why not do
newsock = sock.accept()
sockname = newsock.getsockname()

i'm always having strange bugs because i forget accept gives me a tuple rather
than just a socket... and you don't generally need the sockname, especially
since you can get it later with getsockname.

* the host-to-network functions, are they needed? can't you just use struct.pack
and unpack? why not throw them away?

what do you say?


-tomer
_______________________________________________
Python-3000 mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com

Reply via email to