Hi all,

I've been putting more thought into this problem, particularly in
contrast to the "client manages N transport [connections]." I believe
that the latter is not very workable given the variance in underlying
transports. Not enough information is available to the client to
manage the connections properly without putting transport-specific
information into the client (eg. the difference between EPIPE and
ECONNREFUSED). I think it would be wrong to put connection-related
code into the client.

Here are the three layers that I see, and the direction my branch takes:

1. client: responsible for a high-level API for applications. It maps
this API into the underlying transport primitives (as defined by
riak.transports.transport.RiakTransport).

2. transport: maps the primitives into the appropriately formatted
wire request(s), and handles the response(s).

3. connection manager (CM): handles multiple connections to multiple
hosts for use by the transport.


The CM creates connection objects that understand the protocol (e.g
HTTPConnecction), which is also an object that the transport
understands how to use. The connection objects can signal errors to
the CM for removal when (say) the server goes down or is otherwise
unavailable.

The client does need to be aware of host/port pairs, and pass those
into the transport for provision to the CM. (or possibly the client
creates the appropriate CM, and passes that to the transport). The
different types of CMs (connection type, connection policies/params
etc) imply that the client may be the one to create this, with the
right params. Otherwise, the transport would get a list of host/port
pairs and CM options, and the transport would create the CM with
connection objects appropriate to the transport.

To follow up with my original concern, and to show a concrete example:

In connection.py, we would create a subclass of HTTPConnection that
overrides the .connect() method. If the superclass raises
ECONNRefused, then the subclass would remove the host from the CM.

The subclass does not have to manage EPIPE since it knows that
HTTPConnection can already manage that itself (except for certain
types of variant-sized requests, such as needs to be done for Luwak
requests).

There is a Socket class in connection.py for managing bare sockets for
the protobuf connections. That needs to create a .send() method that
manages EPIPE in some way. It would also have logic for ECONNREFUSED
similar to our HTTPConnection subclass: remove the host from the
available set.

Long-running client applications need to monitor the state of the
ring, and propagate join/leave changes into the available host/port
pairs in the CM. If one server returns ECONNREFUSED and is removed
from the available set, but it is determined that is *transient*, then
the client would need to recognize that and put it back into the set.
I do not have an answer for how the system can know the problem was
transient, or how it recognizes the server is back. Possibly, the host
moves to an "offline" list, and the CM periodically pings it to see if
it is alive (again). If the client removes it (due to a detected ring
change), then it removes it from the offline list. Possibly after time
period T, it is removed from the offline list. I believe these are all
workable details.

Cheers,
-g

_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Reply via email to