Hi all, I've been putting more thought into this problem, particularly in contrast to the "client manages N transport [connections]." I believe that the latter is not very workable given the variance in underlying transports. Not enough information is available to the client to manage the connections properly without putting transport-specific information into the client (eg. the difference between EPIPE and ECONNREFUSED). I think it would be wrong to put connection-related code into the client.
Here are the three layers that I see, and the direction my branch takes: 1. client: responsible for a high-level API for applications. It maps this API into the underlying transport primitives (as defined by riak.transports.transport.RiakTransport). 2. transport: maps the primitives into the appropriately formatted wire request(s), and handles the response(s). 3. connection manager (CM): handles multiple connections to multiple hosts for use by the transport. The CM creates connection objects that understand the protocol (e.g HTTPConnecction), which is also an object that the transport understands how to use. The connection objects can signal errors to the CM for removal when (say) the server goes down or is otherwise unavailable. The client does need to be aware of host/port pairs, and pass those into the transport for provision to the CM. (or possibly the client creates the appropriate CM, and passes that to the transport). The different types of CMs (connection type, connection policies/params etc) imply that the client may be the one to create this, with the right params. Otherwise, the transport would get a list of host/port pairs and CM options, and the transport would create the CM with connection objects appropriate to the transport. To follow up with my original concern, and to show a concrete example: In connection.py, we would create a subclass of HTTPConnection that overrides the .connect() method. If the superclass raises ECONNRefused, then the subclass would remove the host from the CM. The subclass does not have to manage EPIPE since it knows that HTTPConnection can already manage that itself (except for certain types of variant-sized requests, such as needs to be done for Luwak requests). There is a Socket class in connection.py for managing bare sockets for the protobuf connections. That needs to create a .send() method that manages EPIPE in some way. It would also have logic for ECONNREFUSED similar to our HTTPConnection subclass: remove the host from the available set. Long-running client applications need to monitor the state of the ring, and propagate join/leave changes into the available host/port pairs in the CM. If one server returns ECONNREFUSED and is removed from the available set, but it is determined that is *transient*, then the client would need to recognize that and put it back into the set. I do not have an answer for how the system can know the problem was transient, or how it recognizes the server is back. Possibly, the host moves to an "offline" list, and the CM periodically pings it to see if it is alive (again). If the client removes it (due to a detected ring change), then it removes it from the offline list. Possibly after time period T, it is removed from the offline list. I believe these are all workable details. Cheers, -g _______________________________________________ riak-users mailing list [email protected] http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
