TinkerPop uses WebSocket as the primary transport.  WebSocket was designed
to ride over and/or be compatible with HTTP.  Thus, either by design or by
accident, TinkerPop inherits all the features of HTTP.  This would include
the ability to support HTTP proxies (load balancers), name-based virtual
hosting, and wildcard SSL certs to name a few.  These things are achieved
through HTTP's Host header. The client sets a Host header to indicate the
hostname of the service it is attempting to communicate with.  So, when
connecting to dmitry.gremlin.cosmosdb.azure.com, the client sets the Host
to dmitry.gremlin.cosmosdb.azure.com.  The proxy can read the host header
and redirect it to an internal Gremlin Server instance hosting dmitry's
graph database.  Clearly some services have taken advantage of this ability.

TINKERPOP-2289 [1] broke this behavior by resolving all hostnames to IP
address.  When the client connects to the service, the Host header is set
to an IP address instead of a name.  The problem is that dmitry, oliver,
stephen, and 50 million other services all resolve to the same IP address
[2].  The HTTP proxy has no idea how to route the request (or a poorly
configured proxy routes them all to a single, default instance).

I think the first action to take is to revert the breaking change in order
to restore previous functionality (either by design or by accident).  I've
created a PR that should restore the previous behavior [3].

The next action is to discuss what to support.  I don't think TinkerPop
should support DNS-based load balancing in the way implemented in
TINKERPOP-2289.  I'm not sure that it even needs to support it in general.
TinkerPop already supports a list of hosts for this purpose.  Should verify
that IPs work as well.


1. https://issues.apache.org/jira/browse/TINKERPOP-2289
2. https://groups.google.com/d/msg/gremlin-users/A9rr9jLh5AY/DLguF9QmAQAJ
3. https://github.com/apache/tinkerpop/pull/1213


Robert Dale

Reply via email to