Bala Nathan created LENS-658:
--------------------------------
Summary: Distributed Lens Server Deployment
Key: LENS-658
URL: https://issues.apache.org/jira/browse/LENS-658
Project: Apache Lens
Issue Type: New Feature
Components: server
Reporter: Bala Nathan
Currently lens can be deployed and function only on a single node. This JIRA
tracks the approach to make lens work in a clustered environment. The following
key aspects of clustered deployment are discussed below:
1) Session Management
Creating and Persisting a Session:
A lens session needs to be created before any queries can be submitted to the
Lens server. A Lens session is associated with a unique session handle, current
database being used, any jar's been added which must be passed when making
queries, or doing metadata operations in the same session. The session service
is started as part of the lens server. Today the session object is persisted on
a local filesystem periodically (set via lens.server.persist.location conf) for
the purposes of recovery i.e. the Lens Server will read from the location when
it is restarted and recovery is enabled. In a multi node deployment, this
destination needs to be available to all nodes. Hence the session information
can be persisted on a datastore instead of a filesystem that can allow reads
from this location so that any node in the cluster can re-construct the session
and be able to execute queries.
Restoring a Session:
Lets consider the following scenario:
1) Lets assume there are three nodes in the lens cluster (L1, L2, L3). These
nodes are behind a load balancer (LB) and the load balance balance policy is
round robin.
2) When a User sends a create session to LB, LB will send the request to L1. L1
will create a session, persist it in the datastore and return the session
handle back to caller.
3) User sends a submit query to LB, LB will send the request to L2. L2 will
check if the session is valid before it can submit the query. However, since L2
does not have session info, it may not be able to process the query. Hence L2
must rebuild the session information on itself from the datastore and then
submit a query. Based on the load balance policy, every node in the cluster
will eventually build all session information.
2) Query Lifecycle Management
Once a query is issued to the lens server, it moves through various states:
Queued -> Launched -> Running -> Failed / Successful -> Completed
A client can check the status of a query based on a query handle. Since this
request can be routed to any server in the cluster, the query status also needs
to be persisted in a datastore.
Updating Status of a Query in the case of node failure: When the status of a
query is being requested, any node in the lens cluster can get the request. If
the node which initially issued the query is down for some reason, another node
needs to take over to update the status of the query. This can be achieved by
either maintaining a heartbeat or health check of each node in the cluster. For
Example:
1) L1 submitted the query and updated the datastore
2) L1 server went down
3) Request for query comes to L2. L2 checks if L1 is up via a healthcheck
URL. If the response is 200 OK, it routes the request to L1 , else it updates
the datastore (changes from L1 to L2) and takes over
In the case of a Lens cluster restart, there may be a high load on the hive
server if every server in a cluster attempts to check the status of all hive
query handles. Hence, this would require some sort of stickiness to be
maintained in the query handle such that upon a restart, each server may
attempt to restore only those queries which originated from itself.
3) Result Set Management
Lens supports two consumption modes for a query: In-Memory and Persistent. A
user can ask for a query to persist results in a HDFS location and can retrieve
it from a different node. However, this does not apply for in-memory result
sets (particularly for JDBC drivers). In the case of a in-memory resultsets, a
request for retrieving the resultset must go to the same node which executed
the query. Hence, a query handle stickiness is unavoidable.
Lets take an example to illustrate this:
1) Again, ets assume there are three nodes in the lens cluster (L1, L2, L3).
These nodes are behind a load balancer (LB) and the load balance balance policy
is round robin.
2) User created a session and submitted a query with in-memory resultset to LB.
LB routed the information to L1 in the cluster. L1 submitted the query,
persisted the state in a datastore and returned the query handle to user.
3) User sent a request to LB asking for status of query. LB routes it to L2. L2
checks the datastore and says the query is completed.
4) User asks for resultset of query to LB. LB now sends it to L3. However, L3
does not have the resultset. Hence it cannot serve the request.
However if we persist the node information along with the query handle in the
datastore (#2), when the user asks for a resultset, L3 can now redirect the
request to L2 and ask it to take over.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)