Bala Nathan created LENS-658:
--------------------------------

             Summary: Distributed Lens Server Deployment
                 Key: LENS-658
                 URL: https://issues.apache.org/jira/browse/LENS-658
             Project: Apache Lens
          Issue Type: New Feature
          Components: server
            Reporter: Bala Nathan


Currently lens can be deployed and function only on a single node. This JIRA 
tracks the approach to make lens work in a clustered environment. The following 
key aspects of clustered deployment are discussed below:

1) Session Management

Creating and Persisting a Session:

A lens session needs to be created before any queries can be submitted to the 
Lens server. A Lens session is associated with a unique session handle, current 
database being used, any jar's been added which must be passed when making 
queries, or doing metadata operations in the same session. The session service 
is started as part of the lens server. Today the session object is persisted on 
a local filesystem periodically (set via lens.server.persist.location conf) for 
the purposes of recovery i.e. the Lens Server will read from the location when 
it is restarted and recovery is enabled. In a multi node deployment, this 
destination needs to be available to all nodes. Hence the session information 
can be persisted on a datastore instead of a filesystem that can allow reads 
from this location so that any node in the cluster can re-construct the session 
and be able to execute queries. 

Restoring a Session:

Lets consider the following scenario:

1) Lets assume there are three nodes in the lens cluster (L1, L2, L3). These 
nodes are behind a load balancer (LB) and the load balance balance policy is 
round robin.

2) When a User sends a create session to LB, LB will send the request to L1. L1 
will create a session, persist it in the datastore and return the session 
handle back to caller.

3) User sends a submit query to LB, LB will send the request to L2. L2 will 
check if the session is valid before it can submit the query. However, since L2 
does not have session info, it may not be able to process the query. Hence L2 
must rebuild the session information on itself from the datastore and then 
submit a query. Based on the load balance policy, every node in the cluster 
will eventually build all session information.


2) Query Lifecycle Management

Once a query is issued to the lens server, it moves through various states:
                                                                        
Queued -> Launched -> Running -> Failed / Successful -> Completed 

A client can check the status of a query based on a query handle. Since this 
request can be routed to any server in the cluster, the query status also needs 
to be persisted in a datastore. 

Updating Status of a Query in the case of node failure: When the status of a 
query is being requested, any node in the lens cluster can get the request. If 
the node which initially issued the query is down for some reason, another node 
needs to take over to update the status of the query. This can be achieved by 
either maintaining a heartbeat or health check of each node in the cluster. For 
Example:

1) L1 submitted the query and updated the datastore

2) L1 server went down

3) Request for query comes to L2. L2 checks if L1 is up via a healthcheck 
URL. If the response is 200 OK, it routes the request to L1 , else it updates 
the datastore (changes from L1 to L2) and takes over

In the case of a Lens cluster restart, there may be a high load on the hive 
server if every server in a cluster attempts to check the status of all hive 
query handles. Hence, this would require some sort of stickiness to be 
maintained in the query handle such that upon a restart, each server may 
attempt to restore only those queries which originated from itself. 

3) Result Set Management 

Lens supports two consumption modes for a query: In-Memory and Persistent. A 
user can ask for a query to persist results in a HDFS location and can retrieve 
it from a different node. However, this does not apply for in-memory result 
sets (particularly for JDBC drivers). In the case of a in-memory resultsets, a 
request for retrieving the resultset must go to the same node which executed 
the query. Hence, a query handle stickiness is unavoidable. 

Lets take an example to illustrate this:

1) Again, ets assume there are three nodes in the lens cluster (L1, L2, L3). 
These nodes are behind a load balancer (LB) and the load balance balance policy 
is round robin.

2) User created a session and submitted a query with in-memory resultset to LB. 
LB routed the information to L1 in the cluster. L1 submitted the query, 
persisted the state in a datastore and returned the query handle to user.

3) User sent a request to LB asking for status of query. LB routes it to L2. L2 
checks the datastore and says the query is completed.

4) User asks for resultset of query to LB. LB now sends it to L3. However, L3 
does not have the resultset. Hence it cannot serve the request. 

However if we persist the node information along with the query handle in the 
datastore (#2), when the user asks for a resultset, L3 can now redirect the 
request to L2 and ask it to take over. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to