[ 
https://issues.apache.org/jira/browse/LENS-658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14624790#comment-14624790
 ] 

Himanshu Gahlaut commented on LENS-658:
---------------------------------------

Could the session information be maintained on client and passed in each 
request ? This will mean that Lens APIs will become REST APIs and implement the 
ST (State Transfer) in REST. Session state will be transferred around instead 
of server storing it. This will also distribute the cost of storing session 
state across different clients connected to server which is easier to scale as 
it is directly proportional to number of clients connected to server. It will 
also make the implementation simpler.

https://en.wikipedia.org/wiki/Representational_state_transfer#Stateless


> Distributed Lens Server Deployment
> ----------------------------------
>
>                 Key: LENS-658
>                 URL: https://issues.apache.org/jira/browse/LENS-658
>             Project: Apache Lens
>          Issue Type: New Feature
>          Components: server
>            Reporter: Bala Nathan
>
> Currently lens can be deployed and function only on a single node. This JIRA 
> tracks the approach to make lens work in a clustered environment. The 
> following key aspects of clustered deployment are discussed below:
> 1) Session Management
> Creating and Persisting a Session:
> A lens session needs to be created before any queries can be submitted to the 
> Lens server. A Lens session is associated with a unique session handle, 
> current database being used, any jar's been added which must be passed when 
> making queries, or doing metadata operations in the same session. The session 
> service is started as part of the lens server. Today the session object is 
> persisted on a local filesystem periodically (set via 
> lens.server.persist.location conf) for the purposes of recovery i.e. the Lens 
> Server will read from the location when it is restarted and recovery is 
> enabled. In a multi node deployment, this destination needs to be available 
> to all nodes. Hence the session information can be persisted on a datastore 
> instead of a filesystem that can allow reads from this location so that any 
> node in the cluster can re-construct the session and be able to execute 
> queries. 
> Restoring a Session:
> Lets consider the following scenario:
> 1) Lets assume there are three nodes in the lens cluster (L1, L2, L3). These 
> nodes are behind a load balancer (LB) and the load balance balance policy is 
> round robin.
> 2) When a User sends a create session to LB, LB will send the request to L1. 
> L1 will create a session, persist it in the datastore and return the session 
> handle back to caller.
> 3) User sends a submit query to LB, LB will send the request to L2. L2 will 
> check if the session is valid before it can submit the query. However, since 
> L2 does not have session info, it may not be able to process the query. Hence 
> L2 must rebuild the session information on itself from the datastore and then 
> submit a query. Based on the load balance policy, every node in the cluster 
> will eventually build all session information.
> 2) Query Lifecycle Management
> Once a query is issued to the lens server, it moves through various states:
>                                                                       
> Queued -> Launched -> Running -> Failed / Successful -> Completed 
> A client can check the status of a query based on a query handle. Since this 
> request can be routed to any server in the cluster, the query status also 
> needs to be persisted in a datastore. 
> Updating Status of a Query in the case of node failure: When the status of a 
> query is being requested, any node in the lens cluster can get the request. 
> If the node which initially issued the query is down for some reason, another 
> node needs to take over to update the status of the query. This can be 
> achieved by either maintaining a heartbeat or health check of each node in 
> the cluster. For Example:
> 1) L1 submitted the query and updated the datastore
> 2) L1 server went down
> 3) Request for query comes to L2. L2 checks if L1 is up via a healthcheck 
> URL. If the response is 200 OK, it routes the request to L1 , else it updates 
> the datastore (changes from L1 to L2) and takes over
> In the case of a Lens cluster restart, there may be a high load on the hive 
> server if every server in a cluster attempts to check the status of all hive 
> query handles. Hence, this would require some sort of stickiness to be 
> maintained in the query handle such that upon a restart, each server may 
> attempt to restore only those queries which originated from itself. 
> 3) Result Set Management 
> Lens supports two consumption modes for a query: In-Memory and Persistent. A 
> user can ask for a query to persist results in a HDFS location and can 
> retrieve it from a different node. However, this does not apply for in-memory 
> result sets (particularly for JDBC drivers). In the case of a in-memory 
> resultsets, a request for retrieving the resultset must go to the same node 
> which executed the query. Hence, a query handle stickiness is unavoidable. 
> Lets take an example to illustrate this:
> 1) Again, ets assume there are three nodes in the lens cluster (L1, L2, L3). 
> These nodes are behind a load balancer (LB) and the load balance balance 
> policy is round robin.
> 2) User created a session and submitted a query with in-memory resultset to 
> LB. LB routed the information to L1 in the cluster. L1 submitted the query, 
> persisted the state in a datastore and returned the query handle to user.
> 3) User sent a request to LB asking for status of query. LB routes it to L2. 
> L2 checks the datastore and says the query is completed.
> 4) User asks for resultset of query to LB. LB now sends it to L3. However, L3 
> does not have the resultset. Hence it cannot serve the request. 
> However if we persist the node information along with the query handle in the 
> datastore (#2), when the user asks for a resultset, L3 can now redirect the 
> request to L2 and ask it to take over. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to