[ 
https://issues.apache.org/jira/browse/HIVE-25237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-25237:
--------------------------------
    Summary: Thrift CLI Service Protocol: Enhance HTTP variant to be more 
resilient  (was: Thrift CLI Service Protocol: Enhance HTTP variant)

> Thrift CLI Service Protocol: Enhance HTTP variant to be more resilient
> ----------------------------------------------------------------------
>
>                 Key: HIVE-25237
>                 URL: https://issues.apache.org/jira/browse/HIVE-25237
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Matt McCline
>            Assignee: Matt McCline
>            Priority: Major
>
> I have been thinking about the (Thrift) CLI Service protocol between the 
> client and server.
> Cloudera's Prashanth Jayachandran (private e-mail) told me that its original 
> BINARY (TCP/IP) transport is designed +_differently_+ than the newer HTTP 
> transport. HTTP is used when we go through a Gateway. The design for HTTP is 
> stateless and different in nature than the direct BINARY TCP/IP connection. 
> Which means today when we see that a Hive Server 2 response to a HTTP query 
> request can be lost and that is part of the design... It is the WARNING we 
> have seen when the Gateway drops its HTTP connection to Hive Server 2. We had 
> been thinking this was a bug but it is by design.
> I think the HTTP design needs a rethink.
> When I worked for Tandem computers a long time ago messages were 
> fault-tolerant. They used a message sequence #. When you send a message to a 
> Tandem server it is a process pair. The message gets routed to the current 
> process called the primary. The primary computes the message work and tells 
> the backup process to remember the results before replying in case there is a 
> failure. You can see where this goes -- if there is a failure before the 
> client gets the result it retries and the backup process can resiliently give 
> back the result the primary sent it. This isn't unique to Tandem -- without a 
> process-pair -- this is a general resilient protocol.
> In the HTTP design says message lost is possible both directions (request and 
> response). I think we adopt a better scheme but not necessarily a process 
> pair.
> The first principle of rethink is the +_client_+ needs to generate a new 
> operation num (an integer) that replaces the server-side generated random 
> GUID. And the client generates a new msg num within its new operation. So 
> beeline might say ExecuteStatement operationNum = 57 NEW, operationMsgNum = 
> 1. If the client gets an OS connection kind of error, it retries with those 
> (57, 1) numbers. Hive Server 2 will remember the last response. When Hive 
> Server 2 gets a message, there are 3 cases:
> 1) The sessionId GUID is not valid -- for now we reject the request because 
> it is likely Hive Server 2 killed the session perhaps because it was 
> restarted.
> 2) The operationNum or operationMsgNum is new. (Assert the msg num increases 
> monotonically.) Perform the request and save the response. And respond.
> 3) The (operationNum, operationMsgNum) matches the last request. Resiliently 
> respond with the saved result.
> I think this message handling is in alignment with the HTTP stateless and any 
> messages in-between can be lost philosophy. And it will shield the client 
> from suffering a whole category of message failures that unnecessarily kill 
> queries.
> This also allows to not worry about which request is idempotent or not but 
> instead requests are resilient.
> ---------------------
> Link to earlier HTTP change: [HIVE-24786: JDBC HttpClient should retry for 
> idempotent and unsent http methods by prasanthj · Pull Request #1983 · 
> apache/hive (github.com)|https://github.com/apache/hive/pull/1983/files]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to