[ 
https://issues.apache.org/jira/browse/THRIFT-4805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16775739#comment-16775739
 ] 

Mithun Radhakrishnan commented on THRIFT-4805:
----------------------------------------------

Hey, [~vihangk1]. I'm grateful for your continued interest and scrutiny of this 
patch.
{quote}I am a bit confused when you say ".. correct the behavior of 
THRIFT-3769".
{quote}
I apologize for my lack of clarity. I'd be happy to consider the title of this 
Jira, based on our discussion here.
{quote}Can you give a example stack trace which was not being printed before 
and gets logged after THRIFT-3769?
{quote}
 The stack-trace included in the description of this Jira is the primary 
example. When we switched from Thrift 0.9.3 to Thrift 0.12.0, we saw the 
exception listed above filling up our server logs. Looking at the snippet of 
code in the description, we see that THRIFT-3769 is not entirely to blame for 
this. For some reason, with Thrift 0.12.0, the health-checks cause 
{{TTransportExceptions}} that aren't wrapped by RTEs, so the current code 
didn't detect and handle them. They're caught and logged by the {{catch 
(TException)}} block.


bq. My understanding is that with this patch we are suppressing types of 
exceptions which is a super-set of the ones suppressed by THRIFT-3769. Is that 
not correct?
That would be a legitimate concern, but unless I'm mistaken, that's not what my 
proposed fix in [the pull 
request|https://github.com/apache/thrift/pull/1746/files] does. That is not my 
intent, either.

Let's briefly take a step back:
# THRIFT-2268 intended to suppress only those {{TExceptions}} that arise from 
{{TTransportException.END_OF_FILE}} in  
[TSaslTransport.java|https://github.com/apache/thrift/blob/master/lib/java/src/org/apache/thrift/transport/TSaslTransport.java#L317].
 That's why {{TSaslTransportException}} was introduced: to represent the 
{{END_OF_FILE}} case.
# THRIFT-3769 removed the special handling for {{TSaslTransportException}}, and 
attempted to handle it transparently as a {{TTransportException}}, as follows:
{code:java|title=TThreadpoolServer.java}
      } catch (TException tx) {
        LOGGER.error("Thrift error occurred during processing of message.", tx);
      } catch (Exception x) {
        // We'll usually receive RuntimeException types here
        // Need to unwrap to ascertain real causing exception before we choose 
to ignore
        Throwable realCause = x.getCause();
        // Ignore err-logging all transport-level/type exceptions
        if ((realCause != null && realCause instanceof TTransportException)
            || (x instanceof TTransportException)) {...}
{code}
>From the {{x instanceof TTransportException}}, it appears that the author 
>*intended* here to suppress *all* {{TTransportExceptions}} (wrapped inside 
>other exceptions, or not), but erred. Notice how {{|| (x instanceof 
>TTransportException)}} will never be satisfied, because of the preceding 
>{{catch (TException)}} block, so it's a superfluous check. Given the error, I 
>can't be completely sure of the intention. :|

Regardless, I propose narrowing the suppression to {{TSaslTransportExceptions}} 
(i.e. the {{END_OF_FILE}} case). Testing it in our load-balancer environments, 
they appear *not* to be wrapped inside RTEs in Thrift 0.12, so my fix checks 
both the caught exception *and* the {{getCause()}}.

Please ignore the patches attached to this Jira. (They're already stale.) The 
proposed fix is here:
https://github.com/apache/thrift/pull/1746/files

Your review and recommendations will be greatly appreciated.


> Fix logic of THRIFT-3769
> ------------------------
>
>                 Key: THRIFT-4805
>                 URL: https://issues.apache.org/jira/browse/THRIFT-4805
>             Project: Thrift
>          Issue Type: Bug
>          Components: Java - Library
>    Affects Versions: 0.12.0
>            Reporter: Mithun Radhakrishnan
>            Assignee: Mithun Radhakrishnan
>            Priority: Major
>         Attachments: THRIFT-4805.1.patch, THRIFT-4805.bonus.patch
>
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> This has to do with the fix checked in for THRIFT-3769 and THRIFT-2268, which 
> was to mute the noise from {{TSaslTransportExceptions}} raised from 
> load-balancer health-checks for Thrift services, such as the Hive metastore. 
> Please consider [the code in 
> TThreadPoolServer::run()|https://github.com/apache/thrift/blob/master/lib/java/src/org/apache/thrift/server/TThreadPoolServer.java#L322]:
> {code:java|title=TThreadPoolServer.java}
>       } catch (TException tx) {
>         LOGGER.error("Thrift error occurred during processing of message.", 
> tx);
>       } catch (Exception x) {
>         // We'll usually receive RuntimeException types here
>         // Need to unwrap to ascertain real causing exception before we 
> choose to ignore
>         Throwable realCause = x.getCause();
>         // Ignore err-logging all transport-level/type exceptions
>         if ((realCause != null && realCause instanceof TTransportException)
>             || (x instanceof TTransportException)) {
>           LOGGER.debug(
>               "Received TTransportException during processing of message. 
> Ignoring.",
>               x);
>         } else {
>           // Log the exception at error level and continue
>           LOGGER.error("Error occurred during processing of message.", x);
>         }
>       } finally {...}
> {code}
> The logic here is solid for {{RuntimeExceptions}} that wrap 
> {{TTansportExceptions}}. But it slips up on handling the condition being 
> checked for at 
> [line#323|https://github.com/apache/thrift/blob/master/lib/java/src/org/apache/thrift/server/TThreadPoolServer.java#L323],
>  i.e.:
> {code:java|title=TThreadPoolServer.java}
>             || (x instanceof TTransportException)) {
> {code}
> The {{catch (Exception x)}} comes *after* the {{catch (TException tx)}}, so 
> it's a guarantee that {{!(x instanceof TTransportException)}}. When a 
> {{TTransportException}} (or {{TSaslTransportException}}) is thrown, it will 
> be caught in the first catch block, and logged. This rather defeats the 
> purpose of the fix. The error manifests with the following stack-trace 
> filling up my logs:
> {noformat}
> org.apache.thrift.transport.TTransportException
>         at 
> org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)
>         at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
>         at 
> org.apache.thrift.transport.TSaslTransport.readLength(TSaslTransport.java:374)
>         at 
> org.apache.thrift.transport.TSaslTransport.readFrame(TSaslTransport.java:451)
>         at 
> org.apache.thrift.transport.TSaslTransport.read(TSaslTransport.java:433)
>         at 
> org.apache.thrift.transport.TSaslServerTransport.read(TSaslServerTransport.java:43)
>         at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
>         at 
> org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:425)
>         at 
> org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:321)
>         at 
> org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:225)
>         at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:27)
>         at 
> org.apache.hive.service.cli.thrift.ThriftCLIMetricsProcessor.process(ThriftCLIMetricsProcessor.java:63)
>         at 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$2.run(HadoopThriftAuthBridge.java:777)
>         at 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$2.run(HadoopThriftAuthBridge.java:773)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:422)
>         at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1953)
>         at 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:773)
>         at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:310)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>         at java.lang.Thread.run(Thread.java:748)
> {noformat}
> The fix is simple. I'll upload it shortly, for review.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to