[jira] [Work logged] (HIVE-26633) Make thrift max message size configurable

ASF GitHub Bot (Jira) Mon, 17 Oct 2022 08:56:03 -0700


     [ 
https://issues.apache.org/jira/browse/HIVE-26633?focusedWorklogId=817721&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-817721
 ]


ASF GitHub Bot logged work on HIVE-26633:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 17/Oct/22 15:54
            Start Date: 17/Oct/22 15:54
    Worklog Time Spent: 10m 
      Work Description: jfsii commented on code in PR #3674:
URL: https://github.com/apache/hive/pull/3674#discussion_r997240526


##########
common/src/java/org/apache/hadoop/hive/conf/HiveConf.java:
##########
@@ -2913,7 +2913,10 @@ public static enum ConfVars {
     HIVE_STATS_MAX_NUM_STATS("hive.stats.max.num.stats", (long) 10000,
         "When the number of stats to be updated is huge, this value is used to 
control the number of \n" +
         " stats to be sent to HMS for update."),
-
+    HIVE_THRIFT_MAX_MESSAGE_SIZE("hive.thrift.max.message.size", "1gb",

Review Comment:
   For point 1 - I looked into the hive.server2.thrift.max.message.size - it 
has nothing to do with http. I wouldn't trust the grouping comments in 
HiveConf. It has been edited by so many developers with different styles it 
tends to be a bit messy imo. The "hive.server2.thrift.max.message.size" setting 
ends up getting used here:
   
https://github.com/apache/hive/blob/d17bd5bf12e62563cc35016dbb4445c89333d9a3/service/src/java/org/apache/hive/service/cli/thrift/ThriftBinaryCLIService.java#L102
   which then gets passed to this constructor in thrift:
   
https://github.com/apache/thrift/blob/2a93df80f27739ccabb5b885cb12a8dc7595ecdf/lib/java/src/org/apache/thrift/protocol/TBinaryProtocol.java#L103
   
   So it ends up being used to define these two limits for the binary protocol:
   ```/**
      * The maximum number of bytes to read from the transport for
      * variable-length fields (such as strings or binary) or {@link 
#NO_LENGTH_LIMIT} for
      * unlimited.
      */
     private final long stringLengthLimit_;
   
     /**
      * The maximum number of elements to read from the network for
      * containers (maps, sets, lists), or {@link #NO_LENGTH_LIMIT} for 
unlimited.
      */
     private final long containerLengthLimit_;
   ```
   
   I would argue the name "hive.server2.thrift.max.message.size" is itself 
misnamed, however we are stuck with it since it has been in the wild for so 
long. I am not sure it would be a good idea to use the same setting for 3 
different thrift limits.
   
   I am open to an alternative name for my setting though 
"hive.thrift.max.message.size" because it is confusing with the other named 
setting. If you think my above concerns about re-using the same setting for 
stringLengthLimit, containerLengthLimit and overall thrift client bytes 
received limit are overblown, I could be swayed.
   
   For point 2 - I'd argue my way is more consistent. I am using a 
SizeValidator() which from my quick check most of the settings that use a size 
validator, the default is a size with a unit. I'd also argue this is more 
user-friendly. The usages of SizeValidator() that do not specify a unit look to 
be using SizeValidator() incorrectly. For example - "hive.mv.files.thread" is 
not a size. I could not use SizeValidator and just accept an integer but I 
think that is removing a user friend way to set settings for something that is 
more engineer oriented.
   
   Both of the above points I can be swayed, I'm just sharing my thought 
process.
   





Issue Time Tracking
-------------------

    Worklog Id:     (was: 817721)
    Time Spent: 40m  (was: 0.5h)

> Make thrift max message size configurable
> -----------------------------------------
>
>                 Key: HIVE-26633
>                 URL: https://issues.apache.org/jira/browse/HIVE-26633
>             Project: Hive
>          Issue Type: Bug
>          Components: HiveServer2
>    Affects Versions: 4.0.0-alpha-2
>            Reporter: John Sherman
>            Assignee: John Sherman
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 40m
>  Remaining Estimate: 0h
>
> Since thrift >= 0.14, thrift now enforces max message sizes through a 
> TConfiguration object as described here:
> [https://github.com/apache/thrift/blob/master/doc/specs/thrift-tconfiguration.md]
> By default MaxMessageSize gets set to 100MB.
> As a result it is possible for HMS clients not to be able to retrieve certain 
> metadata for tables with a large amount of partitions or other metadata.
> For example on a cluster configured with kerberos between hs2 and hms, 
> querying a large table (10k partitions, 200 columns with names of 200 
> characters) results in this backtrace:
> {code:java}
> org.apache.thrift.transport.TTransportException: MaxMessageSize reached
> at 
> org.apache.thrift.transport.TEndpointTransport.countConsumedMessageBytes(TEndpointTransport.java:96)
>  
> at 
> org.apache.thrift.transport.TMemoryInputTransport.read(TMemoryInputTransport.java:97)
>  
> at org.apache.thrift.transport.TSaslTransport.read(TSaslTransport.java:390) 
> at 
> org.apache.thrift.transport.TSaslClientTransport.read(TSaslClientTransport.java:39)
>  
> at org.apache.thrift.transport.TTransport.readAll(TTransport.java:109) 
> at 
> org.apache.hadoop.hive.metastore.security.TFilterTransport.readAll(TFilterTransport.java:63)
>  
> at 
> org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:464) 
> at 
> org.apache.thrift.protocol.TBinaryProtocol.readByte(TBinaryProtocol.java:329) 
> at 
> org.apache.thrift.protocol.TBinaryProtocol.readFieldBegin(TBinaryProtocol.java:273)
>  
> at 
> org.apache.hadoop.hive.metastore.api.FieldSchema$FieldSchemaStandardScheme.read(FieldSchema.java:461)
>  
> at 
> org.apache.hadoop.hive.metastore.api.FieldSchema$FieldSchemaStandardScheme.read(FieldSchema.java:454)
>  
> at 
> org.apache.hadoop.hive.metastore.api.FieldSchema.read(FieldSchema.java:388) 
> at 
> org.apache.hadoop.hive.metastore.api.StorageDescriptor$StorageDescriptorStandardScheme.read(StorageDescriptor.java:1269)
>  
> at 
> org.apache.hadoop.hive.metastore.api.StorageDescriptor$StorageDescriptorStandardScheme.read(StorageDescriptor.java:1248)
>  
> at 
> org.apache.hadoop.hive.metastore.api.StorageDescriptor.read(StorageDescriptor.java:1110)
>  
> at 
> org.apache.hadoop.hive.metastore.api.Partition$PartitionStandardScheme.read(Partition.java:1270)
>  
> at 
> org.apache.hadoop.hive.metastore.api.Partition$PartitionStandardScheme.read(Partition.java:1205)
>  
> at org.apache.hadoop.hive.metastore.api.Partition.read(Partition.java:1062) 
> at 
> org.apache.hadoop.hive.metastore.api.PartitionsByExprResult$PartitionsByExprResultStandardScheme.read(PartitionsByExprResult.java:420)
>  
> at 
> org.apache.hadoop.hive.metastore.api.PartitionsByExprResult$PartitionsByExprResultStandardScheme.read(PartitionsByExprResult.java:399)
>  
> at 
> org.apache.hadoop.hive.metastore.api.PartitionsByExprResult.read(PartitionsByExprResult.java:335)
>  
> at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_partitions_by_expr_result$get_partitions_by_expr_resultStandardScheme.read(ThriftHiveMetastore.java)
>   {code}
> Making this configurable (and defaulting to a higher value) would allow these 
> tables to still be accessible.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Work logged] (HIVE-26633) Make thrift max message size configurable

Reply via email to