cxzl25 commented on pull request #30725: URL: https://github.com/apache/spark/pull/30725#issuecomment-745305483
> I see, thanks for the reference. So IIUC this patch is primarily targeting the `spark.hadoop.cloneConf = true` use case? No. When `spark.hadoop.cloneConf=false`, `HadoopRDD#getPartitions` will create a jobconf and add it to `hadoopJobMetadata` cache. When the number of partitions of the queried hive table is large, many jobconf objects will be created and added to the cache. When the drvier memory configuration is small, the driver will use all the memory, and then full gc. If your hadoop client version is above 2.7, or use the patch of [HADOOP-11209](https://issues.apache.org/jira/browse/HADOOP-11209), you can enable `spark.hadoop.cloneConf=true`, at this time the driver will not have too many jobconf objects. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org