[ https://issues.apache.org/jira/browse/SPARK-10970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Cheolsoo Park resolved SPARK-10970. ----------------------------------- Resolution: Fixed Closing the jira because this is fixed by SPARK-10679. SPARK-10679 addresses a different issue, but it also fixes this issue as a byproduct. > Executors overload Hive metastore by making massive connections at execution > time > --------------------------------------------------------------------------------- > > Key: SPARK-10970 > URL: https://issues.apache.org/jira/browse/SPARK-10970 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 1.5.1 > Environment: Hive 1.2, Spark on YARN > Reporter: Cheolsoo Park > Priority: Critical > > This is a regression in Spark 1.5, more specifically after upgrading Hive > dependency to 1.2. > HIVE-2573 introduced a new feature that allows users to register functions in > session. The problem is that it added a [static code > block|https://github.com/apache/hive/blob/branch-1.2/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L164-L170] > to Hive.java- > {code} > // register all permanent functions. need improvement > static { > try { > reloadFunctions(); > } catch (Exception e) { > LOG.warn("Failed to access metastore. This class should not accessed in > runtime.",e); > } > } > {code} > This code block is executed by every Spark executor in cluster when HadoopRDD > tries to access to JobConf. So if Spark job has a high parallelism (eg > 1000+), executors will hammer the HCat server causing it to go down in the > worst case. > Here is the stack trace that I took in executor when it makes a connection to > Hive metastore- > {code} > 15/10/06 19:26:05 WARN conf.HiveConf: HiveConf of name hive.optimize.s3.query > does not exist > 15/10/06 19:26:05 INFO hive.metastore: XXX: > java.lang.Thread.getStackTrace(Thread.java:1589) > 15/10/06 19:26:05 INFO hive.metastore: XXX: > org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:236) > 15/10/06 19:26:05 INFO hive.metastore: XXX: > org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.<init>(SessionHiveMetaStoreClient.java:74) > 15/10/06 19:26:05 INFO hive.metastore: XXX: > sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > 15/10/06 19:26:05 INFO hive.metastore: XXX: > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) > 15/10/06 19:26:05 INFO hive.metastore: XXX: > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > 15/10/06 19:26:05 INFO hive.metastore: XXX: > java.lang.reflect.Constructor.newInstance(Constructor.java:526) > 15/10/06 19:26:05 INFO hive.metastore: XXX: > org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1521) > 15/10/06 19:26:05 INFO hive.metastore: XXX: > org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.<init>(RetryingMetaStoreClient.java:86) > 15/10/06 19:26:05 INFO hive.metastore: XXX: > org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:132) > 15/10/06 19:26:05 INFO hive.metastore: XXX: > org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:104) > 15/10/06 19:26:05 INFO hive.metastore: XXX: > org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:3005) > 15/10/06 19:26:05 INFO hive.metastore: XXX: > org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3024) > 15/10/06 19:26:05 INFO hive.metastore: XXX: > org.apache.hadoop.hive.ql.metadata.Hive.getAllDatabases(Hive.java:1234) > 15/10/06 19:26:05 INFO hive.metastore: XXX: > org.apache.hadoop.hive.ql.metadata.Hive.reloadFunctions(Hive.java:174) > 15/10/06 19:26:05 INFO hive.metastore: XXX: > org.apache.hadoop.hive.ql.metadata.Hive.<clinit>(Hive.java:166) > 15/10/06 19:26:05 INFO hive.metastore: XXX: > org.apache.hadoop.hive.ql.plan.PlanUtils.configureJobPropertiesForStorageHandler(PlanUtils.java:803) > 15/10/06 19:26:05 INFO hive.metastore: XXX: > org.apache.hadoop.hive.ql.plan.PlanUtils.configureInputJobPropertiesForStorageHandler(PlanUtils.java:782) > 15/10/06 19:26:05 INFO hive.metastore: XXX: > org.apache.spark.sql.hive.HadoopTableReader$.initializeLocalJobConfFunc(TableReader.scala:347) > 15/10/06 19:26:05 INFO hive.metastore: XXX: > org.apache.spark.sql.hive.HadoopTableReader$anonfun$17.apply(TableReader.scala:322) > 15/10/06 19:26:05 INFO hive.metastore: XXX: > org.apache.spark.sql.hive.HadoopTableReader$anonfun$17.apply(TableReader.scala:322) > 15/10/06 19:26:05 INFO hive.metastore: XXX: > org.apache.spark.rdd.HadoopRDD$anonfun$getJobConf$6.apply(HadoopRDD.scala:179) > 15/10/06 19:26:05 INFO hive.metastore: XXX: > org.apache.spark.rdd.HadoopRDD$anonfun$getJobConf$6.apply(HadoopRDD.scala:179) > 15/10/06 19:26:05 INFO hive.metastore: XXX: scala.Option.map(Option.scala:145) > 15/10/06 19:26:05 INFO hive.metastore: XXX: > org.apache.spark.rdd.HadoopRDD.getJobConf(HadoopRDD.scala:179) > 15/10/06 19:26:05 INFO hive.metastore: XXX: > org.apache.spark.rdd.HadoopRDD$anon$1.<init>(HadoopRDD.scala:231) > 15/10/06 19:26:05 INFO hive.metastore: XXX: > org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:227) > 15/10/06 19:26:05 INFO hive.metastore: XXX: > org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:103) > 15/10/06 19:26:05 INFO hive.metastore: XXX: > org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297) > 15/10/06 19:26:05 INFO hive.metastore: XXX: > org.apache.spark.rdd.RDD.iterator(RDD.scala:264) > 15/10/06 19:26:05 INFO hive.metastore: XXX: > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > 15/10/06 19:26:05 INFO hive.metastore: XXX: > org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297) > 15/10/06 19:26:05 INFO hive.metastore: XXX: > org.apache.spark.rdd.RDD.iterator(RDD.scala:264) > 15/10/06 19:26:05 INFO hive.metastore: XXX: > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > 15/10/06 19:26:05 INFO hive.metastore: XXX: > org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297) > 15/10/06 19:26:05 INFO hive.metastore: XXX: > org.apache.spark.rdd.RDD.iterator(RDD.scala:264) > 15/10/06 19:26:05 INFO hive.metastore: XXX: > org.apache.spark.rdd.UnionRDD.compute(UnionRDD.scala:97) > 15/10/06 19:26:05 INFO hive.metastore: XXX: > org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297) > 15/10/06 19:26:05 INFO hive.metastore: XXX: > org.apache.spark.rdd.RDD.iterator(RDD.scala:264) > 15/10/06 19:26:05 INFO hive.metastore: XXX: > org.apache.spark.rdd.MapPartitionsWithPreparationRDD.compute(MapPartitionsWithPreparationRDD.scala:63) > 15/10/06 19:26:05 INFO hive.metastore: XXX: > org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297) > 15/10/06 19:26:05 INFO hive.metastore: XXX: > org.apache.spark.rdd.RDD.iterator(RDD.scala:264) > 15/10/06 19:26:05 INFO hive.metastore: XXX: > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > 15/10/06 19:26:05 INFO hive.metastore: XXX: > org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297) > 15/10/06 19:26:05 INFO hive.metastore: XXX: > org.apache.spark.rdd.RDD.iterator(RDD.scala:264) > 15/10/06 19:26:05 INFO hive.metastore: XXX: > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73) > 15/10/06 19:26:05 INFO hive.metastore: XXX: > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) > 15/10/06 19:26:05 INFO hive.metastore: XXX: > org.apache.spark.scheduler.Task.run(Task.scala:88) > 15/10/06 19:26:05 INFO hive.metastore: XXX: > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) > 15/10/06 19:26:05 INFO hive.metastore: XXX: > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > 15/10/06 19:26:05 INFO hive.metastore: XXX: > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > 15/10/06 19:26:05 INFO hive.metastore: XXX: > java.lang.Thread.run(Thread.java:745) > 15/10/06 19:26:05 INFO hive.metastore: Trying to connect to metastore with > URI thrift://admin.gateway.dataeng.netflix.net:11002 > {code} > As can be seen, HadoopRDD tries to get JobConf in executor, which in turn > invokes the {{reloadFunctions()}} function in Hive.java. > What's worse, due to HIVE-10319, a single {{reloadFunctions()}} call ends up > making hundreds of thrift calls to Hive metastore if there are a large number > of databases in Hive metastore. So any Spark job can easily take down HCat > server in production. > As a workaround, I forked Databrick's [Hive 1.2 > repo|https://github.com/pwendell/hive/commits/release-1.2.1-spark], removed > the static code block from Hive.java, and rebuilt Spark with this forked > version of Hive. I don't know if there is a better way of fixing this problem. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org