Saif Addin created SPARK-21198: ---------------------------------- Summary: SparkSession catalog is terribly slow Key: SPARK-21198 URL: https://issues.apache.org/jira/browse/SPARK-21198 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 2.1.0 Reporter: Saif Addin
We have a considerably large Hive metastore and a Spark program that goes through Hive data availability. In spark 1.x, we were using sqlConext.tableNames or sqlContext.sql() to go throgh Hive. Once migrated to spark 2.x we switched over SparkSession.catalog instead, but it turns out that both listDatabases() and listTables() take between 5 to 20 minutes depending on the database to return results, using operations such as the following one: spark.catalog.listTables(db).filter(_.isTemporary).map(_.name).collect and made the program unbearably to return a list of tables. I know we still have spark.sqlContext.tableNames as workaround but I am assuming this is going to be deprecated anytime soon? -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org