Saif Addin created SPARK-21198:
----------------------------------

             Summary: SparkSession catalog is terribly slow
                 Key: SPARK-21198
                 URL: https://issues.apache.org/jira/browse/SPARK-21198
             Project: Spark
          Issue Type: Bug
          Components: Spark Core
    Affects Versions: 2.1.0
            Reporter: Saif Addin


We have a considerably large Hive metastore and a Spark program that goes 
through Hive data availability.

In spark 1.x, we were using sqlConext.tableNames or sqlContext.sql() to go 
throgh Hive.
Once migrated to spark 2.x we switched over SparkSession.catalog instead, but 
it turns out that both listDatabases() and listTables() take between 5 to 20 
minutes depending on the database to return results, using operations such as 
the following one:

spark.catalog.listTables(db).filter(_.isTemporary).map(_.name).collect

and made the program unbearably to return a list of tables.

I know we still have spark.sqlContext.tableNames as workaround but I am 
assuming this is going to be deprecated anytime soon?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to